An Opined Monitoring And Self Healing System

Quickly create scripts to perform validations to monitor systems, applications or infrastructure. Easily take action when behavior changes or when a problem is detected!

Plugins Based

Like other frameworks, Lifeguard was built with mechanisms that facilitate the creation of plugins, separating what is essential from what is optional. Example: The core must know when or not to notify, but we don't need to load the MS Teams notification routines if we are going to use Google Chat.

Proactive Validations

Most monitoring systems work reactively, like the Grafana Alertmanager solution, that is, based on things (in many cases, errors) that have already happened that alerts are triggered. Lifeguard's focus is on making it easy to create proactive validations that can look for errors and send easily understandable alerts.

Deploy and Scale Easily

Lifeguard consists of two fundamental elements. A web server that provides some APIs where you can provide a GUI (the GUI also works based on plugins) and a queue, or queues, to perform validations. Both can run together or not and can scale independently as needed. Validations can be performed on specific instances of queues. This organization makes it much easier to scale resources depending on the type of use.

Starting a project

To start a project, you need to install the Lifeguard package using pip:

pip install lifeguard

Create a new directory and create a main settings file used for Lifeguard:

mkdir myproject
cd myproject
lifeguard -g

The -g parameter will create a new file called lifeguard_settings.py with the initial structure. The example in the right side is a example.

# import lifeguard plugin for MongoDB
# This plugin able the persistent results of validations
# To use this plugin, you need to install it using pip
# pip install lifeguard-mongodb
import lifeguard_mongodb

from lifeguard.settings import SettingsManager
from lifeguard.auth import BASIC_AUTH_METHOD

# list of plugins used by this instance
PLUGINS = [lifeguard_mongodb]

def setup(lifeguard_context):
    # Register user and password for basic authentication
    lifeguard_context.auth_method = BASIC_AUTH_METHOD
    lifeguard_context.users = [
        {"username": "user", "password": "pass"}
    ]

Simple Example Validation

from lifeguard import NORMAL, PROBLEM, change_status
from lifeguard.actions.database import save_result_into_database
from lifeguard.http_client import get
from lifeguard.logger import lifeguard_logger as logger
from lifeguard.validations import ValidationResponse, validation


# The validation decorator is used to register a new validation.
# The schedule parameter is used to define when the validation will be executed.
# The actions parameter is used to define what actions will be
# executed after the validation.
# The validation function must return a ValidationResponse object.
# The file with validation must be in the folder validations and
# must be named with "_validation.py" suffix.
@validation(
    "check if pudim is alive",
    actions=[save_result_into_database],
    schedule={"every": {"minutes": 1}},
)
def pudim_is_alive():
    status = NORMAL
    result = requests.get("http://pudim.com.br")
    logger.info("pudim status code: %s", result.status_code)

    if result.status_code != 200:
        status = change_status(status, PROBLEM)

    return ValidationResponse(
        status,
        {status: result.status_code},
    )

To execute the validation, run the command:

lifeguard

The validation will be executed every minute and the result will be saved in the database. You can see the result in the http://localhost:5567/lifeguard/status/complete. Not forget to use the user and password.

Settings

To list all core and plugins settings, run the command:

lifeguard -d

Using the SettingsManager to register new settings you can view these in the same output.

Table with settings

[Beta] Using OpenAI

The pictures shows the action executed after validation that uses the OpenAI API to generate an explanation of the root cause of the a given traceback.

A complete example can be found in this validation. In this example the function pods_validation investigates de problem in the events of pods or in the pods logs and put problem in the details of validation_response. The action in the lifeguard-openai plugins use the tracebacks to generate the explanation.

In the output used as an example (sent to a Telegram channel) the pod was in crash loop and the root cause explanation was generated by the OpenAI API.

Example of OpenAI

In the next example, the first pod has an invalid image and the pull is failing. The second pod has an invalid command to start the container. Both explanations were generated based on events in the pods.

Example of OpenAI

Built Plugins

MongoDB

This plugin is used to store the results of validations in a MongoDB database.

TinyDB

This plugin is used to store the results of validations in a TinyDB database. TinyDB produces a single json file with data.

Google Chat Notification

This plugin is used to send notifications to Google Chat.

MS Teams Notification

This plugin is used to send notifications to MS Teams.

Telegram

This plugin is used to send notifications to Telegram.

The plugin provides a simple way to integrate bot commands with Lifeguard.

RabbitMQ

This plugin contains some common validations for RabbitMQ queues.

Simple Dashboard

This plugin is used to create a simple dashboard with the results of validations. Used as example to build a simple GUI.

OpenAI

This plugin provides actions to interact with OpenAI API. The first action created is used to analize a traceback and explains what is the problem.