Quickly create scripts to perform validations to monitor systems, applications or infrastructure. Easily take action when behavior changes or when a problem is detected!
Like other frameworks, Lifeguard was built with mechanisms that facilitate the creation of plugins, separating what is essential from what is optional. Example: The core must know when or not to notify, but we don't need to load the MS Teams notification routines if we are going to use Google Chat.
Most monitoring systems work reactively, like the Grafana Alertmanager solution, that is, based on things (in many cases, errors) that have already happened that alerts are triggered. Lifeguard's focus is on making it easy to create proactive validations that can look for errors and send easily understandable alerts.
Lifeguard consists of two fundamental elements. A web server that provides some APIs where you can provide a GUI (the GUI also works based on plugins) and a queue, or queues, to perform validations. Both can run together or not and can scale independently as needed. Validations can be performed on specific instances of queues. This organization makes it much easier to scale resources depending on the type of use.
To start a project, you need to install the Lifeguard package using pip:
pip install lifeguard
Create a new directory and create a main settings file used for Lifeguard:
mkdir myproject
cd myproject
lifeguard -g
The -g
parameter will create a new file called
lifeguard_settings.py
with the initial structure. The
example in the right side is a example.
# import lifeguard plugin for MongoDB
# This plugin able the persistent results of validations
# To use this plugin, you need to install it using pip
# pip install lifeguard-mongodb
import lifeguard_mongodb
from lifeguard.settings import SettingsManager
from lifeguard.auth import BASIC_AUTH_METHOD
# list of plugins used by this instance
PLUGINS = [lifeguard_mongodb]
def setup(lifeguard_context):
# Register user and password for basic authentication
lifeguard_context.auth_method = BASIC_AUTH_METHOD
lifeguard_context.users = [
{"username": "user", "password": "pass"}
]
from lifeguard import NORMAL, PROBLEM, change_status
from lifeguard.actions.database import save_result_into_database
from lifeguard.http_client import get
from lifeguard.logger import lifeguard_logger as logger
from lifeguard.validations import ValidationResponse, validation
# The validation decorator is used to register a new validation.
# The schedule parameter is used to define when the validation will be executed.
# The actions parameter is used to define what actions will be
# executed after the validation.
# The validation function must return a ValidationResponse object.
# The file with validation must be in the folder validations and
# must be named with "_validation.py" suffix.
@validation(
"check if pudim is alive",
actions=[save_result_into_database],
schedule={"every": {"minutes": 1}},
)
def pudim_is_alive():
status = NORMAL
result = requests.get("http://pudim.com.br")
logger.info("pudim status code: %s", result.status_code)
if result.status_code != 200:
status = change_status(status, PROBLEM)
return ValidationResponse(
status,
{status: result.status_code},
)
To execute the validation, run the command:
lifeguard
The validation will be executed every minute and the result will be saved in the database. You can see the result in the http://localhost:5567/lifeguard/status/complete. Not forget to use the user and password.
To list all core and plugins settings, run the command:
lifeguard -d
Using the SettingsManager
to register new settings
you can view these in the same output.
The pictures shows the action executed after validation that uses the OpenAI API to generate an explanation of the root cause of the a given traceback.
A complete example can be found in this
validation. In this example the function
pods_validation
investigates de problem in the events
of pods or in the pods logs and put problem in the details of
validation_response. The action in the
lifeguard-openai
plugins use the tracebacks to
generate the explanation.
In the output used as an example (sent to a Telegram channel) the pod was in crash loop and the root cause explanation was generated by the OpenAI API.
In the next example, the first pod has an invalid image and the pull is failing. The second pod has an invalid command to start the container. Both explanations were generated based on events in the pods.
This plugin is used to store the results of validations in a MongoDB database.
This plugin is used to store the results of validations in a TinyDB database. TinyDB produces a single json file with data.
This plugin is used to send notifications to Google Chat.
This plugin is used to send notifications to MS Teams.
This plugin is used to send notifications to Telegram.
The plugin provides a simple way to integrate bot commands with Lifeguard.
This plugin contains some common validations for RabbitMQ queues.
This plugin is used to create a simple dashboard with the results of validations. Used as example to build a simple GUI.
This plugin provides actions to interact with OpenAI API. The first action created is used to analize a traceback and explains what is the problem.