Session 14 Alerting
Session 14 Alerting
Alerting
Ram N Sangwan
Agenda
• Introduction to Alerting
• AlertManager
• Alerting Rules
• Setting up Alerts
2
Alerting Overview
• Alerting with Prometheus is separated into two parts.
• Alerting rules in Prometheus servers send alerts to an Alertmanager.
• The Alertmanager then manages those alerts, including silencing, inhibition, aggregation
and sending out notifications via methods such as email and chat platforms.
• The main steps to setting up alerting and notifications are:
• Setup and configure the Alertmanager
• Configure Prometheus to talk to the Alertmanager
• Create alerting rules in Prometheus
3
Alertmanager
• The Alertmanager handles alerts sent by client applications such as the
Prometheus server.
• It takes care of deduplicating, grouping, and routing them to the correct
receiver integration such as email, webhook, or OpsGenie.
• It also takes care of silencing and inhibition of alerts.
4
How Alertmanager Works
5
Set Up AlertManager For Prometheus and Grafana Alerts
6
Setup AlertManager For Prometheus Alerts
• AlertManager rules are conceptualized as routes, giving you the ability to write
sophisticated sets of rules to determine where notifications should end up.
• A default receiver should be configured for every notification.
• Then additional services can be configured through child routes that match
certain conditions, such as:
7
Grouping of Alerts
Grouping categorizes alerts of similar nature into a single notification.
• Example: Dozens of instances of a service are running in your cluster when a
network partition occurs.
• Half of your service instances can no longer reach the database.
• Alerting rules in Prometheus were configured to send an alert for each service instance if it
cannot communicate with the database.
• As a result hundreds of alerts are sent to Alertmanager.
• One can configure Alertmanager to group alerts by their cluster and alertname
so it sends a single compact notification.
• Grouping of alerts, timing for the grouped notifications, and the receivers of
those notifications are configured by a routing tree in the configuration file.
8
Inhibition Alerts
Inhibition is a concept of suppressing notifications for certain alerts if certain
other alerts are already firing.
• Example:
• An alert is firing that informs that an entire cluster is not reachable.
• Alertmanager can be configured to mute all other alerts concerning this cluster if that
particular alert is firing.
• This prevents notifications for hundreds or thousands of firing alerts that are unrelated to
the actual issue.
• Inhibitions are configured through the Alertmanager's configuration file.
9
Silences for Alerts
• Silences are a straightforward way to simply mute alerts for a given time.
• A silence is configured based on matchers, just like the routing tree.
• Incoming alerts are checked whether they match all the equality or regular
expression matchers of an active silence.
• If they do, no notifications will be sent out for that alert.
• Silences are configured in the web interface of the Alertmanager.
10
Client Behavior and High Availability
Client Behavior
• The Alertmanager has special requirements for behavior of its client.
• Those are only relevant for advanced use cases where Prometheus is not
used to send alerts.
High Availability
• Alertmanager supports configuration to create a cluster for high availability.
This can be configured using the --cluster-* flags.
• It's important not to load balance traffic between Prometheus and its
Alertmanagers, but instead, point Prometheus to a list of all Alertmanagers.
11
Install Alertmanager
• Create the user for alertmanager:
# useradd --no-create-home --shell /bin/false alertmanager
• Download alertmanager and extract:
# wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
12
Install Alertmanager
• Ensure that the correct permissions are in place:
# chown alertmanager:alertmanager /usr/local/bin/alertmanager
# chown alertmanager:alertmanager /usr/local/bin/amtool
13
Create alertmanager.yml
Create /etc/alertmanager/alertmanager.yml file:
global:
slack_api_url:
"https://hooks.slack.com/services/XXXXXXXXXXXXXXX"
route:
receiver: "default"
routes:
- match:
severity: info
receiver: slack
- match:
severity: critical
receiver: email
group_wait: 30s
group_interval: 5m
repeat_interval: 5m
14
Create alertmanager.yml
receivers: - name: slack
- name: default slack_configs:
- send_resolved: true
email_configs:
username: '{{ template "slack.default.username" . }}'
- to: '[email protected]'
color: '{{ if eq .Status "firing" }}good{{ else }}good{{ end }}'
from: '[email protected]' title: '{{ template "slack.default.title" . }}'
smarthost: 'smtp.host.com:2525' title_link: '{{ template "slack.default.titlelink" . }}'
auth_username: "smtpusername" pretext: '{{ .CommonAnnotations.summary }}'
auth_password: "smtppassword" text:
html: '{{ template "email" .}}' >-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity
}}`:bar_chart:
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
15
Create alertmanager.yml
- name: email
email_configs:
- to: '[email protected]'
from: '[email protected]'
smarthost: 'smtp.host.com:2525'
auth_username: "smtpusername"
auth_password: "smtppassword"
html: '{{ template "email" .}}'
templates:
- 'alert.tmpl'
16
Configure Alertmanager
• Create a data folder at the root directory, with a alertmanager folder inside.
# mkdir -p /data/alertmanager
# chown alertmanager:alertmanager -R /etc/alertmanager
17
Configure AlertManager
There are four main components I defined.
• global: Here we can define any global variable like we defined slack_api_url.
• route: It is a routing block. I am playing routing on severity. Similarly, it is your
choice on which variable you want to route your alerts. So here if severity ==
‘info’, Alert will go from slack or if severity == ‘critical’, Alert will go via an email.
• receivers: Here we can define the channel by which alert will go. For now, I
have defined only email & slack.
• You can explore more receivers here.
• templates: It is an alert template where I have defined the HTML template for
an email alert. It is not restricted to email. You can define a template for any
channel.
• Explore more details about templates here.
18
Sample template {{ define "email" }}
<html>
<head>
<style type="text/css">
table {
Create /etc/prometheus/alert.tmpl font-family: verdana,arial,sans-serif;
font-size:11px;
color:#333333;
border-width: 1px;
border-color: #999999;
border-collapse: collapse;
}
table th {
background-color:#ff6961;
border-width: 1px;
padding: 8px;
border-style: solid;
border-color: #F54C44;
}
table td {
border-width: 1px;
padding: 8px;
border-style: solid;
border-color: #F54C44;
text-align: right;
}
19 </style>
Sample template
</head>
<body>
<table border=1>
<thead>
<tr>
<th>Alert name</th>
<th>Host</th>
<th>Summary</th>
<th>Description</th>
</tr>
</thead>
<tbody>
{{ range .Alerts }}
<tr>
<td>{{ .Labels.alertname }}</td>
<td>{{ .Annotations.host }}</td>
<td>{{ .Annotations.summary }}</td>
<td>{{ .Annotations.description }}</td>
</tr>
{{ end }}
</tbody>
</table>
</body>
</html>
{{end}}
20
Start alertmanager service
# systemctl start alertmanager
# systemctl restart prometheus
21
Test complete Integrations
• If everything is set up correctly and alert rule getting true then it will trigger an
alert.
• You can enable the log of alertmanager for debugging purposes.
• If you want to test alert, Then simply make threshold to very less 0% or 1%
after 30s it should trigger the alert.
• Visit localhost:9093.
• If there is some alert you will get the list on the dashboard.
22
What Next?
• Explore Official Documentation on Alert Manager.
• A Good Collection of Alerting Rules
- https://awesome-prometheus-alerts.grep.to/rules.html
23
Thank You
24