The aim of this document is to provide information about resolving monitoring alerts according to this workflow:
Monitoring a host via Zabbix → Receiving alert notification → Analyzing → Marking alerts as acknowledged → Resolving according to instructions provided
- Understanding alerts and triggers
- Understanding priorities
- Finding, acknowledging and resolving alerts
Understanding Alerts and Triggers
Zabbix is a monitoring system used by the platform in order to monitor and track the status of servers. Zabbix uses different triggers to represent system state. Trigger is a mechanism that initiates an action (in this case alarm) when an event occurs. It is automatically fired as a result of a data modification or exceeding threshold value.
A trigger may have the following values:
|PROBLEM||Normally means that something happened. For example, processor load is too high.|
|OK||This is a normal trigger state.|
|UNKNOWN||In this case, Zabbix cannot evaluate trigger expression.|
This may happen because of several reasons:
When something goes wrong with a server, trigger will be in the state of PROBLEM. In this case Zabbix agent identifies the problem and sends you email and/or sms notification. The text message of the notification sent by Zabbix agent can be of the following type:
|Processor load is too high on app.jelastic.***.net|
The platform provides a set of triggers with already set notifications. By default you will get SMS notifications for issues with severity status Disaster and about the rest of the issues you will be notified by emails.
You can also set your own configurations for getting notifications (See our section Configuring Zabbix agent). You can decide about what problems you want to be notified by emails and about what by sms notifications.
The platform provides its users a set of triggers with appropriate trigger severity. Trigger severity represent priority status due to the importance of a definite trigger.
Zabbix supports the following trigger severities (priorities) displayed from the lowest to the highest priority:
|Not classified||Unknown severity.||Grey|
|Information||For information purposes.||Light green|
|Warning||Be warned.||Light yellow|
|Average||Average problem.||Dark red|
|High||Something important has happened.||Red|
|Disaster||Problem needs to be solved immediately (financial losses, etc.)||Bright red|
Trigger severity names and colours can be configured in Administration → General → Trigger severities.
We recommend setting SMS notifications from Zabbix for issues with severity status Disaster.
To get more information about triggers and their configuration follow the link.
Finding, Acknowledging and Resolving Alerts
Alert emergence is connected with the trigger status change. Triggers statuses can be viewed via Zabbix dashboard. Alert notification is sent to you when the trigger is fired and gets status PROBLEM. Triggers statuses and all other changes are reflected on Zabbix dashboard. After having received email or sms notifications from Zabbix (based on the fired trigger) you should perform certain actions to resolve the issue.
If you can’t resolve the situation at once you should perform some actions to organize its troubleshooting (for example, create a ticket based on this issue, where you state a person responsible for it or create a plan on solving this problem). In this case you should mark the issue as acknowledged. To do it follow the instructions below:
Go to Zabbix dashboard → Last 20 issues and choose the issue you want to mark as acknowledged.
Acknowledge alert by clicking No in the column Ack (acknowledge alarm).
Then in Acknowledge alarm dialog type your message (if possible link it to an appropriate ticket with description or plans regarding issue troubleshooting) and click Acknowledge & Return.
When the ticket which is linked to your issue is resolved, alert will disappear from the list.
Note: If the issue is resolved without acknowledgement, it will also disappear from the list automatically when its trigger status is changed from PROBLEM to OK.
After the alerted issue is solved and its trigger status changes from PROBLEM to OK, you will get email notification with recovery message by default. You can also set sms notifications for this purposes (use the instructions from Monitoring Guide. Configurations).