Feature Overview
In a bid to elevate network monitoring capabilities, Orchestrator introduces a robust alerting feature aimed at measuring KPIs (Key Performance Indicators) and generating site-based events regularly. The primary focus is on providing customizable thresholds to alert users based on these events.
Key Alerts
SITEAVAILABILITYALERT: Monitors site-level Access Point count and their health status
RECURRINGPODCRASHALERT: Keeps an eye on edge-level pods, measuring the recurring pod crash count
Support Guide
New events are automatically generated by the backend event generator batch process and should start appearing in the Events section. These events carry severity levels of INFO or WARN based on their criticality.
Site Availability Based Events
Pod Crash Based Events
Threshold-based alerts need to be enabled to generate alarms for these new events from the Alerts section. Once SITEAVAILABILITYALERT and RECURRINGPODCRASHALERT are activated, the backend notification engine processes corresponding events that match the threshold. When the measured KPI meets the threshold, an alert is generated, appearing in the Alerts section and triggering notifications, such as on a Slack channel.
Default Thresholds
SITEAVAILABILITYALERT: Triggers when AP availability drops by more than 10% in a given 5-minute time window.
RECURRINGPODCRASHALERT: Activates when more than 3 pods crash within a minute.
Alerts View
The Alerts section on the Orchestrator will show any new alerts generated when the threshold is met (for example: if the site availability degrades more than 10% in a 5 minute time window), the corresponding SITEAVAILABILITYALERT alert will be generated:
Similarly, if there are more than 3 recurring pod crashes in any given minute, a new RECURRINGPODCRASHALERT alert will be generated: