You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
To reduce noise, OpenWISP Monitoring avoids sending alerts for metrics that flap—i.e., alternate between healthy and unhealthy states—within a defined tolerance period.
Currently, this is implemented by loading all data points for the metric from the time series database within the threshold window and iterating over them to detect flapping. However, this approach does not scale well when the threshold window spans several hours or days, especially for high-frequency metrics or large deployments.
# if the latest results are consistent, the metric being
# monitored is not flapping and we can confidently return
# wheter the value crosses the threshold or not
iflen(set(results)) ==1:
returnvalue_crossed
# otherwise, the results are flapping, the situation has not changed
# we will return a value that will not trigger changes
returnnotself.metric.is_healthy_tolerant
# otherwise keep looking back
continue
# the search has not yielded any conclusion
# return result based on the current value and time
time=timezone.now()
Problem:
Inefficient performance for large time windows.
Potentially high memory usage and long processing time when analyzing high-volume data.
Expected Behavior:
Optimize the flapping detection logic to work efficiently for long threshold windows without loading all data points into memory. We shall try to use database queries to optimize this operation.
The text was updated successfully, but these errors were encountered:
nemesifier
changed the title
[feature] Optimize the logic that catches flapping metrics
[change] Optimize the logic that catches flapping metrics
May 19, 2025
Description:
To reduce noise, OpenWISP Monitoring avoids sending alerts for metrics that flap—i.e., alternate between healthy and unhealthy states—within a defined tolerance period.
Currently, this is implemented by loading all data points for the metric from the time series database within the threshold window and iterating over them to detect flapping. However, this approach does not scale well when the threshold window spans several hours or days, especially for high-frequency metrics or large deployments.
openwisp-monitoring/openwisp_monitoring/monitoring/base/models.py
Lines 988 to 1021 in a9993c7
Problem:
Expected Behavior:
Optimize the flapping detection logic to work efficiently for long threshold windows without loading all data points into memory. We shall try to use database queries to optimize this operation.
The text was updated successfully, but these errors were encountered: