You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't have the exact repro steps. I think this is a race condition when a queue is deleted while the metrics are collected and the queue is found by the collector but then returns empty values ('') instead of the expected numbers. I've seen this occur in 3 different places:
crasher:
initial call: rabbit_mgmt_metrics_collector:init/1
pid: <0.699.0>
registered_name: connection_coarse_metrics_metrics_collector
exception error: an error occurred when evaluating an arithmetic expression
in operator -/2
called as '' - 19495760921
in call from rabbit_mgmt_metrics_collector:difference/2 (rabbit_mgmt_metrics_collector.erl, line 649)
in call from rabbit_mgmt_metrics_collector:aggregate_entry/4 (rabbit_mgmt_metrics_collector.erl, line 234)
in call from lists:foldl/3 (lists.erl, line 1594)
in call from ets:do_foldl/4 (ets.erl, line 656)
in call from ets:foldl/3 (ets.erl, line 645)
in call from rabbit_mgmt_metrics_collector:aggregate_metrics/2 (rabbit_mgmt_metrics_collector.erl, line 172)
in call from rabbit_mgmt_metrics_collector:handle_info/2 (rabbit_mgmt_metrics_collector.erl, line 106)
crasher:
initial call: rabbit_mgmt_metrics_collector:init/1
pid: <0.7180.0>
registered_name: connection_coarse_metrics_metrics_collector
exception error: an error occurred when evaluating an arithmetic expression
in operator +/2
called as 19340950184 + ''
in call from rabbit_mgmt_metrics_collector:sum_entry/2 (rabbit_mgmt_metrics_collector.erl, line 631)
in call from rabbit_mgmt_metrics_collector:'-insert_entry_op/4-fun-0-'/2 (rabbit_mgmt_metrics_collector.erl, line 598)
in call from maps:update_with/4 (maps.erl, line 296)
in call from rabbit_mgmt_metrics_collector:insert_entry_op/4 (rabbit_mgmt_metrics_collector.erl, line 597)
in call from lists:foldl/3 (lists.erl, line 1594)
in call from rabbit_mgmt_metrics_collector:aggregate_entry/4 (rabbit_mgmt_metrics_collector.erl, line 235)
in call from lists:foldl/3 (lists.erl, line 1594)
crasher:
initial call: cowboy_stream_h:request_process/3
pid: <0.95144.0>
registered_name: []
exception error: an error occurred when evaluating an arithmetic expression
in operator +/2
called as '' + 4193012
in call from prometheus_rabbitmq_core_metrics_collector:'-get_data/3-fun-5-'/3 (prometheus_rabbitmq_core_metrics_collector.erl, line 648)
in call from lists:foldl/3 (lists.erl, line 1594)
in call from ets:do_foldl/4 (ets.erl, line 656)
in call from ets:foldl/3 (ets.erl, line 645)
in call from prometheus_rabbitmq_core_metrics_collector:get_data/3 (prometheus_rabbitmq_core_metrics_collector.erl, line 633)
in call from prometheus_rabbitmq_core_metrics_collector:'-collect/5-lc$^0/1-0-'/6 (prometheus_rabbitmq_core_metrics_collector.erl, line 327)
in call from prometheus_rabbitmq_core_metrics_collector:collect/5 (prometheus_rabbitmq_core_metrics_collector.erl, line 329)
Observed on the main branch on November 26
Reproduction steps
Not clear
Expected behavior
No crashes
Additional context
No response
The text was updated successfully, but these errors were encountered:
I'm not sure if that's because I'm testing a slightly different scenario perhaps these errors are caused by some recent changes but I haven't seen them in the past at all and now I see them very often.
We have seen these before, usually on a freshly booted node when some values are not yet available. '' is used as undefined historically by a few metrics that come from the environment, e.g. free disk space, which is not computed instantly the node starts booting.
The Prometheus plugin should filter such values out because if a data point isn't available yet, what else can it do?
@mkuratczyk yup, that can be another case where some metrics no longer exist. During a boot, they do not yet exist, and after queue deletion, they no longer exist.
I am all for making the code more defensive but if some samples are missing… what other than an error can the Prometheus scraping API endpoint return?
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
I don't have the exact repro steps. I think this is a race condition when a queue is deleted while the metrics are collected and the queue is found by the collector but then returns empty values (
''
) instead of the expected numbers. I've seen this occur in 3 different places:Observed on the
main
branch on November 26Reproduction steps
Not clear
Expected behavior
No crashes
Additional context
No response
The text was updated successfully, but these errors were encountered: