-
Notifications
You must be signed in to change notification settings - Fork 158
Add delayed_jobs_ready to DelayedJobs plugin and collect_by_queue option for GoodJob plugin #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
One question, how would you handle that the number of jobs returned can decrease, because of clean up scripts? |
I'm not sure I fully understand the question/problem statement here - any clarification would be great :D If you're asking about GoodJob's auto-clean up which deleted jobs after X amount of time (default 2 weeks) you can either disable the clean-up and leave the records in the db or implement some good_job on delete hook to increment some counter somewhere. Though, I'm not sure if that's within the scope of the prometheus_exporter gem, or my PR, so maybe I'm misunderstanding the question 😅 |
Apologies for not addressing this PR for so long but I'm doing a general clean up of issues/PRs in this repository and will be closing this as stale. I'll reopen the issue/PR if I get a comment to do this. |
Hi tgxworld, no worries, life is busy :) We would still prefer to have these changes upstreamed, and I believe most users/future users of the library would to. Each change aligns the respective plugins to have functionality that the other plugins have, creating a more consistent library experience regardless of the background job processor. |
@benngarcia I have reopened the PR. Can you rebase it for me to review? Thank you! |
For the Delayed Job part, should this metric exclude failed and running jobs? If so would it be best to update the query to:
Forgive me if I misunderstood your goal with this. I am also looking for a better way to alert when delayed jobs are "stuck" meaning they should have run and haven't yet and haven't failed. |
@chrisrohr Good catch! We don't keep failed jobs around, and our running jobs is always capped lower than the threshold that we autoscale at - so our systems never had issues with that. But you're spot on with understanding the original intention :) Will address this & rebase today. |
delayed_jobs_ready instance variable added to metrics
don't forget the documentation! RuntimeMetric unused struct removed GoodJobCollector refactor empty? changed to length.zero? to respect expiry metrics_container changes reverted since I didn't use any enumerable methods post refactor
When we do `group(:queue_name).size` it returns the count by queue => {"queue_a"=>3, "queue_d"=>1} The problem is when a queue is empty it will simply be excluded from the results instead of returning count of 0. So the result we want should be => {"queue_a"=>3, "queue_b"=>0, "queue_c"=>0, "queue_d"=>1} Without returning 0, the queue count in prometheus metrics will be the last non-zero value meaning we can't auto-scale down the workers. Bug Fix Refactor
This PR was born out of metric gathering for our auto-scaling needs as we're migrating hosting platforms and mid-migration of queue libraries.
This PR adds a new metric to the DelayedJobs plugin - "delayed_jobs_ready". This can be thought of as all of the jobs whose
run_at < now()
. We needed this metric and not queued or pending since those included all of our jobs which could be days, weeks, or months out.This PR also adds the ability to view GoodJob metrics sliced by queue, similar to the DelayedJobs plugin. It's fairly self-explanatory why scaling queue workers based off how many jobs are enqueued in a given queue may be beneficial.