Add delayed_jobs_ready to DelayedJobs plugin and collect_by_queue option for GoodJob plugin #302

benngarcia · 2024-01-23T08:20:15Z

This PR was born out of metric gathering for our auto-scaling needs as we're migrating hosting platforms and mid-migration of queue libraries.

This PR adds a new metric to the DelayedJobs plugin - "delayed_jobs_ready". This can be thought of as all of the jobs whose run_at < now(). We needed this metric and not queued or pending since those included all of our jobs which could be days, weeks, or months out.

This PR also adds the ability to view GoodJob metrics sliced by queue, similar to the DelayedJobs plugin. It's fairly self-explanatory why scaling queue workers based off how many jobs are enqueued in a given queue may be beneficial.

lauer · 2024-01-24T09:58:08Z

One question, how would you handle that the number of jobs returned can decrease, because of clean up scripts?
I am using the GoodJob part now, but since is a total count in the DB, the number is really unusable, when the clean up script is running next to it.

benngarcia · 2024-01-24T10:17:29Z

One question, how would you handle that the number of jobs returned can decrease, because of clean up scripts? I am using the GoodJob part now, but since is a total count in the DB, the number is really unusable, when the clean up script is running next to it.

I'm not sure I fully understand the question/problem statement here - any clarification would be great :D

If you're asking about GoodJob's auto-clean up which deleted jobs after X amount of time (default 2 weeks) you can either disable the clean-up and leave the records in the db or implement some good_job on delete hook to increment some counter somewhere. Though, I'm not sure if that's within the scope of the prometheus_exporter gem, or my PR, so maybe I'm misunderstanding the question 😅

tgxworld · 2025-04-17T01:44:53Z

Apologies for not addressing this PR for so long but I'm doing a general clean up of issues/PRs in this repository and will be closing this as stale. I'll reopen the issue/PR if I get a comment to do this.

benngarcia · 2025-05-08T19:40:59Z

Hi tgxworld, no worries, life is busy :)

We would still prefer to have these changes upstreamed, and I believe most users/future users of the library would to. Each change aligns the respective plugins to have functionality that the other plugins have, creating a more consistent library experience regardless of the background job processor.

tgxworld · 2025-05-29T04:46:22Z

@benngarcia I have reopened the PR. Can you rebase it for me to review? Thank you!

chrisrohr · 2025-06-20T17:31:09Z

For the Delayed Job part, should this metric exclude failed and running jobs? If so would it be best to update the query to:

Delayed::Job.where(queue: job.queue)
            .where('run_at <= ?', Time.current)
            .where(locked_at: nil, failed_at: nil)
            .count

Forgive me if I misunderstood your goal with this. I am also looking for a better way to alert when delayed jobs are "stuck" meaning they should have run and haven't yet and haven't failed.

benngarcia · 2025-06-20T18:43:53Z

@chrisrohr Good catch! We don't keep failed jobs around, and our running jobs is always capped lower than the threshold that we autoscale at - so our systems never had issues with that. But you're spot on with understanding the original intention :)

Will address this & rebase today.

delayed_jobs_ready instance variable added to metrics

don't forget the documentation! RuntimeMetric unused struct removed GoodJobCollector refactor empty? changed to length.zero? to respect expiry metrics_container changes reverted since I didn't use any enumerable methods post refactor

When we do `group(:queue_name).size` it returns the count by queue => {"queue_a"=>3, "queue_d"=>1} The problem is when a queue is empty it will simply be excluded from the results instead of returning count of 0. So the result we want should be => {"queue_a"=>3, "queue_b"=>0, "queue_c"=>0, "queue_d"=>1} Without returning 0, the queue count in prometheus metrics will be the last non-zero value meaning we can't auto-scale down the workers. Bug Fix Refactor

lauer mentioned this pull request Jan 24, 2024

Added basic GoodJob collectors #280

Merged

tgxworld closed this Apr 17, 2025

tgxworld reopened this May 29, 2025

benngarcia and others added 3 commits June 20, 2025 14:49

delayed_jobs_ready metric added to DelayedJob plugin

5886421

delayed_jobs_ready instance variable added to metrics

benngarcia force-pushed the main branch from 9cd5c4d to b29f9fe Compare June 20, 2025 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add delayed_jobs_ready to DelayedJobs plugin and collect_by_queue option for GoodJob plugin #302

Add delayed_jobs_ready to DelayedJobs plugin and collect_by_queue option for GoodJob plugin #302

Uh oh!

benngarcia commented Jan 23, 2024

Uh oh!

lauer commented Jan 24, 2024

Uh oh!

benngarcia commented Jan 24, 2024

Uh oh!

tgxworld commented Apr 17, 2025

Uh oh!

benngarcia commented May 8, 2025

Uh oh!

tgxworld commented May 29, 2025

Uh oh!

chrisrohr commented Jun 20, 2025

Uh oh!

benngarcia commented Jun 20, 2025

Uh oh!

Uh oh!

Add delayed_jobs_ready to DelayedJobs plugin and collect_by_queue option for GoodJob plugin #302

Are you sure you want to change the base?

Add delayed_jobs_ready to DelayedJobs plugin and collect_by_queue option for GoodJob plugin #302

Uh oh!

Conversation

benngarcia commented Jan 23, 2024

Uh oh!

lauer commented Jan 24, 2024

Uh oh!

benngarcia commented Jan 24, 2024

Uh oh!

tgxworld commented Apr 17, 2025

Uh oh!

benngarcia commented May 8, 2025

Uh oh!

tgxworld commented May 29, 2025

Uh oh!

chrisrohr commented Jun 20, 2025

Uh oh!

benngarcia commented Jun 20, 2025

Uh oh!

Uh oh!