Skip to content

Aip 86: add deadlines to DagRunResponse #50957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rawwar
Copy link
Contributor

@rawwar rawwar commented May 22, 2025

closes #50913

@boring-cyborg boring-cyborg bot added area:airflow-ctl area:API Airflow's REST/HTTP API area:UI Related to UI/UX. For Frontend Developers. labels May 22, 2025
@rawwar rawwar requested a review from ferruzzi May 22, 2025 16:43
@@ -68,6 +69,7 @@ class DAGRunResponse(BaseModel):
end_date: datetime | None
data_interval_start: datetime | None
data_interval_end: datetime | None
deadlines: list[DeadlineResponse] | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed this. A dag run can have multiple deadlines?

Copy link
Contributor Author

@rawwar rawwar May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here's an example that @ferruzzi provided:

with DAG(
    dag_id="dag_with_deadlines",
    deadline=[
        DeadlineAlert(
            reference = DeadlineReference.DAGRUN_QUEUED_AT,
            interval=timedelta(hours=1),
            callback=warn_me_it_may_be_late
            ),
        DeadlineAlert(
            reference = DeadlineReference.DAGRUN_QUEUED_AT,
            interval=timedelta(hours=2),
            callback=definitely_late
        ),
        DeadlineAlert(
            reference = DeadlineReference.DAGRUN_QUEUED_AT,
            interval=timedelta(hours=3),
            callback=email_my_boss_that_it_wasnt_my_fault
        ),
    ]
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok and then on the DagResponse we will have the full DeadlineAlert with the reference, interval and callback?

Copy link
Contributor

@bugraoz93 bugraoz93 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making this a separate endpoint and calling it for the DagRun retrieved? Simply passing the dag_run_id. This way, we don't increase the DagRunResponse too much. It already has a lot of information. We can have deadline routes as a separate if it is okay from the UI perspective

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping some deadline information on DagRunResponse is valuable because I think "show me recent dag runs with passed deadlines" would be a key feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed this. A dag run can have multiple deadlines?

To be fair, it was not part of the initial proposal but added due to overwhelming popular demand, as they say. I had several users ask me to make sure that is possible so they can do tiered/escalated response patterns and it seemed like an easy enough addition so I said I would make it happen. It may or may not have ever made it into the official AIP yet.

Copy link
Contributor

@ferruzzi ferruzzi May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "show me recent dag runs with passed deadlines" would be a key feature.

From my side, this may be an issue. I haven't implemented the "what happens when a deadline is missed"portion yet, but my intention is essentially:

===

On dag run creation check if there is deadline in the DAG. If yes then fetch or calculate the Reference, add the Interval, and store (the resulting timestamp, callback, callback kwargs, dag_id, and run_id) in the Deadlines table

On dagrun completion/cleanup, if there is a deadline then remove all rows matching this run_id from the table

In the scheduler loop get Deadline.MIN(timestamp) and if it is in the past then send the callback for execution and remove the row from the table [[TBD, the scheduler loop may loop here or may trigger a subprocess to loop here until MIN(timestamp) is not in the past]]

===

Which means that the way I have this system designed, it is inherently destructive. The rows are removed when they are no longer needed, so if you want that data to be persistent, we'll need some way to do that. It could just be a bool flag in the dagrun table for "deadline was missed" or maybe something else, I'm not sure. Just throwing it out there in case you were counting on the Deadline table to be a reliable source.

Copy link
Contributor Author

@rawwar rawwar May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scheduler loop get Deadline.MIN(timestamp) and if it is in the past then send the callback for execution and remove the row from the table [[TBD, the scheduler loop may loop here or may trigger a subprocess to loop here until MIN(timestamp) is not in the past]]

Is removing data from deadlines table necessary?

If we remove it, we have no way to show that past DagRun's have missed their SLA. Infact, there's no way for front-end to show any kind of Deadline breach alert persistently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't remove rows, maybe just mark them archieved via a boolean in the db. This way we can still access the history for UI or audit. Also adding a column completed_at to know when was the deadline completed can even be an interesting option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to change the plan at this point, please start a dev list discussion for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:airflow-ctl area:API Airflow's REST/HTTP API area:UI Related to UI/UX. For Frontend Developers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add deadline to the DagRunResponse
5 participants