-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
Description
It seems like there is some race condition when posting a build command from inside a build that has restarted automatically -- long running builds that celery kills after 1 hour and "restarts".
The issue can be summarized as:
- Build starts and maybe sphinx-build step takes more than 1 hour. Celery kills this build. This is a separate, known issue.
- On restarting the task, at some point posting a build command will result in a 404 1.
- This causes the build to restart yet again, in a further broken state. Not sure how to explain this yet.
- Eventually the build also fails due to 403 from the build token expiring.
The underlying problem is hard to describe and it might be suffering from a timing issue as well. For one of the events on that issue, the build currently has 3 build commands:
In [11]: list(Build.objects.get(pk=28816462).commands.values_list("pk", flat=True))
Out[11]: [278037179, 278037193, 278037205]
But the Sentry exception notes we were trying to patch()
to this build command:
Client Error 404: https://{host}/api/v2/command/278032405/
If we inspect build command 278032405, it doesn't exist at all.
Footnotes
-
Sentry Issue: READTHEDOCS-ORG-T7M ↩