Commit 13203e4
Allow admins to customize the time gap between sessions (#1292)
* feat(projects): wire session_time_gap_seconds into event grouping
The session_time_gap_seconds project setting was added in #918 but
never used. group_images_into_events() still hardcoded a 120-minute
gap. This wires the field through and exposes it in the Processing
settings UI so each project can override the session boundary.
Closes #906
Co-Authored-By: Claude <noreply@anthropic.com>
* feat(api): add regroup-sessions action to deployment viewset
POST /api/v2/deployments/<pk>/regroup-sessions/ queues the existing
ami.tasks.regroup_events Celery task for the deployment. The task
calls group_images_into_events(), which now reads the project's
session_time_gap_seconds setting.
User-facing terminology is "sessions"; "Event" is retained as the
internal model name for backwards compatibility.
Returns 202 with task_id, deployment_id, project_id.
Refs #906
Co-Authored-By: Claude <noreply@anthropic.com>
* test(grouping): cover cross-midnight bursts and project-setting fallback
Three new TestImageGrouping cases that exercise the new
session_time_gap_seconds wiring:
- test_cross_midnight_bursts_split_by_short_gap: documents the user-
reported pain (2 bursts with off-window > default gap, crossing
midnight, split into 2 Events on different dates) and proves that
bumping max_time_gap past the off-window collapses them to 1 Event.
- test_same_date_bursts_merge_regardless_of_gap: documents the inverse
behavior — bursts on the same calendar date collide on group_by
regardless of gap setting. Acts as a regression target for any future
rework of the date-keyed Event reuse path (per the #904 caveat at
models.py:1516).
- test_session_time_gap_seconds_is_used_when_no_explicit_gap: verifies
group_images_into_events falls back to the project setting when no
max_time_gap is passed, including refreshing the cached relation
after the setting changes.
Refs #906
Co-Authored-By: Claude <noreply@anthropic.com>
* feat(perms,tasks): regroup_sessions_deployment perm + idempotent regroup task
Addresses takeaway-review feedback on PR #1292:
- Add `regroup_sessions_deployment` custom Project permission so non-superusers
with the ProjectManager role can hit `POST /deployments/<pk>/regroup-sessions/`
(mirrors `sync_deployment`). Without this, `BaseModel.check_custom_permission`
computes the codename `regroup_sessions_deployment` and the perm doesn't
exist, so the action 403s for everyone except superusers.
- Wrap `ami.tasks.regroup_events` in a per-deployment cache lock. Concurrent
enqueues (double-clicks, sync_captures auto-regroup racing with manual
trigger) collapse to a single run instead of overlapping on the same rows.
Lower-level than the view so all callers (Deployment.save, admin bulk
action, new API action) are protected.
- Defensive clamp on `session_time_gap_seconds` — fall back to the historical
120-minute default if the value is 0 / negative, with a warning log. Stops
pathological "every gap = new event" behavior when admins enter bad values.
- Migration 0085 adds the new perm to `Project.Meta.permissions` and grants it
to existing `ProjectManager` role groups via a data migration (mirrors how
0084 revoked `delete_job`).
- Tests:
- test_invalid_session_time_gap_falls_back_to_default exercises 0 / -1 / -7200
- test_regroup_events_task_is_idempotent_under_concurrent_calls pre-takes
the cache lock, asserts a second call skips group_images_into_events
Co-Authored-By: Claude <noreply@anthropic.com>
* refactor(regroup): wrap regroup in Job framework, fix lock, unify call sites
Replaces the direct-Celery API/admin entry points with a new
RegroupEventsJob so admins can see regroup progress and failures in the
Jobs UI like every other long-running task. The existing
DataStorageSyncJob now runs grouping as an explicit second stage instead
of relying on Deployment.save() autoregroup, so a post-sync regroup
failure flips the Job to FAILURE rather than silently succeeding
(closes #1157, #1158).
Other changes in this commit:
* Move the per-deployment idempotency lock from ami.tasks.regroup_events
into group_images_into_events() itself, so all four call sites
(RegroupEventsJob, sync-stage regroup, Deployment.save autoregroup,
the bare Celery task) collapse to a single in-flight run rather than
only the Celery-task entry point being protected.
* Use a uuid token on the lock value and only delete the key if it
still matches our token. Fixes the lock-release-clobbers-newer-owner
race CodeRabbit flagged.
* Add run_regroup_events_job permission and grant it to MLDataManager
(which ProjectManager inherits from). regroup_sessions_deployment
stays on ProjectManager. Both wired through the renamed migration
0085_add_regroup_permissions.
* Strip the UI form field for session_time_gap_seconds; field is now
admin-only until we decide whether it belongs on Project or Deployment.
* Swap the admin "regroup" and "sync" bulk actions to create Jobs so
the admin path matches the API path.
Tests:
- TestImageGrouping: replaced the old task-level idempotency test with
one that exercises the new in-function lock; added a token-release
test that verifies an expired-then-reacquired lock isn't clobbered.
- TestRegroupEventsJob (new): end-to-end Job run, stats populated, and
a failure-propagates assertion.
- TestDataStorageSyncJobIncludesRegroupStage (new): sync Job exposes
the regroup stage and a regroup failure surfaces as Job failure.
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(migrations): rebase regroup perms migration onto main's 0088
Main has moved on to migration 0088. Rename 0085_add_regroup_permissions
to 0089 and depend on 0088_detection_det_srcimg_created_idx to resolve
the multiple-leaf-nodes conflict CI surfaced.
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(regroup): correct events_created accounting + duplicate-timestamp log; add e2e command
The regroup stage param `events_created` was incrementing for every event
visited rather than only for events newly created by `get_or_create`. The bool
returned from `get_or_create` was being discarded. Track `events_created_count`
separately and use it in the stage-param write.
Drive-by: `logger.warning(f"Found {len(values)} ...")` was reporting the
character length of the newline-joined duplicate-timestamp report rather than
the duplicate count. Pre-existing (since 0954c38, 2024-09-12) but sits inside
the very block whose stats this PR now surfaces. Use `duplicate_timestamp_count`,
and cap the in-log sample at 20 timestamps so very dirty deployments don't
flood logs.
Add `manage.py test_regroup_job_e2e {regroup|sync|concurrent}` to exercise the
new Job framework path against a real stack: `regroup` runs a `RegroupEventsJob`
and dumps stage params; `sync` runs a `DataStorageSyncJob` and asserts the
two-stage chain; `concurrent` fires two regroups back-to-back and verifies the
lock semantics. Mirrors `test_ml_job_e2e`. Used to validate PR #1292 against
deployment 74 of the Antenna dev stack.
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(admin): speed up deployment list view (~12× faster)
The DeploymentAdmin list view ran four live per-row aggregates for each
deployment displayed:
- `start_date` → `SourceImage.objects.filter(event__deployment=obj).aggregate(Min)`
(joins SourceImage→Event, scans the full per-deployment timeline)
- `end_date` → `SourceImage.objects.filter(deployment=obj).aggregate(Max)`
- `events_count` → `obj.events.count()`
- `captures_count` → re-read of `data_source_total_files` (cached, but
wrapped in a method that suggested it was computed)
With 251 deployments on the page and ~5 expensive queries per row, the list
view became unusable on stacks holding deployments with >100k captures
(e.g. one deployment with 638k captures dominated end_date's wall time).
Deployment already denormalizes `captures_count`, `events_count`,
`data_source_total_size`, `first_capture_timestamp` and
`last_capture_timestamp` via `update_calculated_fields()`. Switch the admin
list to read those instead and drop the custom aggregates. Add
`@admin.display(ordering=...)` so the columns remain sortable against the
underlying cached fields.
Wall time on the dev box (251 deployments): ~0.6s warm, ~0.76s cold.
Co-Authored-By: Claude <noreply@anthropic.com>
* feat(jobs): human-readable regroup stage param names
The regroup stage params were stored with machine-style names
(`captures_grouped`, `events_created`, ...), which is what the Jobs UI
displays to admins. Other JobType stages already use title-cased,
space-separated names (`Total files`, `Captures added`, etc.); this commit
brings the regroup stage in line.
Param **names** (UI labels):
Captures grouped, Events created, Events touched,
Empty events deleted, Duplicate timestamps,
Ungrouped captures, Captures missing timestamp.
Param **keys** (for retrieval, slugify(name) with `-` → `_`):
captures_grouped, events_created, events_touched,
empty_events_deleted (was `events_deleted_empty`),
duplicate_timestamps, ungrouped_captures,
captures_missing_timestamp (was `no_timestamp_captures`).
Five keys are stable; two changed shape. Existing Job rows keep their
historical name/key on disk — only new runs use the new labels.
Centralise the list as `REGROUP_STAGE_PARAM_NAMES` in ami.jobs.models so
RegroupEventsJob and DataStorageSyncJob's regroup stage share one source of
truth. In `_group_images_into_events_locked`, swap the kwargs-style
`update_stage(...)` call (which couldn't carry names with spaces) for an
explicit `add_or_update_stage_param(stage_key, name, value)` loop driven by
a label→value dict.
Update tests to look up by the new keys and assert the human-readable names
exist; update `test_regroup_job_e2e` to print `name [key]: value` so the
stable retrieval key is visible alongside the UI label.
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(regroup): replace assert with validation; restore update_children on sync path
Two CodeRabbit findings on the latest revision.
1. `ami/main/api/views.py:376` used `assert deployment.project` to guard the
API action. `assert` is stripped under `python -O`, so a deployment whose
project FK is null (the schema allows it — `Project` FK is nullable on
`Deployment`) would 500 with a confusing AttributeError later in the view
instead of returning a clean 400. Replace with an explicit
`api_exceptions.ValidationError` that returns 400 with a structured
detail.
2. `Deployment.sync_captures()`'s new `regroup_after=False` branch (added in
this PR so `DataStorageSyncJob` can drive the regroup as a separate
tracked stage) bypassed `Deployment.save()`'s `update_calculated_fields=
True` block, which is where `self.update_children()` realigns child
`Event`/`Occurrence`/`SourceImage` `project` pointers. Result: a sync Job
whose deployment had moved between projects would leave child rows with
stale project FKs, breaking project-scoped queries (default filters,
visibility checks, etc.). The user-triggered `RegroupEventsJob` path is
unaffected because it doesn't run sync_captures — the regression is
specific to the new sync→regroup chain.
Add an explicit `self.update_children()` call after the
`update_calculated_fields(save=True)` refresh on the regroup_after=False
branch, gated on `project_id` like the corresponding block in
`Deployment.save()`.
Existing regroup tests still pass (4/4).
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(jobs): warn on sync→regroup lock-miss instead of re-enqueuing
CodeRabbit raised a real concern about the sync→regroup chain: if a
concurrent regroup holds the lock when DataStorageSyncJob's regroup stage
runs, the lock-miss branch in `group_images_into_events` returns `[]` and
the just-synced captures sit ungrouped until the next save/sync.
A previous draft of this fix added a post-lock re-enqueue
(`regroup_events.delay()` after the `with _regroup_lock:` block) that called
`deployment_events_need_update` and chained a follow-up task. A fresh
review found that approach risks cascading enqueues on busy deployments —
the lock serialises work but not enqueueing, so concurrent
`Deployment.save` autoregroups during a slow run can pile redundant tasks
into the queue (no infinite loop, but tens of redundant runs plausible).
Drop the re-enqueue. Instead, surface a clear warning on the Job itself
when the sync stage's regroup is skipped: we know `total_files` from the
preceding sync stage, so we can say "sync added N files but regroup was
skipped" directly on the Job that an admin is looking at. The captures
will be picked up by the next Deployment.save autoregroup
(models.py:1110-1113), which is the existing safety net for the general
case.
Trade-off vs the previous draft: a sync that races with a concurrent
regroup defers its new captures to the next save/sync (small window, no
data loss) but does NOT silently appear to have succeeded — the warning
lands in the Job log where it belongs.
No new tests; existing 4 regroup/lock tests + 12 grouping tests pass
(16/16).
Co-Authored-By: Claude <noreply@anthropic.com>
* docs(project): expand session_time_gap_seconds help_text to flag manual-regroup requirement
The previous help text just said "Time gap in seconds to consider a new
session" — it didn't tell the user that changing the value doesn't
regroup existing captures, which is the most common point of confusion
when admins edit the field.
Now points users to the Deployments admin "Regroup captures into
sessions" bulk action so they can apply the change. Surfaces via
Django admin form, DRF browsable API, and OPTIONS schema.
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(regroup): apply takeaway-review findings
Four fixes surfaced by an independent takeaway review:
- Count actual SourceImage rows assigned to an event for the "Captures
grouped" stat instead of the distinct-timestamp count. Two captures can
share a timestamp, so the old count understated the work done.
- Lower the regroup lock TTL and the matching Celery time limits to 10 min
soft / 11 min hard. group_images_into_events is mostly batch SQL and
finishes in seconds, so a tight limit bounds the worst-case stuck-lock
window when a worker dies hard (OOM, SIGKILL, eviction) and the finally
block never runs to release the lock.
- Document that RegroupEventsJob and the bare regroup_events task only
group; propagating project_id to children stays on Deployment.save().
- Log a clear "regroup skipped" message when the per-deployment lock is
already held, instead of a misleading "now has 0 events".
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>1 parent 0e329fb commit 13203e4
12 files changed
Lines changed: 1119 additions & 77 deletions
File tree
- ami
- jobs
- management/commands
- tests
- main
- api
- migrations
- models_future
- users
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
672 | 672 | | |
673 | 673 | | |
674 | 674 | | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
675 | 691 | | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
676 | 702 | | |
677 | 703 | | |
| 704 | + | |
| 705 | + | |
678 | 706 | | |
679 | 707 | | |
680 | 708 | | |
| |||
683 | 711 | | |
684 | 712 | | |
685 | 713 | | |
| 714 | + | |
686 | 715 | | |
687 | | - | |
| 716 | + | |
688 | 717 | | |
689 | 718 | | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
690 | 724 | | |
691 | 725 | | |
692 | 726 | | |
693 | 727 | | |
694 | 728 | | |
695 | 729 | | |
696 | 730 | | |
697 | | - | |
698 | | - | |
699 | | - | |
700 | | - | |
701 | | - | |
702 | | - | |
703 | | - | |
704 | | - | |
705 | | - | |
706 | 731 | | |
707 | | - | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
708 | 742 | | |
709 | | - | |
710 | | - | |
711 | | - | |
712 | | - | |
713 | | - | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
714 | 767 | | |
715 | | - | |
716 | | - | |
717 | 768 | | |
| 769 | + | |
| 770 | + | |
718 | 771 | | |
719 | 772 | | |
720 | 773 | | |
| |||
834 | 887 | | |
835 | 888 | | |
836 | 889 | | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
837 | 938 | | |
838 | 939 | | |
839 | 940 | | |
| |||
847 | 948 | | |
848 | 949 | | |
849 | 950 | | |
| 951 | + | |
850 | 952 | | |
851 | 953 | | |
852 | 954 | | |
| |||
0 commit comments