Skip to content

Add verification status to Taxa views#1317

Merged
mihow merged 18 commits into
mainfrom
worktree-taxa-verification-counts
May 27, 2026
Merged

Add verification status to Taxa views#1317
mihow merged 18 commits into
mainfrom
worktree-taxa-verification-counts

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented May 22, 2026

Summary

Adds a new filter for filtering by verification status in the taxa list view. This is the beginning of a workflow that allows identifiers to quickly verify the presence of at least one occurrence of every predicted taxa. Other filters can be combined to verify the presence in different scenarios, e.g. one occurrence per taxa, per station.

Also adds the number of verified occurrences per taxon, which is recursive (shows all taxa verified within a genus, subfamily, etc).

Also refactors how subqueries are performed for calculating stats about taxa observations within a project. Most of the notes below are analysis and comments about the performance of that aspect. I am trying to go as far as we can before we start needing introduce a TaxonObserved model (proposed in #490) and start caching numbers, so that figures stay fresh and we don't have to introduce more complex invalidation logic, hooks and tests.

The taxa view should be overall faster for all queries after merging this, except when filtering by a Capture Set (SourceImageCollection) on smaller projects. Filtering by capture sets on medium & large projects currently times out on the live site now, so still considering it a win!

Screenshots

image

Performance

Measured against the deployed branch on an internal dev box copy of production data (six projects, 58k–234k occurrences each), cold queryset evaluation, cachalot flushed between every request, query count = 5 on main / 7 on this PR (two extra annotations: verified_count, id__in membership).

Default path (/api/v2/taxa/?project_id=<id>&limit=25&ordering=-occurrences_count)

project occurrences verified-rollup taxa main this PR delta
A 234,625 236 2.4s 0.7s −71%
B 179,468 707 2.3s 2.6s +13%
C 144,156 67 0.5s 0.7s +40%
D 108,370 188 0.5s 1.0s +100%
E 84,295 71 0.7s 1.0s +43%
F 58,590 261 0.3s 0.5s +67%

Mixed. Biggest project wins big (main's per-taxon EXISTS membership scales with total occurrences; the PR's materialised id__in does not). Mid-size projects pay the verified-count precompute + CASE annotation. Default-path cost on this PR scales with verified-rollup cardinality (size of the CASE map) — project B's 707-taxa rollup map is 3× bigger than project A's 236, so B is slower than A despite having fewer occurrences.

Collection path (/api/v2/taxa/?project_id=<id>&collection=<id>&limit=25)

project occurrences main this PR delta
A 234,625 TIMEOUT (>30s) 6.2s fixed
B 179,468 TIMEOUT (>30s) 12.9s fixed
C 144,156 10.7s 6.2s −42%
D 108,370 TIMEOUT (>30s) 13.4s fixed
E 84,295 TIMEOUT (>30s) 14.0s fixed
F 58,590 0.8s 6.9s +760% (regression)

PR rescues four of six projects from the 30s statement-timeout. Project F regresses hard — small project where main's correlated-subquery + EXISTS shape was already cheap, but the PR's conditional-aggregate shape pays full Taxon→Occurrence→Detection join cost regardless of selectivity. Net positive: main has 30s timeouts in production today.

Collection path on this PR is 6–14s on the four big projects. UX-wise the filter is opt-in via the capture-set dropdown, so the slow path doesn't hit the default taxa view. Tracking under a follow-up perf issue (size-based hybrid dispatch, or push verified-count out of the aggregate).

Changes

Backend

  • ami/main/models.py — new TaxonQuerySet helpers. Count logic moved off the viewset:
    • with_observation_counts_subqueries(project, request, ...) — correlated Subquery for occurrences_count, best_determination_score, last_detected. Index-served by the composite (determination_id, project_id, event_id, determination_score) index on Occurrence. Used by all non-collection paths.
    • with_observation_counts_aggregated(project, request, ...) — conditional Count(distinct=True) / Max(...) on the Taxon→occurrences reverse relation, one GROUP BY. Used by the ?collection= path.
    • observed_in_project_subqueries(project, request, ...)id__in materialised membership. Replaces main's per-taxon EXISTS.
    • with_verification_counts(project, request, ...) — Python pass over verified occurrences building {taxon_id: count} (including ancestors via parents_json), applied as a sparse CASE annotation. Filter resolves from the same map via id__in.
  • ami/main/api/views.pyTaxonViewSet.get_taxa_observed is now an orchestrator. Dispatches between with_observation_counts_subqueries and with_observation_counts_aggregated based on "collection" in request.query_params. TaxonCollectionFilter filter backend removed — redundant JOIN was a planner blow-up; the collection filter is enforced inside the aggregate now.
  • verified_count added to ordering_fields. verified=true|false registered on the filterset.
  • Migration 0087_taxon_parents_json_gin_indexCREATE INDEX CONCURRENTLY IF NOT EXISTS on Taxon.parents_json (jsonb_path_ops). Serves literal-RHS containment (the occurrence-list ?taxon=<id> filter, project default-taxa filter). Does not serve the rollup — see "Why not OuterRef" below.

Frontend

  • Sortable Verified column on the taxa list, linking to that taxon's verified occurrences in the occurrence-list.
  • Verification status filter pill (All / Verified / Unverified) → verified= query param. Reuses the existing occurrence-list filter component.
  • Verification panel on the taxon detail page showing the verified count.

Why not a per-taxon OuterRef subquery for the rollup

First implementation in this branch used parents_json @> [{"id": OuterRef("pk")}] to walk the ancestor chain per row. With the GIN jsonb_path_ops index in place this still timed out at 30s on a 1k-taxa / 17k-occurrence project — the GIN containment index only serves a constant right-hand side. An OuterRef RHS degrades to a per-row scan, the index is not used. The timeout hit even with the Verified column hidden, because the annotation is always-on and the verified=false COUNT evaluates the subquery for every taxon in the project.

The single-pass precompute + sparse CASE map is what unblocks it. The rollup runs over a sparse subset (verified occurrences are bounded by human review effort) and the CASE annotation is select-only — stripped from COUNT queries by the ORM.

The GIN index still ships because it serves the pre-existing literal-RHS consumers (the occurrence-list ?taxon=<id> filter and the project default-taxa filter via build_occurrence_default_filters_q). Not load-bearing for this PR's data path, but it closes the #1307 follow-up.

Count semantics

  • occurrences_count — direct match (unchanged).
  • verified_count — rolls up descendants. For species rows the two are equivalent. For Family / Order rows verified_count can exceed the direct occurrences_count.

Tests

ami.main.tests.TestTaxaVerification covers:

  • Hierarchical rollup to ancestors (species verifies → genus / family / order rows all increment).
  • verified=true|false filter, strict-complement property, apply_defaults interaction.
  • ordering=verified_count.
  • Detection fan-out dedup under ?collection=<id> — regression for the rollup .values() query: a verified occurrence with multiple detections must not double-count.

Full backend taxa test suite (TestTaxaVerification, TestTaxonomyViews, TestTaxonListQueryCount, TestProjectDefaultTaxaFilter, TestCachedCountsDefaultFilters) green, 40/40.

Dev-environment caveat

Projects with a verified-rollup map of ~700+ distinct taxa (like project B in the table above) trip sqlparse's 10,000-token limit inside django-debug-toolbar's reformat_sql step. The underlying query has already succeeded — the toolbar's pretty-printer chokes formatting the SQL for the panel and returns a 500 after-the-fact. Production doesn't load the toolbar; endpoint serves normally. Worth a trivial follow-up to wrap reformat_sql in try/except inside config/settings/local.py. Unrelated to the data path.

Reference

Pattern, gotchas, and dispatch matrix captured in docs/claude/reference/hierarchical-rollup-query-performance.md for future rollup work on this endpoint or others.

Test plan

  • Backend tests pass in CI (full taxa suite, 40/40).
  • Performance: main vs this PR benched against six 58k–234k-occurrence project copies on internal dev box (tables above).
  • Browser check of the column / filter pill / detail panel on the deployed branch.
  • Optional follow-up: address the small-project collection-path regression (project F: 0.8s → 6.9s) — likely a size-based dispatch heuristic or pushing verified-count out of the aggregate.

Deferred to follow-up

  • Add model-agreement stats to Taxa views #1319 — model-agreement metrics (agreed_with_prediction_count, agreed_exact_count, with_agreement). Originally bundled here, deferred because no FE consumer at merge time.
  • Presence verification workflow from the taxa view #1320 — verification workflow from the Taxa view (single-click identification modal per predicted taxon, plus optional links from the last_seen / best_score cells to their source occurrences).
  • Collection-path size-based dispatch heuristic to address the small-project regression in the table above.
  • Wrap debug_toolbar.panels.sql.utils.reformat_sql in try/except for dev-box stability with large CASE maps.

Summary by CodeRabbit

Release Notes

  • New Features

    • Species table displays a new sortable "Verified" column showing verification counts per species.
    • Species details page includes a "Verification" section with verification metrics and links to verified occurrences.
    • Species can now be filtered by verification status.
  • Chores

    • Database performance improvements for verification queries.

Review Change Stack

…lter

Adds to GET /api/v2/taxa/ (issue #1316):
- verified_count and agreed_with_prediction_count annotations (always on),
  rolled up over descendant occurrences via a hierarchical parents_json match.
- agreed_exact_count, gated behind with_agreement=true (and always on the
  detail view).
- verified=true|false filter (EXISTS / strict complement), project-scoped and
  respecting apply_default_filters.
- verified_count added to ordering_fields.

The hierarchical descendant match uses a Postgres jsonb @> containment built
from an OuterRef (literal __contains can't embed an OuterRef). Migration 0085
adds the supporting GIN index on Taxon.parents_json (jsonb_path_ops).

Frontend: sortable "Verified" column + "Verification status" filter on the taxa
list, and a Verification panel on the taxon detail page.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 04:46
@netlify
Copy link
Copy Markdown

netlify Bot commented May 22, 2026

Deploy Preview for antenna-preview ready!

Name Link
🔨 Latest commit c4132e9
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a1641539c3abb0008931235
😎 Deploy Preview https://deploy-preview-1317--antenna-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 62 (🔴 down 3 from production)
Accessibility: 81 (🔴 down 8 from production)
Best Practices: 92 (🔴 down 8 from production)
SEO: 92 (no change from production)
PWA: 80 (no change from production)
View the detailed breakdown and full score reports
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 22, 2026

Deploy Preview for antenna-ssec ready!

Name Link
🔨 Latest commit c4132e9
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a164153f8bed8000853b00d
😎 Deploy Preview https://deploy-preview-1317--antenna-ssec.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

This PR implements end-to-end per-taxon verification count aggregation: a GIN index on Taxon.parents_json, new ORM methods for hierarchical rollups via sparse CASE annotations, ViewSet refactoring to dispatch between annotation strategies, API serialization of verified_count, frontend UI for display and filtering, comprehensive test coverage, and detailed performance documentation.

Changes

Per-Taxon Verification Counts

Layer / File(s) Summary
Database index and Taxon model interface
ami/main/migrations/0087_taxon_parents_json_gin_index.py, ami/main/models.py
GIN index on parents_json using jsonb_path_ops enables fast constant-RHS JSON containment; Taxon.verified_count() accessor returns the annotated value or None.
QuerySet annotation methods
ami/main/models.py
Five new methods replace with_occurrence_counts: _case_from_map (map→CASE converter), with_observation_counts_subqueries, with_observation_counts_aggregated, observed_in_project_subqueries, and with_verification_counts (precomputes ancestor/descendant rollups via Python then applies as constant-time annotations).
ViewSet orchestration and filters
ami/main/api/views.py
Removes TaxonCollectionFilter backend; refactors get_occurrence_filters to accept optional accessor for relation-path prefixing; refactors get_taxa_observed to choose annotation strategy based on ?collection= presence, apply verification rollups, and parse verified=true/false list filters.
API serialization
ami/main/api/serializers.py
Adds verified_count field to TaxonListSerializer and TaxonSerializer to expose per-taxon verified occurrence count in responses.
Frontend data and routing
ui/src/data-services/models/species.ts, ui/src/utils/getAppRoute.ts
Species.numVerified getter exposes server verified_count; getAppRoute routing utility extends FilterType to include 'verified'.
Frontend list UI
ui/src/pages/species/species-columns.tsx, ui/src/pages/species/species.tsx
New sortable "Verified" column (right-aligned, linked to filtered occurrences); enabled by default; new FilterControl for field="verified" in sidebar.
Frontend detail page
ui/src/pages/species-details/species-details.tsx
"Verification" info block shows verified count with link and agreement-related metadata.
Comprehensive test suite
ami/main/tests.py
TestTaxaVerification validates hierarchical rollups, list field values, filter complements, ordering, apply_defaults, and collection-filtering deduplication.
Performance and design reference
docs/claude/reference/hierarchical-rollup-query-performance.md
Documents the hierarchical rollup problem, trade-offs (correlated subqueries vs sparse+CASE vs aggregation), GIN constraints, deduplication, gotchas, and future denormalization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • #1319: This PR implements the per-taxon verified_count primitives (TaxonQuerySet methods, sparse CASE-from-map pattern, GIN index, test structure) that issue #1319 explicitly states should be reused as the foundation for adding model-agreement counts; the two PRs are directly sequential.
  • #1316: This PR fully implements per-taxon verified_count on the backend (models, viewset, migration, tests) and frontend (Species model, UI columns, filters, detail view) matching the exact scope and objectives of issue #1316.

Poem

🐰 A verification tale unfolds in code,
Where parents_json holds the map's abode,
CASE expressions dance in constant time,
Ancestors and kin in hierarchic rhyme,
No more correlated bloat—just sparse delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the primary change: adding verification status display to the taxa views, which is the main feature across both backend and frontend changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description follows the required template structure with Summary, Related Issues (implied by context), Detailed Description with changes and performance analysis, Screenshots, and Deployment Notes sections.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-taxa-verification-counts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-taxon verification and model-agreement metrics to the taxa API and surfaces them in the taxa (“species”) UI, including a new verified filter and verified-count sorting/links.

Changes:

  • Backend: annotate taxa with verified_count + agreed_with_prediction_count, optionally agreed_exact_count, and add verified=true|false filtering; add GIN index migration for parents_json.
  • Frontend: add “Verified” column, verified filter control, and taxon detail “Verification” panel; propagate verified query param via routing utils.
  • Tests/docs: add API tests covering counts, gating, filter complement behavior, ordering, and apply-defaults behavior; add planning doc.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ui/src/utils/getAppRoute.ts Adds verified to supported query filter keys for route generation.
ui/src/pages/species/species.tsx Enables verified filter control and default column visibility.
ui/src/pages/species/species-columns.tsx Adds sortable “Verified” column linking to verified occurrences for a taxon.
ui/src/pages/species-details/species-details.tsx Displays verification/agreement counts on the taxon detail panel with a link to verified occurrences.
ui/src/data-services/models/species.ts Exposes new server fields via getters (verified_count, agreement counts).
docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md Planning/spec documentation for the feature and performance considerations.
ami/main/tests.py Adds TestTaxaVerification coverage for annotations, gating, filtering, ordering, and apply-defaults.
ami/main/models.py Adds Taxon stub methods for new annotated fields to support serialization.
ami/main/migrations/0085_taxon_parents_json_gin_index.py Adds concurrent GIN index on Taxon.parents_json for hierarchical containment performance.
ami/main/api/views.py Implements hierarchical “under taxon” occurrence correlation, annotations, verified filter, and ordering field addition.
ami/main/api/serializers.py Gates agreed_exact_count field on list responses unless with_agreement=true.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/main/api/views.py Outdated
Comment thread ami/main/api/views.py
…bquery

The per-taxon correlated parents_json subquery for verified_count /
agreed_with_prediction_count / agreed_exact_count did not scale: on a large
project (~1k taxa, ~17k occurrences) the taxa list timed out at the 30s
statement limit even with the column hidden and on the default sort, because
each page row (and the verified=false COUNT) ran a JSONB containment scan the
GIN index can't serve when the @> right-hand side is an OuterRef.

All three counts only concern verified occurrences (those with a non-withdrawn
Identification), which are sparse. Compute the hierarchical rollup in a single
pass over that small set in Python and apply it as constant-time CASE
annotations; resolve the verified filter from the same precomputed set via
id__in. Page values, sort, and the pagination COUNT are now constant-time.

Also fixes ancestor rollup returning 0: parents_json round-trips through the
pydantic schema field, so elements may be TaxonParent objects rather than dicts.

Measured on the large project: default page 30s timeout -> ~0.6s; verified=false
30s timeout -> ~0.2s; ordering=verified_count 30s timeout -> ~0.04s.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ami/main/api/views.py (1)

1429-1433: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gate verified_count ordering to project-scoped requests (or add a default annotation).

TaxonViewSet.ordering_fields advertises verified_count, but when project_id is not provided TaxonViewSet.get_queryset() does not call add_verification_data() and does not annotate verified_count (only occurrences_count/events_count are set to None). With NullsLastOrderingFilter, ordering by verified_count on the non-project path will attempt to order by a non-existent DB value, causing the queryset/ordering to fail. Restrict verified_count from ordering_fields when no active project is present, or annotate a default verified_count in that branch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/api/views.py` around lines 1429 - 1433, TaxonViewSet advertises
ordering by "verified_count" but get_queryset() only annotates verified_count
via add_verification_data() when a project is provided, causing ordering to fail
on non-project requests; fix by either gating TaxonViewSet.ordering_fields to
remove "verified_count" when no project is active or by adding a default
annotation for verified_count in the non-project branch of
TaxonViewSet.get_queryset() (e.g., annotate verified_count=Value(0) or similar)
so NullsLastOrderingFilter always has a column to order by; update references to
add_verification_data(), TaxonViewSet.ordering_fields, and any filters using
NullsLastOrderingFilter accordingly.
🧹 Nitpick comments (5)
docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md (5)

46-46: ⚡ Quick win

Clarify which four fields are included in detail view.

The text states "all four fields above" but only three new annotations are defined in this section: verified_count, agreed_with_prediction_count, and agreed_exact_count. The fourth field is ambiguous. If you're counting occurrences_count, that field pre-existed this feature and shouldn't be listed as part of "above."

✏️ Suggested clarification
-`GET /api/v2/taxa/<id>/` should include all four fields above unconditionally — single-row cost is negligible.
+`GET /api/v2/taxa/<id>/` should include all three verification fields above unconditionally (`verified_count`, `agreed_with_prediction_count`, `agreed_exact_count`) — single-row cost is negligible.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md` at line
46, The doc currently says "all four fields above" but only three new
annotations are listed; update the GET /api/v2/taxa/<id>/ detail description to
explicitly enumerate the four fields returned: verified_count,
agreed_with_prediction_count, agreed_exact_count, and occurrences_count (noting
that occurrences_count is pre-existing), or remove "above" and state that these
four fields will be included unconditionally in the detail view to eliminate
ambiguity.

162-165: 💤 Low value

Consider removing specific line numbers from references.

Like the earlier reference on line 38, these hardcoded line numbers will become stale as the codebase evolves. For a planning document that may be referenced later, consider referencing by module and function/class name only, or note that line numbers are approximate as of a specific date.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md` around
lines 162 - 165, The document currently pins specific line numbers for locations
like ami/main/api/views.py:1403 (TaxonViewSet), ami/main/api/views.py:1576
(get_taxa_observed), ami/main/models.py:2440 (Identification model /
agreed_with_prediction FK), and ami/main/models.py:3383
(update_occurrence_determination), which will go stale; remove the numeric line
references and instead cite only the module and symbol names (e.g.,
ami.main.api.views — TaxonViewSet; ami.main.api.views — get_taxa_observed;
ami.main.models — Identification; ami.main.models —
update_occurrence_determination) or add a short parenthetical like “line numbers
approximate as of 2026-05-20” if you want to keep a temporal anchor.

38-38: ⚡ Quick win

Avoid hardcoded line numbers in planning documentation.

The reference to ami/main/models.py:2528, 3383-3393 will become stale as the codebase evolves. Consider referencing the function name or module-level documentation instead, or note that line numbers are approximate as of the PR date.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md` at line
38, Replace the hardcoded file line references with stable identifiers: remove
"ami/main/models.py:2528, 3383-3393" and instead reference the function and
class names (e.g. the update_occurrence_determination function and
Identification.save method in ami/main/models.py) or state that the line numbers
are approximate as of the PR date; this keeps the doc resilient to code reflows
while still directing readers to the correct implementation.

25-36: ⚡ Quick win

Update implementation approach description to match actual implementation.

The document describes the implementation using "EXISTS subquery" (line 25) and "correlated subqueries per row" (lines 31, 36), but the PR objectives indicate the final implementation switched to a "single-pass computation: precompute hierarchical rollup over verified occurrences in Python, apply results as CASE annotations and id__in for the filter" to resolve performance issues. This planning document should be updated to reflect the architecture that was actually shipped.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md` around
lines 25 - 36, Update the text for TaxonViewSet's `verified=true|false` and the
three annotations (`verified_count`, `agreed_with_prediction_count`,
`agreed_exact_count`) to reflect the actual shipped architecture: replace
references to "EXISTS subquery" and "correlated subqueries per row" with a
description that we perform a single-pass computation that precomputes a
hierarchical rollup of verified occurrences in Python and then applies those
results back to the queryset as CASE annotations (for counts) and an id__in
filter (for `verified=true`), and note that `agreed_exact_count` is computed
only when `with_agreement=true` and therefore gated for performance; keep the
same behavioral semantics (descendant-inclusive via parents_json__contains or
determination_id) and preserve the existing notes about which fields are
always-on vs gated.

48-60: ⚡ Quick win

Resolve inconsistency about GIN index status.

Lines 50 mentions the GIN index is "already flagged as a follow-up to #1307" but line 57 states "Treat shipping the GIN index as a hard blocker." These statements conflict. Based on the PR objectives, the migration 0085 does include the GIN index, suggesting it was indeed treated as a blocker. Clarify the dependency status or update the language to reflect that the blocker was resolved.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md` around
lines 48 - 60, The text is inconsistent about whether the GIN index on
Taxon.parents_json is still a blocker; update the doc to state the resolved
dependency by referencing the actual migration that applied it (migration 0085)
and the index name main_taxon_parents_json_gin_idx so readers know the GIN index
was shipped, and remove or reword the "hard blocker" assertion (or conversely,
if it truly remains unshipped, change the line that says it was "already
flagged" to indicate it is planned in `#1307`); ensure you mention
Taxon.parents_json and main_taxon_parents_json_gin_idx and either mark the
blocker as resolved (index applied in migration 0085) or clearly state it is
still required before enabling recursive rollup correctness.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/api/views.py`:
- Around line 1696-1701: The count loop is iterating SQL rows from
verified_occurrences (the Occurrence queryset built with variable
verified_occurrences and annotated with _agreed_prediction_id), but when
occurrence_filters includes detections__source_image__collections a single
Occurrence can join to multiple Detection rows, inflating counts; fix by
de-duplicating Occurrence rows before the Python rollup—call
verified_occurrences = verified_occurrences.distinct('pk') (or
.values_list('pk', flat=True).distinct() and re-query if you need full objects)
immediately after the annotated queryset is built and before the
values()/counting loop that computes verified_count,
agreed_with_prediction_count, and agreed_exact_count.

---

Outside diff comments:
In `@ami/main/api/views.py`:
- Around line 1429-1433: TaxonViewSet advertises ordering by "verified_count"
but get_queryset() only annotates verified_count via add_verification_data()
when a project is provided, causing ordering to fail on non-project requests;
fix by either gating TaxonViewSet.ordering_fields to remove "verified_count"
when no project is active or by adding a default annotation for verified_count
in the non-project branch of TaxonViewSet.get_queryset() (e.g., annotate
verified_count=Value(0) or similar) so NullsLastOrderingFilter always has a
column to order by; update references to add_verification_data(),
TaxonViewSet.ordering_fields, and any filters using NullsLastOrderingFilter
accordingly.

---

Nitpick comments:
In `@docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md`:
- Line 46: The doc currently says "all four fields above" but only three new
annotations are listed; update the GET /api/v2/taxa/<id>/ detail description to
explicitly enumerate the four fields returned: verified_count,
agreed_with_prediction_count, agreed_exact_count, and occurrences_count (noting
that occurrences_count is pre-existing), or remove "above" and state that these
four fields will be included unconditionally in the detail view to eliminate
ambiguity.
- Around line 162-165: The document currently pins specific line numbers for
locations like ami/main/api/views.py:1403 (TaxonViewSet),
ami/main/api/views.py:1576 (get_taxa_observed), ami/main/models.py:2440
(Identification model / agreed_with_prediction FK), and ami/main/models.py:3383
(update_occurrence_determination), which will go stale; remove the numeric line
references and instead cite only the module and symbol names (e.g.,
ami.main.api.views — TaxonViewSet; ami.main.api.views — get_taxa_observed;
ami.main.models — Identification; ami.main.models —
update_occurrence_determination) or add a short parenthetical like “line numbers
approximate as of 2026-05-20” if you want to keep a temporal anchor.
- Line 38: Replace the hardcoded file line references with stable identifiers:
remove "ami/main/models.py:2528, 3383-3393" and instead reference the function
and class names (e.g. the update_occurrence_determination function and
Identification.save method in ami/main/models.py) or state that the line numbers
are approximate as of the PR date; this keeps the doc resilient to code reflows
while still directing readers to the correct implementation.
- Around line 25-36: Update the text for TaxonViewSet's `verified=true|false`
and the three annotations (`verified_count`, `agreed_with_prediction_count`,
`agreed_exact_count`) to reflect the actual shipped architecture: replace
references to "EXISTS subquery" and "correlated subqueries per row" with a
description that we perform a single-pass computation that precomputes a
hierarchical rollup of verified occurrences in Python and then applies those
results back to the queryset as CASE annotations (for counts) and an id__in
filter (for `verified=true`), and note that `agreed_exact_count` is computed
only when `with_agreement=true` and therefore gated for performance; keep the
same behavioral semantics (descendant-inclusive via parents_json__contains or
determination_id) and preserve the existing notes about which fields are
always-on vs gated.
- Around line 48-60: The text is inconsistent about whether the GIN index on
Taxon.parents_json is still a blocker; update the doc to state the resolved
dependency by referencing the actual migration that applied it (migration 0085)
and the index name main_taxon_parents_json_gin_idx so readers know the GIN index
was shipped, and remove or reword the "hard blocker" assertion (or conversely,
if it truly remains unshipped, change the line that says it was "already
flagged" to indicate it is planned in `#1307`); ensure you mention
Taxon.parents_json and main_taxon_parents_json_gin_idx and either mark the
blocker as resolved (index applied in migration 0085) or clearly state it is
still required before enabling recursive rollup correctness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fae23575-10c9-4680-aad2-adee1ce29d2b

📥 Commits

Reviewing files that changed from the base of the PR and between e721c8b and 16b1468.

📒 Files selected for processing (11)
  • ami/main/api/serializers.py
  • ami/main/api/views.py
  • ami/main/migrations/0085_taxon_parents_json_gin_index.py
  • ami/main/models.py
  • ami/main/tests.py
  • docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md
  • ui/src/data-services/models/species.ts
  • ui/src/pages/species-details/species-details.tsx
  • ui/src/pages/species/species-columns.tsx
  • ui/src/pages/species/species.tsx
  • ui/src/utils/getAppRoute.ts

Comment thread ami/main/api/views.py Outdated
…reference

The parents_json GIN index (migration 0085) no longer backs the verification
rollup (now a Python single-pass). Update the migration docstring and add an
inline comment at its real consumer — the literal-RHS containment in the
occurrence-list taxon filter — so the index's purpose is clear.

Add docs/claude/reference/hierarchical-rollup-query-performance.md capturing the
anti-pattern (per-taxon correlated parents_json subquery) vs the pattern
(precompute over the sparse set + CASE), the COUNT/cachalot/pydantic-field
gotchas, and the denormalized TaxonObserved direction.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow and others added 7 commits May 25, 2026 18:03
… filter

When ?collection=<id> joins Occurrence to detections, a single occurrence
yields one row per matching detection, inflating verified_count /
agreed_with_prediction_count / agreed_exact_count. Select pk and .distinct()
the values() rollup so each occurrence is counted once. Adds a regression test.

Co-Authored-By: Claude <noreply@anthropic.com>
… main merge

main merged #1305 which took 0085/0086 (project activity + recent-capture
indexes). Repoint the GIN-index migration onto 0086 and update the references
to it (views.py comment, rollup perf doc).

Co-Authored-By: Claude <noreply@anthropic.com>
The observed-set membership used a correlated EXISTS; under ?collection=<id>
the occurrence_filters join to detections turned it into a per-taxon scan,
timing out the pagination COUNT (no LIMIT to short-circuit). Replace with a
single distinct-determination id__in subquery. Default path unchanged; the
collection path drops from a 30s timeout to sub-second.

Co-Authored-By: Claude <noreply@anthropic.com>
The IN-subquery form (b92b2b0) still embedded the detections m2m join in
both the COUNT and page queries; under ?collection=<id> the planner
re-evaluated it pathologically (collection path ~87s). Materialize the
distinct-determination id set in one query (~0.2s) so COUNT/page reduce to a
plain indexed id IN (...). Mirrors the verified-filter pattern.

Co-Authored-By: Claude <noreply@anthropic.com>
…nce pass

The taxa list computed occurrences_count / best_determination_score /
last_detected as three per-taxon correlated subqueries. Index-served for the
default filters, but under ?collection=<id> the detections join turned each
into a per-row scan (~20s for a 25-row page on a large project), on top of the
already-fixed COUNT membership.

Replace them with annotate_taxon_counts: one GROUP BY over the shared filtered
occurrence set builds {taxon_id: value} maps, applied as constant-time CASE
annotations via _case_from_map. The same base feeds the verification rollup
(_annotate_verification_counts, formerly add_verification_data), and its
determination ids serve the observed-set membership filter, so no separate
membership query is needed. Count(distinct) also dedupes the collection join
fan-out for occurrences_count. No denormalized model introduced.

Co-Authored-By: Claude <noreply@anthropic.com>
The previous commit applied occurrences_count / best_determination_score /
last_detected as CASE-from-map annotations. That works for the sparse verified
counts but not for these dense aggregates: one CASE branch per observed taxon
blew past the SQL parser token limit (SQLParseError: Maximum number of tokens
exceeded) on large projects, breaking every taxa request.

Switch the dense aggregates to conditional aggregation over the Taxon->occurrences
reverse relation (Count/Max with filter=, the pattern already used for Event
counts), which is one GROUP BY with constant-size SQL. occurrences_count__gt=0
becomes the observed-set membership (HAVING). The sparse verified/agreement
counts stay as CASE annotations. get_occurrence_filters gains an accessor arg to
express the same filters through the reverse relation.

Measured on a ~1k-taxa / ~17k-occurrence project: collection-filtered list page
~0.25s and COUNT ~0.31s (was a 30s+ timeout); default/verified/ordering ~0.1-0.4s.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ami/main/tests.py (1)

4906-4906: ⚡ Quick win

Use timezone-aware datetime for timestamp.

datetime.datetime.now() returns a naive datetime. For consistency with Django's timezone settings and the rest of the codebase, consider using timezone.now() from django.utils.timezone.

⏰ Proposed fix
+from django.utils import timezone
+
         extra_detection.classifications.create(taxon=self.cardui, score=0.9, timestamp=datetime.datetime.now())
+        extra_detection.classifications.create(taxon=self.cardui, score=0.9, timestamp=timezone.now())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/tests.py` at line 4906, Replace the naive datetime used when
creating the classification: change the timestamp argument in
extra_detection.classifications.create(...) to use timezone-aware
django.utils.timezone.now() instead of datetime.datetime.now(), and add the
corresponding import (from django.utils import timezone) or reference
timezone.now() where imports live so the created classification timestamp is
timezone-aware and consistent with the project.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@ami/main/tests.py`:
- Line 4906: Replace the naive datetime used when creating the classification:
change the timestamp argument in extra_detection.classifications.create(...) to
use timezone-aware django.utils.timezone.now() instead of
datetime.datetime.now(), and add the corresponding import (from django.utils
import timezone) or reference timezone.now() where imports live so the created
classification timestamp is timezone-aware and consistent with the project.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 12fc2c33-b893-4699-a7ad-92962dd9a3e4

📥 Commits

Reviewing files that changed from the base of the PR and between 16b1468 and 838f9d7.

📒 Files selected for processing (5)
  • ami/main/api/views.py
  • ami/main/migrations/0087_taxon_parents_json_gin_index.py
  • ami/main/models.py
  • ami/main/tests.py
  • docs/claude/reference/hierarchical-rollup-query-performance.md
✅ Files skipped from review due to trivial changes (1)
  • docs/claude/reference/hierarchical-rollup-query-performance.md

mihow and others added 4 commits May 25, 2026 18:48
Including the default taxa include/exclude filter in the conditional-aggregate
filter added a parents_json containment join the planner couldn't reconcile with
the detections (?collection=) join, turning the collection page into a multi-minute
scan. It is redundant: occurrences_count groups by determination = the taxon row,
so the per-occurrence taxa filter just mirrors filter_by_project_default_taxa
(already applied to the queryset). Keep only the per-occurrence score threshold in
the aggregate; the verification base still gets the full filters (sparse, cheap).

Collection-filtered list now ~0.3s (page + COUNT); default/verified/ordering ~0.1-0.4s.

Co-Authored-By: Claude <noreply@anthropic.com>
The filter_backends chain included TaxonCollectionFilter, which unconditionally
appends queryset.filter(occurrences__detections__source_image__collections=<id>)
to the Taxon queryset on top of whatever annotate_taxon_counts already does. The
collection filter is now fully enforced inside the conditional aggregates (via
get_occurrence_filters(accessor="occurrences")) and the observed-set HAVING, so
the backend only added a redundant JOIN on the main qs. Combined with the
aggregate GROUP BY, that JOIN turned the collection-filtered taxa page into a
multi-minute scan; removing it brings ?collection= back to sub-second.

Co-Authored-By: Claude <noreply@anthropic.com>
…conditional aggregation

Captures the two findings from the collection-path timeout fix: (1) the
CASE-from-map precompute pattern only scales for sparse maps because dense
maps blow past sqlparse's 10000-token limit, and (2) the two gotchas that
turn conditional aggregation from sub-second into a multi-minute scan
(taxa filter redundant inside the aggregate, redundant collection JOIN
backend on top of the aggregate GROUP BY).

Co-Authored-By: Claude <noreply@anthropic.com>
… TaxonQuerySet

Captures the two follow-ups: (1) restore correlated Subquery aggregates for the
default path (regressed 0.82s -> 2.14s under conditional aggregation) while
keeping conditional aggregation for ?collection=; (2) move with_observation_counts
/ observed_in_project / with_verification_counts onto TaxonQuerySet to lighten
the viewset. Includes the live timings, the three gotchas to preserve, and the
commit chronology so the next session can pick up cold.

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread ami/main/api/serializers.py Outdated
Comment thread ami/main/api/serializers.py Outdated
Comment thread ami/main/api/views.py Outdated
…gregation dispatch

Lifts per-taxon count annotation logic out of TaxonViewSet into four new
TaxonQuerySet methods, matching the "Custom QuerySet Methods (Always Use These)"
pattern in CLAUDE.md.

- with_observation_counts_subqueries — three correlated Subquery annotations
  (occurrences_count / best_determination_score / last_detected), index-served by
  the composite (determination_id, project_id, event_id, determination_score)
  index on Occurrence. Default / event / deployment / verified / ordering paths.
- with_observation_counts_aggregated — conditional aggregation over the
  Taxon→occurrences reverse relation with Count(distinct) dedup. Required under
  ?collection=<id> where the detections join turns correlated subqueries into
  per-row scans.
- observed_in_project_subqueries — materialised id__in membership for the
  subquery path (the aggregate path uses HAVING via occurrences_count__gt=0).
- with_verification_counts — sparse CASE-from-map rollup of verified_count /
  agreed_with_prediction_count / (gated) agreed_exact_count over ancestor
  parents_json, with optional verified=true|false filter.

TaxonViewSet.get_taxa_observed shrinks to a dispatcher that picks the count
shape based on "collection" in request.query_params and chains the queryset
methods. _case_from_map moves to a module-level helper alongside the new
queryset methods.

Removes the now-redundant TaxonCollectionFilter backend (its INNER JOIN on
queryset.filter(occurrences__detections__source_image__collections=<id>) was
unreconcilable with the aggregate GROUP BY).

Co-Authored-By: Claude <noreply@anthropic.com>
PR scope narrowed to match issue #1316 (per-taxon verification UX). The
model-agreement signals (`agreed_with_prediction_count`,
`agreed_exact_count`, `with_agreement` query gate) had no FE consumer at
merge time and serve a different audience (ML model evaluation) than the
human-trust `verified_count`. They are deferred to follow-up issue #1319
where they can be paired with a real FE consumer and renamed to a
`model_agreed_*` prefix to disambiguate from human verifications.

Removed from the backend:
- Classification + Identification subqueries inside
  `TaxonQuerySet.with_verification_counts` for `_best_machine_taxon_id`
  and `_agreed_prediction_id`
- `include_agreement` parameter on `with_verification_counts`
- `TaxonViewSet._include_agreement` and the `with_agreement` query param
- `agreement_requested` helper and the field-pop logic in
  `TaxonListSerializer.__init__`
- `agreed_with_prediction_count` / `agreed_exact_count` fields on
  `TaxonListSerializer` and `TaxonSerializer`
- Property stubs `Taxon.agreed_with_prediction_count` /
  `Taxon.agreed_exact_count`
- Tests for the agreed counts (kept the rollup, dedup, list, and ordering
  tests for the verified count)

Removed from the frontend:
- `Species.numAgreedWithPrediction` / `Species.numAgreedExact` getters
- "Agreed with prediction" and "Matched model exactly" rows in the
  species detail view (Verification block keeps just the Verified row)

40 of the 43 taxa tests pass under CI compose; the dropped three covered
the agreed counts.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow changed the title feat(taxa): per-taxon verification & agreement counts + verified filter (#1316) Add verification status to Taxa views May 26, 2026
mihow and others added 2 commits May 26, 2026 17:56
… stale planning + prompt files [skip ci]

- Expanded `docs/claude/reference/hierarchical-rollup-query-performance.md` to be
  the single canonical reference for per-taxon rollup queries on the taxa
  endpoint. Adds the `TaxonQuerySet` API surface and dispatch decision table,
  the detection fan-out dedup pattern, and the model-agreement split-out
  rationale (deferred to issue #1319).
- Deleted `docs/claude/planning/2026-05-20-taxa-verification-guidance-ticket.md`
  (superseded by the PR description and the reference doc; the deferred
  model-agreement scope now lives on issue #1319).
- Deleted `docs/claude/prompts/NEXT_SESSION_PROMPT.md` (work shipped).

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ami/main/api/views.py (1)

1526-1576: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the collection_id alias on taxa endpoints.

get_occurrence_filters() and the aggregation switch now only look for collection, but OccurrenceCollectionFilter in this file still treats both collection_id and collection as supported. Requests like /api/v2/taxa/?project_id=...&collection_id=... will silently ignore the collection filter and take the wrong query path.

🔧 Minimal fix
-        collection_id = self.request.query_params.get("collection")
+        collection_id = self.request.query_params.get("collection") or self.request.query_params.get(
+            "collection_id"
+        )
...
-        use_aggregation = "collection" in request.query_params
+        use_aggregation = any(param in request.query_params for param in ("collection", "collection_id"))

Also applies to: 1645-1647

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/api/views.py` around lines 1526 - 1576, get_occurrence_filters
currently only reads "collection" from query params so requests using the legacy
"collection_id" alias are ignored; update get_occurrence_filters (and the
related aggregation switch logic) to accept both
request.query_params.get("collection") and
request.query_params.get("collection_id") (e.g. use collection_id =
self.request.query_params.get("collection") or
self.request.query_params.get("collection_id")) so the existing
OccurrenceCollectionFilter behavior is preserved and taxa endpoints honoring
collection_id continue to filter correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@ami/main/api/views.py`:
- Around line 1526-1576: get_occurrence_filters currently only reads
"collection" from query params so requests using the legacy "collection_id"
alias are ignored; update get_occurrence_filters (and the related aggregation
switch logic) to accept both request.query_params.get("collection") and
request.query_params.get("collection_id") (e.g. use collection_id =
self.request.query_params.get("collection") or
self.request.query_params.get("collection_id")) so the existing
OccurrenceCollectionFilter behavior is preserved and taxa endpoints honoring
collection_id continue to filter correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5364474b-4745-4814-949d-a907c80120dc

📥 Commits

Reviewing files that changed from the base of the PR and between e014a73 and c4132e9.

📒 Files selected for processing (6)
  • ami/main/api/serializers.py
  • ami/main/api/views.py
  • ami/main/models.py
  • ami/main/tests.py
  • docs/claude/reference/hierarchical-rollup-query-performance.md
  • ui/src/data-services/models/species.ts
💤 Files with no reviewable changes (2)
  • ui/src/data-services/models/species.ts
  • ami/main/api/serializers.py

Copy link
Copy Markdown
Collaborator Author

@mihow mihow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in dev environment with large projects with many taxa & verifications.

@mihow mihow merged commit f585ddc into main May 27, 2026
5 checks passed
@mihow mihow deleted the worktree-taxa-verification-counts branch May 27, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants