Skip to content

Endpoint for stats about verified occurrences#1307

Open
mihow wants to merge 17 commits into
mainfrom
feat/human-model-agreement-endpoint
Open

Endpoint for stats about verified occurrences#1307
mihow wants to merge 17 commits into
mainfrom
feat/human-model-agreement-endpoint

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented May 14, 2026

Summary

Adds GET /api/v2/occurrences/stats/model-agreement/ — verified-occurrence rate + human↔model agreement rates (with Wilson 95% confidence intervals and Cohen's kappa) over the same filter set the /occurrences/ list view accepts. Designed for the project-overview dashboard widget and occurrence-list sidebar panel (consumed by #1308).

Stats viewset convention established in #1296 (see docs/claude/reference/api-stats-pattern.md): scalar response under the entity it's computed over, namespaced under /stats/.

Filter parity

Stats endpoint accepts every query param the /occurrences/ list endpoint accepts, minus ordering/search (don't apply to scalars).

Param Source Notes
project_id=<int> ProjectMixin required
apply_defaults=true/false apply_default_filters Defaults to true. Bypass = ignore project default taxa lists + score thresholds.
taxon=<id> or determination=<id> CustomOccurrenceDeterminationFilter Hierarchical via parents_json — matches the taxon and all descendants.
event=<id> DjangoFilterBackend
deployment=<id> DjangoFilterBackend
determination__rank=<RANK> DjangoFilterBackend e.g. SPECIES, GENUS, FAMILY.
detections__source_image=<id> DjangoFilterBackend
collection=<id> or collection_id=<id> OccurrenceCollectionFilter Capture collection containing the occurrence's detections.
algorithm=<id> (repeatable) OccurrenceAlgorithmFilter Inclusive list of detection-algorithm ids.
not_algorithm=<id> (repeatable) OccurrenceAlgorithmFilter Exclusive list.
date_start=<YYYY-MM-DD> OccurrenceDateFilter ISO date.
date_end=<YYYY-MM-DD> OccurrenceDateFilter ISO date.
verified=true/false OccurrenceVerified Has-any-ident filter (independent of identifier).
verified_by_me=true/false OccurrenceVerifiedByMeFilter Auth-gated — short-circuits to no-op for anon.
taxa_list_id=<id> OccurrenceTaxaListFilter
not_taxa_list_id=<id> OccurrenceTaxaListFilter

Backed by the same OCCURRENCE_FILTER_BACKENDS + OCCURRENCE_FILTERSET_FIELDS tuples wired into OccurrenceViewSet, so the two endpoints stay in lock-step.

Endpoint-specific param

Param Default Notes
agreement_coarsest_rank=<RANK> absent (no threshold) Optional. When supplied, response includes agreed_coarser_rank_* counting only LCAs at or deeper than the given rank. Accepts any TaxonRank name (case-insensitive); UNKNOWN and unknown strings → 400.

Response shape

{
  "project_id": 18,
  "total_occurrences": 43149,
  "verified_count": 45,
  "verified_pct": 0.001,
  "verified_with_prediction_count": 24,
  "no_prediction_count": 21,
  "agreed_exact_count": 12,
  "agreed_exact_pct": 0.5,
  "agreed_exact_ci_low": 0.3146,
  "agreed_exact_ci_high": 0.6854,
  "agreed_any_rank_count": 17,
  "agreed_any_rank_pct": 0.7083,
  "agreed_any_rank_ci_low": 0.5023,
  "agreed_any_rank_ci_high": 0.8521,
  "cohens_kappa": 0.41,
  "agreement_coarsest_rank": null,
  "agreed_coarser_rank_count": null,
  "agreed_coarser_rank_pct": null
}

With ?agreement_coarsest_rank=FAMILY, the bottom three fields populate:

{
  "agreement_coarsest_rank": "FAMILY",
  "agreed_coarser_rank_count": 14,
  "agreed_coarser_rank_pct": 0.5833
}

Field semantics

  • verified_* = at least one non-withdrawn Identification.
  • verified_with_prediction_count = verified AND has a machine prediction; used as the denominator for agreed_*_pct since occurrences with no prediction can't agree or disagree.
  • no_prediction_count = verified but no machine prediction (surfaced so consumers can see why the agreement denominator differs from verified_count).
  • agreed_exact_* = user's best identification taxon equals the model's best prediction.
  • agreed_any_rank_* = exact matches plus disagreements whose LCA is at any real taxonomic rank (UNKNOWN excluded, since it sorts after SPECIES in TaxonRank.OrderedEnum). The upstream filter (e.g. a Lepidoptera include list) is what bounds the meaningful scope, not a hardcoded threshold in this function.
  • agreed_*_ci_low / agreed_*_ci_high = Wilson score 95% confidence interval bounds for the corresponding agreed_*_pct. Wilson stays inside [0, 1] and is honest at the small n typical of verified sets, where the normal approximation breaks down. null when verified_with_prediction_count is 0. Reads as "94%, somewhere between 87% and 97%" — wide when few verified, tight when many.
  • cohens_kappa = exact-taxon Cohen's kappa: human↔model agreement beyond chance, range [-1, 1] (negative = worse than chance). Plain agreement % rewards luck — in a project dominated by one common species, human and model agree most of the time just by both naming the common one. Kappa subtracts that expected-by-chance agreement. null when there are no doubly-classified occurrences or expected agreement is exactly 1.0 (a single taxon category, undefined). Computed from the same human/model pairs already in memory — no extra query.
  • agreed_coarser_rank_* = exact matches plus disagreements whose LCA is at the supplied agreement_coarsest_rank or deeper. null when no threshold supplied.
  • agreement_coarsest_rank = the threshold rank that was applied (echoed back to the caller). null when the param was absent.

Disagreement counts are not surfaced explicitly — derivable as verified_with_prediction_count - agreed_*_count.

Usage examples

# Whole project, project defaults applied
curl '.../api/v2/occurrences/stats/model-agreement/?project_id=18'

# Bypass project default filters (broader denominator)
curl '.../api/v2/occurrences/stats/model-agreement/?project_id=18&apply_defaults=false'

# One deployment
curl '.../api/v2/occurrences/stats/model-agreement/?project_id=18&deployment=42'

# Apply a coarser-rank threshold (also count LCAs at FAMILY or deeper)
curl '.../api/v2/occurrences/stats/model-agreement/?project_id=18&agreement_coarsest_rank=FAMILY'

# One taxon and its descendants (hierarchical match)
curl '.../api/v2/occurrences/stats/model-agreement/?project_id=18&taxon=567'

# Multiple algorithms (repeated param)
curl '.../api/v2/occurrences/stats/model-agreement/?project_id=18&algorithm=3&algorithm=7'

# Combine: deployment + collection + bypass project defaults
curl '.../api/v2/occurrences/stats/model-agreement/?project_id=18&deployment=42&collection=99&apply_defaults=false'

The frontend consumer (#1308) wraps this in useModelAgreement(projectId, filters), which accepts an arbitrary filter map (including arrays for repeated params), so the occurrence list page's filter state can be threaded through unchanged.

Implementation notes

  • Base queryset is deduplicated (queryset.distinct()) before counting so the join chain from apply_default_filters (e.g. verified_by_meIdentification, taxa_list_idparents_json) can't inflate total_occurrences vs the verified branch.
  • Aggregation is scoped to the verified set (occurrences with at least one non-withdrawn identification), which is typically a small fraction of total. The expensive correlated subqueries — best user identification (over Identification) and best machine prediction (over Classification) — evaluate only on verified rows, not on the full filtered queryset.
  • LCA on disagreements is deduplicated to distinct (user_taxon, machine_taxon) pairs before computation.
  • Reuses apply_default_filters so apply_defaults=false bypasses project default taxa lists + score thresholds.
  • *_pct fields are bounded to [0.0, 1.0] in the serializer; cohens_kappa to [-1.0, 1.0].
  • Wilson CI + Cohen's kappa are pure-Python over the verified_rows list already materialized for the agreement counts — one extra pass, zero extra DB queries.

Bench

Project 18 (43,149 occurrences, 45 verified): 928ms → 350ms cold / 146ms warm after scoping subqueries to the verified set.

Across all production projects with non-zero identifications:

Project Total Verified Cold Warm
P#85 SEC-SEQ AI Symposium 36,253 13,140 1.18s 343ms
P#20 Barro Colorado Island 40,958 1,351 0.92s 153ms
P#84 Pennsylvania Tebufenozide 18,407 251 0.56s 139ms
P#24 Atlantic Forestry Centre 2,797 274 0.50s 203ms
P#46 AMBER - Panama 10,248 48 0.44s 125ms
P#23 Insectarium de Montréal 20,393 74 0.43s 190ms
P#16 Aarhus Ecoscience 644 8 0.38s 128ms
P#18 Vermont Atlas of Life 43,149 45 0.35s 146ms
P#49 Marc Bélisle's Lab 439 71 0.32s 128ms
P#38 MothBox - Gamboa 21 10 0.31s 122ms

Pre-rework state on project 18 with apply_defaults=false: 159s curl timeout.

Test plan

  • ami.main.tests.TestLcaRankBetween — 7 unit tests including UNKNOWN-rank regression.
  • ami.main.tests.TestModelAgreementForProject — empty-project + 4-bucket canonical case + coarsest_rank threshold filtering.
  • ami.main.tests.TestOccurrenceStatsViewSet — HTTP coverage for envelope shape, draft-project 404, filter passthrough, apply_defaults=false bypass, exact-match happy path, sister-species any-rank bucket, invalid rank → 400, UNKNOWN rejection, threshold echo, CI/kappa field presence + null-on-empty.
  • ami.main.tests.TestWilsonInterval + ami.main.tests.TestCohensKappa — 9 pure-Python unit tests (textbook 8/10 CI value, unit-interval clamping, interval-tightens-with-n; kappa known 2×2 value, perfect agreement, single-category-undefined, negative kappa, empty).
  • Full model-agreement suite green locally: 37/37. The any-rank bucket test was made deterministic (it previously flaked ~50% on the random fixture taxon — pinned to two fixed Vanessa species).
  • Live smoke against project 18 + 9 other production projects, including the heaviest verified-set (P#85, 13,140 idents).

This PR is backend-only. The frontend consumer — the useModelAgreement hook + the occurrence-list stats panel — lives in #1308.

Follow-ups (out of scope, calling out for next rounds)

apply_default_filters is the dominant cost on hot stats paths

Update: #1317 (taxa verification counts) addresses part of this — its perf rewrite (precompute + CASE aggregation) eliminated a 30s timeout on the per-taxon stats path that has the same root cause (the apply_default_filters + valid() stack being applied per-row). The compound index + GIN index below are still outstanding follow-ups.

For the heaviest project (P#85, 36k post-filter occurrences) the agreement subqueries on the verified set run in <50ms. The rest of the response time is the apply_default_filters + valid() filter stack on Occurrence. EXPLAIN ANALYZE on P#85 reveals:

  1. No compound index on (project_id, determination_score) — Postgres does a Parallel Seq Scan on main_occurrence and discards 195,549 rows by filter to find 36,253 matching the project-default score threshold. Hot path: ~60ms.
  2. No GIN index on Taxon.parents_json — for projects with default include/exclude taxa lists, the parents_json__contains JSONB containment check is a row-by-row evaluation. P#85 has no taxa lists so this didn't show up here, but it would dominate for projects that do (e.g. via OccurrenceFilter's recursive taxa filter).
  3. valid()'s anti-join to main_detection is fine (index scan, 36k loops on hot cache, <60ms).

This affects every endpoint that calls apply_default_filters() or Occurrence.objects.valid()/occurrences/, /captures/, /events/, and the other stats actions on this viewset. Anywhere a project default threshold is non-zero, the same seq-scan is happening.

Note: the entire bench table above was measured without these indexes (they don't exist yet), so those numbers are the no-index baseline — worst case 1.18s cold / 343ms warm. That's acceptable for a cached dashboard widget, so the indexes are a follow-up, not a blocker for this PR.

Likely cheap wins:

  • CREATE INDEX CONCURRENTLY main_occurrence_project_score_idx ON main_occurrence (project_id, determination_score) — index range scan instead of seq-scan-then-filter.
  • CREATE INDEX CONCURRENTLY main_taxon_parents_json_gin_idx ON main_taxon USING gin (parents_json jsonb_path_ops) — index lookup for parents_json__contains instead of full-row JSONB eval.

Filter-driven occurrence exports

This PR's filter parity wiring (OCCURRENCE_FILTER_BACKENDS + OCCURRENCE_FILTERSET_FIELDS) sets up a natural follow-up: let users click "Export" on /occurrences/ with the current filters applied and get a job whose output matches that filtered set, without first materializing a SourceImageCollection. The export infra already has a "filters JSON → re-run backends in worker" pattern (ami/exports/utils.py:13-72 generate_fake_request() + apply_filters()) but is hardwired to OccurrenceCollectionFilter. Wiring the same shared backend tuple into the exporter would close the gap.

Explicit auth gate on stats viewset

OccurrenceStatsViewSet uses IsActiveStaffOrReadOnly. verified_by_me=true from an anon caller is safe today only because OccurrenceVerifiedByMeFilter.filter_queryset short-circuits on is_authenticated. Worth an explicit gate at the viewset level rather than relying on the filter's internal short-circuit.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features
    • Added a new API endpoint to retrieve agreement metrics between human and machine predictions, including agreement rates, confidence intervals, and statistical measures like Cohen's kappa. Optional filtering by coarser taxonomic ranks is now supported.

Review Change Stack

Copilot AI review requested due to automatic review settings May 14, 2026 17:44
@netlify
Copy link
Copy Markdown

netlify Bot commented May 14, 2026

Deploy Preview for antenna-preview ready!

Name Link
🔨 Latest commit 336c1fe
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a16f1b150815300089266c5
😎 Deploy Preview https://deploy-preview-1307--antenna-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 61 (🔴 down 4 from production)
Accessibility: 81 (🔴 down 8 from production)
Best Practices: 92 (🔴 down 8 from production)
SEO: 92 (no change from production)
PWA: 80 (no change from production)
View the detailed breakdown and full score reports
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 14, 2026

Deploy Preview for antenna-ssec ready!

Name Link
🔨 Latest commit 336c1fe
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a16f1b112143e00083d3fd4
😎 Deploy Preview https://deploy-preview-1307--antenna-ssec.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR adds a stats endpoint /occurrences/stats/model-agreement/ that computes verified-occurrence counts and human–model agreement metrics (exact, any-rank, optional coarser-rank) over filtered occurrences, plus Wilson 95% CIs and Cohen's kappa, with supporting utilities, serializer, tests, and docs update.

Changes

Model Agreement Stats Endpoint

Layer / File(s) Summary
Statistical utility functions
ami/utils/stats.py
New module adds WILSON_Z_95, wilson_interval() for 95% Wilson confidence bounds on proportions, and cohens_kappa() for inter-rater agreement; both return rounded, bounded results or None when undefined.
Taxonomic LCA and agreement computation
ami/main/models_future/occurrence.py, ami/main/tests.py
Adds TaxonTuple type, lca_rank_between() to find deepest shared non-UNKNOWN rank, and model_agreement_for_project() to compute deduplicated totals, verified subsets, exact/any/coarser-rank agreement counts/percents, Wilson CIs, and Cohen's kappa. Includes unit tests for LCA, stats helpers, and agreement aggregation.
API response serializer
ami/main/api/serializers.py
Adds ModelAgreementSerializer describing project-scoped totals, verified counts/pcts, exact and any-rank agreement counts/pcts with nullable Wilson CI fields, nullable Cohen's kappa, and optional coarser-rank agreement fields keyed to agreement_coarsest_rank.
API endpoint wiring and exposure
ami/main/api/views.py, ami/main/tests.py
Centralizes occurrence filter settings, adds filter plumbing to OccurrenceStatsViewSet, and implements a non-paginated model_agreement action that validates project visibility and agreement_coarsest_rank, applies default and DRF filters, computes agreement via model_agreement_for_project, and returns serialized results. HTTP tests cover validation, filtering, CI/kappa presence, coarsest-rank handling, and default-filter bypass.
API reference documentation
docs/claude/reference/api-stats-pattern.md
Updates example endpoint path to GET /occurrences/stats/model-agreement/.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ViewSet as OccurrenceStatsViewSet
    participant DRFFilter as DRF Filters
    participant Logic as model_agreement_for_project
    participant Serializer as ModelAgreementSerializer
    
    Client->>ViewSet: GET /stats/model-agreement?project_id=X
    ViewSet->>ViewSet: Validate project visibility
    ViewSet->>ViewSet: Parse & validate agreement_coarsest_rank
    ViewSet->>ViewSet: Build base queryset with defaults
    ViewSet->>DRFFilter: filter_queryset()
    DRFFilter->>ViewSet: Filtered QuerySet
    ViewSet->>Logic: Compute agreement metrics
    Logic->>Logic: Deduplicate, count verified/unverified
    Logic->>Logic: Compute exact & any-rank agreement
    Logic->>Logic: Calculate rates, Wilson CIs, Cohen's kappa
    Logic->>ViewSet: Agreement dict
    ViewSet->>Serializer: Serialize result
    Serializer->>Client: JSON response (counts, rates, CIs, kappa)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Poem

🐰 I nudge through taxon trees with nimble feet,
Counting matches where human and model meet.
Wilson bounds hush the noisy debate,
Kappa whispers chance we should negate —
A hop, a stat, agreement looks sweet!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Endpoint for stats about verified occurrences' accurately describes the main change: a new API endpoint for model agreement statistics. It is clear and specific enough for a teammate to understand the primary addition.
Description check ✅ Passed The description includes Summary, List of Changes (detailed), Related Issues reference, Detailed Description with comprehensive field semantics, test plan coverage, and Implementation notes. However, the Checklist section is absent entirely.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/human-model-agreement-endpoint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new scalar stats endpoint GET /occurrences/stats/human-model-agreement/ that reports verified-occurrence and human-vs-model agreement rates (exact and "under-order") for an occurrence queryset, reusing the existing /occurrences/ filter stack and apply_default_filters.

Changes:

  • New aggregation helper human_model_agreement_for_project plus pure-Python lca_rank_between over Taxon.parents_json in ami/main/models_future/occurrence.py.
  • New action on OccurrenceStatsViewSet plus HumanModelAgreementSerializer; extracts OccurrenceViewSet filter backends/fields into module-level tuples (OCCURRENCE_FILTER_BACKENDS, OCCURRENCE_FILTERSET_FIELDS) shared by both viewsets.
  • React Query hook useHumanModelAgreement and supporting planning/scoping docs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
ami/main/models_future/occurrence.py New LCA helper + Python-side aggregation function over a pre-filtered Occurrence queryset.
ami/main/api/views.py Extracts shared occurrence filter config and adds human_model_agreement action on OccurrenceStatsViewSet.
ami/main/api/serializers.py New HumanModelAgreementSerializer describing the response shape.
ami/main/tests.py Unit tests for lca_rank_between, aggregation tests, and HTTP-level tests for the new action.
ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts New typed React Query hook for the endpoint.
docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md Implementation plan document for the feature.
docs/claude/planning/occurrence-filter-driven-exports.md Side-research scoping stub for filter-driven exports (out of scope of this PR).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/api/views.py
Comment thread ami/main/tests.py Outdated
Comment thread ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
ami/main/api/serializers.py (1)

1765-1769: ⚡ Quick win

Constrain percentage fields to the documented 0.0..1.0 range

These fields are contractually bounded; adding serializer bounds gives fast failure on accidental regressions and keeps response validation self-documenting.

Proposed diff
-    verified_pct = serializers.FloatField(help_text="verified_count / total_occurrences")
+    verified_pct = serializers.FloatField(
+        min_value=0.0,
+        max_value=1.0,
+        help_text="verified_count / total_occurrences",
+    )
@@
-    agreed_exact_pct = serializers.FloatField(help_text="agreed_exact_count / verified_count")
+    agreed_exact_pct = serializers.FloatField(
+        min_value=0.0,
+        max_value=1.0,
+        help_text="agreed_exact_count / verified_count",
+    )
@@
-    agreed_under_order_pct = serializers.FloatField(help_text="agreed_under_order_count / verified_count")
+    agreed_under_order_pct = serializers.FloatField(
+        min_value=0.0,
+        max_value=1.0,
+        help_text="agreed_under_order_count / verified_count",
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/api/serializers.py` around lines 1765 - 1769, The percentage fields
verified_pct, agreed_exact_pct, and agreed_under_order_pct are currently
unbounded; update their declarations to add validation bounds (min_value=0.0,
max_value=1.0) on the serializers.FloatField instances so the serializer
enforces the documented 0.0..1.0 contract and fails fast on invalid values.
ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts (2)

4-13: 💤 Low value

Consider renaming Response to avoid shadowing the global DOM type.

Response is the name of the global fetch response type. Shadowing it at module scope is harmless today but creates a foot-gun if anyone later references the DOM Response in this file. A domain-prefixed name (e.g., HumanModelAgreementResponse) is clearer at call sites too.

♻️ Proposed rename
-interface Response {
+interface HumanModelAgreementResponse {
   project_id: number
   total_occurrences: number
   verified_count: number
   verified_pct: number
   agreed_exact_count: number
   agreed_exact_pct: number
   agreed_under_order_count: number
   agreed_under_order_pct: number
 }
@@
-  const { data, isLoading, isFetching, error } = useAuthorizedQuery<Response>({
+  const { data, isLoading, isFetching, error } = useAuthorizedQuery<HumanModelAgreementResponse>({
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts`
around lines 4 - 13, Rename the module-scoped interface named Response to a
domain-specific name (e.g., HumanModelAgreementResponse) to avoid shadowing the
global DOM Response type; update the interface declaration and all references to
it within useHumanModelAgreement.ts (and any exported types/imports) so code
that needs the DOM Response can still reference it unambiguously and call sites
use the new HumanModelAgreementResponse identifier.

20-32: ⚡ Quick win

Single-value filter map drops multi-value query params (e.g., algorithm).

OccurrenceAlgorithmFilter reads algorithm and not_algorithm via request.query_params.getlist(...) on the backend, so callers can legitimately pass multiple algorithm IDs. The current Record<string, string | number | boolean | undefined> plus params.set(...) collapses any such filter to a single value, so this hook can't fully reproduce the /occurrences/ filter set the PR objectives describe.

Consider widening the value type and switching to append per item:

♻️ Proposed change
 export const useHumanModelAgreement = (
   projectId?: string,
-  filters?: Record<string, string | number | boolean | undefined>
+  filters?: Record<
+    string,
+    string | number | boolean | Array<string | number> | undefined
+  >
 ) => {
   const url = `${API_URL}/${API_ROUTES.OCCURRENCES}/stats/human-model-agreement/`

   const params = new URLSearchParams()
   if (projectId) params.set('project_id', projectId)
   if (filters) {
     Object.entries(filters).forEach(([key, value]) => {
-      if (value !== undefined && value !== '' && value !== null) {
-        params.set(key, String(value))
-      }
+      if (value === undefined || value === null || value === '') return
+      if (Array.isArray(value)) {
+        value.forEach((v) => {
+          if (v !== undefined && v !== null && v !== '') {
+            params.append(key, String(v))
+          }
+        })
+      } else {
+        params.set(key, String(value))
+      }
     })
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts`
around lines 20 - 32, The hook useHumanModelAgreement currently types filters as
Record<string, string | number | boolean | undefined> and calls params.set(...),
which collapses multi-value query params; update the filters param type to allow
string[] (e.g., Record<string, string | number | boolean | string[] |
undefined>) and when iterating Object.entries(filters) detect arrays and call
params.append(key, String(item)) for each element (fall back to params.set for
single values), ensuring multi-value keys like "algorithm" and "not_algorithm"
are preserved in the generated URL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/models_future/occurrence.py`:
- Line 187: The code eagerly materializes the entire QuerySet into memory by
doing occurrences = list(qs); change this to a memory-safe iteration or paging
approach: replace the full list() with chunked processing using
qs.iterator(chunk_size=1000) or loop over qs in paginated batches (e.g.,
Paginator or manual offset/limit) and aggregate/write results per-chunk, and
avoid prefetching everything at once (adjust or remove the prefetch_related on
identifications or use values()/only()/defer() to limit fetched fields) so
memory usage stays bounded; update any downstream logic that expects a full list
to work with incremental processing or collect results into a streaming response
instead.

In `@docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md`:
- Around line 26-43: The fenced code block listing project files lacks a
language tag (triggering markdownlint MD040); update the opening fence for the
block that contains entries like "ami/ ... occurrence.py # ADD:
human_model_agreement_for_project(), _lca_rank_of() helper", "serializers.py #
ADD: HumanModelAgreementSerializer", and "useHumanModelAgreement.ts # ADD: typed
React Query hook" to include a language identifier (e.g., ```text) so the block
is explicitly labeled; keep the same block content and closing fence unchanged.

---

Nitpick comments:
In `@ami/main/api/serializers.py`:
- Around line 1765-1769: The percentage fields verified_pct, agreed_exact_pct,
and agreed_under_order_pct are currently unbounded; update their declarations to
add validation bounds (min_value=0.0, max_value=1.0) on the
serializers.FloatField instances so the serializer enforces the documented
0.0..1.0 contract and fails fast on invalid values.

In `@ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts`:
- Around line 4-13: Rename the module-scoped interface named Response to a
domain-specific name (e.g., HumanModelAgreementResponse) to avoid shadowing the
global DOM Response type; update the interface declaration and all references to
it within useHumanModelAgreement.ts (and any exported types/imports) so code
that needs the DOM Response can still reference it unambiguously and call sites
use the new HumanModelAgreementResponse identifier.
- Around line 20-32: The hook useHumanModelAgreement currently types filters as
Record<string, string | number | boolean | undefined> and calls params.set(...),
which collapses multi-value query params; update the filters param type to allow
string[] (e.g., Record<string, string | number | boolean | string[] |
undefined>) and when iterating Object.entries(filters) detect arrays and call
params.append(key, String(item)) for each element (fall back to params.set for
single values), ensuring multi-value keys like "algorithm" and "not_algorithm"
are preserved in the generated URL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a97a61f4-518e-4bf7-b6b5-fb325dc4e97d

📥 Commits

Reviewing files that changed from the base of the PR and between aeb57c1 and b81a987.

📒 Files selected for processing (7)
  • ami/main/api/serializers.py
  • ami/main/api/views.py
  • ami/main/models_future/occurrence.py
  • ami/main/tests.py
  • docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md
  • docs/claude/planning/occurrence-filter-driven-exports.md
  • ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts

Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md Outdated
@mihow mihow marked this pull request as draft May 14, 2026 20:01
@mihow mihow changed the title feat(occurrence-stats): add /occurrences/stats/human-model-agreement/ endpoint Endpoint for stats about verified occurrences May 14, 2026
@annavik
Copy link
Copy Markdown
Member

annavik commented May 14, 2026

Oh yes!!

mihow added a commit that referenced this pull request May 15, 2026
… review fixes

Captures: review findings from Copilot + CodeRabbit, perf bench evidence
(43k rows → 159s timeout on apply_defaults=false), and the planned changes
for the next session (rename to model-agreement, push aggregation into
SQL/ORM, fix UNKNOWN rank LCA + denominator + verified_by_me anon gap +
test gaps).

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request May 15, 2026
…ion to SQL

Addresses review feedback on PR #1307:

Rename (drop "human"):
- URL: /occurrences/stats/human-model-agreement/ -> /model-agreement/
- Function: human_model_agreement_for_project -> model_agreement_for_project
- Serializer: HumanModelAgreementSerializer -> ModelAgreementSerializer
- Viewset action + url_path: human_model_agreement -> model_agreement
- FE hook: useHumanModelAgreement -> useModelAgreement (file + symbol)
- FE type: Response -> ModelAgreementResponse (fixes DOM Response shadow)
- Test class: TestHumanModelAgreementForProject -> TestModelAgreementForProject

SQL push-down (Copilot+CodeRabbit perf flag):
- Replace list(qs) full-row materialization with annotated aggregate().
- Annotate best_user_taxon_id via Subquery over Identification
  (BEST_IDENTIFICATION_ORDER). Drop the prefetch + select_related("taxon")
  on identifications since only taxon_id is read.
- aggregate() Count(filter=Q(...)) for total/verified/exact/no-prediction.
- For under-order disagreement: group disagreement set by distinct
  (user_taxon, machine_taxon) pair before LCA. Each pair's LCA runs once.
- Bench against project 18 (43,149 occurrences): pre-rework apply_defaults=false
  curl timed out at 159s; post-rework 1.96s unfiltered / 3.4s with bypass
  (93,019 occurrences post-filter).

Denominator fix (Copilot):
- agreed_*_pct now divides by verified_with_prediction_count instead of
  verified_count. A verified occurrence with no machine prediction can't
  agree or disagree; including it in the denominator drags the rate down
  without representing actual model disagreement.
- Surface no_prediction_count + verified_with_prediction_count as sibling
  fields so consumers can see how many such occurrences exist.

UNKNOWN rank bug (Copilot):
- TaxonRank.UNKNOWN sorts after SPECIES in OrderedEnum definition order,
  so without explicit exclusion UNKNOWN >= ORDER is True and a shared
  UNKNOWN ancestor would wrongly count as under-order agreement. Filter
  UNKNOWN out of lca_rank_between's candidate ranks. Add regression test.

Tests:
- New: test_unknown_rank_excluded_from_lca (LCA regression)
- New: test_agreement_under_order_bucket (HTTP coverage for sister-species
  case, previously only exact-match shortcut was exercised)
- Updated: happy-path asserts verified_with_prediction_count and
  no_prediction_count.

22/22 backend tests green:
  docker compose exec django python manage.py test
    ami.main.tests.TestLcaRankBetween
    ami.main.tests.TestModelAgreementForProject
    ami.main.tests.TestOccurrenceStatsViewSet

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow changed the title Endpoint for stats about verified occurrences feat(occurrence-stats): /occurrences/stats/model-agreement/ endpoint May 15, 2026
@mihow mihow changed the title feat(occurrence-stats): /occurrences/stats/model-agreement/ endpoint Endpoint for stats about verified occurrences May 15, 2026
@mihow mihow marked this pull request as ready for review May 21, 2026 00:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ui/src/data-services/hooks/occurrences/stats/useModelAgreement.ts (1)

22-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Support repeated query params for multi-select filters.

Record<string, primitive> + params.set(...) drops repeated keys, so multi-value filters (e.g., repeated algorithm / not_algorithm) can’t be forwarded faithfully from occurrence filters.

💡 Proposed fix
-export const useModelAgreement = (
-  projectId?: string,
-  filters?: Record<string, string | number | boolean | undefined>
-) => {
+type FilterPrimitive = string | number | boolean
+type FilterValue = FilterPrimitive | FilterPrimitive[] | null | undefined
+
+export const useModelAgreement = (
+  projectId?: string,
+  filters?: Record<string, FilterValue>
+) => {
   const url = `${API_URL}/${API_ROUTES.OCCURRENCES}/stats/model-agreement/`

   const params = new URLSearchParams()
   if (projectId) params.set('project_id', projectId)
   if (filters) {
     Object.entries(filters).forEach(([key, value]) => {
-      if (value !== undefined && value !== '' && value !== null) {
-        params.set(key, String(value))
-      }
+      if (Array.isArray(value)) {
+        value.forEach((item) => {
+          if (item !== undefined && item !== '' && item !== null) {
+            params.append(key, String(item))
+          }
+        })
+        return
+      }
+      if (value !== undefined && value !== '' && value !== null) {
+        params.set(key, String(value))
+      }
     })
   }
+  const queryString = params.toString()

   const { data, isLoading, isFetching, error } =
     useAuthorizedQuery<ModelAgreementResponse>({
       queryKey: [
         API_ROUTES.OCCURRENCES,
         'stats',
         'model-agreement',
         projectId,
-        filters,
+        queryString,
       ],
-      url: `${url}?${params.toString()}`,
+      url: `${url}?${queryString}`,
     })

Also applies to: 38-46

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ui/src/data-services/hooks/occurrences/stats/useModelAgreement.ts` around
lines 22 - 33, The current implementation converts multi-value filters into a
Record<string, primitive> and uses params.set(...), which overwrites duplicate
query keys and loses multi-select filters; update the filters type to allow
arrays (e.g., Record<string, string | number | boolean | string[] | undefined>),
and when building URLSearchParams switch to using params.append(...) for
repeated values: if the filter value is an array loop and params.append(key,
String(v)) for each entry, otherwise call params.append(key, String(value));
ensure the same change is applied in the other occurrence mentioned (the block
around the second params handling at lines ~38-46). Use the existing params and
filters identifiers so the change is localized.
🧹 Nitpick comments (1)
docs/claude/prompts/NEXT_SESSION_PROMPT.md (1)

1-86: ⚡ Quick win

Planning document appears stale and may confuse future readers.

This file is titled "Next session" and describes tasks "for this session" (lines 7-68), but according to the PR objectives summary, the work described here has already been completed:

  • Renaming from "human-model-agreement" to "model-agreement" ✓
  • SQL aggregation push ✓
  • UNKNOWN rank bug fix ✓
  • Denominator fix (verified_with_prediction_count) ✓

Including a "NEXT_SESSION_PROMPT" document that describes already-completed work as if it's pending creates confusion for future developers who might try to execute these tasks again or wonder what state the codebase is in.

Additionally, line 5 references the old endpoint URL that will 404 after the renaming.

Consider one of:

  1. Archive/rename this to docs/claude/planning/2026-05-14-session-notes-pr-1307.md (historical record, past tense)
  2. Remove it if the other planning doc at line 18 already serves as the historical record
  3. Add a header clearly stating "Historical planning document - work completed in commits X-Y"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/claude/prompts/NEXT_SESSION_PROMPT.md` around lines 1 - 86, The
"NEXT_SESSION_PROMPT.md" planning doc is stale and misleading; update it by
either (a) renaming/archiving it (e.g., to
docs/claude/planning/2026-05-14-session-notes-pr-1307.md) and leaving as
historical record, (b) deleting it if redundant, or (c) editing the top of
NEXT_SESSION_PROMPT.md to a clear "Historical planning document — work completed
in commits <sha-range>" header and update/remove the old endpoint URL reference;
ensure you touch the file named NEXT_SESSION_PROMPT.md and fix the line that
references the old endpoint URL
(http://localhost:8000/api/v2/occurrences/stats/human-model-agreement/?) so it
no longer points to a non-existent route and include the completed-commits SHAs
or a pointer to the merged PR in the header.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/models_future/occurrence.py`:
- Around line 201-213: The count is taken from the raw queryset (total =
queryset.count()) but the verified branch uses a deduped queryset (.distinct()),
so duplicates in the incoming queryset can inflate total; change to operate on a
deduplicated base queryset (e.g., replace/count using queryset.distinct() or
assign deduped = queryset.distinct() and use deduped for total and downstream
operations like the block that builds verified_rows and any other aggregations)
so that total, verified_rows and agreement numerators use the same deduplicated
set (refer to total, verified_rows and the use of .distinct() in this
file/function).

In `@docs/claude/prompts/NEXT_SESSION_PROMPT.md`:
- Line 86: The TODO about updating MEMORY.md is incomplete—either perform the
update or remove/clarify the note: add a new entry named
project_pr_1307_human_model_agreement.md into MEMORY.md summarizing the current
PR state (references to PR `#1307`, the plan doc at
docs/claude/planning/occurrence-filter-driven-exports.md, and the exported
stub), or delete the parenthetical “(TODO this session start)” and replace it
with a clear status line (e.g., “updated” or “needs follow-up”) so the commit
message and NEXT_SESSION_PROMPT.md reflect an accurate, actionable state.

---

Outside diff comments:
In `@ui/src/data-services/hooks/occurrences/stats/useModelAgreement.ts`:
- Around line 22-33: The current implementation converts multi-value filters
into a Record<string, primitive> and uses params.set(...), which overwrites
duplicate query keys and loses multi-select filters; update the filters type to
allow arrays (e.g., Record<string, string | number | boolean | string[] |
undefined>), and when building URLSearchParams switch to using
params.append(...) for repeated values: if the filter value is an array loop and
params.append(key, String(v)) for each entry, otherwise call params.append(key,
String(value)); ensure the same change is applied in the other occurrence
mentioned (the block around the second params handling at lines ~38-46). Use the
existing params and filters identifiers so the change is localized.

---

Nitpick comments:
In `@docs/claude/prompts/NEXT_SESSION_PROMPT.md`:
- Around line 1-86: The "NEXT_SESSION_PROMPT.md" planning doc is stale and
misleading; update it by either (a) renaming/archiving it (e.g., to
docs/claude/planning/2026-05-14-session-notes-pr-1307.md) and leaving as
historical record, (b) deleting it if redundant, or (c) editing the top of
NEXT_SESSION_PROMPT.md to a clear "Historical planning document — work completed
in commits <sha-range>" header and update/remove the old endpoint URL reference;
ensure you touch the file named NEXT_SESSION_PROMPT.md and fix the line that
references the old endpoint URL
(http://localhost:8000/api/v2/occurrences/stats/human-model-agreement/?) so it
no longer points to a non-existent route and include the completed-commits SHAs
or a pointer to the merged PR in the header.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c229eb53-6f25-4d87-bd0b-622992ee75eb

📥 Commits

Reviewing files that changed from the base of the PR and between b81a987 and 0924027.

📒 Files selected for processing (8)
  • ami/main/api/serializers.py
  • ami/main/api/views.py
  • ami/main/models_future/occurrence.py
  • ami/main/tests.py
  • docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md
  • docs/claude/prompts/NEXT_SESSION_PROMPT.md
  • docs/claude/reference/api-stats-pattern.md
  • ui/src/data-services/hooks/occurrences/stats/useModelAgreement.ts
✅ Files skipped from review due to trivial changes (1)
  • docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md

Comment thread ami/main/models_future/occurrence.py
Comment thread docs/claude/prompts/NEXT_SESSION_PROMPT.md Outdated
mihow pushed a commit that referenced this pull request May 21, 2026
…pport

Renames `agreed_under_order_*` → `agreed_any_rank_*` to match the
backend's dropped ORDER threshold. Adds optional `agreement_coarsest_rank` +
`agreed_coarser_rank_*` fields to the response type (not consumed by the UI
yet — the stats panel still renders `verified_pct` + `agreed_any_rank_pct`).

Also widens `filters` to accept arrays and appends repeated query params so
multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via
`request.query_params.getlist(...)`) survive. Same fix CodeRabbit flagged
in PR #1307 review.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow pushed a commit that referenced this pull request May 21, 2026
…pport

Renames `agreed_under_order_*` → `agreed_any_rank_*` to match the
backend's dropped ORDER threshold. Adds optional `agreement_coarsest_rank` +
`agreed_coarser_rank_*` fields to the response type (not consumed by the UI
yet — the stats panel still renders `verified_pct` + `agreed_any_rank_pct`).

Also widens `filters` to accept arrays and appends repeated query params so
multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via
`request.query_params.getlist(...)`) survive. Same fix CodeRabbit flagged
in PR #1307 review.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow pushed a commit that referenced this pull request May 21, 2026
One-line field rename in the occurrence stats panel to match the backend's
dropped ORDER threshold. Hook type rename + multi-value filter support
landed on the base branch (4a92c0b on #1307).

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented May 21, 2026

Claude says: Pushed two commits addressing this round of CodeRabbit feedback + a follow-on design extension the user signed off on.

Code changes (0565f06)feat(occurrence-stats): drop ORDER threshold; add coarsest_rank query param

  • Drops the hardcoded lca >= TaxonRank.ORDER agreement gate. Replaces agreed_under_order_* with agreed_any_rank_* (exact + any non-null LCA at a real rank; UNKNOWN still excluded). Idea: the upstream filter (e.g. a Lepidoptera include list) already bounds the meaningful scope, so a fixed threshold in this function is unnecessary noise.
  • Adds optional ?agreement_coarsest_rank=<RANK> for callers who do want an explicit floor. When supplied, response also includes agreed_coarser_rank_* (exact + LCAs at-or-deeper-than the threshold). Always echoes the applied rank in agreement_coarsest_rank so consumers can disambiguate. Invalid rank or UNKNOWN → 400.
  • Addresses CodeRabbit:
    • Dedupes base queryset (queryset.distinct()) so default-filter joins (verified_by_me, taxa_list_id) can't inflate total_occurrences vs the verified branch.
    • Bounds the *_pct FloatField serializers to [0.0, 1.0].
  • Tests: existing buckets renamed; new cases for the coarsest-rank threshold filtering shallow LCAs, invalid rank → 400, UNKNOWN rejection, and the threshold echo.

FE follow-up (4a92c0b)feat(ui): align model-agreement hook with BE rename + multi-value query params

  • Renames agreed_under_order_*agreed_any_rank_* in the hook's response type; adds the optional coarser-rank fields.
  • Widens filters to accept arrays and uses params.append(...) so multi-value filters (algorithm, not_algorithm — backend reads via request.query_params.getlist(...)) survive instead of getting collapsed by params.set.

Tests: 19/19 in TestModelAgreementForProject + TestOccurrenceStatsViewSet pass (--keepdb). Live render on localhost:4000 still shows VERIFIED 0% / AGREEMENT 71% on P18.

Stale NEXT_SESSION_PROMPT.md flagged in the latest review is deferred — useful as narrative for this rename and not on the production path.

🤖 Generated with Claude Code

mihow pushed a commit that referenced this pull request May 21, 2026
One-line field rename in the occurrence stats panel to match the backend's
dropped ORDER threshold. Hook type rename + multi-value filter support
landed on the base branch (4a92c0b on #1307).

Co-Authored-By: Claude <noreply@anthropic.com>
mihow pushed a commit that referenced this pull request May 21, 2026
One-line field rename in the occurrence stats panel to match the backend's
dropped ORDER threshold. Hook type rename + multi-value filter support
landed on the base branch (4a92c0b on #1307).

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request May 22, 2026
useModelAgreement.ts belongs with the frontend consumer (#1308), not the
backend endpoint PR. Keeps #1307 backend-only.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow pushed a commit that referenced this pull request May 22, 2026
One-line field rename in the occurrence stats panel to match the backend's
dropped ORDER threshold. Hook type rename + multi-value filter support
landed on the base branch (4a92c0b on #1307).

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request May 22, 2026
Typed React Query wrapper for /occurrences/stats/model-agreement/.
Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow mentioned this pull request May 22, 2026
4 tasks
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented May 26, 2026

Claude says: Follow-up from #1316 / #1317, which builds the per-taxon version of this. Two findings here that are relevant to this PR:

1. The parents_json GIN index follow-up — shipped, but it isn't the fix.

#1316 carried over the suggestion to add a GIN index on Taxon.parents_json (jsonb_path_ops) as the cure for slow higher-rank (Family/Order) rollups. We shipped it (migration 0085). Measured outcome: with the index in place, the per-taxon rollup still timed out (30s statement limit) on a production-scale project (~1k observed taxa, ~17k occurrences).

Reason: the rollup ran as a correlated subquery with parents_json @> [{"id": OuterRef(...)}], and a GIN index can't serve a containment whose right-hand side is an OuterRef — it's a per-row parameter, not a constant. So it degraded to a per-row scan, run once per page row plus once per taxon in the pagination COUNT. The index is still worth keeping (it serves the literal-RHS containment filters: occurrence-list taxon=<id>, the default-taxa filter), but it does not fix correlated rollups.

2. The pattern that did fix it — relevant to this endpoint's in-memory concern.

The reviews here flagged model_agreement_for_project materializing the full filtered queryset into Python on large projects. The approach we landed in #1317 sidesteps both problems at once: the counts only concern verified occurrences (those with a non-withdrawn Identification), which are sparse — bounded by human review effort, not total occurrences. Compute the rollup in a single pass over just that small set → constant-time CASE annotations (and id__in for the membership filter). Same project went from 30s timeout to ~0.3s across the default list / verified=true|false / ordering=verified_count. Cost is O(verified occurrences × ancestor depth), not O(all occurrences) — might be a cheaper path for this endpoint's large-project case too.

Writeup with the anti-pattern vs pattern and the gotchas (pagination COUNT annotation stripping, cachalot polluting repeated benchmarks, django-pydantic-field .values() returning pydantic objects rather than dicts): docs/claude/reference/hierarchical-rollup-query-performance.md (added in #1317).

mihow added a commit that referenced this pull request May 26, 2026
… review fixes

Captures: review findings from Copilot + CodeRabbit, perf bench evidence
(43k rows → 159s timeout on apply_defaults=false), and the planned changes
for the next session (rename to model-agreement, push aggregation into
SQL/ORM, fix UNKNOWN rank LCA + denominator + verified_by_me anon gap +
test gaps).

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/api/views.py`:
- Around line 1404-1417: The code treats an empty query param as omission
because it only branches when coarsest_rank_param is truthy; change the logic in
the agreement_coarsest_rank handling so that you distinguish between None
(missing) and empty string: if
request.query_params.get("agreement_coarsest_rank") is None keep coarsest_rank =
None, otherwise if coarsest_rank_param.strip() == "" raise
api_exceptions.ValidationError for "agreement_coarsest_rank" (reject blank), and
then continue to map via TaxonRank[...] and handle KeyError/TaxonRank.UNKNOWN as
before; refer to the coarsest_rank_param variable, TaxonRank enum, and the
existing ValidationError branches in ami/main/api/views.py.

In `@ami/main/models_future/occurrence.py`:
- Around line 229-291: The agreement denominator currently uses
verified_with_pred (count of verified rows with a machine prediction) but all
numerators and cohens_kappa drop rows where best_user_taxon_id is None, causing
inconsistent rates; create a new cohort (e.g., verified_both_present or
verified_with_both) that filters verified_rows to rows where both
best_user_taxon_id and best_machine_prediction_taxon_id are not None, compute
agreed_exact, agreed_any_rank, agreed_coarser_rank, the wilson CIs (exact_ci,
any_rank_ci) and cohens_kappa over that cohort, and use its length as the
denominator for _pct and CI calls so all agreement metrics and kappa are
computed from the same comparable population (update references to
verified_with_pred where appropriate and ensure pair_counts and
both_present_pairs derive from the same filtered set).

In `@ami/utils/stats.py`:
- Around line 29-35: The function computing the Wilson interval currently only
checks total but not successes, so invalid inputs like successes < 0 or
successes > total can make the sqrt negative; add an explicit validation in the
same scope (before computing phat) that checks successes is an integer/number
within [0, total] and if not, raise a clear ValueError (or TypeError for wrong
type) with a message including the invalid successes and total values; reference
the local variables successes, total, phat, z, denom, center, margin so you
place the check immediately before computing phat = successes / total to ensure
deterministic failure instead of a math-domain crash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 43600deb-32ad-4d7b-921c-0f114f388ab3

📥 Commits

Reviewing files that changed from the base of the PR and between 0924027 and 9347277.

📒 Files selected for processing (6)
  • ami/main/api/serializers.py
  • ami/main/api/views.py
  • ami/main/models_future/occurrence.py
  • ami/main/tests.py
  • ami/utils/stats.py
  • docs/claude/reference/api-stats-pattern.md
✅ Files skipped from review due to trivial changes (1)
  • docs/claude/reference/api-stats-pattern.md

Comment thread ami/main/api/views.py
Comment on lines +1404 to +1417
coarsest_rank_param = request.query_params.get("agreement_coarsest_rank")
coarsest_rank = None
if coarsest_rank_param:
try:
coarsest_rank = TaxonRank[coarsest_rank_param.upper()]
except KeyError:
valid = ", ".join(r.name for r in TaxonRank if r.name != "UNKNOWN")
raise api_exceptions.ValidationError(
{"agreement_coarsest_rank": f"Invalid rank '{coarsest_rank_param}'. Must be one of: {valid}."}
)
if coarsest_rank == TaxonRank.UNKNOWN:
raise api_exceptions.ValidationError(
{"agreement_coarsest_rank": "UNKNOWN is not a valid threshold rank."}
)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject blank agreement_coarsest_rank values.

Line 1406 only validates truthy input, so ?agreement_coarsest_rank= is treated the same as omission and returns null coarser-rank fields instead of the documented 400 for an invalid rank.

🛠️ Proposed fix
-        coarsest_rank_param = request.query_params.get("agreement_coarsest_rank")
         coarsest_rank = None
-        if coarsest_rank_param:
+        if "agreement_coarsest_rank" in request.query_params:
+            coarsest_rank_param = request.query_params.get("agreement_coarsest_rank")
+            if not coarsest_rank_param:
+                raise api_exceptions.ValidationError(
+                    {"agreement_coarsest_rank": "This parameter cannot be blank."}
+                )
             try:
                 coarsest_rank = TaxonRank[coarsest_rank_param.upper()]
             except KeyError:
                 valid = ", ".join(r.name for r in TaxonRank if r.name != "UNKNOWN")
                 raise api_exceptions.ValidationError(
🧰 Tools
🪛 Ruff (0.15.14)

[warning] 1411-1413: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/api/views.py` around lines 1404 - 1417, The code treats an empty
query param as omission because it only branches when coarsest_rank_param is
truthy; change the logic in the agreement_coarsest_rank handling so that you
distinguish between None (missing) and empty string: if
request.query_params.get("agreement_coarsest_rank") is None keep coarsest_rank =
None, otherwise if coarsest_rank_param.strip() == "" raise
api_exceptions.ValidationError for "agreement_coarsest_rank" (reject blank), and
then continue to map via TaxonRank[...] and handle KeyError/TaxonRank.UNKNOWN as
before; refer to the coarsest_rank_param variable, TaxonRank enum, and the
existing ValidationError branches in ami/main/api/views.py.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also use proper validation here? add Enum support to SingleParamSerializer if need be, or pydantic? Or a DRF serializer?

## Query parameters
Use `SingleParamSerializer[T].clean(...)` from `ami/base/serializers.py`.
It runs a DRF `serializers.IntegerField` / etc. through the standard
validation pipeline and raises `ValidationError` → DRF returns 400 with
the field-level error body the frontend expects.
```python
limit = SingleParamSerializer[int].clean(
param_name="limit",
field=serializers.IntegerField(required=False, min_value=1, max_value=50, default=5),
data=request.query_params,
)
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look at SingleParamSerializer implementation
ast-grep --pattern 'class SingleParamSerializer$$$'

echo "---"
rg -n "SingleParamSerializer" ami/base/serializers.py -A 30 | head -80

echo "=== api-stats-pattern.md ==="
cat docs/claude/reference/api-stats-pattern.md | head -200

echo "=== TaxonRank enum ==="
rg -n "class TaxonRank" -A 20

Length of output: 14976


🏁 Script executed:

# Confirm current imports in ami/base/serializers.py to ensure EnumField won't conflict
rg -n "^from|^import" ami/base/serializers.py | head -30
# Confirm EnumField is not already defined
rg -n "EnumField" ami/base/serializers.py

Length of output: 804



Good, no conflicts. Here's the full plan:

Task: Proper enum validation via SingleParamSerializer + EnumField

  • Add EnumField to ami/base/serializers.py
  • Update model_agreement in views.py to use it

🧬 Code Graph Analysis Results

Relevant snippets from ami/main/api/views.py

1) SourceImageCollectionViewSet._get_source_image uses SingleParamSerializer[int] for query/body param validation

ami/main/api/views.py (approx. L560-L575)

def _get_source_image(self):
    """
    Get source image from either GET query param or in the PUT/POST request body.
    """
    key = "source_image"
    source_image_id = SingleParamSerializer[int].clean(
        key,
        field=serializers.IntegerField(required=True, min_value=0),
        data=dict(self.request.data, **self.request.query_params),
    )

    try:
        return SourceImage.objects.get(id=source_image_id)
    except SourceImage.DoesNotExist:
        raise api_exceptions.NotFound(detail=f"SourceImage with id {source_image_id} not found")

2) SourceImageUploadViewSet.perform_create uses SingleParamSerializer[bool] for process_now

ami/main/api/views.py (approx. L620-L650)

def perform_create(self, serializer):
    """
    Save the SourceImageUpload with the current user and create the associated SourceImage.
    """
    from ami.base.serializers import get_current_user
    from ami.main.models import create_source_image_from_upload

    # Get current user from request
    user = get_current_user(self.request)

    # Create the SourceImageUpload object with the user
    obj = serializer.save(user=user)

    # Get process_now flag from project feature flags
    process_now = SingleParamSerializer[bool].clean(
        param_name="process_now",
        field=serializers.BooleanField(required=True),
        data=self.request.query_params,
    )

    # Create source image from the upload
    source_image = create_source_image_from_upload(
        image=obj.image,
        deployment=obj.deployment,
        request=self.request,
        process_now=process_now,
    )

    # Update the source_image reference and save
    obj.source_image = source_image
    obj.save()

3) OccurrenceStatsViewSet.top_identifiers uses SingleParamSerializer[int] for limit

ami/main/api/views.py (approx. L800-L830)

limit = SingleParamSerializer[int].clean(
    param_name="limit",
    field=serializers.IntegerField(required=False, min_value=1, max_value=50, default=5),
    data=request.query_params,
)
top_users = list(top_identifiers_for_project(project)[:limit])

4) OccurrenceStatsViewSet.model_agreement manually validates agreement_coarsest_rank via TaxonRank[...] (enum-style parsing)

ami/main/api/views.py (approx. L830-L880)

coarsest_rank_param = request.query_params.get("agreement_coarsest_rank")
coarsest_rank = None
if coarsest_rank_param:
    try:
        coarsest_rank = TaxonRank[coarsest_rank_param.upper()]
    except KeyError:
        valid = ", ".join(r.name for r in TaxonRank if r.name != "UNKNOWN")
        raise api_exceptions.ValidationError(
            {"agreement_coarsest_rank": f"Invalid rank '{coarsest_rank_param}'. Must be one of: {valid}."}
        )
    if coarsest_rank == TaxonRank.UNKNOWN:
        raise api_exceptions.ValidationError(
            {"agreement_coarsest_rank": "UNKNOWN is not a valid threshold rank."}
        )

5) EventViewSet.timeline uses Django form field cleaning for resolution_minutes

ami/main/api/views.py (approx. L230-L250)

event = self.get_object()
resolution_minutes = IntegerField(required=False, min_value=1).clean(
    request.query_params.get("resolution_minutes", 1)
)
resolution = datetime.timedelta(minutes=resolution_minutes)

6) Storage connection status uses a hardcoded list with @TODO use ENUM

ami/main/api/views.py (approx. L1000-L1025)

_STORAGE_CONNECTION_STATUS = [
    # These come from the ConnetionStatus react component
    # `@TODO` use ENUM
    "NOT_CONNECTED",
    "CONNECTING",
    "CONNECTED",
    "ERROR",
]

class StorageStatus(APIView):
    ...
    def post(self, request):
        """`@TODO` not totally sure how to use the serializer here yet."""
        ...
        data = {
            "data_source": data_source,
            "status": _STORAGE_CONNECTION_STATUS[1],
            ...
        }
        return Response(data)

🧠 Learnings used
Learnt from: mihow
Repo: RolnickLab/antenna PR: 1130
File: ami/ml/orchestration/nats_queue.py:217-217
Timestamp: 2026-02-13T23:11:30.276Z
Learning: In the RolnickLab/antenna project (Pydantic v1), use .dict() for serialization of Pydantic models instead of .model_dump(). Update code paths in Python files where model_dump() is used to ensure compatibility with Pydantic v1 behavior. If any files are later migrated to Pydantic v2, consider switching to .model_dump() or using model_dump(..., mode='json') as appropriate.

Learnt from: mihow
Repo: RolnickLab/antenna PR: 1222
File: ami/ml/orchestration/nats_queue.py:126-132
Timestamp: 2026-04-13T22:08:32.779Z
Learning: In code that uses the `nats-py` JetStream client, do not divide `ConsumerInfo.config.ack_wait` (or the `ack_wait` returned by `consumer_info()` / `add_consumer()`) by `1e9`. `nats-py` already deserializes the server’s raw nanosecond value by dividing by `_NANOSECOND` (1e9), so application-level code should treat `ack_wait` as already human-readable seconds (e.g., `30.0` means 30 seconds). Only apply nanosecond-to-second conversion if you are working with the raw server nanosecond field directly (not the `ack_wait` property exposed by `ConsumerInfo`/`ConsumerConfig`).

Learnt from: mihow
Repo: RolnickLab/antenna PR: 1214
File: ami/main/api/serializers.py:1396-1425
Timestamp: 2026-04-17T05:08:02.267Z
Learning: When serializers access related model fields (e.g., `prediction.taxon` / `prediction.algorithm` after `obj.best_prediction` in `get_best_machine_prediction`), ensure the underlying queryset uses `select_related` for those relations (e.g., `.select_related("taxon", "algorithm")`) so the serializer does not trigger per-row lazy loads (N+1 queries). If an N+1 remains due to calling `best_prediction` itself, treat it separately (it may be pre-existing and tracked elsewhere), and don’t conflate it with missing `select_related` coverage for the accessed relations.

Learnt from: mihow
Repo: RolnickLab/antenna PR: 1261
File: ami/jobs/tasks.py:162-171
Timestamp: 2026-04-21T07:40:49.264Z
Learning: In this repo, when using `cachalot_disabled` imported from `cachalot.api`, treat it as supported in both forms: `with cachalot_disabled():` and `cachalot_disabled()`. It is implemented as a `contextmanager` that also behaves as a `ContextDecorator` (Python 3.2+), so the decorator form is not a no-op and creates a fresh generator instance per invocation. During code review, do not raise false positives that claim the decorator usage is unsupported or ineffective.

Learnt from: mihow
Repo: RolnickLab/antenna PR: 1131
File: ami/exports/dwca/rows.py:96-146
Timestamp: 2026-04-22T22:40:24.069Z
Learning: For this repo, `Detection.bbox`/`BoundingBox` values use an absolute pixel coordinate space (not normalized [0–1] floats). When code constructs crops (e.g., `PIL.Image.crop((x1, y1, x2, y2))`), use integer pixel coordinates and pass them through directly—do not convert from normalized floats. When emitting DWCA bounding box measurements (e.g., in `iter_mof_rows`), ensure the bounding box row uses `measurementUnit="pixels"` to match the canonical schema in `ami/ml/schemas.py`.

The requested changes have been implemented and a pull request has been created: View PR

Comment on lines +229 to +291
verified = len(verified_rows)
no_prediction = sum(1 for r in verified_rows if r["best_machine_prediction_taxon_id"] is None)
verified_with_pred = verified - no_prediction
agreed_exact = sum(
1
for r in verified_rows
if r["best_machine_prediction_taxon_id"] is not None
and r["best_user_taxon_id"] == r["best_machine_prediction_taxon_id"]
)

# Dedupe disagreement pairs so each (user_taxon, machine_taxon) LCA runs once.
pair_counts: collections.Counter = collections.Counter()
for r in verified_rows:
m_id = r["best_machine_prediction_taxon_id"]
u_id = r["best_user_taxon_id"]
if m_id is None or u_id is None or u_id == m_id:
continue
pair_counts[(u_id, m_id)] += 1

needed_taxa_ids: set[int] = set()
for u_id, m_id in pair_counts:
needed_taxa_ids.add(u_id)
needed_taxa_ids.add(m_id)

taxa_by_id: dict[int, TaxonTuple] = {}
if needed_taxa_ids:
for t in Taxon.objects.filter(pk__in=needed_taxa_ids):
parents = [
{"id": p.id, "rank": p.rank.name if hasattr(p.rank, "name") else p.rank} for p in t.parents_json
]
taxa_by_id[t.pk] = (t.pk, t.rank, parents)

any_rank_disagreement_count = 0
coarser_rank_disagreement_count = 0
for (u_id, m_id), count in pair_counts.items():
u = taxa_by_id.get(u_id)
m = taxa_by_id.get(m_id)
if not u or not m:
continue
lca = lca_rank_between(u, m)
if lca is None:
continue
any_rank_disagreement_count += count
if coarsest_rank is not None and lca >= coarsest_rank:
coarser_rank_disagreement_count += count

agreed_any_rank = agreed_exact + any_rank_disagreement_count
agreed_coarser_rank = agreed_exact + coarser_rank_disagreement_count

# Extra stats over the same verified_rows already in memory — no extra query.
# Wilson 95% CI conveys how shaky each rate is at small n; Cohen's kappa
# (exact-taxon) discounts the agreement you'd get by chance.
exact_ci = wilson_interval(agreed_exact, verified_with_pred)
any_rank_ci = wilson_interval(agreed_any_rank, verified_with_pred)
both_present_pairs = [
(r["best_user_taxon_id"], r["best_machine_prediction_taxon_id"])
for r in verified_rows
if r["best_user_taxon_id"] is not None and r["best_machine_prediction_taxon_id"] is not None
]
kappa = cohens_kappa(both_present_pairs)

def _pct(num: int, denom: int) -> float:
return round(num / denom, 4) if denom else 0.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exclude taxon-less verifications from the agreement cohort.

Line 231 uses every verified row with a model prediction as the denominator, but Lines 232-288 drop best_user_taxon_id is None from all agreement numerators and from cohens_kappa(). In this repo Identification.taxon is nullable, so a comment-only verification will depress agreed_exact_pct / agreed_any_rank_pct even though there is no human label to compare.

Either build a separate comparable cohort (best_user_taxon_id and model taxon both present) for the agreement stats, or expose that denominator explicitly in the payload so the rates stay internally consistent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/models_future/occurrence.py` around lines 229 - 291, The agreement
denominator currently uses verified_with_pred (count of verified rows with a
machine prediction) but all numerators and cohens_kappa drop rows where
best_user_taxon_id is None, causing inconsistent rates; create a new cohort
(e.g., verified_both_present or verified_with_both) that filters verified_rows
to rows where both best_user_taxon_id and best_machine_prediction_taxon_id are
not None, compute agreed_exact, agreed_any_rank, agreed_coarser_rank, the wilson
CIs (exact_ci, any_rank_ci) and cohens_kappa over that cohort, and use its
length as the denominator for _pct and CI calls so all agreement metrics and
kappa are computed from the same comparable population (update references to
verified_with_pred where appropriate and ensure pair_counts and
both_present_pairs derive from the same filtered set).

Comment thread ami/utils/stats.py
Comment on lines +29 to +35
if total <= 0:
return None
phat = successes / total
z2 = z * z
denom = 1 + z2 / total
center = (phat + z2 / (2 * total)) / denom
margin = (z / denom) * math.sqrt(phat * (1 - phat) / total + z2 / (4 * total * total))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate successes bounds to prevent math-domain crashes.

At Line 29, total is validated, but successes is not. Invalid inputs (e.g., successes > total) can make the sqrt term negative and crash at Line 35. Add an explicit range check so failures are deterministic and easier to debug.

Proposed fix
 def wilson_interval(successes: int, total: int, z: float = WILSON_Z_95) -> tuple[float, float] | None:
@@
     if total <= 0:
         return None
+    if successes < 0 or successes > total:
+        raise ValueError("successes must be between 0 and total")
     phat = successes / total
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/utils/stats.py` around lines 29 - 35, The function computing the Wilson
interval currently only checks total but not successes, so invalid inputs like
successes < 0 or successes > total can make the sqrt negative; add an explicit
validation in the same scope (before computing phat) that checks successes is an
integer/number within [0, total] and if not, raise a clear ValueError (or
TypeError for wrong type) with a message including the invalid successes and
total values; reference the local variables successes, total, phat, z, denom,
center, margin so you place the check immediately before computing phat =
successes / total to ensure deterministic failure instead of a math-domain
crash.

mihow added a commit that referenced this pull request May 27, 2026
… review fixes

Captures: review findings from Copilot + CodeRabbit, perf bench evidence
(43k rows → 159s timeout on apply_defaults=false), and the planned changes
for the next session (rename to model-agreement, push aggregation into
SQL/ORM, fix UNKNOWN rank LCA + denominator + verified_by_me anon gap +
test gaps).

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request May 27, 2026
…ion to SQL

Addresses review feedback on PR #1307:

Rename (drop "human"):
- URL: /occurrences/stats/human-model-agreement/ -> /model-agreement/
- Function: human_model_agreement_for_project -> model_agreement_for_project
- Serializer: HumanModelAgreementSerializer -> ModelAgreementSerializer
- Viewset action + url_path: human_model_agreement -> model_agreement
- FE hook: useHumanModelAgreement -> useModelAgreement (file + symbol)
- FE type: Response -> ModelAgreementResponse (fixes DOM Response shadow)
- Test class: TestHumanModelAgreementForProject -> TestModelAgreementForProject

SQL push-down (Copilot+CodeRabbit perf flag):
- Replace list(qs) full-row materialization with annotated aggregate().
- Annotate best_user_taxon_id via Subquery over Identification
  (BEST_IDENTIFICATION_ORDER). Drop the prefetch + select_related("taxon")
  on identifications since only taxon_id is read.
- aggregate() Count(filter=Q(...)) for total/verified/exact/no-prediction.
- For under-order disagreement: group disagreement set by distinct
  (user_taxon, machine_taxon) pair before LCA. Each pair's LCA runs once.
- Bench against project 18 (43,149 occurrences): pre-rework apply_defaults=false
  curl timed out at 159s; post-rework 1.96s unfiltered / 3.4s with bypass
  (93,019 occurrences post-filter).

Denominator fix (Copilot):
- agreed_*_pct now divides by verified_with_prediction_count instead of
  verified_count. A verified occurrence with no machine prediction can't
  agree or disagree; including it in the denominator drags the rate down
  without representing actual model disagreement.
- Surface no_prediction_count + verified_with_prediction_count as sibling
  fields so consumers can see how many such occurrences exist.

UNKNOWN rank bug (Copilot):
- TaxonRank.UNKNOWN sorts after SPECIES in OrderedEnum definition order,
  so without explicit exclusion UNKNOWN >= ORDER is True and a shared
  UNKNOWN ancestor would wrongly count as under-order agreement. Filter
  UNKNOWN out of lca_rank_between's candidate ranks. Add regression test.

Tests:
- New: test_unknown_rank_excluded_from_lca (LCA regression)
- New: test_agreement_under_order_bucket (HTTP coverage for sister-species
  case, previously only exact-match shortcut was exercised)
- Updated: happy-path asserts verified_with_prediction_count and
  no_prediction_count.

22/22 backend tests green:
  docker compose exec django python manage.py test
    ami.main.tests.TestLcaRankBetween
    ami.main.tests.TestModelAgreementForProject
    ami.main.tests.TestOccurrenceStatsViewSet

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request May 27, 2026
useModelAgreement.ts belongs with the frontend consumer (#1308), not the
backend endpoint PR. Keeps #1307 backend-only.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the feat/human-model-agreement-endpoint branch from 9347277 to e476333 Compare May 27, 2026 01:11
mihow pushed a commit that referenced this pull request May 27, 2026
One-line field rename in the occurrence stats panel to match the backend's
dropped ORDER threshold. Hook type rename + multi-value filter support
landed on the base branch (4a92c0b on #1307).

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request May 27, 2026
Typed React Query wrapper for /occurrences/stats/model-agreement/.
Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow and others added 17 commits May 27, 2026 06:25
Pure-Python LCA over (taxon_id, rank, parents_json) tuples. Returns
the deepest shared TaxonRank or None. Used by the upcoming
human-model-agreement stat to bucket agreement at-or-finer-than ORDER.

Plan: docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md
Side-research: docs/claude/planning/occurrence-filter-driven-exports.md

Co-Authored-By: Claude <noreply@anthropic.com>
… queryset

Pure aggregation; caller wires apply_default_filters + OccurrenceFilter.
Annotates best machine prediction, prefetches non-withdrawn identifications,
batches Taxon fetch for parents_json, buckets exact / under-order / above-order.

Co-Authored-By: Claude <noreply@anthropic.com>
Adds HumanModelAgreementSerializer and the human_model_agreement action
on OccurrenceStatsViewSet. Extracts OccurrenceViewSet's filter backends +
filterset_fields into a module-level tuple so OccurrenceStatsViewSet can
reuse the same OccurrenceFilter pass-through (deployment, event, taxa lists,
verified, score thresholds, apply_defaults=false, etc).

The top_identifiers action keeps its current behavior — filter_queryset
is only invoked by actions that opt in.

Co-Authored-By: Claude <noreply@anthropic.com>
Adds 6 HTTP-level tests: missing project_id 400, draft 404, empty zeros,
happy-path exact match, deployment filter pass-through, apply_defaults=false
score-threshold bypass.

Also adds DjangoFilterBackend to OccurrenceStatsViewSet.filter_backends so
filterset_fields (event, deployment, determination__rank, ...) actually take
effect. Without DjangoFilterBackend, filterset_fields are silently ignored
and ?deployment=N returns the unfiltered set.

Co-Authored-By: Claude <noreply@anthropic.com>
Mirrors useTopIdentifiers's useAuthorizedQuery pattern. Accepts an
arbitrary filter map so the occurrence list page can thread its filter
state through unchanged (deployment, event, taxon, score thresholds,
apply_defaults).

Co-Authored-By: Claude <noreply@anthropic.com>
… review fixes

Captures: review findings from Copilot + CodeRabbit, perf bench evidence
(43k rows → 159s timeout on apply_defaults=false), and the planned changes
for the next session (rename to model-agreement, push aggregation into
SQL/ORM, fix UNKNOWN rank LCA + denominator + verified_by_me anon gap +
test gaps).

Co-Authored-By: Claude <noreply@anthropic.com>
…ion to SQL

Addresses review feedback on PR #1307:

Rename (drop "human"):
- URL: /occurrences/stats/human-model-agreement/ -> /model-agreement/
- Function: human_model_agreement_for_project -> model_agreement_for_project
- Serializer: HumanModelAgreementSerializer -> ModelAgreementSerializer
- Viewset action + url_path: human_model_agreement -> model_agreement
- FE hook: useHumanModelAgreement -> useModelAgreement (file + symbol)
- FE type: Response -> ModelAgreementResponse (fixes DOM Response shadow)
- Test class: TestHumanModelAgreementForProject -> TestModelAgreementForProject

SQL push-down (Copilot+CodeRabbit perf flag):
- Replace list(qs) full-row materialization with annotated aggregate().
- Annotate best_user_taxon_id via Subquery over Identification
  (BEST_IDENTIFICATION_ORDER). Drop the prefetch + select_related("taxon")
  on identifications since only taxon_id is read.
- aggregate() Count(filter=Q(...)) for total/verified/exact/no-prediction.
- For under-order disagreement: group disagreement set by distinct
  (user_taxon, machine_taxon) pair before LCA. Each pair's LCA runs once.
- Bench against project 18 (43,149 occurrences): pre-rework apply_defaults=false
  curl timed out at 159s; post-rework 1.96s unfiltered / 3.4s with bypass
  (93,019 occurrences post-filter).

Denominator fix (Copilot):
- agreed_*_pct now divides by verified_with_prediction_count instead of
  verified_count. A verified occurrence with no machine prediction can't
  agree or disagree; including it in the denominator drags the rate down
  without representing actual model disagreement.
- Surface no_prediction_count + verified_with_prediction_count as sibling
  fields so consumers can see how many such occurrences exist.

UNKNOWN rank bug (Copilot):
- TaxonRank.UNKNOWN sorts after SPECIES in OrderedEnum definition order,
  so without explicit exclusion UNKNOWN >= ORDER is True and a shared
  UNKNOWN ancestor would wrongly count as under-order agreement. Filter
  UNKNOWN out of lca_rank_between's candidate ranks. Add regression test.

Tests:
- New: test_unknown_rank_excluded_from_lca (LCA regression)
- New: test_agreement_under_order_bucket (HTTP coverage for sister-species
  case, previously only exact-match shortcut was exercised)
- Updated: happy-path asserts verified_with_prediction_count and
  no_prediction_count.

22/22 backend tests green:
  docker compose exec django python manage.py test
    ami.main.tests.TestLcaRankBetween
    ami.main.tests.TestModelAgreementForProject
    ami.main.tests.TestOccurrenceStatsViewSet

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Replace the .aggregate() over the full filtered queryset with a two-step
approach:
  1. SQL Count('pk') for total_occurrences (no joins, no subqueries).
  2. Fetch the verified set (occurrences with at least one non-withdrawn
     ident) with both best_user_taxon_id and best_machine_prediction_taxon_id
     annotated, then bucket counts + LCA in Python.

Why: the previous version evaluated two correlated subqueries (best user
identification + best machine prediction) on every row of the filtered
queryset. For typical projects, >95% of occurrences have no identification
— those rows ran the user-ident subquery only to discover NULL, then ran
the (much more expensive) machine-prediction subquery on detections that
won't contribute to any agreement bucket. Scoping the subqueries to the
verified set avoids that waste.

Bench (cold, cache invalidated):

  Project                          Total    Verified   Pre      Post
  P#85 SEC-SEQ                     36,253   13,140     —        1.18s
  P#20 BCI                         40,958    1,351     —        0.92s
  P#84 Pennsylvania                18,407      251     —        0.56s
  P#24 Atlantic Forestry            2,797      274     —        0.50s
  P#18 Vermont                     43,149       45     ~928ms   0.35s
  P#23 Insectarium Montreal        20,393       74     —        0.43s

Warm via django-cachalot: 122–343ms across all projects.

For P#85 (highest absolute identification count in the system), the cost
is dominated by apply_default_filters' score-threshold join, not the
subqueries. apply_defaults=false actually runs faster (0.69s cold,
179,466 total / 13,140 verified) because the classification join is
skipped.

Co-Authored-By: Claude <noreply@anthropic.com>
… param

Replaces hardcoded `lca >= TaxonRank.ORDER` agreement gate with two layers:

- Always returned: `agreed_any_rank_*` — exact matches plus any non-null LCA
  at a real rank (UNKNOWN excluded). The upstream filter (e.g. a Lepidoptera
  include list) is what bounds the meaningful scope, not a hardcoded
  threshold in this function.
- Optional `?agreement_coarsest_rank=FAMILY`: when supplied, response also
  includes `agreed_coarser_rank_*` (exact + LCAs at or below the threshold).
  The applied rank is echoed in `agreement_coarsest_rank`; null when absent.

Also addresses CodeRabbit feedback on the existing branch:
- Dedupe base queryset before counting (joins from default-filter chain can
  inflate Occurrence rows).
- Bound `*_pct` FloatFields to [0.0, 1.0] in the serializer.

Param validation: invalid rank → 400; UNKNOWN rejected as not meaningful.
Tests cover any-rank fallback, threshold filtering, invalid + UNKNOWN
rejection, and threshold echo.

Co-Authored-By: Claude <noreply@anthropic.com>
…ry params

- Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's
  dropped ORDER threshold (0565f06).
- Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to
  the response type (not consumed yet — UI follows in #1308).
- Widen `filters` to accept arrays and append repeated query params so
  multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via
  `request.query_params.getlist(...)`) survive. Per CodeRabbit review.

Co-Authored-By: Claude <noreply@anthropic.com>
Session-scratchpad doc — belongs in local notes, not the merged branch.

Co-Authored-By: Claude <noreply@anthropic.com>
- 2026-05-14-human-model-agreement-endpoint.md — design narrative; superseded
  by code + PR description.
- occurrence-filter-driven-exports.md — side-research stub Copilot flagged as
  out-of-scope. Promoted to a PR-description follow-up item.

Co-Authored-By: Claude <noreply@anthropic.com>
create_detections assigns the classification taxon via .order_by("?"),
so the previous test picked a random machine taxon and then required a
sister species under the same genus. Random non-species picks (ORDER /
FAMILY / GENUS) have no sister, flaking ~50% of runs.

Pin both the machine prediction and the human ID to two fixed Vanessa
species, so the LCA is always GENUS (any-rank bucket, not exact) and the
test is deterministic.

Co-Authored-By: Claude <noreply@anthropic.com>
useModelAgreement.ts belongs with the frontend consumer (#1308), not the
backend endpoint PR. Keeps #1307 backend-only.

Co-Authored-By: Claude <noreply@anthropic.com>
Both derive from the verified_rows already in memory — no extra query.

- wilson_interval(): 95% Wilson score CI on agreed_exact_pct and
  agreed_any_rank_pct (agreed_*_ci_low / _ci_high). Wilson stays inside
  [0,1] and is honest at the small n typical of verified sets, where the
  normal approximation breaks down.
- cohens_kappa(): exact-taxon agreement beyond chance (cohens_kappa
  field, range [-1, 1]). Null when no doubly-classified occurrences or
  expected agreement is 1.0. Discounts the agreement you'd get for free
  in a project dominated by one common species.

Adds 5 nullable response fields. Backwards-compatible (additive only).
9 pure-Python unit tests + 2 HTTP field-presence tests.

Co-Authored-By: Claude <noreply@anthropic.com>
Both are generic statistical helpers — they don't depend on Django or any
domain model. Lifting them out of ami/main/models_future/occurrence.py so
other endpoints/jobs that need binomial CIs or chance-corrected agreement
can import them without dragging in the occurrence module.

Same implementations, just relocated. Renamed parameter names on
cohens_kappa from (human, model) to (rater_a, rater_b) so the helper
reads as generic rather than human-vs-model specific.

Tests already use isolated `from ami.utils.stats import …` imports
(updated all 9 sites in ami/main/tests.py).

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the feat/human-model-agreement-endpoint branch from e476333 to 336c1fe Compare May 27, 2026 13:29
mihow pushed a commit that referenced this pull request May 27, 2026
One-line field rename in the occurrence stats panel to match the backend's
dropped ORDER threshold. Hook type rename + multi-value filter support
landed on the base branch (4a92c0b on #1307).

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request May 27, 2026
Typed React Query wrapper for /occurrences/stats/model-agreement/.
Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only.

Co-Authored-By: Claude <noreply@anthropic.com>
max_value=1.0,
allow_null=True,
required=False,
help_text="Wilson 95% CI lower bound for agreed_exact_pct. Null when verified_with_prediction_count is 0.",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we send the help text to the UI to use in tooltips? in the serializer? later in our share types maybe, but right now...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants