This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
cell-eval is a Python package and CLI tool for evaluating the performance of models that predict cellular responses to perturbations at the single-cell level. Developed by the Arc Research Institute.
It generally revolves around a real anndata and a predicted anndata where it measures the general differences between the two across a variety of metrics.
- Python 3.11–3.12, managed with UV and built with hatchling
- CLI entry point:
cell-eval(defined insrc/cell_eval/__main__.py)
# Install dependencies
uv sync --all-extras --dev
# Run all tests
uv run pytest -v
# Run a single test
uv run pytest tests/test_eval.py::test_broken_adata_not_normlog -v
# Formatting (check / fix)
uv run ruff format --check
uv run ruff format
# Type checking
uv run ty check
# Verify CLI works
uv run cell-eval --helpCI runs: formatting, typing, pytest, and cli-test (see .github/workflows/CI.yml).
AnnData inputs (predicted + real)
→ MetricsEvaluator (validation, normalization, DE computation)
→ MetricPipeline (profile-based metric selection + execution)
→ metrics_registry (global MetricRegistry instance)
→ individual metric functions
→ polars DataFrames (per-perturbation + aggregated results)
-
MetricsEvaluator(src/cell_eval/_evaluator.py) — Main programmatic entry point. Validates input AnnData objects, computes differential expression viapdex, and orchestrates the metric pipeline. -
MetricRegistry(src/cell_eval/metrics/_registry.py) — Global singletonmetrics_registry. Metrics are registered with a name, type (DEorANNDATA_PAIR), compute function, and best-value indicator. Supports both plain functions and class-based metrics requiring instantiation. -
MetricPipeline(src/cell_eval/_pipeline/_runner.py) — Selects and runs metrics based on a profile (full,minimal,vcc,de,anndata,pds). Collects per-perturbation results and aggregates them. -
Metricprotocol (src/cell_eval/metrics/base.py) — All metric functions take either aPerturbationAnndataPairorDEComparisonand returnfloat | dict[str, float]. -
Type system (
src/cell_eval/_types/) — Immutable dataclasses:PerturbationAnndataPair,DEComparison, plus enumsMetricType,MetricBestValue,DESortBy.
Metrics are split into two categories registered in src/cell_eval/metrics/_impl.py:
- AnnData metrics (
_anndata.py): pearson_delta, mse, mae, mse_delta, mae_delta, discrimination_score, clustering_agreement, edistance - DE metrics (
_de.py): overlap/precision at N, spearman correlations, direction match, significant gene recall, ROC/PR AUC
Subcommands in src/cell_eval/_cli/: prep (data preparation for VCC), run (evaluation), baseline (create baseline), score (normalize against baseline). CLI defaults are in _cli/_const.py.
cell_eval.data provides build_random_anndata() and downsample_cells() for generating synthetic AnnData objects in tests.
- Uses
polars(not pandas) for DataFrames - Uses
match/casestatements (Python 3.10+ syntax) - Type hints throughout; PEP 561
py.typedmarker present - Private modules prefixed with
_(public API is re-exported from__init__.py)