[OPIK-4642] [BE] [FE] [SDK] feat: optimization framework package by itamargolan · Pull Request #5415 · comet-ml/opik

itamargolan · 2026-02-26T13:36:19Z

Details

Adds a new optimization framework package (apps/opik-optimizer) that decouples optimizer algorithms from experiment execution, persistence, and UI concerns. Integrates via the existing optimization studio pipeline: Frontend → Java Backend → Redis Queue → Python Backend → Subprocess → Framework Runner.

Key changes:

New opik_optimizer_framework package with orchestrator, sampler, candidate validation, evaluation adapter, and result aggregation
New evaluate_optimization_suite_trial() SDK function combining optimization trial linkage with evaluation suite behavior
Backend routing: legacy optimizer types go to old queue, new types route to OPTIMIZER_FRAMEWORK queue
RQ worker now listens on both opik:optimizer-cloud and opik:optimizer-framework queues by default
Shared optimizer_job_helper.py deduplicates job execution logic between legacy and framework optimizers
Includes a "SimpleOptimizer" as a 2-step test algorithm

Change checklist

User facing
Documentation update

Issues

Resolves OPIK-4642

AI-WATERMARK

AI-WATERMARK: yes

If yes:
- Tools: Claude Code
- Model(s): Claude Opus 4.6
- Scope: Framework architecture, tests, integration
- Human verification: Manual E2E testing against Comet cloud

Testing

62 unit tests passing (cd apps/opik-optimizer && pytest tests/unit/ -v)
End-to-end test against Comet cloud with real LLM calls (5 trials, scores, prompts, progress chart)
Java backend compiles with Spotless passing
Frontend TypeScript type-check and lint passing
Verified UI compatibility: prompt display, progress chart, best prompt identification

Documentation

No documentation updates required for this initial internal framework package.

Implements a new optimization framework (`apps/opik-optimizer`) that decouples optimizer algorithms from experiment execution, persistence, and UI concerns. Integrates via the existing optimization studio pipeline (Redis queue → Python backend → subprocess). Key components: - Orchestrator: central lifecycle controller with sampler, validator, materializer, result aggregator, and event emitter - StupidOptimizer: 2-step test optimizer (3 candidates → best → 2 more) - EvaluationAdapter: wraps SDK evaluate_optimization_suite_trial() - Backend integration: new Redis queue, framework_optimizer job processor, framework_runner subprocess entry point Also adds evaluate_optimization_suite_trial() to the Python SDK, combining optimization trial linkage with evaluation suite behavior (evaluators and execution policy from the dataset). 53 unit + integration tests passing. Verified end-to-end against Comet cloud with real LLM calls, UI progress chart, prompt display, and score tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-26T13:40:37Z

📋 PR Linter Failed

❌ Missing Section. The description is missing the ## Details section.

❌ Missing Section. The description is missing the ## Change checklist section.

❌ Missing Section. The description is missing the ## Issues section.

❌ Missing Section. The description is missing the ## Testing section.

❌ Missing Section. The description is missing the ## Documentation section.

github-actions · 2026-02-26T13:44:55Z

Python Backend Tests Results

167 tests 163 ✅ 3m 16s ⏱️
1 suites 3 💤
1 files 1 ❌

For more details on these failures, see this check.

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

apps/opik-backend/src/main/java/com/comet/opik/infrastructure/queues/Queue.java

apps/opik-python-backend/src/opik_backend/jobs/framework_runner.py

sdks/python/src/opik/evaluation/evaluator.py

apps/opik-optimizer/src/opik_optimizer_framework/orchestrator.py

apps/opik-optimizer/tests/integration/test_orchestrator_flow.py

apps/opik-backend/src/main/java/com/comet/opik/domain/OptimizationService.java

apps/opik-python-backend/src/opik_backend/jobs/framework_optimizer.py

github-actions · 2026-02-26T14:18:14Z

Backend Tests Results

439 files 439 suites 1h 2m 13s ⏱️
6 894 tests 6 877 ✅ 13 💤 4 ❌
6 769 runs 6 752 ✅ 13 💤 4 ❌

For more details on these failures, see this check.

Results for commit c3e4b93.

♻️ This comment has been updated with latest results.

Add rich metadata to each experiment so the UI can aggregate and visualize the optimization trajectory. Key changes: - step_index increments only when candidate changes (not per eval) - candidate_id is stable across re-evaluations of the same prompt - parent_candidate_ids always set correctly for derived candidates - New metadata fields: batch_index, num_items, capture_traces, eval_purpose - Refactor optimizer package: protocol + factory pattern for registration - Add GEPA adapter bridging GEPA callbacks to framework metadata - Fix BE tests for experimentScores null and queue routing - Add docs: ADDING_AN_OPTIMIZER.md and GEPA_IMPLEMENTATION.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove register_optimizer public API and OptimizerFactory class; replace with a simple dict in _load_registry() - framework_runner: avoid holding full dataset items in memory - Update docs and tests to match simplified factory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

apps/opik-optimizer/src/opik_optimizer_framework/event_emitter.py

apps/opik-optimizer/src/opik_optimizer_framework/optimizers/simple_optimizer.py

apps/opik-optimizer/src/opik_optimizer_framework/optimizers/gepa/gepa_optimizer.py

apps/opik-optimizer/src/opik_optimizer_framework/experiment_executor.py

apps/opik-optimizer/src/opik_optimizer_framework/optimizers/gepa/gepa_adapter.py

Resolve conflict in CompareTrialsPage.tsx: keep both workspaceName (for useExperimentsList) and canViewDatasets permission guard from main, plus isEvaluationSuite prop from our branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…iments - Replace sequential step_index counter with parent-lineage derivation (max parent step + 1), so all re-evaluations of the same candidate share the same step_index - Ensure every non-baseline experiment carries parent_candidate_ids, enabling the UI to draw lineage graphs - Pass batch_index, num_items, capture_traces, and eval_purpose through to experiment metadata for richer visualization - Revert runner scripts to direct invocation (remove runner_common.py) - Update unit tests to match new metadata contract Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-02T17:52:37Z

Backend Tests - Unit Tests

1 247 tests 1 245 ✅ 42s ⏱️
137 suites 2 💤
137 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:53:45Z

Backend Tests - Integration Group 7

227 tests 227 ✅ 1m 54s ⏱️
21 suites 0 💤
21 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:54:03Z

Backend Tests - Integration Group 12

145 tests 145 ✅ 2m 45s ⏱️
21 suites 0 💤
21 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:54:04Z

Backend Tests - Integration Group 5

98 tests 98 ✅ 2m 45s ⏱️
24 suites 0 💤
24 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:54:26Z

Backend Tests - Integration Group 16

28 files 28 suites 2m 25s ⏱️
141 tests 134 ✅ 3 💤 4 ❌
133 runs 126 ✅ 3 💤 4 ❌

For more details on these failures, see this check.

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:54:34Z

Backend Tests - Integration Group 11

21 files 21 suites 1m 37s ⏱️
162 tests 160 ✅ 2 💤 0 ❌
140 runs 138 ✅ 2 💤 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:54:35Z

Backend Tests - Integration Group 8

259 tests 259 ✅ 3m 40s ⏱️
18 suites 0 💤
18 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:54:37Z

Backend Tests - Integration Group 13

440 tests 435 ✅ 3m 37s ⏱️
14 suites 5 💤
14 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:54:44Z

Backend Tests - Integration Group 14

128 tests 128 ✅ 2m 45s ⏱️
9 suites 0 💤
9 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:55:04Z

Backend Tests - Integration Group 6

1 111 tests 1 111 ✅ 5m 5s ⏱️
5 suites 0 💤
5 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:55:05Z

Backend Tests - Integration Group 15

169 tests 169 ✅ 5m 3s ⏱️
29 suites 0 💤
29 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:56:12Z

Backend Tests - Integration Group 1

390 tests 390 ✅ 6m 49s ⏱️
21 suites 0 💤
21 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:56:27Z

Backend Tests - Integration Group 10

176 tests 176 ✅ 7m 23s ⏱️
19 suites 0 💤
19 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:56:43Z

Backend Tests - Integration Group 9

323 tests 322 ✅ 8m 1s ⏱️
24 suites 1 💤
24 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:56:50Z

Backend Tests - Integration Group 4

4 files 4 suites 2m 52s ⏱️
1 342 tests 1 342 ✅ 0 💤 0 ❌
1 254 runs 1 254 ✅ 0 💤 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:57:30Z

Backend Tests - Integration Group 3

307 tests 307 ✅ 9m 18s ⏱️
28 suites 0 💤
28 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-02T17:58:55Z

Backend Tests - Integration Group 2

240 tests 240 ✅ 13m 16s ⏱️
17 suites 0 💤
17 files 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

apps/opik-python-backend/src/opik_backend/jobs/framework_runner.py

apps/opik-optimizer/src/opik_optimizer_framework/optimizers/gepa/gepa_optimizer.py

apps/opik-optimizer/src/opik_optimizer_framework/optimizers/simple_optimizer.py

- Remove canonical_config_hash from Candidate and TrialResult types, candidate_materializer, experiment_executor, and all tests - Delete util/hashing.py module (unused — GEPA does minibatching so config-hash dedup would block valid re-evaluations) - Merge SdkEventEmitter and LoggingEventEmitter into a single EventEmitter class with optional optimization_id - Update GEPA_IMPLEMENTATION.md to reflect parent_ids tracking fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…through context - Replace CandidateConfig dataclass with dict[str, Any] type alias - Add baseline_config field to OptimizationContext (caller-provided, opaque) - Orchestrator passes baseline_config through without knowing its structure - Optimizers copy baseline_config and override prompt_messages only - Remove result_aggregator module (inlined into evaluation_adapter) - Move gepa imports to runtime (lazy) for optional dependency - Fix protocol.py training_set/validation_set types to list[dict] - Update ADDING_AN_OPTIMIZER.md to reflect all changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baz-reviewer · 2026-03-03T01:04:10Z

apps/opik-optimizer/tests/library_integration/gepa/test_gepa_optimizer.py

+    @patch("gepa.core.adapter.EvaluationBatch")
+    def test_evaluate_delegates_to_evaluation_adapter(self, mock_eb_cls):
+        mock_eb_cls.side_effect = lambda **kwargs: SimpleNamespace(**kwargs)
+
+        mock_eval_adapter = MagicMock()
+        trial = _make_trial("c-1", 0.75)
+        raw_result = _make_raw_result([("id-1", 1.0, ["Good"]), ("id-2", 0.0, ["Bad"])])
+        mock_eval_adapter.evaluate_with_details.return_value = (trial, raw_result)
+
+        adapter = _build_adapter(mock_eval_adapter)
+        adapter._base_messages = [{"role": "user", "content": "Say {question}"}]


tests in tests/unit/test_gepa_optimizer.py patch gepa.core.adapter.EvaluationBatch/gepa.optimize, so simply importing the module already requires the optional gepa package and causes the unit suite to fail when gepa isn't installed; can we move these tests into tests/library_integration/gepa (guarded by pytest.importorskip("gepa")) so the unit run stays fast and doesn't depend on that heavy optional dependency?

_{Finding type: Isolate tests, be fast}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

In apps/opik-optimizer/tests/unit/test_gepa_optimizer.py around lines 194 to 204, the tests patch gepa.core.adapter.EvaluationBatch and require the optional gepa package at import time, causing unit test failures when gepa is not installed. Refactor by moving all tests that import or patch gepa (starting at ~line 194 and any other blocks using gepa or patch("gepa.optimize")) into a new file under tests/library_integration/gepa/test_gepa_optimizer_integration.py and at the top of that new file call pytest.importorskip("gepa") to skip when gepa is absent; alternatively, if you prefer keeping the tests in-place, wrap each gepa-dependent test/class with pytest.importorskip("gepa") before those tests run. Ensure unit tests no longer import gepa at module import time and update any import paths in the moved tests accordingly.

Commit 3eae9fb addressed this comment by adding pytest.importorskip("gepa") at the top of apps/opik-optimizer/tests/library_integration/gepa/test_gepa_optimizer.py, preventing import-time failures when gepa is not installed by skipping the test module.

…dependency on gepa The gepa tests patch gepa.core.adapter.EvaluationBatch and gepa.optimize, requiring the optional gepa package at import time. Moving them to tests/library_integration/gepa/ with pytest.importorskip("gepa") keeps the unit suite fast and dependency-free. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

itamargolan and others added 6 commits February 26, 2026 10:40

design doc

82790ae

FE communication and ERD additions

69efe6f

ui reporting events in flow

772eed9

changes

444a002

add reason to TrialItemRun

5bee1ad

itamargolan force-pushed the itamar/new-optimizer-framework branch from f7a2862 to 35d9ec9 Compare February 26, 2026 13:40

github-actions bot removed documentation Improvements or additions to documentation Frontend tests Including test files, or tests related like configuration. typescript *.ts *.tsx TypeScript SDK Optimizer SDK labels Feb 26, 2026

github-actions bot assigned itamargolan Feb 26, 2026

baz-reviewer bot reviewed Feb 26, 2026

View reviewed changes

itamargolan and others added 2 commits March 2, 2026 10:31

github-actions bot added the documentation Improvements or additions to documentation label Mar 2, 2026

Merge branch 'main' into itamar/new-optimizer-framework

c3e4b93

baz-reviewer bot reviewed Mar 2, 2026

View reviewed changes

itamargolan and others added 2 commits March 2, 2026 14:22

baz-reviewer bot reviewed Mar 2, 2026

View reviewed changes

baz-reviewer bot approved these changes Mar 2, 2026

View reviewed changes

itamargolan and others added 2 commits March 2, 2026 18:27

baz-reviewer bot reviewed Mar 3, 2026

View reviewed changes

Conversation

itamargolan commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

AI-WATERMARK

Testing

Documentation

Uh oh!

github-actions bot commented Feb 26, 2026

📋 PR Linter Failed

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Backend Tests Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Unit Tests

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 7

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 12

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 5

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 16

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 11

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 8

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 13

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 14

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 6

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 15

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

itamargolan commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading