[OPIK-4642] [BE] [FE] [SDK] feat: optimization framework package#5415
[OPIK-4642] [BE] [FE] [SDK] feat: optimization framework package#5415itamargolan wants to merge 21 commits intomainfrom
Conversation
Implements a new optimization framework (`apps/opik-optimizer`) that decouples optimizer algorithms from experiment execution, persistence, and UI concerns. Integrates via the existing optimization studio pipeline (Redis queue → Python backend → subprocess). Key components: - Orchestrator: central lifecycle controller with sampler, validator, materializer, result aggregator, and event emitter - StupidOptimizer: 2-step test optimizer (3 candidates → best → 2 more) - EvaluationAdapter: wraps SDK evaluate_optimization_suite_trial() - Backend integration: new Redis queue, framework_optimizer job processor, framework_runner subprocess entry point Also adds evaluate_optimization_suite_trial() to the Python SDK, combining optimization trial linkage with evaluation suite behavior (evaluators and execution policy from the dataset). 53 unit + integration tests passing. Verified end-to-end against Comet cloud with real LLM calls, UI progress chart, prompt display, and score tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f7a2862 to
35d9ec9
Compare
📋 PR Linter Failed❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the |
Python Backend Tests Results167 tests 163 ✅ 3m 16s ⏱️ For more details on these failures, see this check. Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
apps/opik-backend/src/main/java/com/comet/opik/infrastructure/queues/Queue.java
Show resolved
Hide resolved
apps/opik-optimizer/tests/integration/test_orchestrator_flow.py
Outdated
Show resolved
Hide resolved
apps/opik-backend/src/main/java/com/comet/opik/domain/OptimizationService.java
Show resolved
Hide resolved
apps/opik-python-backend/src/opik_backend/jobs/framework_optimizer.py
Outdated
Show resolved
Hide resolved
Backend Tests Results 439 files 439 suites 1h 2m 13s ⏱️ For more details on these failures, see this check. Results for commit c3e4b93. ♻️ This comment has been updated with latest results. |
Add rich metadata to each experiment so the UI can aggregate and visualize the optimization trajectory. Key changes: - step_index increments only when candidate changes (not per eval) - candidate_id is stable across re-evaluations of the same prompt - parent_candidate_ids always set correctly for derived candidates - New metadata fields: batch_index, num_items, capture_traces, eval_purpose - Refactor optimizer package: protocol + factory pattern for registration - Add GEPA adapter bridging GEPA callbacks to framework metadata - Fix BE tests for experimentScores null and queue routing - Add docs: ADDING_AN_OPTIMIZER.md and GEPA_IMPLEMENTATION.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove register_optimizer public API and OptimizerFactory class; replace with a simple dict in _load_registry() - framework_runner: avoid holding full dataset items in memory - Update docs and tests to match simplified factory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
apps/opik-optimizer/src/opik_optimizer_framework/event_emitter.py
Outdated
Show resolved
Hide resolved
apps/opik-optimizer/src/opik_optimizer_framework/optimizers/simple_optimizer.py
Show resolved
Hide resolved
apps/opik-optimizer/src/opik_optimizer_framework/optimizers/gepa/gepa_optimizer.py
Show resolved
Hide resolved
apps/opik-optimizer/src/opik_optimizer_framework/optimizers/gepa/gepa_adapter.py
Show resolved
Hide resolved
apps/opik-optimizer/src/opik_optimizer_framework/optimizers/gepa/gepa_adapter.py
Show resolved
Hide resolved
apps/opik-optimizer/src/opik_optimizer_framework/optimizers/gepa/gepa_adapter.py
Outdated
Show resolved
Hide resolved
Resolve conflict in CompareTrialsPage.tsx: keep both workspaceName (for useExperimentsList) and canViewDatasets permission guard from main, plus isEvaluationSuite prop from our branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iments - Replace sequential step_index counter with parent-lineage derivation (max parent step + 1), so all re-evaluations of the same candidate share the same step_index - Ensure every non-baseline experiment carries parent_candidate_ids, enabling the UI to draw lineage graphs - Pass batch_index, num_items, capture_traces, and eval_purpose through to experiment metadata for richer visualization - Revert runner scripts to direct invocation (remove runner_common.py) - Update unit tests to match new metadata contract Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend Tests - Unit Tests1 247 tests 1 245 ✅ 42s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 7227 tests 227 ✅ 1m 54s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 12145 tests 145 ✅ 2m 45s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 598 tests 98 ✅ 2m 45s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 16 28 files 28 suites 2m 25s ⏱️ For more details on these failures, see this check. Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 11 21 files 21 suites 1m 37s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 8259 tests 259 ✅ 3m 40s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 13440 tests 435 ✅ 3m 37s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 14128 tests 128 ✅ 2m 45s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 61 111 tests 1 111 ✅ 5m 5s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 15169 tests 169 ✅ 5m 3s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 1390 tests 390 ✅ 6m 49s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 10176 tests 176 ✅ 7m 23s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 9323 tests 322 ✅ 8m 1s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 4 4 files 4 suites 2m 52s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 3307 tests 307 ✅ 9m 18s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
Backend Tests - Integration Group 2240 tests 240 ✅ 13m 16s ⏱️ Results for commit 3eae9fb. ♻️ This comment has been updated with latest results. |
apps/opik-optimizer/src/opik_optimizer_framework/optimizers/gepa/gepa_optimizer.py
Show resolved
Hide resolved
apps/opik-optimizer/src/opik_optimizer_framework/optimizers/simple_optimizer.py
Show resolved
Hide resolved
- Remove canonical_config_hash from Candidate and TrialResult types, candidate_materializer, experiment_executor, and all tests - Delete util/hashing.py module (unused — GEPA does minibatching so config-hash dedup would block valid re-evaluations) - Merge SdkEventEmitter and LoggingEventEmitter into a single EventEmitter class with optional optimization_id - Update GEPA_IMPLEMENTATION.md to reflect parent_ids tracking fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…through context - Replace CandidateConfig dataclass with dict[str, Any] type alias - Add baseline_config field to OptimizationContext (caller-provided, opaque) - Orchestrator passes baseline_config through without knowing its structure - Optimizers copy baseline_config and override prompt_messages only - Remove result_aggregator module (inlined into evaluation_adapter) - Move gepa imports to runtime (lazy) for optional dependency - Fix protocol.py training_set/validation_set types to list[dict] - Update ADDING_AN_OPTIMIZER.md to reflect all changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| @patch("gepa.core.adapter.EvaluationBatch") | ||
| def test_evaluate_delegates_to_evaluation_adapter(self, mock_eb_cls): | ||
| mock_eb_cls.side_effect = lambda **kwargs: SimpleNamespace(**kwargs) | ||
|
|
||
| mock_eval_adapter = MagicMock() | ||
| trial = _make_trial("c-1", 0.75) | ||
| raw_result = _make_raw_result([("id-1", 1.0, ["Good"]), ("id-2", 0.0, ["Bad"])]) | ||
| mock_eval_adapter.evaluate_with_details.return_value = (trial, raw_result) | ||
|
|
||
| adapter = _build_adapter(mock_eval_adapter) | ||
| adapter._base_messages = [{"role": "user", "content": "Say {question}"}] |
There was a problem hiding this comment.
tests in tests/unit/test_gepa_optimizer.py patch gepa.core.adapter.EvaluationBatch/gepa.optimize, so simply importing the module already requires the optional gepa package and causes the unit suite to fail when gepa isn't installed; can we move these tests into tests/library_integration/gepa (guarded by pytest.importorskip("gepa")) so the unit run stays fast and doesn't depend on that heavy optional dependency?
Finding type: Isolate tests, be fast
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
In apps/opik-optimizer/tests/unit/test_gepa_optimizer.py around lines 194 to 204, the
tests patch gepa.core.adapter.EvaluationBatch and require the optional gepa package at
import time, causing unit test failures when gepa is not installed. Refactor by moving
all tests that import or patch gepa (starting at ~line 194 and any other blocks using
gepa or patch("gepa.optimize")) into a new file under
tests/library_integration/gepa/test_gepa_optimizer_integration.py and at the top of that
new file call pytest.importorskip("gepa") to skip when gepa is absent; alternatively, if
you prefer keeping the tests in-place, wrap each gepa-dependent test/class with
pytest.importorskip("gepa") before those tests run. Ensure unit tests no longer import
gepa at module import time and update any import paths in the moved tests accordingly.
There was a problem hiding this comment.
Commit 3eae9fb addressed this comment by adding pytest.importorskip("gepa") at the top of apps/opik-optimizer/tests/library_integration/gepa/test_gepa_optimizer.py, preventing import-time failures when gepa is not installed by skipping the test module.
…dependency on gepa
The gepa tests patch gepa.core.adapter.EvaluationBatch and gepa.optimize,
requiring the optional gepa package at import time. Moving them to
tests/library_integration/gepa/ with pytest.importorskip("gepa") keeps
the unit suite fast and dependency-free.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Details
Adds a new optimization framework package (
apps/opik-optimizer) that decouples optimizer algorithms from experiment execution, persistence, and UI concerns. Integrates via the existing optimization studio pipeline: Frontend → Java Backend → Redis Queue → Python Backend → Subprocess → Framework Runner.Key changes:
opik_optimizer_frameworkpackage with orchestrator, sampler, candidate validation, evaluation adapter, and result aggregationevaluate_optimization_suite_trial()SDK function combining optimization trial linkage with evaluation suite behaviorOPTIMIZER_FRAMEWORKqueueopik:optimizer-cloudandopik:optimizer-frameworkqueues by defaultoptimizer_job_helper.pydeduplicates job execution logic between legacy and framework optimizersChange checklist
Issues
AI-WATERMARK
AI-WATERMARK: yes
Testing
cd apps/opik-optimizer && pytest tests/unit/ -v)Documentation
No documentation updates required for this initial internal framework package.