Skip to content

[OPIK-4642] [BE] [FE] [SDK] feat: optimization framework package#5415

Draft
itamargolan wants to merge 21 commits intomainfrom
itamar/new-optimizer-framework
Draft

[OPIK-4642] [BE] [FE] [SDK] feat: optimization framework package#5415
itamargolan wants to merge 21 commits intomainfrom
itamar/new-optimizer-framework

Conversation

@itamargolan
Copy link
Contributor

@itamargolan itamargolan commented Feb 26, 2026

Details

Adds a new optimization framework package (apps/opik-optimizer) that decouples optimizer algorithms from experiment execution, persistence, and UI concerns. Integrates via the existing optimization studio pipeline: Frontend → Java Backend → Redis Queue → Python Backend → Subprocess → Framework Runner.

Key changes:

  • New opik_optimizer_framework package with orchestrator, sampler, candidate validation, evaluation adapter, and result aggregation
  • New evaluate_optimization_suite_trial() SDK function combining optimization trial linkage with evaluation suite behavior
  • Backend routing: legacy optimizer types go to old queue, new types route to OPTIMIZER_FRAMEWORK queue
  • RQ worker now listens on both opik:optimizer-cloud and opik:optimizer-framework queues by default
  • Shared optimizer_job_helper.py deduplicates job execution logic between legacy and framework optimizers
  • Includes a "SimpleOptimizer" as a 2-step test algorithm

Change checklist

  • User facing
  • Documentation update

Issues

  • Resolves OPIK-4642

AI-WATERMARK

AI-WATERMARK: yes

  • If yes:
    • Tools: Claude Code
    • Model(s): Claude Opus 4.6
    • Scope: Framework architecture, tests, integration
    • Human verification: Manual E2E testing against Comet cloud

Testing

  • 62 unit tests passing (cd apps/opik-optimizer && pytest tests/unit/ -v)
  • End-to-end test against Comet cloud with real LLM calls (5 trials, scores, prompts, progress chart)
  • Java backend compiles with Spotless passing
  • Frontend TypeScript type-check and lint passing
  • Verified UI compatibility: prompt display, progress chart, best prompt identification

Documentation

No documentation updates required for this initial internal framework package.

@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file python Pull requests that update Python code java Pull requests that update Java code Frontend Backend Infrastructure tests Including test files, or tests related like configuration. typescript *.ts *.tsx Python SDK TypeScript SDK Optimizer SDK labels Feb 26, 2026
itamargolan and others added 6 commits February 26, 2026 10:40
Implements a new optimization framework (`apps/opik-optimizer`) that
decouples optimizer algorithms from experiment execution, persistence,
and UI concerns. Integrates via the existing optimization studio pipeline
(Redis queue → Python backend → subprocess).

Key components:
- Orchestrator: central lifecycle controller with sampler, validator,
  materializer, result aggregator, and event emitter
- StupidOptimizer: 2-step test optimizer (3 candidates → best → 2 more)
- EvaluationAdapter: wraps SDK evaluate_optimization_suite_trial()
- Backend integration: new Redis queue, framework_optimizer job processor,
  framework_runner subprocess entry point

Also adds evaluate_optimization_suite_trial() to the Python SDK, combining
optimization trial linkage with evaluation suite behavior (evaluators and
execution policy from the dataset).

53 unit + integration tests passing. Verified end-to-end against Comet cloud
with real LLM calls, UI progress chart, prompt display, and score tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@itamargolan itamargolan force-pushed the itamar/new-optimizer-framework branch from f7a2862 to 35d9ec9 Compare February 26, 2026 13:40
@github-actions github-actions bot removed documentation Improvements or additions to documentation Frontend tests Including test files, or tests related like configuration. typescript *.ts *.tsx TypeScript SDK Optimizer SDK labels Feb 26, 2026
@github-actions
Copy link
Contributor

📋 PR Linter Failed

Missing Section. The description is missing the ## Details section.


Missing Section. The description is missing the ## Change checklist section.


Missing Section. The description is missing the ## Issues section.


Missing Section. The description is missing the ## Testing section.


Missing Section. The description is missing the ## Documentation section.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2026

Python Backend Tests Results

167 tests   163 ✅  3m 16s ⏱️
  1 suites    3 💤
  1 files      1 ❌

For more details on these failures, see this check.

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2026

Backend Tests Results

  439 files    439 suites   1h 2m 13s ⏱️
6 894 tests 6 877 ✅ 13 💤 4 ❌
6 769 runs  6 752 ✅ 13 💤 4 ❌

For more details on these failures, see this check.

Results for commit c3e4b93.

♻️ This comment has been updated with latest results.

itamargolan and others added 2 commits March 2, 2026 10:31
Add rich metadata to each experiment so the UI can aggregate and
visualize the optimization trajectory. Key changes:

- step_index increments only when candidate changes (not per eval)
- candidate_id is stable across re-evaluations of the same prompt
- parent_candidate_ids always set correctly for derived candidates
- New metadata fields: batch_index, num_items, capture_traces, eval_purpose
- Refactor optimizer package: protocol + factory pattern for registration
- Add GEPA adapter bridging GEPA callbacks to framework metadata
- Fix BE tests for experimentScores null and queue routing
- Add docs: ADDING_AN_OPTIMIZER.md and GEPA_IMPLEMENTATION.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove register_optimizer public API and OptimizerFactory class;
  replace with a simple dict in _load_registry()
- framework_runner: avoid holding full dataset items in memory
- Update docs and tests to match simplified factory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 2, 2026
itamargolan and others added 2 commits March 2, 2026 14:22
Resolve conflict in CompareTrialsPage.tsx: keep both workspaceName
(for useExperimentsList) and canViewDatasets permission guard from main,
plus isEvaluationSuite prop from our branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iments

- Replace sequential step_index counter with parent-lineage derivation
  (max parent step + 1), so all re-evaluations of the same candidate
  share the same step_index
- Ensure every non-baseline experiment carries parent_candidate_ids,
  enabling the UI to draw lineage graphs
- Pass batch_index, num_items, capture_traces, and eval_purpose through
  to experiment metadata for richer visualization
- Revert runner scripts to direct invocation (remove runner_common.py)
- Update unit tests to match new metadata contract

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Unit Tests

1 247 tests   1 245 ✅  42s ⏱️
  137 suites      2 💤
  137 files        0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 7

227 tests   227 ✅  1m 54s ⏱️
 21 suites    0 💤
 21 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 12

145 tests   145 ✅  2m 45s ⏱️
 21 suites    0 💤
 21 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 5

98 tests   98 ✅  2m 45s ⏱️
24 suites   0 💤
24 files     0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 16

 28 files   28 suites   2m 25s ⏱️
141 tests 134 ✅ 3 💤 4 ❌
133 runs  126 ✅ 3 💤 4 ❌

For more details on these failures, see this check.

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 11

 21 files   21 suites   1m 37s ⏱️
162 tests 160 ✅ 2 💤 0 ❌
140 runs  138 ✅ 2 💤 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 8

259 tests   259 ✅  3m 40s ⏱️
 18 suites    0 💤
 18 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 13

440 tests   435 ✅  3m 37s ⏱️
 14 suites    5 💤
 14 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 14

128 tests   128 ✅  2m 45s ⏱️
  9 suites    0 💤
  9 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 6

1 111 tests   1 111 ✅  5m 5s ⏱️
    5 suites      0 💤
    5 files        0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 15

169 tests   169 ✅  5m 3s ⏱️
 29 suites    0 💤
 29 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 1

390 tests   390 ✅  6m 49s ⏱️
 21 suites    0 💤
 21 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 10

176 tests   176 ✅  7m 23s ⏱️
 19 suites    0 💤
 19 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 9

323 tests   322 ✅  8m 1s ⏱️
 24 suites    1 💤
 24 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 4

    4 files      4 suites   2m 52s ⏱️
1 342 tests 1 342 ✅ 0 💤 0 ❌
1 254 runs  1 254 ✅ 0 💤 0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 3

307 tests   307 ✅  9m 18s ⏱️
 28 suites    0 💤
 28 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Backend Tests - Integration Group 2

240 tests   240 ✅  13m 16s ⏱️
 17 suites    0 💤
 17 files      0 ❌

Results for commit 3eae9fb.

♻️ This comment has been updated with latest results.

itamargolan and others added 2 commits March 2, 2026 18:27
- Remove canonical_config_hash from Candidate and TrialResult types,
  candidate_materializer, experiment_executor, and all tests
- Delete util/hashing.py module (unused — GEPA does minibatching so
  config-hash dedup would block valid re-evaluations)
- Merge SdkEventEmitter and LoggingEventEmitter into a single
  EventEmitter class with optional optimization_id
- Update GEPA_IMPLEMENTATION.md to reflect parent_ids tracking fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…through context

- Replace CandidateConfig dataclass with dict[str, Any] type alias
- Add baseline_config field to OptimizationContext (caller-provided, opaque)
- Orchestrator passes baseline_config through without knowing its structure
- Optimizers copy baseline_config and override prompt_messages only
- Remove result_aggregator module (inlined into evaluation_adapter)
- Move gepa imports to runtime (lazy) for optional dependency
- Fix protocol.py training_set/validation_set types to list[dict]
- Update ADDING_AN_OPTIMIZER.md to reflect all changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +194 to +204
@patch("gepa.core.adapter.EvaluationBatch")
def test_evaluate_delegates_to_evaluation_adapter(self, mock_eb_cls):
mock_eb_cls.side_effect = lambda **kwargs: SimpleNamespace(**kwargs)

mock_eval_adapter = MagicMock()
trial = _make_trial("c-1", 0.75)
raw_result = _make_raw_result([("id-1", 1.0, ["Good"]), ("id-2", 0.0, ["Bad"])])
mock_eval_adapter.evaluate_with_details.return_value = (trial, raw_result)

adapter = _build_adapter(mock_eval_adapter)
adapter._base_messages = [{"role": "user", "content": "Say {question}"}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests in tests/unit/test_gepa_optimizer.py patch gepa.core.adapter.EvaluationBatch/gepa.optimize, so simply importing the module already requires the optional gepa package and causes the unit suite to fail when gepa isn't installed; can we move these tests into tests/library_integration/gepa (guarded by pytest.importorskip("gepa")) so the unit run stays fast and doesn't depend on that heavy optional dependency?

Finding type: Isolate tests, be fast


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In apps/opik-optimizer/tests/unit/test_gepa_optimizer.py around lines 194 to 204, the
tests patch gepa.core.adapter.EvaluationBatch and require the optional gepa package at
import time, causing unit test failures when gepa is not installed. Refactor by moving
all tests that import or patch gepa (starting at ~line 194 and any other blocks using
gepa or patch("gepa.optimize")) into a new file under
tests/library_integration/gepa/test_gepa_optimizer_integration.py and at the top of that
new file call pytest.importorskip("gepa") to skip when gepa is absent; alternatively, if
you prefer keeping the tests in-place, wrap each gepa-dependent test/class with
pytest.importorskip("gepa") before those tests run. Ensure unit tests no longer import
gepa at module import time and update any import paths in the moved tests accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 3eae9fb addressed this comment by adding pytest.importorskip("gepa") at the top of apps/opik-optimizer/tests/library_integration/gepa/test_gepa_optimizer.py, preventing import-time failures when gepa is not installed by skipping the test module.

…dependency on gepa

The gepa tests patch gepa.core.adapter.EvaluationBatch and gepa.optimize,
requiring the optional gepa package at import time. Moving them to
tests/library_integration/gepa/ with pytest.importorskip("gepa") keeps
the unit suite fast and dependency-free.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation Frontend Infrastructure java Pull requests that update Java code Python SDK python Pull requests that update Python code tests Including test files, or tests related like configuration. typescript *.ts *.tsx

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant