[SDK][Python] Add Weaviate retrieval integration#5424
[SDK][Python] Add Weaviate retrieval integration#5424vincentkoc wants to merge 6 commits intomainfrom
Conversation
📋 PR Linter Failed❌ Invalid Title Format. Your PR title must include a ticket/issue number and may optionally include component tags (
Example: ❌ Incomplete Issues Section. You must reference at least one GitHub issue ( |
1 similar comment
📋 PR Linter Failed❌ Invalid Title Format. Your PR title must include a ticket/issue number and may optionally include component tags (
Example: ❌ Incomplete Issues Section. You must reference at least one GitHub issue ( |
| from opik.integrations._retrieval_tracker import ( | ||
| RetrievalTrackingConfig, | ||
| as_tuple, | ||
| patch_retrieval_client, | ||
| ) |
There was a problem hiding this comment.
Direct symbol import from _retrieval_tracker violates the Python SDK guideline (sdks/python/AGENTS.md) that new code should prefer module-style imports; can we import the module (e.g. import opik.integrations._retrieval_tracker as retrieval_tracker) and reference retrieval_tracker.RetrievalTrackingConfig/as_tuple/patch_retrieval_client instead?
Finding type: AI Coding Guidelines
- Apply fix with Baz
Other fix methods
Prompt for AI Agents:
In sdks/python/src/opik/integrations/weaviate/opik_tracker.py around lines 5 to 9, the
file uses direct symbol imports from opik.integrations._retrieval_tracker. Refactor the
imports to use a module-style import (for example: import
opik.integrations._retrieval_tracker as retrieval_tracker) and update the function body
in track_weaviate to reference retrieval_tracker.RetrievalTrackingConfig,
retrieval_tracker.as_tuple, and retrieval_tracker.patch_retrieval_client instead of the
directly imported names. Ensure no other behavior changes and run tests/linters to
confirm imports follow the SDK guideline.
| def track_weaviate(weaviate_client_or_collection: Any, project_name: Optional[str] = None) -> Any: | ||
| """Adds Opik tracking wrappers to a Weaviate client or collection/query object.""" | ||
| config = RetrievalTrackingConfig( | ||
| provider="weaviate", | ||
| operation_paths=as_tuple( | ||
| [ | ||
| "query.get", | ||
| "query.raw", | ||
| "query.hybrid", | ||
| "query.near_text", | ||
| "query.near_vector", | ||
| "query.fetch_objects", | ||
| "query.bm25", | ||
| ] | ||
| ), | ||
| project_name=project_name, | ||
| ) | ||
| return patch_retrieval_client(weaviate_client_or_collection, config) |
There was a problem hiding this comment.
sdks/python/AGENTS.md mandates that behavior changes get tests; the new Weaviate integration wraps client methods via patch_retrieval_client but no new test under sdks/python/tests/ exercises track_weaviate or the tracking patch/idempotency, so this new behavior is unverified. Can we add unit/ integration coverage that exercises track_weaviate (and the shared patcher) to prove it patches the expected methods and stays idempotent?
Finding type: AI Coding Guidelines
- Apply fix with Baz
Other fix methods
Prompt for AI Agents:
In sdks/python/src/opik/integrations/weaviate/opik_tracker.py around lines 12-29, the
function `track_weaviate` adds a new integration but there are no tests verifying it
patches the expected methods or is idempotent. Add unit tests under sdks/python/tests/
that: create a minimal fake Weaviate client/collection object with the method names
listed in operation_paths (e.g., query.get, query.raw, etc.), monkeypatch or mock
`opik.integrations._retrieval_tracker.patch_retrieval_client` to capture the passed
client and config, call `track_weaviate` and assert patch_retrieval_client was called
once with provider == 'weaviate' and the exact operation_paths; also add a test that
calling `track_weaviate` twice does not double-wrap (idempotency) by asserting
patch_retrieval_client handles already-patched clients or that methods remain callable
only once-wrapped. Include cleanup and clear comments specifying which behavior each
test asserts.
There was a problem hiding this comment.
Commit 1e24723 addressed this comment by adding new unit tests in sdks/python/tests/unit/integrations/test_weaviate_tracker.py that exercise track_weaviate and verify it wraps nested collection.query.* methods (asserting opik.track is applied and produces the expected weaviate.<operation> names/metadata). It also adds an explicit idempotency test that calls track_weaviate twice on the same object and asserts no additional tracking wrappers are applied on the second call.
📋 PR Linter Failed❌ Invalid Title Format. Your PR title must include a ticket/issue number and may optionally include component tags (
Example: ❌ Incomplete Issues Section. You must reference at least one GitHub issue ( |
1 similar comment
📋 PR Linter Failed❌ Invalid Title Format. Your PR title must include a ticket/issue number and may optionally include component tags (
Example: ❌ Incomplete Issues Section. You must reference at least one GitHub issue ( |
| ```python | ||
| import weaviate | ||
| from opik.integrations.weaviate import track_weaviate | ||
|
|
||
| client = weaviate.connect_to_local() | ||
| collection = client.collections.get("docs") | ||
| collection = track_weaviate(collection, project_name="retrieval-demo") | ||
|
|
||
| result = collection.query.hybrid( | ||
| query="What is Opik?", | ||
| limit=5, | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Hand-pasted track_weaviate snippet has no intent/trigger sentence, no required-vs-optional param notes, and no link to canonical/generated example, so readers can't tell when to use it or how to keep it synced; can we add a one-line intent note, mark which params are required/optional, and link to the autogenerated source with a maintenance note?
Finding types: Keep docs accurate
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
In apps/opik-documentation/documentation/fern/docs/tracing/integrations/weaviate.mdx
around lines 17-29, the usage snippet for track_weaviate lacks intent/context,
required-vs-optional parameter notes, and a link/maintenance pointer to the canonical
autogenerated SDK example. Edit the doc to: 1) insert a one-line intent/trigger sentence
immediately above the code block explaining when to use track_weaviate (e.g., "Use
track_weaviate to wrap a Weaviate collection so Opik records retrieval tool spans for
query operations"); 2) add a short inline comment or one-sentence note after the import
or before calling track_weaviate that lists which parameters are required vs optional
(for example: "project_name is optional and used for tagging; other client config is
passed through to the Weaviate client"); and 3) add a final sentence after the snippet
linking to the autogenerated SDK example (or noting its path) with a maintenance note
stating the canonical source file/path, who is responsible for refreshing it, and the
update cadence (for example: "This snippet is derived from <path/to/source> and should
be refreshed by the docs team when API changes occur or every 3 months"). Make these
changes concise and in-place so readers know why to use the snippet, what params they
must supply, and where to sync updates.
Details
Adds first-pass Python retrieval integration wrapper for Weaviate.
This PR includes:
track_weaviate__init__.py)Change checklist
Issues
Testing
cd sdks/python && PYTHONPATH=src pytest tests/unit/integrations/test_weaviate_tracker.py2 passedDocumentation
docs/tracing/integrations/weaviate.mdxdocs.yml