Add TwelveLabs video integration (Marengo embeddings + Pegasus analysis)#175
Open
mohit-twelvelabs wants to merge 1 commit into
Open
Conversation
- TwelveLabsEmbedding: Marengo multimodal embeddings (512-dim shared text/image/audio/video space), registered as 'twelvelabs' in EmbeddingFactory - TwelveLabsVideoAnalyzer: Pegasus video-to-text understanding for grounding - Example, unit tests (API-key gated), requirements and README update Opt-in and non-breaking: no existing defaults or behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).
This PR adds an opt-in video modality to TrustRAG via TwelveLabs, so videos become first-class citizens in a RAG pipeline.
What it adds
TwelveLabsEmbedding(trustrag/modules/vector/embedding.py) — Marengo multimodal embeddings. Marengo embeds text, image, audio and video into one shared 512-dim space, so a plain-text query can be matched directly against video clips with the existingEmbeddingGenerator.cosine_similarity. It implements the standardgenerate_embeddingstext interface (drop-inEmbeddingGenerator) plusembed_image/embed_audio.TwelveLabsVideoAnalyzer— Pegasus video understanding (video → text), useful for turning videos into searchable/grounded passages (summaries, Q&A over a clip).'twelvelabs'inEmbeddingFactoryalongside the existing providers.examples/vectors/twelvelabs_embedding_example.py), API-key-gated unit tests,twelvelabs>=1.2.8inrequirements.txt, and a README update-log entry.Why it helps
TrustRAG already supports text/image embeddings and multimodal Q&A; this extends retrieval and grounding to video, a modality not currently covered, using the same factory/registry pattern as the other embedding providers.
Opt-in & non-breaking
No existing defaults or behavior change. The provider is only used when explicitly selected (
EmbeddingFactory.create_embedding_generator('twelvelabs')or constructing the class directly), and the SDK is imported lazily inside the classes.How it was tested
generate_embeddings([...])returns shape(n, 512).analyze()raisesValueErrorwhen no video source is given.flake8clean on all changed files (against the repo's.flake8).You can grab a free API key at https://twelvelabs.io — there's a generous free tier.