Skip to content

[Design] Feature Gen AI data ingestion workflow / pipeline #1707

@Benvii

Description

@Benvii

Feature issue : #1706

Write a design proposals for Gen AI data ingestion workflow using :

  • Gitlab pipeline as data ingestion scheduler
  • OpenSearch as vector DB provider
  • AWS lambda to run ingestion script with access to the database
  • AWS for infrastructure (this design may include GCP GKE reflexion also)
  • Langfuse as test dataset storage solution
  • Reuse as much as possible existing python tooling : tock-llm-indexing-tools
  • Optional Ragas for evaluators

Design should be reviewed and approved before starting any development to be sure that we are developing in the right direction.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions