[Design] Feature Gen AI data ingestion workflow / pipeline

*Feature issue : #1706*

Write a design proposals for Gen AI data ingestion workflow using : 

- Gitlab pipeline as data ingestion scheduler
- OpenSearch as vector DB provider
- AWS lambda to run ingestion script with access to the database
- AWS for infrastructure (this design may include GCP GKE reflexion also)
- Langfuse as test dataset storage solution
- Reuse as much as possible existing python tooling : [tock-llm-indexing-tools](https://github.com/theopenconversationkit/tock/blob/tock-24.3.4/gen-ai/orchestrator-server/src/main/python/tock-llm-indexing-tools/README.md)
- **Optional** Ragas for evaluators

*Design should be reviewed and approved before starting any development to be sure that we are developing in the right direction.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Design] Feature Gen AI data ingestion workflow / pipeline #1707

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Design] Feature Gen AI data ingestion workflow / pipeline #1707

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions