Adding XTR from Rethinking the Role of Token Retrieval in Multi-Vector Retrieval #30

arthur-75 · 2024-05-04T19:48:41Z

No description provided.

I have coded XTR from google, "Rethinking the Role of Token Retrieval in Multi-Vector Retrieval", we still need to optimize the code and to add Missing similarity imputation, please let me know if u have any question.

raphaelsty · 2024-05-31T08:58:38Z

Thank you @arthur-75 for this MR, the best I think would be to add an index directory with a file annoy.py in this directory.

The class would be Annoy() with the parameters dedicated to create the vector database: https://github.com/spotify/annoy

The Annoy index would have a add() method which take as input the documents_embeddings parameter, in order to upload the documents_embeddings.

Then it would have a __call__ method which take as input queries_embeddings: dict[str, torch.tensor], k: int = 100, batch_size: int = 32 and then retrieve the top_k documents_embeddings given the set of queries_embeddings in batch.

Once we have the index method, we can create an XTR object which will take as input an index object such as Annoy, key, on, model.

The XTR object will have an add method, which will simply call the add method of XTR.

The XTR object should inherit from ColBERT retriever.

The __call__ method of XTR will query the index and then post-process the embeddings similarities in order to compute the XTR score.

Also you should properly set up ruff in order to format your code, this is really useful 👍

Raphael Sourty and others added 3 commits April 18, 2024 00:41

init explorer

56f3332

intialize explorer

48d66c1

Create xtr.py

198a6b4

I have coded XTR from google, "Rethinking the Role of Token Retrieval in Multi-Vector Retrieval", we still need to optimize the code and to add Missing similarity imputation, please let me know if u have any question.

arthur-75 changed the title ~~Patch 2~~ Adding XTR from Rethinking the Role of Token Retrieval in Multi-Vector Retrieval May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding XTR from Rethinking the Role of Token Retrieval in Multi-Vector Retrieval #30

Adding XTR from Rethinking the Role of Token Retrieval in Multi-Vector Retrieval #30

Uh oh!

arthur-75 commented May 4, 2024

Uh oh!

raphaelsty commented May 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adding XTR from Rethinking the Role of Token Retrieval in Multi-Vector Retrieval #30

Are you sure you want to change the base?

Adding XTR from Rethinking the Role of Token Retrieval in Multi-Vector Retrieval #30

Uh oh!

Conversation

arthur-75 commented May 4, 2024

Uh oh!

raphaelsty commented May 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants