-
Notifications
You must be signed in to change notification settings - Fork 262
Short inputs cause /embed to randomly return empty vectors. #557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @superkelvint, thanks for reporting! Do you see this happening with a specific model / architecture type? Could you share the model IDs or architecture types of the models you tried that ran into that issue on empty / null embeddings? In order to have a reproducer could you share which device are you using GPU, CPU or MPS, which architecture and model ID from the Hugging Face Hub? Thanks in advance 🤗 e.g. I tried |
@alvarobartt Thanks for your reply. I'm using https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0 on turing-1.6. Unfortunately this is the only GPU I have, otherwise I would try it on another gpu architecture. I can confirm this doesn't happen at all on cpu-1.6. |
I just tried using ibm-granite/granite-embedding-125m-english on turing-1.6 and I'm not seeing the issue either. |
Hey @superkelvint thanks for the information, I'll try to reproduce on my own before closing the issue then, but please let us know if this happens ever again or if you happen to have a consistent reproducer 🤗 Thanks again! |
This has been a long-time know issue first documented in #53. The solution would be to turn off Flash Attention for that model! |
System Info
version: text-embedding-inference 1.6.1
OS: ubuntu 24.04
python: 3.12.3
Embedding short (single-word) inputs randomly cause null vectors.
Information
Tasks
Reproduction
But the next invocation to the exact same URL yields vectors:
This seems to be happening for all models.
When inputs have > ~5 words, they consistently return vectors.
Expected behavior
Expected to return vectors regardless of input length.
The text was updated successfully, but these errors were encountered: