-
Couldn't load subscription status.
- Fork 2.2k
Intel Arc Support #791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Intel Arc Support #791
Conversation
- bump tinygrad
|
Llama 3.2 1B - can't find eos_token? |
|
@joshuacoles; sorry to ping you out of the blue, I noticed you're working on #734; I feel like you'd have some insight into how eos token parsing works through that. Am I out to lunch thinking Exo is just not seeing the eos token, so it keeps going (GPU activity seems to be pinned; though the token count stops increasing)? Thinking to maybe rebase onto your branch to see if your changes might help with resolving this behaviour. Grateful if you could share any pointers or directions of what to look at next... 🍻
|
|
I am out and haven't got a chance to give this more than a glance (will look at it properly when I get the chance), but IIRC EOS determination is done in two places,
I think these can happen independently, it is possible for inference to continue (ie GPU usage) past when the API has stopped serving new tokens, which looks like it might it be your experience? I see in one log an error at the start about not being able to find the EOS id (from the tokenizers.py), and in another log later down a reference from ChatGPTAPI containing the eos_token_id which would lead credence to this theory. So first port of call would be seeing if the EOS is determined differently in these different places and how these interacts with your model / inference engine. As I said I'll have a better look over this later on, probably tomorrow. |
|
Taking a closer look at this, I think something is going wrong during the inference process rather than in EOS token determination. The initial error in your logs, Seems to be occurring as we initially load the tokenizer with This means that the initial loading attempt in If we look at the logs from the ChatGPT API we see that is has correctly determined the EOS token id however it is receiving an invalid token output (128256 being the vocab size of the model and hence outside of the valid token range) from the inference process at each iteration. Looking at your changes I see you updated the tinygrad version, I presume to support the new hardware, so I would suggest focusing on stepping through the tinygrad inference code to see if you can spot where the issue originates. This is doubly so with your later error of |
Thank you so much for your guidance, I'll chase that path. 🍻 |
|
@joshuacoles - thanks for the assist, I was able to track down the issue... Essentially the render for Intel doesn't come on unless both I not sure Not super fast at like ~8 tok/s, but faster than CPU-only (CPU is <1 tok/s) |
|
Very nice PR, I am not very familiar with this codebase though, does this PR adds support for the newest Battlemage GPUs? I don't know if the tinygrad version you put has support for them. I think we also might need to add other Intel arc devices to the list of known devices, like the A750 A580 A380 and A310. Also if the arc B series are supported we also need to add the B580 and B570. I will have soon a B580 to test things out, if needed I have a couple friend that already have some I can borrow for some short tests. |
a39f85b to
56f783b
Compare

Initial attempt at Intel Arc support
Eventually fix #557
TODO: