-
Notifications
You must be signed in to change notification settings - Fork 607
Commit 1727aa1
fix eval llama (#4469)
Summary:
Pull Request resolved: #4469
Previously the refactor moves files from `examples/...` to `extensions/...`, however llama eval was not covered by CI, fix it here
before:
```
(executorch) chenlai@chenlai-mbp executorch % python -m examples.models.llama2.eval_llama -c /Users/chenlai/Documents/stories110M/stories110M/stories110M.pt -p /Users/chenlai/Documents/stories110M/stories110M/params.json -t /Users/chenlai/Documents/stories110M/stories110M/tokenizer.model -d fp32 --max_seq_len 127 --limit 5
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:106: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
impl_abstract("quantized_decomposed::embedding_byte.out")
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:153: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
impl_abstract("quantized_decomposed::embedding_byte.dtype_out")
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:228: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
impl_abstract("quantized_decomposed::embedding_4bit.out")
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:281: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
impl_abstract("quantized_decomposed::embedding_4bit.dtype_out")
Traceback (most recent call last):
File "/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/chenlai/executorch/examples/models/llama2/eval_llama.py", line 13, in <module>
from .eval_llama_lib import build_args_parser, eval_llama
File "/Users/chenlai/executorch/examples/models/llama2/eval_llama_lib.py", line 19, in <module>
from executorch.extension.llm.export import LLMEdgeManager
ImportError: cannot import name 'LLMEdgeManager' from 'executorch.extension.llm.export' (/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/__init__.py)
(executorch) chenlai@chenlai-mbp executorch %
(executorch) chenlai@chenlai-mbp executorch %
```
after
```
(executorch) chenlai@chenlai-mbp executorch % python -m examples.models.llama2.eval_llama -c /Users/chenlai/Documents/stories110M/stories110M/stories110M.pt -p /Users/chenlai/Documents/stories110M/stories110M/params.json -t /Users/chenlai/Documents/stories110M/stories110M/tokenizer.model -d fp32 --max_seq_len 127 --limit 5
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:106: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
impl_abstract("quantized_decomposed::embedding_byte.out")
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:153: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
impl_abstract("quantized_decomposed::embedding_byte.dtype_out")
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:228: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
impl_abstract("quantized_decomposed::embedding_4bit.out")
/opt/homebrew/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:281: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
impl_abstract("quantized_decomposed::embedding_4bit.dtype_out")
2024-07-30:12:36:04,260 INFO [tokenizer.py:33] #words: 32000 - BOS ID: 1 - EOS ID: 2
2024-07-30:12:36:04,260 INFO [export_llama_lib.py:419] Applying quantizers: []
2024-07-30:12:36:04,260 INFO [export_llama_lib.py:594] Loading model with checkpoint=/Users/chenlai/Documents/stories110M/stories110M/stories110M.pt, params=/Users/chenlai/Documents/stories110M/stories110M/params.json, use_kv_cache=False, weight_type=WeightType.LLAMA
/Users/chenlai/executorch/examples/models/llama2/model.py:99: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
2024-07-30:12:36:04,315 INFO [export_llama_lib.py:616] Loaded model with dtype=torch.float32
2024-07-30:12:36:04,395 INFO [huggingface.py:162] Using device 'cpu'
2024-07-30:12:36:27,262 WARNING [task.py:763] [Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
2024-07-30:12:36:27,262 WARNING [task.py:775] [Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
2024-07-30:12:36:27,262 WARNING [task.py:763] [Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
2024-07-30:12:36:27,262 WARNING [task.py:775] [Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
2024-07-30:12:36:27,262 WARNING [task.py:763] [Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte
2024-07-30:12:36:27,262 WARNING [task.py:775] [Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False
Repo card metadata block was not found. Setting CardData to empty.
2024-07-30:12:36:29,494 WARNING [repocard.py:107] Repo card metadata block was not found. Setting CardData to empty.
2024-07-30:12:36:30,401 INFO [task.py:395] Building contexts for wikitext on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 718.57it/s]
2024-07-30:12:36:30,410 INFO [evaluator.py:362] Running loglikelihood_rolling requests
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:14<00:00, 2.91s/it]
wikitext: {'word_perplexity,none': 10885.215324239069, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 6.144013518032613, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.6191813902741017, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
ghstack-source-id: 235865354
exported-using-ghexport
Reviewed By: larryliu0820
Differential Revision: D60466386
fbshipit-source-id: 0032af8b3269f107469fe142382dfacb067518081 parent e03181d commit 1727aa1Copy full SHA for 1727aa1
File tree
Expand file treeCollapse file tree
1 file changed
+11
-0
lines changedFilter options
- extension/llm/export
Expand file treeCollapse file tree
1 file changed
+11
-0
lines changedextension/llm/export/__init__.py
Copy file name to clipboard+11Lines changed: 11 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + |
0 commit comments