-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Fine tuning example works with Llama-3.1 (8b)
with Transformers version 4.43.3
and modification of --rope-scaling
in model's config.json
.
Below is the error log without rope-scaling modification
Traceback (most recent call last): File "/root/optimum-tpu/examples/custom/train.py", line 138, in <module> model, tokenizer = create_and_prepare_model(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/examples/custom/train.py", line 72, in create_and_prepare_model model = AutoModelForCausalLM.from_pretrained(args.model_name, use_cache=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling.py", line 64, in from_pretrained model = cls.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/workflow/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3788, in from_pretrained model = cls(config, *model_args, **model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 1180, in __init__ self.model = LlamaModel(config, rank, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in __init__ [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in <listcomp> [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 746, in __init__ self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 333, in __init__ self._init_rope() File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 343, in _init_rope scaling_type = self.config.rope_scaling["type"] ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^ KeyError: 'type'
I would really appreciate your suggestions/plans on the following:
- How can we handle --rope_scaling issue properly?
- Will there be an upgrade of Transformers version to support Llama-3.1 (8b) in near future?