Support for Llama-3.1 (8b) - inference

Fine tuning [example](https://github.com/huggingface/optimum-tpu/blob/main/examples/language-modeling/llama_tuning.md) **works** with `Llama-3.1 (8b)` with `Transformers version 4.43.3` and modification of `--rope-scaling`  in model's `config.json`. 

Below is the error log without rope-scaling modification

`Traceback (most recent call last):
  File "/root/optimum-tpu/examples/custom/train.py", line 138, in <module>
    model, tokenizer = create_and_prepare_model(args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/optimum-tpu/examples/custom/train.py", line 72, in create_and_prepare_model
    model = AutoModelForCausalLM.from_pretrained(args.model_name, use_cache=False)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/optimum-tpu/optimum/tpu/modeling.py", line 64, in from_pretrained
    model = cls.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/workflow/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3788, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 1180, in __init__
    self.model = LlamaModel(config, rank, world_size)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in __init__
    [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)]
  File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in <listcomp>
    [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 746, in __init__
    self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 333, in __init__
    self._init_rope()
  File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 343, in _init_rope
    scaling_type = self.config.rope_scaling["type"]
                   ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
KeyError: 'type'
`


 **I would really appreciate your suggestions/plans on the following:**
1. How can we handle --rope_scaling issue properly?
2. Will there be an upgrade of Transformers version to support Llama-3.1 (8b) in near future?






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Llama-3.1 (8b) - inference #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Llama-3.1 (8b) - inference #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions