Skip to content

Support for Llama-3.1 (8b) - inference #80

@Bihan

Description

@Bihan

Fine tuning example works with Llama-3.1 (8b) with Transformers version 4.43.3 and modification of --rope-scaling in model's config.json.

Below is the error log without rope-scaling modification

Traceback (most recent call last): File "/root/optimum-tpu/examples/custom/train.py", line 138, in <module> model, tokenizer = create_and_prepare_model(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/examples/custom/train.py", line 72, in create_and_prepare_model model = AutoModelForCausalLM.from_pretrained(args.model_name, use_cache=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling.py", line 64, in from_pretrained model = cls.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/workflow/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3788, in from_pretrained model = cls(config, *model_args, **model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 1180, in __init__ self.model = LlamaModel(config, rank, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in __init__ [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in <listcomp> [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 746, in __init__ self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 333, in __init__ self._init_rope() File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 343, in _init_rope scaling_type = self.config.rope_scaling["type"] ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^ KeyError: 'type'

I would really appreciate your suggestions/plans on the following:

  1. How can we handle --rope_scaling issue properly?
  2. Will there be an upgrade of Transformers version to support Llama-3.1 (8b) in near future?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions