When training a unified model, TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal' #41

zillion-zhao · 2024-06-14T02:29:45Z

Hello!

I meet a problem when I train the model in the unified mode.

First, I would like to share that when I evaluate several models in the artifacts (for example bbcc-mean, cccc-lasttoken, and cccc-wmean), it is also shown that info: TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.

To tackle the problem, I deem that only when the model is loaded with the class of MistralForCausalLM in modeling_gritlm7b.py, the is_causal argument is meaningful. Otherwise, it I do not put the modeling_gritlm7b.py in the model directory, the model is loaded as the MistralForCausalLM in the transformers lib, which do not have "is_causal". Besides, I think that the model config file should also be modified by adding:
"auto_map": {
"AutoModel": "modeling_gritlm7b.MistralModel",
"AutoModelForCausalLM": "modeling_gritlm7b.MistralForCausalLM",
"AutoModelForSequenceClassification": "modeling_gritlm7b.MistralForSequenceClassification"
},

I fix this issue for evaluation by executing the behaviors above and it succeeds. However, I meet the same question when I train the model. I download Mistral-7B, add modeling_gritlm7b.py, and modify the config file. However, it still shows TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.

I guess maybe the model is not loaded correctly, so I print the type of the model in the run.py after loading the model:

model = GritLMTrainModel(
    model_name_or_path=model_args.model_name_or_path,
    normalized=model_args.normalized,
    pooling_method=model_args.pooling_method,
    negatives_cross_device=training_args.negatives_cross_device,
    temperature=training_args.temperature,
    mode=training_args.mode,
    projection=model_args.projection,
    attn=model_args.attn,
    attn_implementation=model_args.attn_implementation,
    torch_dtype=args_to_dtype(training_args),
    loss_gen_type=training_args.loss_gen_type,
    loss_gen_factor=training_args.loss_gen_factor,
    use_cache=False,
    # Critical to make Mixtral work
    low_cpu_mem_usage=True,
    quantization_config=quantization_config,
    load_in_4bit=load_in_4bit,
)
**print(type(model.model))**

The result is <class 'transformers_modules.Mistral-7B.modeling_gritlm7b.MistralForCausalLM'>, which is correct. So I want to know what is the problem? How can I modify some codes to make it work?

The training command:
torchrun --nproc_per_node 1
-m training.run
--output_dir output_dir
--model_name_or_path ../models/Mistral-7B
--train_data ../data/unified_data
--learning_rate 1e-5
--num_train_epochs 5
--per_device_train_batch_size 5
--per_device_generative_bs 1
--dataloader_drop_last True
--normalized True
--temperature 0.02
--query_max_len 32
--passage_max_len 128
--train_group_size 2
--mode unified
--max_steps 1253
--attn cccc
--overwrite_output_dir
--lora

Waiting for your kind reply! :)

The text was updated successfully, but these errors were encountered:

Muennighoff · 2024-06-14T02:34:55Z

If you are certain you are using https://github.com/ContextualAI/gritlm/blob/main/scripts/modeling_mistral_gritlm.py or https://huggingface.co/GritLM/GritLM-7B/blob/main/modeling_gritlm7b.py , then I am not sure what the problem is. Maybe try pip show transformers and replace the modeling_mistral.py file with one of the correct Python files. Else this seems like a simple issue that can just be solved by debugging with print statements.

zillion-zhao · 2024-06-14T02:44:03Z

Yes, maybe there are some small problems. I try to print the type of the model in the training/model.py:

def encode(self, features):
    print(type(self.model))

and it shows: <class 'peft.peft_model.PeftModel'>

Maybe Lora influence the model type? I am not clear about it. Do you train the model in a full fine-tuning manner?

zillion-zhao · 2024-06-14T02:46:15Z

When I remove --lora, it shows CUDA out of memory^ ^. Maybe it is really due to the Lora. Maybe I could use more GPUs, but why the Lora influence the model type?

Muennighoff · 2024-06-14T03:42:19Z

I see, yes it could be because of Lora. I think that the Peft library wraps the transformer model and this could change the kwargs that are passed through. You may need to change something in https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/model.py to pass it through.

We do full fine-tuning; I haven't really tried Lora with GRIT.

zillion-zhao · 2024-06-14T04:38:05Z

I see. Thank you for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When training a unified model, TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal' #41

When training a unified model, TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal' #41

zillion-zhao commented Jun 14, 2024

Muennighoff commented Jun 14, 2024

Uh oh!

zillion-zhao commented Jun 14, 2024

Uh oh!

zillion-zhao commented Jun 14, 2024

Uh oh!

Muennighoff commented Jun 14, 2024

Uh oh!

zillion-zhao commented Jun 14, 2024

Uh oh!

When training a unified model, TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal' #41

When training a unified model, TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal' #41

Comments

zillion-zhao commented Jun 14, 2024

Muennighoff commented Jun 14, 2024

Uh oh!

zillion-zhao commented Jun 14, 2024

Uh oh!

zillion-zhao commented Jun 14, 2024

Uh oh!

Muennighoff commented Jun 14, 2024

Uh oh!

zillion-zhao commented Jun 14, 2024

Uh oh!