You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I meet a problem when I train the model in the unified mode.
First, I would like to share that when I evaluate several models in the artifacts (for example bbcc-mean, cccc-lasttoken, and cccc-wmean), it is also shown that info: TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.
To tackle the problem, I deem that only when the model is loaded with the class of MistralForCausalLM in modeling_gritlm7b.py, the is_causal argument is meaningful. Otherwise, it I do not put the modeling_gritlm7b.py in the model directory, the model is loaded as the MistralForCausalLM in the transformers lib, which do not have "is_causal". Besides, I think that the model config file should also be modified by adding:
"auto_map": {
"AutoModel": "modeling_gritlm7b.MistralModel",
"AutoModelForCausalLM": "modeling_gritlm7b.MistralForCausalLM",
"AutoModelForSequenceClassification": "modeling_gritlm7b.MistralForSequenceClassification"
},
I fix this issue for evaluation by executing the behaviors above and it succeeds. However, I meet the same question when I train the model. I download Mistral-7B, add modeling_gritlm7b.py, and modify the config file. However, it still shows TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.
I guess maybe the model is not loaded correctly, so I print the type of the model in the run.py after loading the model:
model = GritLMTrainModel(
model_name_or_path=model_args.model_name_or_path,
normalized=model_args.normalized,
pooling_method=model_args.pooling_method,
negatives_cross_device=training_args.negatives_cross_device,
temperature=training_args.temperature,
mode=training_args.mode,
projection=model_args.projection,
attn=model_args.attn,
attn_implementation=model_args.attn_implementation,
torch_dtype=args_to_dtype(training_args),
loss_gen_type=training_args.loss_gen_type,
loss_gen_factor=training_args.loss_gen_factor,
use_cache=False,
# Critical to make Mixtral work
low_cpu_mem_usage=True,
quantization_config=quantization_config,
load_in_4bit=load_in_4bit,
)
**print(type(model.model))**
The result is <class 'transformers_modules.Mistral-7B.modeling_gritlm7b.MistralForCausalLM'>, which is correct. So I want to know what is the problem? How can I modify some codes to make it work?
When I remove --lora, it shows CUDA out of memory^ ^. Maybe it is really due to the Lora. Maybe I could use more GPUs, but why the Lora influence the model type?
Hello!
I meet a problem when I train the model in the unified mode.
First, I would like to share that when I evaluate several models in the artifacts (for example bbcc-mean, cccc-lasttoken, and cccc-wmean), it is also shown that info: TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.
To tackle the problem, I deem that only when the model is loaded with the class of MistralForCausalLM in modeling_gritlm7b.py, the is_causal argument is meaningful. Otherwise, it I do not put the modeling_gritlm7b.py in the model directory, the model is loaded as the MistralForCausalLM in the transformers lib, which do not have "is_causal". Besides, I think that the model config file should also be modified by adding:
"auto_map": {
"AutoModel": "modeling_gritlm7b.MistralModel",
"AutoModelForCausalLM": "modeling_gritlm7b.MistralForCausalLM",
"AutoModelForSequenceClassification": "modeling_gritlm7b.MistralForSequenceClassification"
},
I fix this issue for evaluation by executing the behaviors above and it succeeds. However, I meet the same question when I train the model. I download Mistral-7B, add modeling_gritlm7b.py, and modify the config file. However, it still shows TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.
I guess maybe the model is not loaded correctly, so I print the type of the model in the run.py after loading the model:
The result is <class 'transformers_modules.Mistral-7B.modeling_gritlm7b.MistralForCausalLM'>, which is correct. So I want to know what is the problem? How can I modify some codes to make it work?
The training command:
torchrun --nproc_per_node 1
-m training.run
--output_dir output_dir
--model_name_or_path ../models/Mistral-7B
--train_data ../data/unified_data
--learning_rate 1e-5
--num_train_epochs 5
--per_device_train_batch_size 5
--per_device_generative_bs 1
--dataloader_drop_last True
--normalized True
--temperature 0.02
--query_max_len 32
--passage_max_len 128
--train_group_size 2
--mode unified
--max_steps 1253
--attn cccc
--overwrite_output_dir
--lora
Waiting for your kind reply! :)
The text was updated successfully, but these errors were encountered: