Finetune Problem 微调的各种问题 #270
Replies: 42 comments 81 replies
-
您好!很高兴发现了CogLVM 关于微调有一下问题: |
Beta Was this translation helpful? Give feedback.
-
hi,我在按照示例来微调
我自己的参数是(_name_or_path='/mntnlp/common_base_model/cogvqa/cogagent', architectures=['CogAgentForCausalLM'], attention_dropout=0.1, auto_map={'AutoConfig': 'configuration_cogagent.CogAgentConfig', 'AutoModelForCausalLM': 'modeling_cogagent.CogAgentForCausalLM'}, batch_from_same_dataset=False, batch_size=4, bf16=False, block_size=10000, bos_token_id=1, checkpoint_activations=False, checkpoint_num_layers=1, checkpoint_skip_layers=0, cross_compute_hidden_size=1024, cross_hidden_size=1024, cross_image_pix=1120, cross_image_size=1120, cuda=True, deepscale=False, deepscale_config=None, deepspeed=True, deepspeed_activation_checkpointing=False, deepspeed_config={'train_micro_batch_size_per_gpu': 4, 'gradient_accumulation_steps': 1, 'gradient_clipping': 0.1, 'fp16': {'enabled': False, 'loss_scale': 0, 'loss_scale_window': 200, 'hysteresis': 2, 'min_loss_scale': 0.01}, 'bf16': {'enabled': False}, 'optimizer': {'type': 'AdamW', 'params': {'lr': 0.0001, 'weight_decay': 0.01}}}, deepspeed_mpi=False, device=0, distributed_backend='nccl', drop_path=0.0, eos_token_id=2, epochs=None, eva_args={'model_parallel_size': 1}, eval_batch_size=None, eval_interval=None, eval_iters=100, exit_interval=None, experiment_name='finetune-/mntnlp/common_base_model/cogvqa', fp16=False, from_pretrained='/mntnlp/common_base_model/cogvqa', gradient_accumulation_steps=1, hidden_act='silu', hidden_dropout=0.1, hidden_size=4096, hidden_size_per_attention_head=None, ignore_pad_token_for_loss=True, image_length=256, initializer_range=0.02, inner_hidden_size=None, input_source='interactive', intermediate_size=11008, iterable_dataset=False, layer_range=None, layernorm_epsilon=1e-05, layernorm_order='pre', length_penalty=0.0, load=None, local_rank=0, local_tokenizer='/mntnlp/common_base_model/vicuna_v1.5_7b', log_interval=50, lora_rank=50, lr=0.0001, lr_decay_iters=None, lr_decay_ratio=0.1, lr_decay_style='cosine', make_vocab_size_divisible_by=128, master_ip='127.0.0.1', master_port='16666', max_inference_batch_size=12, max_length=400, max_position_embeddings=2048, max_sequence_length=512, min_tgt_length=0, mode='finetune', model_parallel_size=1, no_load_rng=False, no_repeat_ngram_size=0, no_save_rng=False, num_attention_heads=32, num_beams=1, num_hidden_layers=32, num_layers=6, num_multi_query_heads=0, num_workers=1, out_seq_length=256, output_path='./samples', pad_token_id=0, pre_seq_len=8, prefetch_factor=4, rank=0, resume_dataloader=True, rms_norm_eps=1e-05, save=None, save_args=False, save_interval=5000, seed=1234, skip_init=False, split='1000,1,1', strict_eval=False, summary_dir='', temperature=1.0, template_version='chat', test_data=None, tie_word_embeddings=False, tokenizer_type='fake', top_k=0, top_p=0.0, torch_dtype='bfloat16', train_data=['./archive_split/train'], train_data_weights=None, train_iters=2000, transformers_version='4.36.0.dev0', use_cache=True, use_gpu_initialization=False, use_lora=True, use_ptuning=False, use_qlora=False, valid_data=['./archive_split/valid'], version='chat', vision_config={'dropout_prob': 0.0, 'hidden_act': 'gelu', 'hidden_size': 1792, 'image_size': 224, 'in_channels': 3, 'intermediate_size': 15360, 'layer_norm_eps': 1e-06, 'num_heads': 16, 'num_hidden_layers': 63, 'num_positions': 257, 'patch_size': 14}, vit_checkpoint_activations=False, vocab_size=32000, warmup=0.02, weight_decay=0.01, with_id=False, world_size=1, zero_stage=0) 这个inner_hidden_size默认怎么设置呢,感激 |
Beta Was this translation helpful? Give feedback.
-
enable = ["encoder", "cross_attention", "linear_proj", 'mlp.vision', 'rotary.vision', 'eoi', 'boi', 'vit'] |
Beta Was this translation helpful? Give feedback.
-
This can be somewhat a duplicated question, So here are my questions:
|
Beta Was this translation helpful? Give feedback.
-
我在8*3090上试图精调CogAgent模型,设置MP_SIZE=4,内存(500G)在模型加载阶段就会爆掉。请问有什么办法能够减少这部分内存占用吗? |
Beta Was this translation helpful? Give feedback.
-
遇到了跟 #268 中一样的问题。除了MP_SIZE和NUM_GPUS_PER_WORKER外没有修改其他参数,基于官方SAT拖的权重进行精调,在 |
Beta Was this translation helpful? Give feedback.
-
想问一下,微调能放开backend的各层参数来微调吗?如果能的话,具体怎么做? |
Beta Was this translation helpful? Give feedback.
-
我按照demo的实例,下载好图片并设置好路径之后执行 bash finetune_cogvlm_lora.sh,得到下面的错误,想知道是什么原因 |
Beta Was this translation helpful? Give feedback.
-
Hello! |
Beta Was this translation helpful? Give feedback.
-
请问现在有没有对hf model进行Lora或者qlora的代码呢?为什么用提供的finetune script 无法微调本地下载的hf model呢? |
Beta Was this translation helpful? Give feedback.
-
CogAgent的pretrain数据和qa格式的微调数据有计划开源吗? |
Beta Was this translation helpful? Give feedback.
-
bash finetune_demo/finetune_cogvlm_lora.sh |
Beta Was this translation helpful? Give feedback.
-
finetune cogvlm最后一步save出来的文件大小是基底模型文件的两倍,有60+G,是因为float精度吗?请问怎样保存成比较小的文件可直接用于inference? the final saved file in finetune cogvlm is almost twice as big as the base model file, 60+G, is it float precision? How to save to smaller file that can be used for inferencing? |
Beta Was this translation helpful? Give feedback.
-
训练过程中模型阶段性保存,可以将这部分去掉?如何去掉? |
Beta Was this translation helpful? Give feedback.
-
遇到问题:使用finetune_cogvlm_demo.py 跑几张图,跑了100步,发现loss一直是0,validation pred 文字也一直没有任何变化。 训练参数在原基础上改了batch_size=1,MP_num=1, 其它都没碰。因为vram不够,可训练参数没有包含lora,仅包含vit mlp 和 ptuning。 会是什么问题呢? |
Beta Was this translation helpful? Give feedback.
-
请问怎么基于Lora微调的模型再进行Lora微调。我已经基于base模型lora微调了一个模型并保存了CKPT,我想加载这个CKPT后并联上Lora层,再在其他数据集上做微调,请问如何配置? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
8卡A100运行finetune_cogagent_lora.sh爆显存.... |
Beta Was this translation helpful? Give feedback.
-
想问一下有人试过 ,quant4 量化微调CogAgent嘛?我报维度错误:File "/root/miniconda3/envs/CogVLM/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 97, in init |
Beta Was this translation helpful? Give feedback.
-
能否说明一下这些参数具体是指哪些模块:enable = ["encoder", "cross_attention", "linear_proj", 'mlp.vision', 'rotary.vision', 'eoi', 'boi', 'vit'],如果要冻结是不是仅仅在这里设置就可以了 |
Beta Was this translation helpful? Give feedback.
-
微调如果要从没有merge过的checkpoint应该如何继续呢? |
Beta Was this translation helpful? Give feedback.
-
请问多轮对话如何把多个问答拼接起来?如何修改dataset.py呢? |
Beta Was this translation helpful? Give feedback.
-
请问 finetune cogagent _demo 代码中使用的data_collator 返回的信息如下: 但是我看cogagent模型的forward 方法中接受的参数并不是这些,好奇中间还有哪里做了什么处理吗? |
Beta Was this translation helpful? Give feedback.
-
请问如何把4卡并行lora微调出来的模型合并成一个模型。 我加载了开源的cogvlm-base-490模型,使用MP_SIZE=4做了lora微调,保存了一个CKPT,包含4个文件(mp_rank_00_model_states.pt到mp_rank_03_model_states.pt )。请问我如何把模型合并成MP_SIZE=1的模型并存储到一个文件呢。谢谢 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I have a problem but no idea. Here is the log. (cogvlm) dhu_mbzhao_1@deeplearning-v191204-deeplearn: Could someone help me? |
Beta Was this translation helpful? Give feedback.
-
what is this mean? Is this log impact the performance? Keyword arguments {'add_special_tokens': False} not recognized. |
Beta Was this translation helpful? Give feedback.
-
My test results are as follows, is this considered good or bad? [2024-07-19 09:38:21,589] [INFO] [RANK 0] validation loss at the end of training for test data | loss: 0.000000E+00 | PPL: 1.000000E+00 acc 5.319865E-02 | acc_w/o_case 5.319865E-02 | |
Beta Was this translation helpful? Give feedback.
-
微调Cogagent的时候发现模型eval的过程中输出的pred是模型的回答,但是相对应的label却是问的问题,这是正常的嘛 |
Beta Was this translation helpful? Give feedback.
-
当我使用我自己构建的数据集,迭代一千次后loss很低,但是实际对话效果很差,经常出现错误的检测,是数据集的问题还是超参数设置的问题,用的base-490的模型进行lora微调 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Please articulate any questions about model fine-tuning here; these questions will be answered by community members and officials in their spare time.
在这里阐述任何关于模型微调的问题,这些问题将由社区成员和官方在空闲的时间进行回答
Beta Was this translation helpful? Give feedback.
All reactions