Replies: 1 comment 1 reply
-
Just wanted to add that I upgraded to torchtune 0.6 and getting same issue and can't figure out what's wrong. Any help or suggestion would be greatly appreciated! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I had fine tuning working with Llama3.1 8B on an earlier version of torchtune but after upgrading torchtune to 0.5 cannot get it to run again. I tried grabbing the new recipe (this is single GPU, LoRA) and updating my recipe with the new parameters but am now getting this error. This is an instruct dataset:
`INFO:torchtune.utils._logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:
batch_size: 1
checkpointer:
component: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /data/HF-llama3.1-8b-instruct/
checkpoint_files:
model_type: LLAMA3
output_dir: /data/tuned_model
recipe_checkpoint: recipe_state.pt
compile: false
dataset:
column_map:
input: prompt
output: response
data_files: /data/torchtune/dataset/instruct/parquet/train-algol-manual-vol1-instruct-0003.parquet
new_system_prompt: Are are an AI assistant who provides helpful and accurate answers
to questions.
source: parquet
split: train
train_on_input: true
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: true
epochs: 4
gradient_accumulation_steps: 4
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
component: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
component: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
num_warmup_steps: 100
max_steps_per_epoch: null
metric_logger:
component: torchtune.training.metric_logging.DiskLogger
log_dir: /data/tuned-model/logs
model:
component: torchtune.models.llama3_1.lora_llama3_1_8b
apply_lora_to_mlp: true
apply_lora_to_output: false
lora_alpha: 16
lora_attn_modules:
lora_dropout: 0.0
lora_rank: 8
optimizer:
component: torch.optim.AdamW
lr: 0.0003
weight_decay: 0.01
optimizer_in_bwd: false
output_dir: /data/tuned-model
profiler:
component: torchtune.training.setup_torch_profiler
active_steps: 2
cpu: true
cuda: true
enabled: false
num_cycles: 1
output_dir: /data/tuned-model/logs
profile_memory: false
record_shapes: true
wait_steps: 5
warmup_steps: 3
with_flops: false
with_stack: false
resume_from_checkpoint: false
save_adapter_weights_only: false
seed: null
shuffle: true
tokenizer:
component: torchtune.models.llama3.llama3_tokenizer
max_seq_len: null
path: /data/HF-llama3.1-8b-instruct/original/tokenizer.model
DEBUG:torchtune.utils._logging:Setting manual seed to local seed 3130938658. Local seed is seed + rank = 3130938658 + 0
Writing logs to /data/tuned-model/logs/log_1742309993.txt
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
INFO:torchtune.utils._logging:Memory stats after model init:
GPU peak memory allocation: 15.06 GiB
GPU peak memory reserved: 15.18 GiB
GPU peak memory active: 15.06 GiB
INFO:torchtune.utils._logging:Tokenizer is initialized from file.
INFO:torchtune.utils._logging:Optimizer and loss are initialized.
INFO:torchtune.utils._logging:Loss is initialized.
INFO:torchtune.utils._logging:Dataset and Sampler are initialized.
INFO:torchtune.utils._logging:Learning rate scheduler is initialized.
WARNING:torchtune.utils._logging: Profiling disabled.
INFO:torchtune.utils._logging: Profiler config after instantiation: {'enabled': False}
1|25|Loss: 2.439957618713379: 2%|████ | 25/1049 [00:27<18:16, 1.07s/it]Traceback (most recent call last):
File "/data/pe/bin/tune", line 8, in
sys.exit(main())
^^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/_cli/tune.py", line 49, in main
parser.run(args)
File "/data/pe/lib/python3.12/site-packages/torchtune/_cli/tune.py", line 43, in run
args.func(args)
File "/data/pe/lib/python3.12/site-packages/torchtune/_cli/run.py", line 214, in _run_cmd
self._run_single_device(args, is_builtin=is_builtin)
File "/data/pe/lib/python3.12/site-packages/torchtune/_cli/run.py", line 108, in _run_single_device
runpy.run_path(str(args.recipe), run_name="main")
File "", line 286, in run_path
File "", line 98, in _run_module_code
File "", line 88, in _run_code
File "/data/pe/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 803, in
sys.exit(recipe_main())
^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/config/_parse.py", line 99, in wrapper
sys.exit(recipe_main(conf))
^^^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 798, in recipe_main
recipe.train()
File "/data/pe/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 678, in train
for idx, batch in enumerate(self._dataloader):
File "/data/pe/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 708, in next
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/datasets/_concat.py", line 90, in getitem
return dataset[index - start]
~~~~~~~^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/datasets/_sft.py", line 118, in getitem
return self._prepare_sample(sample)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/datasets/_sft.py", line 125, in _prepare_sample
tokenized_dict = self._model_transform(transformed_sample)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/models/llama3/_tokenizer.py", line 345, in call
tokens, mask = self.tokenize_messages(messages, add_end_tokens=not inference)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/models/llama3/_tokenizer.py", line 308, in tokenize_messages
tokenized_message = self.tokenize_message(
^^^^^^^^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/models/llama3/_tokenizer.py", line 255, in tokenize_message
tokenized_body = self._tokenize_body(message)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/pe/lib/python3.12/site-packages/torchtune/models/llama3/_tokenizer.py", line 224, in _tokenize_body
item["content"].strip(), add_bos=False, add_eos=False
^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'
1|25|Loss: 2.439957618713379: 2%|████ `
Beta Was this translation helpful? Give feedback.
All reactions