hydra_lm_train.py 사용법

# ❓ Questions & Help

수고 많으십니다. 음성인식을 공부하고 있는 학생입니다. openspeech 사용 중 막힌 부분이 있어 질문드립니다.

1. hydra_lm_train.py 파일을 실행시키면 다음과 같은 오류가 발생했다고 합니다.
tokenizer가 없는 keyword라고 하는데 tokenizer를 지정 안 해주면 지정해주라고 하는데,,, 제 환경이 문제인 걸까요?
2. hydra_lm_train.py 역할이 무엇인가요? language model을 만드는 것인지 Acoustic model에 language model을 붙여서 새로운 모델을 만드는 것인지 궁금합니다. 이게 아니라면 acoustic model에 다른 language model을 붙이는 방법이 있을까요?
3. hydra_lm_train.py 사용 예시 코드를 볼 수 있을까요?

감사합니다.


## Details

--입력
python ./openspeech_cli/hydra_lm_train.py dataset=ksponspeech dataset.dataset_path=C:\Users\lab1080\Desktop\openspeech\KsponSpeech dataset.test_dataset_path=C:\Users\lab1080\Desktop\openspeech\KsponSpeech_eval dataset.test_manifest_dir=C:\Users\lab1080\Desktop\openspeech\KsponSpeech_scripts dataset.manifest_file_path=C:\Users\lab1080\Desktop\openspeech\KSPONSPEECH_AUTO_MANIFEST model=listen_attend_spell lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=cross_entropy tokenizer=kspon_character

--출력
[2024-01-25 13:34:14,597][openspeech.utils][INFO] - dataset:
  dataset: ksponspeech
  dataset_path: C:\Users\lab1080\Desktop\openspeech\KsponSpeech
  test_dataset_path: C:\Users\lab1080\Desktop\openspeech\KsponSpeech_eval
  manifest_file_path: C:\Users\lab1080\Desktop\openspeech\KSPONSPEECH_AUTO_MANIFEST
  test_manifest_dir: C:\Users\lab1080\Desktop\openspeech\KsponSpeech_scripts
  preprocess_mode: phonetic
criterion:
  criterion_name: cross_entropy
  reduction: mean
lr_scheduler:
  lr: 0.0001
  scheduler_name: warmup_reduce_lr_on_plateau
  lr_patience: 1
  lr_factor: 0.3
  peak_lr: 0.0001
  init_lr: 1.0e-10
  warmup_steps: 4000
model:
  model_name: listen_attend_spell
  num_encoder_layers: 3
  num_decoder_layers: 2
  hidden_state_dim: 512
  encoder_dropout_p: 0.3
  encoder_bidirectional: true
  rnn_type: lstm
  joint_ctc_attention: false
  max_length: 128
  num_attention_heads: 1
  decoder_dropout_p: 0.2
  decoder_attn_mechanism: dot
  teacher_forcing_ratio: 1.0
  optimizer: adam
trainer:
  seed: 1
  accelerator: dp
  accumulate_grad_batches: 1
  num_workers: 4
  batch_size: 32
  check_val_every_n_epoch: 1
  gradient_clip_val: 5.0
  logger: wandb
  max_epochs: 20
  save_checkpoint_n_steps: 10000
  auto_scale_batch_size: binsearch
  sampler: else
  name: gpu
  device: gpu
  use_cuda: true
  auto_select_gpus: true
tokenizer:
  sos_token: <sos>
  eos_token: <eos>
  pad_token: <pad>
  blank_token: <blank>
  encoding: utf-8
  unit: kspon_character
  vocab_path: ../../../aihub_labels.csv

[2024-01-25 13:34:14,606][openspeech.utils][INFO] - Operating System : Windows 10
[2024-01-25 13:34:14,606][openspeech.utils][INFO] - Processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
[2024-01-25 13:34:14,607][openspeech.utils][INFO] - CUDA is available : False
[2024-01-25 13:34:14,607][openspeech.utils][INFO] - PyTorch version : 1.13.1+cpu
wandb: Currently logged in as: apg0001 (dguyanglab). Use wandb login --relogin to force relogin
wandb: wandb version 0.16.2 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.15.12
wandb: Run data is saved locally in C:\Users\lab1080\Desktop\openspeech\outputs\2024-01-25\13-34-14\wandb\run-20240125_133417-5gndlwut
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run listen_attend_spell-ksponspeech
wandb:  View project at https://wandb.ai/dguyanglab/listen_attend_spell-ksponspeech
wandb:  View run at https://wandb.ai/dguyanglab/listen_attend_spell-ksponspeech/runs/5gndlwut
Traceback (most recent call last):
File "./openspeech_cli/hydra_lm_train.py", line 45, in hydra_main
data_module.setup(tokenizer=tokenizer)
TypeError: setup() got an unexpected keyword argument 'tokenizer'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hydra_lm_train.py 사용법 #216

❓ Questions & Help

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

hydra_lm_train.py 사용법 #216

Description

❓ Questions & Help

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions