[Userbenchmark] Failed to execute userbenchmark/distributed because Accelerate object has no attribute 'use_fp16'

Hi, there!
This is the first time reporting issues to this repo, so please give me some advice! ;)

I found the error when executing userbenchmark/distributed.

```bash
$ python run_benchmark.py distributed \
  --ngpus 1 \
  --nodes 1  \
  --model torchbenchmark.e2e_models.hf_bert.Model \
  --trainer torchbenchmark.util.distributed.trainer.Trainer \
  --distributed ddp \
  --job_dir $PWD/.userbenchmark/distributed/e2e_hf_bert \
  --profiler False
/home/aztecher/benchmark/.userbenchmark/distributed/e2e_hf_bert/ad50a4d731e440c6ac57b8122b2143ce_init
Traceback (most recent call last):
  File "/home/aztecher/benchmark/run_benchmark.py", line 48, in <module>
    run()
  File "/home/aztecher/benchmark/run_benchmark.py", line 41, in run
    benchmark.run(bm_args)
  File "/home/aztecher/benchmark/userbenchmark/distributed/run.py", line 28, in run
    result = slurm_run(args, model_args)
  File "/home/aztecher/benchmark/userbenchmark/distributed/run.py", line 92, in slurm_run
    result = job.results()
  File "/home/aztecher/bench/lib64/python3.9/site-packages/submitit/core/core.py", line 294, in results
    raise job_exception  # pylint: disable=raising-bad-type
submitit.core.utils.FailedJobError: Job (task=0) failed during processing with trace:
----------------------
Traceback (most recent call last):
  File "/home/aztecher/bench/lib64/python3.9/site-packages/submitit/core/submission.py", line 55, in process_job
    result = delayed.result()
  File "/home/aztecher/bench/lib64/python3.9/site-packages/submitit/core/utils.py", line 137, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "/home/aztecher/benchmark/torchbenchmark/util/distributed/submit.py", line 134, in __call__
    return trainer_class(
  File "/home/aztecher/benchmark/torchbenchmark/util/distributed/trainer.py", line 33, in __init__
    self.e2e_benchmark: E2EBenchmarkModel = model_class(
  File "/home/aztecher/benchmark/torchbenchmark/util/e2emodel.py", line 9, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/aztecher/benchmark/torchbenchmark/e2e_models/hf_bert/__init__.py", line 114, in __init__
    self.prep(hf_args)
  File "/home/aztecher/benchmark/torchbenchmark/e2e_models/hf_bert/__init__.py", line 175, in prep
    tokenizer, pad_to_multiple_of=(8 if accelerator.use_fp16 else None)
AttributeError: 'Accelerator' object has no attribute 'use_fp16'
```


When I checked the implementation of [huggingface/accelerate](https://github.com/huggingface/accelerate), I found that [Accelerator class](https://github.com/huggingface/accelerate/blob/main/src/accelerate/accelerator.py#L169) doesn't already have the attribute `use_fp16`(https://github.com/huggingface/accelerate/pull/3098).  
And since the version of the `accelerate` module is not fixed in [requirements.txt](https://github.com/pytorch/benchmark/blob/main/torchbenchmark/e2e_models/hf_bert/requirements.txt) currently, if you try to do this benchmark, you will face the same issue.


I guess that using [accelerator.state.mixed_precision](https://github.com/huggingface/accelerate/blob/main/src/accelerate/state.py#L972C1-L984C31) will be an alternation of using that property, like bellow.


```bash
            self.data_collator = DataCollatorWithPadding(
                tokenizer, pad_to_multiple_of=(8 if accelerator.state.mixed_precision == "fp16" else None)

```

And in my environment, this will work fine.

What do you think of this implementation? If ok, I will create PR to fix this issue.
Thanks.


**How to reproduce**

- OS: Rocky Linux 9.5
- Python: 3.9.21
- （NVIDIA Driver version: 560.28.03）
- （CUDA version: 12.6）
- （Slurm version: 24.05.3-1）

```bash
# Suppose this node is already configured as a slurm worker.

# Setup env
$ python -m venv bench
$ source bench/bin/activate

# Clone pytorch/benchmark and install requirements
(bench) $ git clone https://github.com/pytorch/benchmark.git
(bench) $ cd benchmark
(bench) $ pip install -r requirements.txt
(bench) $ pip install -r torchbenchmark/e2e_models/hf_bert/requirements.txt


# Tool / module versions
(bench) $ python --version
Python 3.9.21
(bench) $ pip --version
pip 21.3.1 from /home/mmichish/bench/lib64/python3.9/site-packages/pip (python 3.9)
(bench) $ pip list |grep accelerate
accelerate               1.3.0

# Run benchmark
(bench) $ python run_benchmark.py distributed \
  --ngpus 1 \
  --nodes 1  \
  --model torchbenchmark.e2e_models.hf_bert.Model \
  --trainer torchbenchmark.util.distributed.trainer.Trainer \
  --distributed ddp \
  --job_dir $PWD/.userbenchmark/distributed/e2e_hf_bert \
  --profiler False
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Userbenchmark] Failed to execute userbenchmark/distributed because Accelerate object has no attribute 'use_fp16' #2593

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Userbenchmark] Failed to execute userbenchmark/distributed because Accelerate object has no attribute 'use_fp16' #2593

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions