Skip to content

[release-test] A100 3803mnist_hogwild-latency increase 10x on linux.aws.a100 vs [a100-runner] #2573

@atalman

Description

@atalman

Looking at the result of the run for 2.6.0 vs 2.5.1
https://github.com/pytorch/benchmark/actions/runs/12878326305/job/35904096937
Benchmark,pytorch-2.5.1-cuda-12.4,pytorch-2.6.0-cuda-12.4
mnist-cpu_memory,1118.67,1146.76
mnist-gpu_memory,0.0,0.0
mnist-latency,42.46,40.00
mnist_hogwild-cpu_memory,556.57,601.289
mnist_hogwild-gpu_memory,0.0,0.0
mnist_hogwild-latency,671.28,586.02
wlm_cpu_lstm-cpu_memory,885.141,907.066
wlm_cpu_lstm-gpu_memory,0.0,0.0
wlm_cpu_lstm-latency,1266.83,1079.37
wlm_cpu_trans-cpu_memory,852.113,899.531
wlm_cpu_trans-gpu_memory,0.0,0.0
wlm_cpu_trans-latency,1081.98,1078.99
wlm_gpu_lstm-cpu_memory,995.402,954.391
wlm_gpu_lstm-gpu_memory,0.0,0.0
wlm_gpu_lstm-latency,54.78,52.76
wlm_gpu_trans-cpu_memory,1007.86,993.949
wlm_gpu_trans-gpu_memory,0.0,0.0
wlm_gpu_trans-latency,56.41,55.54

Run 2.4.1 vs 2.5.0 (mnist_hogwild only):
https://github.com/pytorch/benchmark/actions/runs/12895573722
Benchmark,pytorch-2.4.1-cuda-12.4,pytorch-2.5.0-cuda-12.4
mnist_hogwild-cpu_memory,561.797,556.758
mnist_hogwild-gpu_memory,0.0,0.0
mnist_hogwild-latency,613.91,610.53

Run 2.5.1 vs 2.6.0 (mnist_hogwild only):
https://github.com/pytorch/benchmark/actions/runs/12894636482
Benchmark,pytorch-2.5.1-cuda-12.4,pytorch-2.6.0-cuda-12.4
mnist_hogwild-cpu_memory,561.73,579.324
mnist_hogwild-gpu_memory,0.0,0.0
mnist_hogwild-latency,592.67,599.23

Comparing mnist_hogwild-latency number with run on A100 hosted on GCP I see 10x difference:
Run 2.4.1 vs 2.5.0:
3803mnist_hogwild-latency ,61.42,62.19

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions