-
Notifications
You must be signed in to change notification settings - Fork 182
Open
Description
Hello, thank you for this wonderful tool.
I was trying to simulate the cycle count of an H100 performing GEMM, however I only have access to an A40 GPU.
So I traced gemm (tencore and normal) from Deepbench_nvidia with the A40 GPU, then tried to run it with the H100-SASS config.
The H100 configs I gathered from these posts:
#344
accel-sim/gpgpu-sim_distribution#80
The simulation ultimately failed with SEGF, and I learned from #161 that it was to be expected.
The error log:
gemm_bench-inference_half_10_10_10_0_0--H100-SASS. Status=SEGF
Last 10 line of /home/gaohengsiang/accelsim/accel-sim-framework/util/job_launching/../../sim_run_12.0/gemm_bench/inference_half_10_10_10_0_0/H100-SASS/gemm_bench-inference_half_10_10_10_0_0.accelsim-commit-3c96d32_modified_0.0_25-11-25-22-09-15gpgpu-sim_git-commit-b18ee397_modified_0.0.o150
------------------
thread block = 0,0,1
GPGPU-Sim uArch: Shader 64 bind to kernel 7 '_ZN7cutlass6KernelI58cutlass_80_wmma_tensorop_h161616gemm_16x16_128x2_nn_align2EEvNT_6ParamsE'
thread block = 0,0,0
GPGPU-Sim: Reconfigure L1 cache to 96KB
GPGPU-Sim uArch: Shader 63 bind to kernel 7 '_ZN7cutlass6KernelI58cutlass_80_wmma_tensorop_h161616gemm_16x16_128x2_nn_align2EEvNT_6ParamsE'
launching kernel name: _ZN7cutlass6KernelI58cutlass_80_wmma_tensorop_h161616gemm_16x16_128x2_nn_align2EEvNT_6ParamsE uid: 7 cuda_stream_id: 0
Header info loaded for kernel command : ./traces/kernel-7-ctx_0x6245c0f86880.traceg.xz
-enable lineinfo = 0
-accelsim tracer version = 5
-nvbit version = 1.7.6
------------------
Contents of /home/gaohengsiang/accelsim/accel-sim-framework/util/job_launching/../../sim_run_12.0/gemm_bench/inference_half_10_10_10_0_0/H100-SASS/gemm_bench-inference_half_10_10_10_0_0.accelsim-commit-3c96d32_modified_0.0_25-11-25-22-09-15gpgpu-sim_git-commit-b18ee397_modified_0.0.e150
------------------
/home/gaohengsiang/accelsim/accel-sim-framework/util/job_launching/../../sim_run_12.0/gemm_bench/inference_half_10_10_10_0_0/H100-SASS/slurm.sim: line 54: 3984007 Segmentation fault (core dumped)
/home/gaohengsiang/accelsim/accel-sim-framework/util/job_launching/../../sim_run_12.0/gpgpu-sim-builds/accelsim-commit-3c96d32_modified_0.0_25-11-25-22-09-15gpgpu-sim_git-commit-b18ee397_modified_0.0/accel-sim.out -config ./gpgpusim.config -trace ./traces/kernelslist.g
All jobs seemed to SEGF at kernel-7, and mentioned something related to wmma.
This led me to two questions:
- Since the
h100-testbranch was closed, is the H100 config usable at all, or is it currently unsalvageable? - Is there a way to approximate H100 performance by modifying the A100 config? (simulation with A100 config ran fine)
Metadata
Metadata
Assignees
Labels
No labels