Skip to content

[Chipyard/Gemmini] Simulation Fails with "ReservationStation.scala" Assertion for Softmax/Layernorm Workloads #395

@xsw632

Description

@xsw632

Background Work

Setup

  • Chipyard Version: 1.10.0
  • Chipyard Commit Hash: 00853c
  • Gemmini Commit Hash: f13847e
  • OS: Ubuntu 22.04.5 LTS (Linux 6.8.0-79-generic, x86_64)
  • Toolchain: Default Chipyard setup as per documentation

Issue Description

Running standard Gemmini baremetal tests works (e.g., tiled_matmul_ws-baremetal), but workloads involving more complex operations (e.g., softmax, layernorm) consistently fail in simulation with an assertion in ReservationStation.scala.

This assertion indicates an invalid entry is being accessed in the reservation station:

assert(entries_st(issue_id).valid)

It seems the reservation station is attempting to issue an entry that is not valid, possibly due to scheduling/queueing logic when handling these micro-ops.

Steps to Reproduce

Successful Run

make CONFIG=GemminiRocketConfig run-binary \
  BINARY=../../generators/gemmini/software/gemmini-rocc-tests/build/bareMetalC/tiled_matmul_ws-baremetal

Failing Run

make CONFIG=GemminiRocketConfig run-binary \
  BINARY=../../generators/gemmini/software/gemmini-rocc-tests/build/bareMetalC/tiled_matmul_ws_softmax-baremetal

Error Log (Failing Case)

/home/mingzhenjia/Desktop/chipyard/sims/vcs/generated-src/chipyard.harness.TestHarness.GemminiRocketConfig/gen-collateral/ReservationStation.sv", 9827:
TestDriver.testHarness.chiptop0.system.tile_prci_domain.tile_reset_domain_tile.gemmini.reservation_station: at time 2877727000 ps
Assertion failed at ReservationStation.scala:479
assert(entries_st(issue_id).valid)

Fatal: .../ReservationStation.sv", 9829:
$finish called at time 2877727000 ps

Log (Successful Case)

Starting gemmini matmul
Cycles taken: 2392
Starting slow CPU matmul
Cycles taken: 3227174
Fatal: ".../TestDriver.v", 147:
$finish called at time 10000000500 ps

Expected Behavior

The simulation should complete normally (print cycles), as with the tiled_matmul_ws-baremetal test.

Request for Help

  1. What conditions might cause ReservationStation.scala to issue an invalid entry for workloads like softmax/layernorm?
  2. Any suggestions for signals to trace or configuration/debugging strategies to narrow down the issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions