AmpereOne A192-32X (Supermicro)

![DSC01611](https://github.com/user-attachments/assets/4099161d-e6ae-487e-8c44-f80e0277513c)

## Basic information

  - Board URL (official): https://www.supermicro.com/en/products/system/megadc/2u/ars-211me-fnr
  - Board purchased from: (Provided by Ampere/Supermicro)
  - Board purchase date: Oct 14, 2024
  - Board specs (as tested): A192-32X, 512GB DDR5 (5200 MT/s)
  - Board price (as tested): (If you have to ask...)

## Linux/system information

```
# output of `screenfetch`
ubuntu@ubuntu:~$ screenfetch 
                          ./+o+-       ubuntu@ubuntu
                  yyyyy- -yyyyyy+      OS: Ubuntu 24.04 noble
               ://+//////-yyyyyyo      Kernel: aarch64 Linux 6.8.0-39-generic-64k
           .++ .:/++++++/-.+sss/`      Uptime: 23m
         .:++o:  /++++++++/:--:/-      Packages: 810
        o:+o+:++.`..```.-/oo+++++/     Shell: bash 5.2.21
       .:+o:+o/.          `+sssoo+/    Disk: 19G / 101G (20%)
  .++/+:+oo+o:`             /sssooo.   CPU: Ampere Ampere-1a @ 192x 3.2GHz
 /+++//+:`oo+o               /::--:.   GPU: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52)
 \+/+o+++`o++o               ++////.   RAM: 31390MiB / 522867MiB
  .++.o+++oo+:`             /dddhhh.  
       .+.o+oo:.          `oddhhhh+   
        \+.++o+o``-````.:ohdhhhhh+    
         `:o+++ `ohhhhhhhhyo++os:     
           .o:`.syhhhhhhh/.oo++o`     
               /osyyyyyyo++ooo+++/    
                   ````` +oo+++o\:    
                          `oo++.     

# output of `uname -a`
Linux ubuntu 6.8.0-39-generic-64k #39-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul  6 11:08:16 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
```

## Benchmark results

### CPU

  - Geekbench 6: 1309 single / 15160 multi ([result](https://browser.geekbench.com/v6/cpu/8602914) lower performance though, see [Geekbench w/ page sizes above 4K](http://support.primatelabs.com/discussions/geekbench/86728-cant-run-geekbench-6-arm-preview-on-ampereone-192-core-system) and [STH article](https://www.servethehome.com/a-reminder-that-geekbench-6-is-not-for-big-cpus/))
  - Geekbench 5: 958 single / 80639 multi ([result](https://browser.geekbench.com/v5/cpu/23018587))
  - 3,027 Gflops at 692W / 4.37 Gflops/W ([geerlingguy/top500-benchmark](https://github.com/geerlingguy/top500-benchmark) [HPL result](https://github.com/geerlingguy/top500-benchmark/issues/43))

### Power

  - Idle power draw (at wall): 199 W (30 W CPU / 78 W IO - 108W SoC package power from `sensors`)
  - Maximum simulated power draw (`stress-ng --matrix 0`): 500 W
  - During Geekbench multicore benchmark: 300-600 W (depending on Geekbench version)
  - During `top500` HPL benchmark: 692 W

### Disk

#### Samsung NVMe SSD - 983 DCT M.2 960GB

[comment]: # (Run `lsblk -o NAME,FSTYPE,LABEL,MOUNTPOINT,SIZE,MODEL` to get model)

| Benchmark                  | Result |
| -------------------------- | ------ |
| iozone 4K random read      | 50.35 MB/s |
| iozone 4K random write     | 216.04 MB/s |
| iozone 1M random read      | 2067.82 MB/s |
| iozone 1M random write     | 1295.13 MB/s |
| iozone 1M sequential read  | 2098.31 MB/s |
| iozone 1M sequential write | 1291.07 MB/s |

```
wget https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh
chmod +x disk-benchmark.sh
sudo MOUNT_PATH=/ TEST_SIZE=1g ./disk-benchmark.sh
```

#### Samsung NVMe SSD - MZQL21T9HCJR-00A07

Specs: https://semiconductor.samsung.com/ssd/datacenter-ssd/pm9a3/mzql21t9hcjr-00a07/

##### Single disk

| Benchmark                  | Result |
| -------------------------- | ------ |
| iozone 4K random read      | 60.19 MB/s |
| iozone 4K random write     | 284.72 MB/s |
| iozone 1M random read      | 3777.29 MB/s |
| iozone 1M random write     | 2686.80 MB/s |
| iozone 1M sequential read  | 3773.44 MB/s |
| iozone 1M sequential write | 2680.90 MB/s |

##### RAID 0 (mdadm)

| Benchmark                  | Result |
| -------------------------- | ------ |
| iozone 4K random read      | 58.05 MB/s |
| iozone 4K random write     | 250.06 MB/s |
| iozone 1M random read      | 5444.03 MB/s |
| iozone 1M random write     | 4411.07 MB/s |
| iozone 1M sequential read  | 7120.75 MB/s |
| iozone 1M sequential write | 4458.30 MB/s |

### Network

`iperf3` results:

  - `iperf3 -c $SERVER_IP`: 21.4 Gbps
  - `iperf3 -c $SERVER_IP --reverse`: 18.8 Gbps
  - `iperf3 -c $SERVER_IP --bidir`: 8.08 Gbps up, 22.2 Gbps down

Tested on one of the two built-in Broadcom `BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller` interfaces, to my HL15 Arm NAS (see: https://github.com/geerlingguy/arm-nas/issues/16), routed through a Mikrotik 25G Cloud Router.

## GPU

Did not test - this server doesn't have a GPU, just the ASPEED integrated BMC VGA graphics, which are *not* suitable for much GPU-accelerated gaming or LLMs, lol. Just render it on CPU!

## Memory

`tinymembench` results:

<details>
<summary>Click to expand memory benchmark result</summary>

```
tinymembench v0.4.10 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :  14199.7 MB/s (0.3%)
 C copy backwards (32 byte blocks)                    :  13871.7 MB/s
 C copy backwards (64 byte blocks)                    :  13879.6 MB/s (0.2%)
 C copy                                               :  13890.6 MB/s (0.2%)
 C copy prefetched (32 bytes step)                    :  14581.4 MB/s
 C copy prefetched (64 bytes step)                    :  14613.8 MB/s
 C 2-pass copy                                        :  10819.4 MB/s
 C 2-pass copy prefetched (32 bytes step)             :  11313.6 MB/s
 C 2-pass copy prefetched (64 bytes step)             :  11417.4 MB/s
 C fill                                               :  31260.2 MB/s
 C fill (shuffle within 16 byte blocks)               :  31257.1 MB/s
 C fill (shuffle within 32 byte blocks)               :  31263.1 MB/s
 C fill (shuffle within 64 byte blocks)               :  31260.9 MB/s
 NEON 64x2 COPY                                       :  14464.3 MB/s (0.9%)
 NEON 64x2x4 COPY                                     :  13694.9 MB/s
 NEON 64x1x4_x2 COPY                                  :  12444.6 MB/s
 NEON 64x2 COPY prefetch x2                           :  14886.9 MB/s
 NEON 64x2x4 COPY prefetch x1                         :  14954.4 MB/s
 NEON 64x2 COPY prefetch x1                           :  14892.3 MB/s
 NEON 64x2x4 COPY prefetch x1                         :  14955.5 MB/s
 ---
 standard memcpy                                      :  14141.9 MB/s
 standard memset                                      :  31268.0 MB/s
 ---
 NEON LDP/STP copy                                    :  13775.1 MB/s (0.7%)
 NEON LDP/STP copy pldl2strm (32 bytes step)          :  14267.3 MB/s
 NEON LDP/STP copy pldl2strm (64 bytes step)          :  14340.9 MB/s
 NEON LDP/STP copy pldl1keep (32 bytes step)          :  14670.0 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :  14644.7 MB/s
 NEON LD1/ST1 copy                                    :  13756.1 MB/s
 NEON STP fill                                        :  31262.2 MB/s
 NEON STNP fill                                       :  31265.7 MB/s
 ARM LDP/STP copy                                     :  14454.0 MB/s (0.6%)
 ARM STP fill                                         :  31265.6 MB/s
 ARM STNP fill                                        :  31266.0 MB/s

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.0 ns          /     0.0 ns 
    131072 :    1.1 ns          /     1.6 ns 
    262144 :    1.7 ns          /     2.0 ns 
    524288 :    1.9 ns          /     2.2 ns 
   1048576 :    2.1 ns          /     2.2 ns 
   2097152 :    3.0 ns          /     3.3 ns 
   4194304 :   22.6 ns          /    33.9 ns 
   8388608 :   33.7 ns          /    44.3 ns 
  16777216 :   39.3 ns          /    48.0 ns 
  33554432 :   42.1 ns          /    49.4 ns 
  67108864 :   49.0 ns          /    60.2 ns 

block size : single random read / dual random read, [MADV_HUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.0 ns          /     0.0 ns 
    131072 :    1.1 ns          /     1.6 ns 
    262144 :    1.7 ns          /     2.0 ns 
    524288 :    1.9 ns          /     2.2 ns 
   1048576 :    2.1 ns          /     2.2 ns 
   2097152 :    3.0 ns          /     3.3 ns 
   4194304 :   22.6 ns          /    33.9 ns 
   8388608 :   33.7 ns          /    44.3 ns 
  16777216 :   39.3 ns          /    47.9 ns 
  33554432 :   42.1 ns          /    49.4 ns 
  67108864 :   49.9 ns          /    61.9 ns 
```
</details>

## `sbc-bench` results

Run sbc-bench and paste a link to the results here: https://0x0.st/X0gc.bin

See: https://github.com/ThomasKaiser/sbc-bench/issues/105

## Phoronix Test Suite

Results from [pi-general-benchmark.sh](https://gist.github.com/geerlingguy/570e13f4f81a40a5395688667b1f79af):

  - pts/encode-mp3: 11.248 sec
  - pts/x264 4K: 69.49 fps
  - pts/x264 1080p: 160.75 fps
  - pts/phpbench: 567108
  - pts/build-linux-kernel (defconfig): 50.101 sec

## Additional benchmarks

### QEMU Coremark

The Ampere team have suggested running this, as it will emulate running tons of virtual instances with coremark inside, a good proxy of the type of performance you can get with VMs/containers on this system: https://github.com/AmpereComputing/qemu-coremark

```
ubuntu@ubuntu:~/qemu-coremark$ ./run_pts.sh 2
47 instances of pts/coremark running in parallel in arm64 VMs!
Round 1 - Total CoreMark Score is: 4697344
Round 2 - Total CoreMark Score is: 4684524
```

### llama.cpp (Ampere-optimized)

See: https://github.com/AmpereComputingAI/llama.cpp (I also have an email from Ampere with some testing notes).

### Ollama (generic LLMs)

See: https://github.com/geerlingguy/ollama-benchmark?tab=readme-ov-file#findings

| System | CPU/GPU | Model | Eval Rate |
| :--- | :--- | :--- | :--- |
| AmpereOne A192-32X (192 core - 512GB) | CPU | llama3.2:3b | 23.52 Tokens/s |
| AmpereOne A192-32X (192 core - 512GB) | CPU | llama3.1:8b | 17.47 Tokens/s |
| AmpereOne A192-32X (192 core - 512GB) | CPU | llama3.1:70b | 3.86 Tokens/s |
| AmpereOne A192-32X (192 core - 512GB) | CPU | llama3.1:405b | 0.90 Tokens/s |

### yolo-v5

See: https://github.com/AmpereComputingAI/yolov5-demo (maybe test it on a 4K60 video, see how it fares).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AmpereOne A192-32X (Supermicro) #52

Basic information

Linux/system information

Benchmark results

CPU

Power

Disk

Samsung NVMe SSD - 983 DCT M.2 960GB

Samsung NVMe SSD - MZQL21T9HCJR-00A07

Single disk

RAID 0 (mdadm)

Network

GPU

Memory

`sbc-bench` results

Phoronix Test Suite

Additional benchmarks

QEMU Coremark

llama.cpp (Ampere-optimized)

Ollama (generic LLMs)

yolo-v5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark	Result
iozone 4K random read	50.35 MB/s
iozone 4K random write	216.04 MB/s
iozone 1M random read	2067.82 MB/s
iozone 1M random write	1295.13 MB/s
iozone 1M sequential read	2098.31 MB/s
iozone 1M sequential write	1291.07 MB/s

Benchmark	Result
iozone 4K random read	60.19 MB/s
iozone 4K random write	284.72 MB/s
iozone 1M random read	3777.29 MB/s
iozone 1M random write	2686.80 MB/s
iozone 1M sequential read	3773.44 MB/s
iozone 1M sequential write	2680.90 MB/s

Benchmark	Result
iozone 4K random read	58.05 MB/s
iozone 4K random write	250.06 MB/s
iozone 1M random read	5444.03 MB/s
iozone 1M random write	4411.07 MB/s
iozone 1M sequential read	7120.75 MB/s
iozone 1M sequential write	4458.30 MB/s

System	CPU/GPU	Model	Eval Rate
AmpereOne A192-32X (192 core - 512GB)	CPU	llama3.2:3b	23.52 Tokens/s
AmpereOne A192-32X (192 core - 512GB)	CPU	llama3.1:8b	17.47 Tokens/s
AmpereOne A192-32X (192 core - 512GB)	CPU	llama3.1:70b	3.86 Tokens/s
AmpereOne A192-32X (192 core - 512GB)	CPU	llama3.1:405b	0.90 Tokens/s

Uh oh!

AmpereOne A192-32X (Supermicro) #52

Description

Basic information

Linux/system information

Benchmark results

CPU

Power

Disk

Samsung NVMe SSD - 983 DCT M.2 960GB

Samsung NVMe SSD - MZQL21T9HCJR-00A07

Single disk

RAID 0 (mdadm)

Network

GPU

Memory

sbc-bench results

Phoronix Test Suite

Additional benchmarks

QEMU Coremark

llama.cpp (Ampere-optimized)

Ollama (generic LLMs)

yolo-v5

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`sbc-bench` results