Skip to content

Commit 18d231a

Browse files
committed
GPA 3.17 updates
1 parent ebba99f commit 18d231a

File tree

153 files changed

+5468
-15322
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

153 files changed

+5468
-15322
lines changed

README.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -31,18 +31,16 @@ Prebuilt binaries can be downloaded from the Releases page: https://github.com/G
3131
* Provides access to some raw hardware counters. See [Raw Hardware Counters](#raw-hardware-counters) for more information.
3232

3333
## What's New
34-
### Version 3.16 (07/01/2024)
35-
* Added support for additional RDNA 3 based APUs.
36-
* GPA's OpenCL support has been temporarily disabled on RDNA 3 hardware.
37-
* Updated error checking in counter splitting to report error if counter group max is zero.
38-
* Disabled the following counters on RDNA 3 based hardware due to inconsistent results:
39-
* CBMemRead, CBColorAndMaskRead, CBMemWritten, CBColorAndMaskWritten
40-
* Disabled the following counters on RDNA 2 based hardware due to inconsistent results:
41-
* VsGsVerticesIn, VsGsPrimsIn
42-
* Disabled the following counters on RDNA based hardware due to inconsistent results:
43-
* VsGsSALUBusy, VsGsSALUBusyCycles, VsGsVALUBusy, VsGsVALUBusyCycles, VsGsVALUInstCount, VsGsSALUInstCount, PSVALUBusy, PSVALUBusyCycles, PSVALUInstCount, PSSALUBusy, PSSALUBusyCycles, PSSALUInstCount
44-
* Output from pre_build.py script is now generated into build\|win,linux|\ directory.
45-
* Compiled binaries are now generated into build\output\ directory.
34+
### Version 3.17 (09/20/2024)
35+
* OpenGL: GPA is no longer supporting Adrenalin 19.6.3 and older drivers.
36+
* On all hardware and APIs, the following counters were renamed for clarity:
37+
* CSWavefronts was renamed to CSWavefrontsLaunched
38+
* CSThreads was renamed to CSThreadsLaunched
39+
* CSThreadGroups was renamed to CSThreadGroupsLaunched
40+
* On all hardware and APIs the following counters were removed, there are already matching counters in the GlobalMemory group:
41+
* CSMemUnitBusy, CSMemUnitBusyCycles, CSMemUnitStalled, CSMemUnitStalledCycles, CSWriteUnitStalled, CSWriteUnitStalledCycles
42+
* CSALUStalledByLDS and CSALUStalledByLDSCycles are now based on per-wave cycle counts.
43+
* On Radeon RX 5000 Series and newer hardware, counters in the ComputeShader group now have simplified equations.
4644

4745
## System Requirements
4846
* An AMD Radeon GPU or APU based on Graphics IP version 8 and newer.

RELEASE_NOTES.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
# GPU Performance API Release Notes
22
---
3+
# Version 3.17 (09/20/2024)
4+
* OpenGL: GPA is no longer supporting Adrenalin 19.6.3 and older drivers.
5+
* On all hardware and APIs, the following counters were renamed for clarity:
6+
* CSWavefronts was renamed to CSWavefrontsLaunched
7+
* CSThreads was renamed to CSThreadsLaunched
8+
* CSThreadGroups was renamed to CSThreadGroupsLaunched
9+
* On all hardware and APIs the following counters were removed, there are already matching counters in the GlobalMemory group:
10+
* CSMemUnitBusy, CSMemUnitBusyCycles, CSMemUnitStalled, CSMemUnitStalledCycles, CSWriteUnitStalled, CSWriteUnitStalledCycles
11+
* CSALUStalledByLDS and CSALUStalledByLDSCycles are now based on per-wave cycle counts.
12+
* On Radeon RX 5000 Series and newer hardware, counters in the ComputeShader group now have simplified equations.
13+
314
# Version 3.16 (07/01/2024)
415
* Added support for additional RDNA 3 based APUs.
516
* GPA's OpenCL support has been temporarily disabled on RDNA 3 hardware.

build/cmake_modules/build_flags.cmake

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,22 @@ endif()
3333

3434
if(${build-32bit})
3535
set(CMAKE_SIZEOF_VOID_P 4)
36-
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_x86)
36+
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_x86)
3737
else()
3838
set(CMAKE_SIZEOF_VOID_P 8)
39-
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_x64)
39+
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_x64)
4040
endif()
4141

42+
if(${BUILD_ANDROID})
43+
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_android)
44+
endif()
45+
46+
# START_REMOVE_PIX_DURING_SANITIZATION
47+
if (${GPA_PIX_BUILD})
48+
set(GPA_PIX_BUILD ON)
49+
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_pix)
50+
endif()
51+
# END_REMOVE_PIX_DURING_SANITIZATION
4252

4353
# DX11 variable
4454
if(NOT DEFINED skipdx11)

build/cmake_modules/common.cmake

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
## Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
2-
cmake_minimum_required(VERSION 3.5.1)
1+
## Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All rights reserved.
2+
cmake_minimum_required(VERSION 3.10)
33

44
include (${GPA_CMAKE_MODULES_DIR}/utils.cmake)
55

build/cmake_modules/defs.cmake

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ cmake_minimum_required(VERSION 3.19)
33

44
## Define the GPA version
55
set(GPA_MAJOR_VERSION 3)
6-
set(GPA_MINOR_VERSION 16)
6+
set(GPA_MINOR_VERSION 17)
77
set(GPA_UPDATE_VERSION 0)
88

99
if(NOT DEFINED GPA_BUILD_NUMBER)

build/cmake_modules/targets.cmake

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
1+
## Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All rights reserved.
22
cmake_minimum_required(VERSION 3.10)
33

44
## GPA has only Debug and Release
@@ -98,6 +98,3 @@ endif()
9898
if(NOT ${skipdocs})
9999
add_subdirectory(${GPA_SPHINX_DOCS} ${CMAKE_BINARY_DIR}/${GPA_SPHINX_DOCS_REL_PATH})
100100
endif()
101-
102-
103-

build/dependencies_map.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All rights reserved.
1+
# Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
22
# dependencies_map.py
33
#
44
# Map of GitHub project names to clone target paths, relative to the GPUPerfAPI
@@ -14,10 +14,10 @@
1414
"appsdk" : ["external/Lib/AMD/APPSDK", "55a6940ebc963daec69152314a1bb94943287d4c"],
1515
"opengl" : ["external/Lib/Ext/OpenGL", "792c2291a4443ebef17ca5a7e3e24a1f854f0d1d"],
1616
"windows_kits" : ["external/Lib/Ext/Windows-Kits", "51845a3771122a9dc1406b8617e9a67d9a2f55b6"],
17-
"googletest" : ["external/Lib/Ext/GoogleTest", "191f9336bc9212b5f5410ab663176f685cafed2a"],
17+
"googletest" : ["external/Lib/Ext/GoogleTest", "542e057c6c5bf45454b43764b881397b71164d62"],
1818
# Src.
1919
"adl_util" : ["external/Src/ADLUtil", "d62c94514326775c83fc129bb89d299c8749ebd1"],
20-
"device_info" : ["external/Src/DeviceInfo", "00b23198e748e3d235f249cfee6604fce0d43c29"],
20+
"device_info" : ["external/Src/DeviceInfo", "7379d082f1d8d64c9d1168b84b7f6b2a9702c82f"],
2121
"dynamic_library_module" : ["external/Src/DynamicLibraryModule", "e6451ce26b8509cf724c7cf5d007878791143a58"],
2222
"tsingleton" : ["external/Src/TSingleton", "02e8fa7d98f33cdbd0e1f77d1a8a403a32e35882"],
2323
}

docs/doxygen/DoxyfilePublic

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ PROJECT_NAME = "GPU Perf API"
3131
# This could be handy for archiving the generated documentation or
3232
# if some version control system is used.
3333

34-
PROJECT_NUMBER = 3.16
34+
PROJECT_NUMBER = 3.17
3535

3636
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute)
3737
# base path where the generated documentation will be put.

docs/sphinx/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,9 @@
6161
# built documents.
6262
#
6363
# The short X.Y version.
64-
version = u'3.16'
64+
version = u'3.17'
6565
# The full version, including alpha/beta/rc tags.
66-
release = u'3.16'
66+
release = u'3.17'
6767

6868
# The language for content autogenerated by Sphinx. Refer to documentation
6969
# for a list of supported languages.

docs/sphinx/source/graphics_counter_tables_gfx10.rst

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -113,9 +113,9 @@ ComputeShader Group
113113
:header: "Counter Name", "Usage", "Brief Description"
114114
:widths: 15, 10, 75
115115

116-
"CSThreadGroups", "Items", "Total number of thread groups."
117-
"CSWavefronts", "Items", "The total number of wavefronts used for the CS."
118-
"CSThreads", "Items", "The number of CS threads processed by the hardware."
116+
"CSThreadGroupsLaunched", "Items", "Total number of thread groups launched."
117+
"CSWavefrontsLaunched", "Items", "The total number of wavefronts launched for the CS."
118+
"CSThreadsLaunched", "Items", "The number of CS threads launched and processed by the hardware."
119119
"CSThreadGroupSize", "Items", "The number of CS threads within each thread group."
120120
"CSVALUInsts", "Items", "The average number of vector ALU instructions executed per work-item (affected by flow control)."
121121
"CSVALUUtilization", "Percentage", "The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal - no thread divergence)."
@@ -127,16 +127,10 @@ ComputeShader Group
127127
"CSVALUBusyCycles", "Cycles", "Number of GPU cycles where vector ALU instructions are processed."
128128
"CSSALUBusy", "Percentage", "The percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal)."
129129
"CSSALUBusyCycles", "Cycles", "Number of GPU cycles where scalar ALU instructions are processed."
130-
"CSMemUnitBusy", "Percentage", "The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound)."
131-
"CSMemUnitBusyCycles", "Cycles", "Number of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account."
132-
"CSMemUnitStalled", "Percentage", "The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad)."
133-
"CSMemUnitStalledCycles", "Cycles", "Number of GPU cycles the memory unit is stalled. Try reducing the number or size of fetches and writes if possible."
134-
"CSWriteUnitStalled", "Percentage", "The percentage of GPUTime the write unit is stalled."
135-
"CSWriteUnitStalledCycles", "Cycles", "Number of GPU cycles the write unit is stalled."
136130
"CSGDSInsts", "Items", "The average number of GDS read or GDS write instructions executed per work item (affected by flow control)."
137131
"CSLDSInsts", "Items", "The average number of LDS read/write instructions executed per work-item (affected by flow control)."
138132
"CSALUStalledByLDS", "Percentage", "The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad)."
139-
"CSALUStalledByLDSCycles", "Cycles", "Number of GPU cycles the ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible."
133+
"CSALUStalledByLDSCycles", "Cycles", "Number of GPU cycles each wavefronts' ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible."
140134
"CSLDSBankConflict", "Percentage", "The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad)."
141135
"CSLDSBankConflictCycles", "Cycles", "Number of GPU cycles the LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad)."
142136

docs/sphinx/source/graphics_counter_tables_gfx103.rst

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,12 @@ PreTessellation Group
5959
:header: "Counter Name", "Usage", "Brief Description"
6060
:widths: 15, 10, 75
6161

62+
"PreTessVALUInstCount", "Items", "Average number of vector ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control."
63+
"PreTessSALUInstCount", "Items", "Average number of scalar ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control."
64+
"PreTessVALUBusy", "Percentage", "The percentage of GPUTime vector ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation."
65+
"PreTessVALUBusyCycles", "Cycles", "Number of GPU cycles vector where ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation."
66+
"PreTessSALUBusy", "Percentage", "The percentage of GPUTime scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation."
67+
"PreTessSALUBusyCycles", "Cycles", "Number of GPU cycles where scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation."
6268
"PreTessVerticesIn", "Items", "The number of vertices processed by the VS and HS when using tessellation."
6369

6470
PostTessellation Group
@@ -69,6 +75,12 @@ PostTessellation Group
6975
:widths: 15, 10, 75
7076

7177
"PostTessPrimsOut", "Items", "The number of primitives output by the DS and GS when using tessellation."
78+
"PostTessVALUInstCount", "Items", "Average number of vector ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control."
79+
"PostTessSALUInstCount", "Items", "Average number of scalar ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control."
80+
"PostTessVALUBusy", "Percentage", "The percentage of GPUTime vector ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation."
81+
"PostTessVALUBusyCycles", "Cycles", "Number of GPU cycles vector where ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation."
82+
"PostTessSALUBusy", "Percentage", "The percentage of GPUTime scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation."
83+
"PostTessSALUBusyCycles", "Cycles", "Number of GPU cycles where scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation."
7284

7385
PrimitiveAssembly Group
7486
%%%%%%%%%%%%%%%%%%%%%%%
@@ -101,20 +113,20 @@ ComputeShader Group
101113
:header: "Counter Name", "Usage", "Brief Description"
102114
:widths: 15, 10, 75
103115

104-
"CSThreadGroups", "Items", "Total number of thread groups."
105-
"CSWavefronts", "Items", "The total number of wavefronts used for the CS."
106-
"CSThreads", "Items", "The number of CS threads processed by the hardware."
116+
"CSThreadGroupsLaunched", "Items", "Total number of thread groups launched."
117+
"CSWavefrontsLaunched", "Items", "The total number of wavefronts launched for the CS."
118+
"CSThreadsLaunched", "Items", "The number of CS threads launched and processed by the hardware."
107119
"CSThreadGroupSize", "Items", "The number of CS threads within each thread group."
108-
"CSMemUnitBusy", "Percentage", "The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound)."
109-
"CSMemUnitBusyCycles", "Cycles", "Number of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account."
110-
"CSMemUnitStalled", "Percentage", "The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad)."
111-
"CSMemUnitStalledCycles", "Cycles", "Number of GPU cycles the memory unit is stalled. Try reducing the number or size of fetches and writes if possible."
112-
"CSWriteUnitStalled", "Percentage", "The percentage of GPUTime the write unit is stalled."
113-
"CSWriteUnitStalledCycles", "Cycles", "Number of GPU cycles the write unit is stalled."
120+
"CSVALUInsts", "Items", "The average number of vector ALU instructions executed per work-item (affected by flow control)."
121+
"CSVALUUtilization", "Percentage", "The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal - no thread divergence)."
122+
"CSSALUInsts", "Items", "The average number of scalar ALU instructions executed per work-item (affected by flow control)."
123+
"CSVFetchInsts", "Items", "The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control)."
124+
"CSSFetchInsts", "Items", "The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control)."
125+
"CSVWriteInsts", "Items", "The average number of vector write instructions to the video memory executed per work-item (affected by flow control)."
114126
"CSGDSInsts", "Items", "The average number of GDS read or GDS write instructions executed per work item (affected by flow control)."
115127
"CSLDSInsts", "Items", "The average number of LDS read/write instructions executed per work-item (affected by flow control)."
116128
"CSALUStalledByLDS", "Percentage", "The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad)."
117-
"CSALUStalledByLDSCycles", "Cycles", "Number of GPU cycles the ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible."
129+
"CSALUStalledByLDSCycles", "Cycles", "The average number of GPU cycles the each wavefronts' ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible."
118130
"CSLDSBankConflict", "Percentage", "The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad)."
119131
"CSLDSBankConflictCycles", "Cycles", "Number of GPU cycles the LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad)."
120132

0 commit comments

Comments
 (0)