Skip to content

Commit 4940121

Browse files
authored
Update numactl-utility.md (#58) (#59)
1 parent f194663 commit 4940121

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

docs/source/debugging-optimizing/numactl-utility.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,20 @@ limitations under the License.
1616

1717
# Using the numactl Utility to Control Resource Utilization with the DeepSparse Engine
1818

19-
The DeepSparse Engine works best when run on a single socket and with hyper-threading disabled. One standard way of controlling compute/memory resources when running processes is to use the **numactl** utility. **numactl** can be used when multiple processes need to run on the same hardware but require their own CPU/memory resources to run optimally.
19+
The DeepSparse Engine achieves better performance on multiple-socket systems as well as with hyperthreading disabled; models with larger batch sizes are likely to see an improvement. One standard way of controlling compute/memory resources when running processes is to use the **numactl** utility. **numactl** can be used when multiple processes need to run on the same hardware but require their own CPU/memory resources to run optimally.
2020

2121
To run the DeepSparse Engine on a single socket (N) of a multi-socket system, you would start the DeepSparse Engine using **numactl**. For example:
2222

2323
```bash
2424
numactl --cpunodebind N <deepsparseengine-process>
2525
```
2626

27+
To run the DeepSparse Engine on multiple sockets (N,M), run:
28+
29+
```bash
30+
numactl --cpunodebind N,M <deepsparseengine-process>
31+
```
32+
2733
It is advised to also allocate memory from the same socket on which the engine is running. So, `--membind` or `--preferred` should be used when using `--cpunodebind.` For example:
2834

2935
```bash
@@ -44,7 +50,10 @@ Given the architecture above, to run the DeepSparse Engine on the first four CPU
4450
numactl --physcpubind 8-11 --preferred 1 <deepsparseengine-process>
4551
```
4652

47-
Note that `--preferred 1` is needed here since the DeepSparse Engine is being bound to CPUs on the second socket.
53+
Appending `--preferred 1` is needed here since the DeepSparse Engine is being bound to CPUs on the second socket.
54+
55+
Note that using more than two sockets may not offer improvements over two sockets; if you have options, try different scenarios to see which setup is ideal for your use case. For batch size considerations, use an amount that is evenly divisible by the number of sockets you intend to use.
56+
4857

4958
## DeepSparse Engine and Thread Pinning
5059

@@ -60,8 +69,6 @@ However, the engine works best when threads are pinned (i.e., not allowed to mig
6069

6170
`NM_BIND_THREADS_TO_CORES` should be used with care since it forces the DeepSparse Engine to run on only the threads it has been allocated at startup. If any other process ends up running on the same threads, it could result in a major degradation of performance.
6271

63-
When using server mode with multiple engines, it is advisable to keep thread pinning disabled.
64-
6572
**Note:** The threads-to-cores mappings described above are specific to Intel only. AMD has a different mapping. For AMD, all the threads for a single core are consecutive, i.e., if each core has two threads and there are N cores, the threads for a particular core K are 2*K and 2*K+1. The mapping of cores to sockets is also straightforward, for a N socket system with C cores per socket, the cores for a particular socket S are numbered S*C to ((S+1)*C)-1.
6673

6774
## Additional Notes

0 commit comments

Comments
 (0)