Skip to content

Conversation

678098
Copy link
Collaborator

@678098 678098 commented Jun 23, 2025

Problem

Multi-thread access to object pools might become slow due to thread safety mechanisms used in the pool implementation.

Batching

One of the way of dealing with this problem is to acquire/release more objects from the pool at a time (use batches of objects). This PR introduces bmqc::BatchedObjectPool class that operates bdlcc::ObjectPool under the hood but gets objects in batches. These batches are hidden from the library users who can operate smaller objects as before. As a result, batching allows to reduce thread contention significantly while also benefiting from the thread safety mechanisms existing in bdlcc::ObjectPool.

Benchmarks

amd64

Run on (48 X 3000.24 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 256 KiB (x24)
  L3 Unified 30720 KiB (x2)
Load Average: 11.57, 6.59, 3.37
------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
bdlcc::ObjectPool threads=1             /iterations:1       49.9 ms        0.010 ms            1
bdlcc::ObjectPool threads=4             /iterations:1       2100 ms        0.014 ms            1
bdlcc::ObjectPool threads=10            /iterations:1       5376 ms        0.018 ms            1
bmqc::BatchedPool threads=1   batch=1   /iterations:1       62.4 ms        0.008 ms            1
bmqc::BatchedPool threads=4   batch=1   /iterations:1       2052 ms        0.015 ms            1
bmqc::BatchedPool threads=10  batch=1   /iterations:1       7899 ms        0.016 ms            1
bmqc::BatchedPool threads=1   batch=32  /iterations:1       11.3 ms        0.007 ms            1
bmqc::BatchedPool threads=4   batch=32  /iterations:1       75.6 ms        0.007 ms            1
bmqc::BatchedPool threads=10  batch=32  /iterations:1        256 ms        0.017 ms            1
bmqc::BatchedPool threads=1   batch=128 /iterations:1       10.2 ms        0.007 ms            1
bmqc::BatchedPool threads=4   batch=128 /iterations:1       26.1 ms        0.007 ms            1
bmqc::BatchedPool threads=10  batch=128 /iterations:1       65.8 ms        0.009 ms            1
bmqc::BatchedPool threads=64  batch=128 /iterations:1       1039 ms        0.016 ms            1
bmqc::BatchedPool threads=128 batch=128 /iterations:1       1837 ms        0.026 ms            1
bmqc::BatchedPool threads=256 batch=128 /iterations:1       3655 ms        0.040 ms            1

Mac M2 Darwin

Run on (12 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x12)
Load Average: 16.91, 13.45, 7.10
------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
bdlcc::ObjectPool threads=1             /iterations:1       31.8 ms        0.009 ms            1
bdlcc::ObjectPool threads=4             /iterations:1        922 ms        0.009 ms            1
bdlcc::ObjectPool threads=10            /iterations:1       4913 ms        0.027 ms            1
bmqc::BatchedPool threads=1   batch=1   /iterations:1       61.2 ms        0.007 ms            1
bmqc::BatchedPool threads=4   batch=1   /iterations:1        661 ms        0.010 ms            1
bmqc::BatchedPool threads=10  batch=1   /iterations:1       3734 ms        0.031 ms            1
bmqc::BatchedPool threads=1   batch=32  /iterations:1       21.3 ms        0.007 ms            1
bmqc::BatchedPool threads=4   batch=32  /iterations:1       29.1 ms        0.007 ms            1
bmqc::BatchedPool threads=10  batch=32  /iterations:1        161 ms        0.012 ms            1
bmqc::BatchedPool threads=1   batch=128 /iterations:1       19.1 ms        0.005 ms            1
bmqc::BatchedPool threads=4   batch=128 /iterations:1       21.1 ms        0.007 ms            1
bmqc::BatchedPool threads=10  batch=128 /iterations:1       54.1 ms        0.016 ms            1
bmqc::BatchedPool threads=64  batch=128 /iterations:1        256 ms        0.099 ms            1
bmqc::BatchedPool threads=128 batch=128 /iterations:1        471 ms        0.101 ms            1
bmqc::BatchedPool threads=256 batch=128 /iterations:1        931 ms        0.360 ms            1

Conclusions

  1. bmqc::BatchedPool threads=1 batch=1 works 2x times slower than the original object pool as expected, because we have the introduced batch overhead, but don't benefit from the batch size (here the batch size is 1 so it is equal to the original object pool).

  2. bmqc::BatchedPool threads=1 batch=32 works 1.5x times faster than the original object pool. We have the batch overhead but also we don't suffer that much from thread safety mechanisms used in the original object pool.

  3. bmqc::BatchedPool threads=4 batch=32 works 30x times faster than the corresponding benchmark of the original object pool. The same with batch=128.

  4. bmqc::BatchedPool threads=10 batch=128 works 90x times faster than the corresponding benchmark of the original object pool.

  5. Note that CPU cannot really process 64, 128 or 256 threads at the same time in the last benchmarks, so the actual work time increases linearly to the number of threads. It also means that even in these conditions the batched pool doesn't suffer that much from the thread contention problem as the original object pool.

@678098 678098 requested a review from a team as a code owner June 23, 2025 15:17
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 2781 of commit b5613d7 has completed with FAILURE

@678098 678098 force-pushed the 250623_BatchedPool branch 3 times, most recently from 813e080 to db18498 Compare June 23, 2025 21:19
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 2784 of commit db18498 has completed with FAILURE

@678098 678098 force-pushed the 250623_BatchedPool branch from db18498 to a56df7a Compare June 25, 2025 21:04
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 2808 of commit a56df7a has completed with FAILURE

Signed-off-by: Evgeny Malygin <[email protected]>
@678098 678098 force-pushed the 250623_BatchedPool branch from a56df7a to f8f5e51 Compare July 18, 2025 16:55
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 2913 of commit f8f5e51 has completed with FAILURE

@678098 678098 force-pushed the 250623_BatchedPool branch from f8f5e51 to ad082bb Compare July 21, 2025 14:34
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 2917 of commit ad082bb has completed with FAILURE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant