MGET support for Redis protocol with OSS Cluster slot-aware routing by fcostaoliveira · Pull Request #397 · redis/memtier_benchmark

fcostaoliveira · 2026-05-27T14:56:26Z

Summary

Implements `write_command_multi_get()` in the Redis protocol layer (was an `assert(0)` stub) — emits valid RESP array wire format for N-key MGET
Enables `--multi-key-get` in OSS Cluster mode: removes the cluster rejection guard and overrides `cluster_client::create_mget_request()` with a slot-aware key selection algorithm
Cluster routing: pre-builds a map from hash slot → key indices, then on each MGET picks all N keys from a single slot. Redis Cluster rejects MGET with a CROSSSLOT error if keys span different slots (even on the same node), so same-slot selection is the correct constraint. No `{tag}` hash-tag prefix needed — keys are identical to those written by a normal SET preload.
Adds four integration test files: `test_mget_standalone.py` (9 tests), `test_mget_cluster.py` (4 tests), `test_mget_protocol.py` (RESP wire-format verification), `test_mget_validation.py` (CLI rejection + regression tests)

Test plan

`TEST=test_mget_standalone.py OSS_STANDALONE=1 ./tests/run_tests.sh` — 9 standalone tests pass
`TEST=test_mget_cluster.py OSS_CLUSTER=1 ./tests/run_tests.sh` — 4 cluster tests pass (no CROSSSLOT errors)
`TEST=test_mget_validation.py OSS_STANDALONE=1 ./tests/run_tests.sh` — CLI rejection, keylist buffer-realloc, partial ratio cycle, and narrow key-range regression tests pass
Full CI suite green (64/64: ASAN, TSAN, UBSan, cluster, TLS, all platform builds)
Verify `--multi-key-get=72` and `--multi-key-get=1000` both work in cluster mode without CROSSSLOT errors
Confirm hit/miss accounting correct in JSON output (`Gets.Hits/sec`, `Gets.Misses/sec`)

Key design notes

Cluster MGET key selection: Redis Cluster returns CROSSSLOT when keys in a multi-key command hash to different slots — even if those slots are on the same node. The implementation pre-builds `m_mget_slot_keys[slot]` (key indices per slot) and `m_mget_conn_slots[conn_id]` (slots owned by each connection). Each MGET picks a single `target_slot` (round-robin over the connection's slots) and draws all N keys from that one slot, satisfying the same-slot constraint without hash tags.

Standalone mode: unchanged — no slot filtering, keys follow the normal key iterator.

🤖 Generated with Claude Code

Note

Medium Risk
Touches cluster request generation, shared cross-thread cache, and hot-path keylist/protocol code; mitigated by extensive new tests and pipeline idle guards.

Overview
This PR turns --multi-key-get from a stub into a working MGET path for Redis and enables it in OSS cluster mode (the old “not supported in cluster” guard is removed).

Protocol and client core: redis_protocol::write_command_multi_get() now emits proper RESP MGET arrays. Standalone client::create_mget_request() skips unavailable keys, refuses empty batches, and advances the GET ratio by the actual key count sent. Failed cluster MGET attempts bump the ratio counter so the pipeline cannot spin forever.

Cluster routing: A shared mget_slot_cache (mutex + one-time slot→key-index scan) plus per-thread cursors and m_mget_conn_slots ensure every MGET uses N keys from one hash slot (same-slot rule, no hash tags required). hold_pipeline() idles connections with no eligible slots; topology refresh rebuilds the cache and schedule_fill() wakes shards.

Config / safety: New validation blocks memcache_binary, --command, and --multi-key-get=0. Run lifecycle allocates/frees mget_cache. keylist::add_key fixes realloc pointer rebasing; a small evbuffer_remove assert fix.

Tests: Large new suites for standalone, cluster, RESP wire format, and CLI/regression; debugPrintMemtierOnError tolerates missing log files.

^{Reviewed by Cursor Bugbot for commit 7af41f8. Bugbot is set up for automated code reviews on this repo. Configure here.}

jit-ci · 2026-05-27T14:57:56Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

Implement write_command_multi_get() in the Redis protocol layer (was an assert(0) stub), add slot-aware MGET key generation for OSS Cluster, and remove the cluster-mode rejection guard that previously blocked --multi-key-get. Cluster mode: each MGET request wraps all N keys in a rotating "{TAG}" hash tag whose CRC16 slot maps to the owning connection, so every wire MGET stays on one shard with no CROSSSLOT errors. Standalone mode is unchanged. Note that cluster MGET keys carry the "{TAG}" prefix, making them incompatible with plain SET-loaded keys; preload with matching format for hit-rate benchmarking. Also adds three integration test files: - test_mget_standalone.py (9 standalone tests: commandstats, 72-key, hit/miss) - test_mget_cluster.py (4 cluster tests: basic, 72-key, cross-shard, no-crossslot) - test_mget_protocol.py (RESP wire-format verification via MONITOR) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… MGET Instead of wrapping every key in "{TAG}" (which breaks compatibility with plain SET-loaded keys), probe the key space for indices whose crc16 slot naturally maps to the target connection. Keys remain in normal memtier-{N} format — compatible with any preloaded data. Computation cost is O(N * num_shards) crc16 calls per MGET request, all inside create_mget_request() which is only entered when --multi-key-get is active. Remove the now-unused generate_key_with_hashtag helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ter probe The help text still referred to the old hash-tag approach; update it to describe the actual slot-probing behaviour (no tag prefix added). Switch the probe loop from raw crc16() to calc_hslot_crc16_with_hash_tag() so that a --key-prefix containing {braces} is hashed the same way Redis itself hashes it, avoiding slot mismatch and spurious MOVED errors. Also guard range==0 (key_maximum < key_minimum misconfiguration) to prevent a divide-by-zero in the modulo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Redis Cluster enforces same hash-SLOT (not just same-node) for MGET. The previous probe loop collected keys by shard, which allowed different slots on the same node to end up in one MGET → guaranteed CROSSSLOT errors on any real cluster. New approach: build_mget_slot_cache() scans [key_minimum, key_maximum] once after handle_cluster_slots() completes (only when --multi-key-get > 0), grouping key indices by their exact hash slot. create_mget_request() then round-robins over the slots owned by conn_id, picking N keys from the same slot per MGET wire command — zero CROSSSLOT risk, O(N) request time instead of O(N × num_shards), and keys remain in plain memtier-NNN format (no tag prefix) for full compatibility with SET-preloaded data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- cluster_client.h: replace >> with > > in nested vector templates to fix compilation on macOS with OpenSSL 1.0.2/1.1 (older clang C++03 mode) - tests/test_mget_protocol.py: pass message= as keyword arg to env.assertTrue/assertEqual to avoid TypeError from RLTest treating the message string as the depth integer parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

RLTest env.assertTrue/assertEqual take depth as the third positional arg, not message. Passing a string as the third positional arg sets depth to a string, making depth+1 crash with TypeError on every assertion call (whether it passes or fails). Fix by using the message= keyword. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…efresh The expensive key-range scan (O(key_max−key_min) CRC16 hashes) was repeated on every CLUSTER SLOTS response. Split build_mget_slot_cache() so the slot→key map is built once (guarded by m_mget_slot_keys_built) and only the topology-dependent conn→slot map is rebuilt on refresh. Also guard client::create_mget_request against an empty keylist reaching send_mget_command when all get_key_for_conn calls return !available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The cluster test docstring incorrectly described hash-tag prefixes; the actual implementation uses a pre-built per-slot key cache (slot-probing). Also remove the redundant `override` on create_mget_request to match the virtual-only style of sibling declarations in cluster_client.h. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Nested templates like vector<vector<T> > require a space between the closing ">>" on macOS builds that compile in C++03 mode. Without Standard: Cpp03, clang-format collapses "> >" to ">>" causing a hard parse error on those builds. Setting Cpp03 makes the format check and the macOS build agree. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

memtier_benchmark always emits ALL STATS.Sets in the JSON output even when --ratio=0:N (no SET commands issued). The test incorrectly asserted the key was absent; updated to assert Count == 0 instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_request Mirror the zero-key guard already present in the base class override so cluster_client::create_mget_request returns false rather than sending an empty MGET when keys_count reaches 0 at a ratio boundary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Cluster mode rejects arbitrary commands with keys_count > 1 (the startup guard at memtier_benchmark.cpp:2963). Multi-key MGET via --command is a future feature; tests that verify routing correctness work equally well with a single hash-tagged key, so switch to --command=MGET {tag}-__key__. Also harden debugPrintMemtierOnError to skip files that don't exist (memtier exits before writing output when it detects a config error, leaving mb.stdout/mb.stderr absent and causing FileNotFoundError). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test_mget_protocol.py: _get_redis_conn() now passes ssl/cert params when env.useTLS is set, so MONITOR captures commands in TLS test runs instead of silently connecting to nothing. test_mget_cluster.py: remove "Totals" from the hits_sec search list; arbitrary-command workloads always emit Hits/sec=0 in Totals (hit tracking is not wired up for --command path), causing the assertGreater to fire as a false negative. Routing correctness is already covered by cmdstat_mget checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

_get_redis_conn now uses ssl_cert_reqs="none" instead of supplying the CA cert when in TLS mode. The test cert's CN/SAN rarely covers "127.0.0.1", so hostname verification caused a silent SSL handshake failure: the MONITOR thread caught the exception and returned empty results, giving 0 MGET entries on every assertion. The client still presents ssl_certfile/ssl_keyfile so the server-side tls-auth-clients check is satisfied. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- protocol.cpp: fix use-after-free in keylist::add_key() — save/restore m_buffer_ptr offset across realloc so the pointer is valid after the buffer may have moved - client.cpp: use m_keylist->get_keys_count() instead of m_config->multi_key_get when advancing m_get_ratio_count so the ratio stays accurate when fewer keys than the configured max are available - memtier_benchmark.cpp: change <= 0 to < 1 for the unsigned multi_key_get validation so the comparison is well-defined; add protocol guard (redis only) and --command incompatibility check - .clang-format: remove duplicate Standard: Cpp03 key that broke make format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Covers four classes of checks: - CLI rejection: --multi-key-get=0, memcache_text, memcache_binary, and --command combos must exit non-zero with a clear error message - keylist buffer-realloc regression: a 512-char key prefix forces keylist::add_key() to grow its buffer; catches the use-after-free fixed in the companion commit (m_buffer_ptr must be updated after realloc) - Partial ratio cycle: ratio=1:5 multi-key-get=4 exercises the path where the final MGET in a cycle carries fewer keys than the configured max, validating that m_get_ratio_count advances by actual keys sent - Narrow key range: key-maximum - key-minimum + 1 < multi_key_get ensures the benchmark completes cleanly when the range is smaller than the batch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lloc If realloc() moves the buffer, m_keys[0..m_keys_count-1].key_ptr entries stored before the overflow still point into the freed allocation. Save the old buffer pointer before realloc and, when the address changes, walk the already-stored entries to re-base each key_ptr offset. Also update tests/test_mget_validation.py: - fix: use multi_key_get=3 with 340-char prefix so realloc fires when m_keys_count==2, exercising the key_ptr re-basing path - fix: replace env.assertIn() (not in RLTest API) with env.assertTrue(needle in value) - docs: clarify which bugs the partial-cycle and narrow-range tests can and cannot detect at the integration level Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

keylist::add_key() capacity check used >= instead of > after the +1 byte needed for the NUL terminator, allowing a one-byte overwrite at the buffer boundary. Rewrite the key_ptr re-basing loop to derive pointers from m_keys[k].key_len (keys are packed contiguously) instead of subtracting old_buffer after realloc freed it, which was technically undefined behaviour and triggered -Wuse-after-free. In client::create_request(), when create_mget_request() returns false (e.g. a cluster connection owns no slots in the configured key range), m_get_ratio_count was left below ratio.b causing fill_pipeline() to retry the same call immediately and busy-spin. Advance m_get_ratio_count to ratio.b on a false return so the ratio counters reset on the next call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ache_binary memcache_text_protocol::write_command_multi_get() is fully implemented (emits 'get k1 k2 k3\r\n') and parse_response() handles multi-VALUE replies. Only memcache_binary has the assert(0) stub, so the validation now rejects memcache_binary only. Remove the test for memcache_text rejection; update the memcache_binary test to expect the new error message. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The inner loop iterates over the full key range (key_minimum..key_maximum) computing a CRC16 hash per key, which can take several seconds for large ranges. Log before and after so the user knows the benchmark is not hung. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The slot→key index table was rebuilt identically by every cluster_client thread (one per --threads), wasting O(key_range * 8B * threads) memory. For 1B keys at --threads=4 that was 32 GB. Three changes together make large key ranges feasible: 1. Share: move slot_keys into mget_slot_cache (owned by benchmark_config, allocated per run_benchmark call). A pthread_mutex guards the one-time build; subsequent threads skip straight to rebuilding their per-thread conn→slot map. Per-thread state (m_mget_slot_cursor, m_mget_conn_slots) stays on the cluster_client instance. 2. Cap: store at most min(multi_key_get * 4, 4096) key indices per slot. That is enough for several full MGET batches of variety. Memory is now O(16384 * cap * 8B) regardless of key range. 3. Early exit: break the scan once every slot has reached its cap. For a uniform CRC16 distribution this fires after ~cap * 16384 * 2 keys, so build time is also O(cap) not O(key_range). Memory comparison for --multi-key-get=72: before: 8B * key_range * threads (1B keys, 4 threads → 32 GB) after: 8B * 288 * 16384 shared (~36 MB, regardless of range or threads) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…he parser Three fixes from the 7-subagent review round: 1. mget_slot_cache::built: change plain bool to std::atomic<bool> with memory_order_release on the store (inside mutex, after slot_keys is fully populated) and memory_order_relaxed on the load (also inside mutex, where relaxed is sufficient). Each thread that reads slot_keys without the mutex has already acquired and released cache->mutex in a prior build_mget_slot_cache() call, providing the necessary happens-before edge; but using atomic<bool> makes this formally correct by the C++ memory model and silences any future audit or TSAN concern. 2. exit(1) path in run_benchmark(): add delete cfg->mget_cache before exit(1) so the pthread_mutex_t inside mget_slot_cache is destroyed cleanly rather than leaked when thread preparation fails. 3. protocol.cpp memcache_text rs_read_value: fix wrong assert on evbuffer_remove() return value. evbuffer_remove() returns the number of bytes removed on success (not 0); the old assert((unsigned int)ret==0) would fire on every MGET hit in a debug build. Changed to assert(ret!=-1). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 3f180f4. Configure here.}

In a GET-only workload (--ratio=0:N) with cluster MGET, a connection whose slots own no keys in the configured key range can never generate a request. The create_request() call advances m_get_ratio_count to ratio.b (causing the else-branch to reset both counters) but never adds anything to m_pipeline, so fill_pipeline's while-loop spins consuming 100% CPU on that connection's event-loop thread. Fix: return true from hold_pipeline() for such connections when the slot cache is fully built and the connection has no eligible slots. This breaks the fill_pipeline loop cleanly; connections with eligible slots continue to operate normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…itor cmds Two issues found by post-review analysis: 1. After build_mget_slot_cache() sets built=true in handle_cluster_slots, connections that had been bufferevent_disable()'d before the cache existed would never re-run fill_pipeline() and stay permanently idle. Fix: call schedule_fill() on all connected shards after every cache build, mirroring the pattern used by the transaction-pin guard. 2. The GET-only MGET hold_pipeline guard did not check m_staged_monitor_commands. A connection with no MGET-eligible slots but with staged --monitor-input commands would be silenced, dropping those commands silently. Fix: add m_staged_monitor_commands[conn_id].empty() to the guard predicate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fcostaoliveira requested a review from paulorsousa May 27, 2026 14:56