Skip to content

MGET support for Redis protocol with OSS Cluster slot-aware routing#397

Merged
fcostaoliveira merged 25 commits into
redis:masterfrom
filipecosta90:feat/mget-redis-protocol
May 28, 2026
Merged

MGET support for Redis protocol with OSS Cluster slot-aware routing#397
fcostaoliveira merged 25 commits into
redis:masterfrom
filipecosta90:feat/mget-redis-protocol

Conversation

@fcostaoliveira
Copy link
Copy Markdown
Collaborator

@fcostaoliveira fcostaoliveira commented May 27, 2026

Summary

  • Implements `write_command_multi_get()` in the Redis protocol layer (was an `assert(0)` stub) — emits valid RESP array wire format for N-key MGET
  • Enables `--multi-key-get` in OSS Cluster mode: removes the cluster rejection guard and overrides `cluster_client::create_mget_request()` with a slot-aware key selection algorithm
  • Cluster routing: pre-builds a map from hash slot → key indices, then on each MGET picks all N keys from a single slot. Redis Cluster rejects MGET with a CROSSSLOT error if keys span different slots (even on the same node), so same-slot selection is the correct constraint. No `{tag}` hash-tag prefix needed — keys are identical to those written by a normal SET preload.
  • Adds four integration test files: `test_mget_standalone.py` (9 tests), `test_mget_cluster.py` (4 tests), `test_mget_protocol.py` (RESP wire-format verification), `test_mget_validation.py` (CLI rejection + regression tests)

Test plan

  • `TEST=test_mget_standalone.py OSS_STANDALONE=1 ./tests/run_tests.sh` — 9 standalone tests pass
  • `TEST=test_mget_cluster.py OSS_CLUSTER=1 ./tests/run_tests.sh` — 4 cluster tests pass (no CROSSSLOT errors)
  • `TEST=test_mget_validation.py OSS_STANDALONE=1 ./tests/run_tests.sh` — CLI rejection, keylist buffer-realloc, partial ratio cycle, and narrow key-range regression tests pass
  • Full CI suite green (64/64: ASAN, TSAN, UBSan, cluster, TLS, all platform builds)
  • Verify `--multi-key-get=72` and `--multi-key-get=1000` both work in cluster mode without CROSSSLOT errors
  • Confirm hit/miss accounting correct in JSON output (`Gets.Hits/sec`, `Gets.Misses/sec`)

Key design notes

Cluster MGET key selection: Redis Cluster returns CROSSSLOT when keys in a multi-key command hash to different slots — even if those slots are on the same node. The implementation pre-builds `m_mget_slot_keys[slot]` (key indices per slot) and `m_mget_conn_slots[conn_id]` (slots owned by each connection). Each MGET picks a single `target_slot` (round-robin over the connection's slots) and draws all N keys from that one slot, satisfying the same-slot constraint without hash tags.

Standalone mode: unchanged — no slot filtering, keys follow the normal key iterator.

🤖 Generated with Claude Code


Note

Medium Risk
Touches cluster request generation, shared cross-thread cache, and hot-path keylist/protocol code; mitigated by extensive new tests and pipeline idle guards.

Overview
This PR turns --multi-key-get from a stub into a working MGET path for Redis and enables it in OSS cluster mode (the old “not supported in cluster” guard is removed).

Protocol and client core: redis_protocol::write_command_multi_get() now emits proper RESP MGET arrays. Standalone client::create_mget_request() skips unavailable keys, refuses empty batches, and advances the GET ratio by the actual key count sent. Failed cluster MGET attempts bump the ratio counter so the pipeline cannot spin forever.

Cluster routing: A shared mget_slot_cache (mutex + one-time slot→key-index scan) plus per-thread cursors and m_mget_conn_slots ensure every MGET uses N keys from one hash slot (same-slot rule, no hash tags required). hold_pipeline() idles connections with no eligible slots; topology refresh rebuilds the cache and schedule_fill() wakes shards.

Config / safety: New validation blocks memcache_binary, --command, and --multi-key-get=0. Run lifecycle allocates/frees mget_cache. keylist::add_key fixes realloc pointer rebasing; a small evbuffer_remove assert fix.

Tests: Large new suites for standalone, cluster, RESP wire format, and CLI/regression; debugPrintMemtierOnError tolerates missing log files.

Reviewed by Cursor Bugbot for commit 7af41f8. Bugbot is set up for automated code reviews on this repo. Configure here.

@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented May 27, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Comment thread cluster_client.cpp
@fcostaoliveira fcostaoliveira changed the title feat(mget): MGET support for Redis protocol with OSS Cluster slot-aware routing MGET support for Redis protocol with OSS Cluster slot-aware routing May 27, 2026
Comment thread client.cpp
Comment thread cluster_client.cpp
Comment thread cluster_client.cpp
Comment thread cluster_client.cpp
Comment thread client.cpp
fcostaoliveira and others added 15 commits May 28, 2026 08:59
Implement write_command_multi_get() in the Redis protocol layer (was an
assert(0) stub), add slot-aware MGET key generation for OSS Cluster, and
remove the cluster-mode rejection guard that previously blocked --multi-key-get.

Cluster mode: each MGET request wraps all N keys in a rotating "{TAG}"
hash tag whose CRC16 slot maps to the owning connection, so every wire
MGET stays on one shard with no CROSSSLOT errors. Standalone mode is
unchanged. Note that cluster MGET keys carry the "{TAG}" prefix, making
them incompatible with plain SET-loaded keys; preload with matching format
for hit-rate benchmarking.

Also adds three integration test files:
- test_mget_standalone.py  (9 standalone tests: commandstats, 72-key, hit/miss)
- test_mget_cluster.py     (4 cluster tests: basic, 72-key, cross-shard, no-crossslot)
- test_mget_protocol.py    (RESP wire-format verification via MONITOR)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… MGET

Instead of wrapping every key in "{TAG}" (which breaks compatibility with
plain SET-loaded keys), probe the key space for indices whose crc16 slot
naturally maps to the target connection. Keys remain in normal
memtier-{N} format — compatible with any preloaded data.

Computation cost is O(N * num_shards) crc16 calls per MGET request, all
inside create_mget_request() which is only entered when --multi-key-get is
active. Remove the now-unused generate_key_with_hashtag helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ter probe

The help text still referred to the old hash-tag approach; update it to
describe the actual slot-probing behaviour (no tag prefix added).

Switch the probe loop from raw crc16() to calc_hslot_crc16_with_hash_tag()
so that a --key-prefix containing {braces} is hashed the same way Redis
itself hashes it, avoiding slot mismatch and spurious MOVED errors.

Also guard range==0 (key_maximum < key_minimum misconfiguration) to prevent
a divide-by-zero in the modulo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Redis Cluster enforces same hash-SLOT (not just same-node) for MGET. The
previous probe loop collected keys by shard, which allowed different slots
on the same node to end up in one MGET → guaranteed CROSSSLOT errors on
any real cluster.

New approach: build_mget_slot_cache() scans [key_minimum, key_maximum]
once after handle_cluster_slots() completes (only when --multi-key-get > 0),
grouping key indices by their exact hash slot. create_mget_request() then
round-robins over the slots owned by conn_id, picking N keys from the same
slot per MGET wire command — zero CROSSSLOT risk, O(N) request time instead
of O(N × num_shards), and keys remain in plain memtier-NNN format (no tag
prefix) for full compatibility with SET-preloaded data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cluster_client.h: replace >> with > > in nested vector templates to
  fix compilation on macOS with OpenSSL 1.0.2/1.1 (older clang C++03 mode)
- tests/test_mget_protocol.py: pass message= as keyword arg to
  env.assertTrue/assertEqual to avoid TypeError from RLTest treating
  the message string as the depth integer parameter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RLTest env.assertTrue/assertEqual take depth as the third positional arg,
not message. Passing a string as the third positional arg sets depth to a
string, making depth+1 crash with TypeError on every assertion call
(whether it passes or fails). Fix by using the message= keyword.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…efresh

The expensive key-range scan (O(key_max−key_min) CRC16 hashes) was
repeated on every CLUSTER SLOTS response.  Split build_mget_slot_cache()
so the slot→key map is built once (guarded by m_mget_slot_keys_built)
and only the topology-dependent conn→slot map is rebuilt on refresh.

Also guard client::create_mget_request against an empty keylist reaching
send_mget_command when all get_key_for_conn calls return !available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The cluster test docstring incorrectly described hash-tag prefixes; the
actual implementation uses a pre-built per-slot key cache (slot-probing).
Also remove the redundant `override` on create_mget_request to match the
virtual-only style of sibling declarations in cluster_client.h.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Nested templates like vector<vector<T> > require a space between the
closing ">>" on macOS builds that compile in C++03 mode.  Without
Standard: Cpp03, clang-format collapses "> >" to ">>" causing a hard
parse error on those builds.  Setting Cpp03 makes the format check and
the macOS build agree.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
memtier_benchmark always emits ALL STATS.Sets in the JSON output even
when --ratio=0:N (no SET commands issued).  The test incorrectly asserted
the key was absent; updated to assert Count == 0 instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_request

Mirror the zero-key guard already present in the base class override so
cluster_client::create_mget_request returns false rather than sending an
empty MGET when keys_count reaches 0 at a ratio boundary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cluster mode rejects arbitrary commands with keys_count > 1 (the startup
guard at memtier_benchmark.cpp:2963).  Multi-key MGET via --command is a
future feature; tests that verify routing correctness work equally well
with a single hash-tagged key, so switch to --command=MGET {tag}-__key__.

Also harden debugPrintMemtierOnError to skip files that don't exist
(memtier exits before writing output when it detects a config error,
leaving mb.stdout/mb.stderr absent and causing FileNotFoundError).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_mget_protocol.py: _get_redis_conn() now passes ssl/cert params when
env.useTLS is set, so MONITOR captures commands in TLS test runs instead
of silently connecting to nothing.

test_mget_cluster.py: remove "Totals" from the hits_sec search list;
arbitrary-command workloads always emit Hits/sec=0 in Totals (hit tracking
is not wired up for --command path), causing the assertGreater to fire as a
false negative.  Routing correctness is already covered by cmdstat_mget checks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_get_redis_conn now uses ssl_cert_reqs="none" instead of supplying the
CA cert when in TLS mode.  The test cert's CN/SAN rarely covers
"127.0.0.1", so hostname verification caused a silent SSL handshake
failure: the MONITOR thread caught the exception and returned empty
results, giving 0 MGET entries on every assertion.

The client still presents ssl_certfile/ssl_keyfile so the server-side
tls-auth-clients check is satisfied.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@fcostaoliveira fcostaoliveira force-pushed the feat/mget-redis-protocol branch from 813b0d2 to c43292e Compare May 28, 2026 07:59
fcostaoliveira and others added 6 commits May 28, 2026 09:12
- protocol.cpp: fix use-after-free in keylist::add_key() — save/restore
  m_buffer_ptr offset across realloc so the pointer is valid after the
  buffer may have moved
- client.cpp: use m_keylist->get_keys_count() instead of
  m_config->multi_key_get when advancing m_get_ratio_count so the ratio
  stays accurate when fewer keys than the configured max are available
- memtier_benchmark.cpp: change <= 0 to < 1 for the unsigned
  multi_key_get validation so the comparison is well-defined; add
  protocol guard (redis only) and --command incompatibility check
- .clang-format: remove duplicate Standard: Cpp03 key that broke
  make format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers four classes of checks:

- CLI rejection: --multi-key-get=0, memcache_text, memcache_binary, and
  --command combos must exit non-zero with a clear error message
- keylist buffer-realloc regression: a 512-char key prefix forces
  keylist::add_key() to grow its buffer; catches the use-after-free fixed
  in the companion commit (m_buffer_ptr must be updated after realloc)
- Partial ratio cycle: ratio=1:5 multi-key-get=4 exercises the path where
  the final MGET in a cycle carries fewer keys than the configured max,
  validating that m_get_ratio_count advances by actual keys sent
- Narrow key range: key-maximum - key-minimum + 1 < multi_key_get ensures
  the benchmark completes cleanly when the range is smaller than the batch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lloc

If realloc() moves the buffer, m_keys[0..m_keys_count-1].key_ptr entries
stored before the overflow still point into the freed allocation.
Save the old buffer pointer before realloc and, when the address changes,
walk the already-stored entries to re-base each key_ptr offset.

Also update tests/test_mget_validation.py:
- fix: use multi_key_get=3 with 340-char prefix so realloc fires when
  m_keys_count==2, exercising the key_ptr re-basing path
- fix: replace env.assertIn() (not in RLTest API) with
  env.assertTrue(needle in value)
- docs: clarify which bugs the partial-cycle and narrow-range tests
  can and cannot detect at the integration level

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
keylist::add_key() capacity check used >= instead of > after the +1 byte
needed for the NUL terminator, allowing a one-byte overwrite at the buffer
boundary.  Rewrite the key_ptr re-basing loop to derive pointers from
m_keys[k].key_len (keys are packed contiguously) instead of subtracting
old_buffer after realloc freed it, which was technically undefined behaviour
and triggered -Wuse-after-free.

In client::create_request(), when create_mget_request() returns false (e.g.
a cluster connection owns no slots in the configured key range),
m_get_ratio_count was left below ratio.b causing fill_pipeline() to retry
the same call immediately and busy-spin.  Advance m_get_ratio_count to
ratio.b on a false return so the ratio counters reset on the next call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ache_binary

memcache_text_protocol::write_command_multi_get() is fully implemented
(emits 'get k1 k2 k3\r\n') and parse_response() handles multi-VALUE
replies.  Only memcache_binary has the assert(0) stub, so the validation
now rejects memcache_binary only.

Remove the test for memcache_text rejection; update the memcache_binary
test to expect the new error message.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The inner loop iterates over the full key range (key_minimum..key_maximum)
computing a CRC16 hash per key, which can take several seconds for large
ranges.  Log before and after so the user knows the benchmark is not hung.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fcostaoliveira and others added 2 commits May 28, 2026 10:52
The slot→key index table was rebuilt identically by every cluster_client
thread (one per --threads), wasting O(key_range * 8B * threads) memory.
For 1B keys at --threads=4 that was 32 GB.

Three changes together make large key ranges feasible:

1. Share: move slot_keys into mget_slot_cache (owned by benchmark_config,
   allocated per run_benchmark call).  A pthread_mutex guards the one-time
   build; subsequent threads skip straight to rebuilding their per-thread
   conn→slot map.  Per-thread state (m_mget_slot_cursor, m_mget_conn_slots)
   stays on the cluster_client instance.

2. Cap: store at most min(multi_key_get * 4, 4096) key indices per slot.
   That is enough for several full MGET batches of variety.  Memory is now
   O(16384 * cap * 8B) regardless of key range.

3. Early exit: break the scan once every slot has reached its cap.  For a
   uniform CRC16 distribution this fires after ~cap * 16384 * 2 keys, so
   build time is also O(cap) not O(key_range).

Memory comparison for --multi-key-get=72:
  before: 8B * key_range * threads  (1B keys, 4 threads → 32 GB)
  after:  8B * 288 * 16384 shared   (~36 MB, regardless of range or threads)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…he parser

Three fixes from the 7-subagent review round:

1. mget_slot_cache::built: change plain bool to std::atomic<bool> with
   memory_order_release on the store (inside mutex, after slot_keys is
   fully populated) and memory_order_relaxed on the load (also inside
   mutex, where relaxed is sufficient).  Each thread that reads slot_keys
   without the mutex has already acquired and released cache->mutex in a
   prior build_mget_slot_cache() call, providing the necessary
   happens-before edge; but using atomic<bool> makes this formally correct
   by the C++ memory model and silences any future audit or TSAN concern.

2. exit(1) path in run_benchmark(): add delete cfg->mget_cache before
   exit(1) so the pthread_mutex_t inside mget_slot_cache is destroyed
   cleanly rather than leaked when thread preparation fails.

3. protocol.cpp memcache_text rs_read_value: fix wrong assert on
   evbuffer_remove() return value.  evbuffer_remove() returns the number
   of bytes removed on success (not 0); the old assert((unsigned int)ret==0)
   would fire on every MGET hit in a debug build.  Changed to assert(ret!=-1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3f180f4. Configure here.

Comment thread client.cpp
fcostaoliveira and others added 2 commits May 28, 2026 11:40
In a GET-only workload (--ratio=0:N) with cluster MGET, a connection
whose slots own no keys in the configured key range can never generate
a request.  The create_request() call advances m_get_ratio_count to
ratio.b (causing the else-branch to reset both counters) but never
adds anything to m_pipeline, so fill_pipeline's while-loop spins
consuming 100% CPU on that connection's event-loop thread.

Fix: return true from hold_pipeline() for such connections when the
slot cache is fully built and the connection has no eligible slots.
This breaks the fill_pipeline loop cleanly; connections with eligible
slots continue to operate normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…itor cmds

Two issues found by post-review analysis:

1. After build_mget_slot_cache() sets built=true in handle_cluster_slots,
   connections that had been bufferevent_disable()'d before the cache existed
   would never re-run fill_pipeline() and stay permanently idle.  Fix: call
   schedule_fill() on all connected shards after every cache build, mirroring
   the pattern used by the transaction-pin guard.

2. The GET-only MGET hold_pipeline guard did not check m_staged_monitor_commands.
   A connection with no MGET-eligible slots but with staged --monitor-input
   commands would be silenced, dropping those commands silently.  Fix: add
   m_staged_monitor_commands[conn_id].empty() to the guard predicate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@fcostaoliveira fcostaoliveira merged commit 659752c into redis:master May 28, 2026
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants