Revise Save/Restore for true pit snapshot. #401

allenss-amazon · 2025-10-01T04:46:29Z

Revises the save/restore logic to support saving of all index data and pending mutation queue as discussed earlier.

This new logic is fully backward and forward compatible. RDB files written using the previous binaries (aka V1) will still be loaded and subject to the same stale data correction and forced backfill logic as existed before this PR. In other words, for RDB files written with the old code, the new code handles them exactly the same.

New format files (aka V2) contain additional data that will be ignored by old code. In other words, V2 RDB files can still be loaded into prior code which will ignore the extra data, i.e., they are treated like V1 files and be subjected to stale data correction and forced backfill.

V2 files which are loaded into the new code are not subjected to stale data correction and forced backfill. Once a V2 file is loaded it can "open for business" immediately.

The V2 format contains an additional section for the contents of non-vector indexes (Tags, Numeric). These sections contain the key and attribute-value for each entry in the index. This external format differs from the internal format, allowing the internal data structures to be freely optimized in the future.

The V2 format also has a section for the mutation queue and index backfill status. This additional section records the keys that were in the mutation queue at the time of the save as well as the index backfill status (complete or incomplete). This allows the recreation of the mutation queue and backfill status (backfill needed or not) to match the state at the time of the save. (One small quirk is that the saved mutation queue doesn't bother to record whether the saved keys are in the mutation queue due to a backfill or not. This behavior should not affect the client-visible semantics, but would affect internal counters for the ingestion pipeline as they would be mislead by the source of the mutation.

Two controls are provided to override the normal behavior of this code.

The configurable: search.rdb_write_v2 (default "yes") controls whether a generated RDB file is in the V1 or V2 format.

The configurable: search.rdb_read_v2 (default "yes") controls whether the V2 data is used. Setting this to "no" will force the code to treat a V2 file as a V1 file, i.e., to ignore the extra V1 data.

FIxes #41

Aksha1812 · 2025-10-07T07:57:33Z

src/rdb_serialization.h

+  absl::Status SaveString(const std::string_view s) {
+    return SaveChunk(s.data(), s.size());
+  }
+  template <typename T, std::enable_if_t<std::is_trivial<T>::value &&


Do we need enable_if_t ? I saw it limits this function only to certain types. What types does it limit the template to. In future are we expected to serialize any complex information ?

This implementation won't handle complex information. If the need for a more sophisticated serialization framework comes up we'll tackle it then. The enable_if_t is seat belts for developers.

Aksha1812 · 2025-10-07T07:59:24Z

testing/common.h

      : vmsdk::ThreadPool(name, num_threads) {
    ON_CALL(*this, Schedule(testing::_, testing::_))
-        .WillByDefault(testing::Invoke(
+        .WillByDefault(


I had seen this was causing build warnings , glad this is being fixed, what was the actual issue?

Not sure why this wasn't caught earlier. I believe that testing::Invoke predates lambda functions and isn't needed for them.

Aksha1812 · 2025-10-07T08:01:01Z

src/index_schema.cc

+                             << tracked_mutated_records_.size();
+  VMSDK_RETURN_IF_ERROR(out.SaveObject(tracked_mutated_records_.size()));
+  for (const auto &[key, value] : tracked_mutated_records_) {
+    VMSDK_RETURN_IF_ERROR(out.SaveString(key->Str()));


We don't save the value of records in the mutation queue?

The values are not needed because they are "by definition" duplicates of what is in the main database.

Aksha1812 · 2025-10-07T08:28:50Z

src/indexes/numeric.cc

+    //
+    if (tracked_keys_.contains(key_ptr)) {
+      DCHECK(false);
+      return absl::InternalError("Numeric field save duplicate key");


what happens when tracked/ untracked keys overlap between loading different indexes. I assume it would be handled automatically while backfilling?

In theory tracked/untracked are disjoint sets. In the save/restore world it's better to crash the save than the restore. In other words, let's not write an invalid RDB file.

Aksha1812 · 2025-10-07T09:03:21Z

integration/test_saverestore.py

+import threading
+from ft_info_parser import FTInfoParser
+
+index = Index("index", [Vector("v", 3, type="HNSW", m=2, efc=1), Numeric("n"), Tag("t")])


maybe we could add a flat vector field as well just to make testing more extensive. also maybe adding some deletes, updates and then saving restoring would also remove any doubts

The PR doesn't change the save/restore of a vector index. It does suppress the post load scan for stale keys. But this logic isn't affected by HNSW vs FLAT.

Aksha1812 · 2025-10-07T09:11:47Z

src/index_schema.cc

+                  supplemental_content->mutation_queue_header().backfilling();
+              if (!backfilling) {
+                VMSDK_LOG(DEBUG, ctx) << "Backfill suppressed.";
+                index_schema->backfill_job_.Get() = std::nullopt;


I imagine some consistency check here, but i guess that is a separate task

Yes, in a perfect world we'd have additional checking code that could be enabled/disabled.

Signed-off-by: Allen Samuels <[email protected]>

yairgott · 2025-10-27T22:41:45Z

src/rdb_serialization.h

+  absl::StatusOr<T> LoadObject() {
+    VMSDK_ASSIGN_OR_RETURN(auto buffer, LoadChunk());
+    if (buffer->size() != sizeof(T)) {
+      return absl::InternalError("Mismatched size protocol error");


Consider asserting using CHECK or DCHECK

I don't think crashing is the correct response to an invalid RDB file.

yairgott · 2025-10-27T23:30:07Z

src/index_schema.cc

+    auto idx = attr.GetIndex();
+    size_t cnt = idx->GetTrackedKeyCount() + idx->GetUnTrackedKeyCount();
+    if (cnt != oracle_key_count) {
+      if (IsVectorIndex(idx) && cnt < oracle_key_count) {


shouldn't this be:

if ((IsVectorIndex(idx) && cnt <= oracle_key_count) || (!IsVectorIndex(idx) && cnt == oracle_key_count)) { continue;

Well, they are equivalent. How about this?

if (IsVectorIndex(idx) ? cnt <= oracle_key_count : cnt == oracle_key_count) { continue; }

src/indexes/numeric.h

yairgott · 2025-10-28T01:13:16Z

src/index_schema.cc

+            if (index_schema) {
+              VMSDK_RETURN_IF_ERROR(index_schema->LoadIndexExtension(
+                  ctx, RDBChunkInputStream(supplemental_iter.IterateChunks())));
+              bool backfilling =


nit: there is no point in defining a variable if used only once.

Which variable? backfilling is used in the next line.

yairgott · 2025-10-28T01:19:35Z

src/index_schema.cc

+      break;
+    }
+  }
+  size_t oracle_key_count =


shouldn't we break if non-vector index if found?

Isn't that exactly what this does?

yairgott · 2025-10-28T01:24:48Z

src/index_schema.cc

+      }
+      auto status2 = larger_index->ForEachUnTrackedKey(key_check);
+      if (!status2.ok()) {
+        status = status1;


should this be:
status = status2; ?

good catch.

yairgott · 2025-10-28T01:29:38Z

src/index_schema.cc

+  //
+  // Need to find an attribute index that has the right tracked/untracked
+  // keys. Any non-vector index will do. But it there are only vector
+  // indexes we will use that.


what is the value of doing so if there are no non-vector indexes?

Hmm, good point. Worse, the list will be wrong if there are multiple vector indexes across disjoint sets of keys. I think you're correct that it can be skipped.

yairgott · 2025-10-28T03:10:26Z

I might had missed it, but I haven't seen that the queued multi/exec are being serialized as well.

allenss-amazon · 2025-10-29T01:02:07Z

I might had missed it, but I haven't seen that the queued multi/exec are being serialized as well.

Good catch.

Signed-off-by: Allen Samuels <[email protected]>

allenss-amazon force-pushed the saverestore branch from 0a657fd to 450e3b3 Compare October 4, 2025 18:08

allenss-amazon requested review from murphyjacob4 and yairgott October 4, 2025 18:27

allenss-amazon changed the title ~~Initial wiring~~ Revise Save/Restore for true pit snapshot. Oct 4, 2025

allenss-amazon requested a review from zvi-code October 4, 2025 18:50

allenss-amazon marked this pull request as ready for review October 4, 2025 18:50

Aksha1812 reviewed Oct 7, 2025

View reviewed changes

allenss-amazon added 12 commits October 16, 2025 03:09

Initial wiring

c23b19e

Signed-off-by: Allen Samuels <[email protected]>

formating

ab61e93

Signed-off-by: Allen Samuels <[email protected]>

Code complete

0116204

Signed-off-by: Allen Samuels <[email protected]>

Finished testing

230682e

Signed-off-by: Allen Samuels <[email protected]>

add missing file

d52dcaa

Signed-off-by: Allen Samuels <[email protected]>

Revert

6344430

Signed-off-by: Allen Samuels <[email protected]>

Cleanup

9c0d6f6

Signed-off-by: Allen Samuels <[email protected]>

fix spelling

667d77e

Signed-off-by: Allen Samuels <[email protected]>

fix bad merge

c7bd274

Signed-off-by: Allen Samuels <[email protected]>

bad merge

233c675

Signed-off-by: Allen Samuels <[email protected]>

experiment

047f043

Signed-off-by: Allen Samuels <[email protected]>

Cleanup

923c3d4

Signed-off-by: Allen Samuels <[email protected]>

allenss-amazon force-pushed the saverestore branch from 7b0fe82 to 923c3d4 Compare October 16, 2025 03:30

allenss-amazon added 3 commits October 16, 2025 06:09

Revised

e4ac641

Signed-off-by: Allen Samuels <[email protected]>

hopefully fix valkey server version

d535c91

Signed-off-by: Allen Samuels <[email protected]>

Merge branch 'main' into saverestore

386686c

allenss-amazon mentioned this pull request Oct 26, 2025

Avoid Backfill After RDB is Loaded #41

Open

Merge branch 'main' into saverestore

9b8ee70

yairgott reviewed Oct 27, 2025

View reviewed changes

yairgott reviewed Oct 28, 2025

View reviewed changes

Revise per review. Add save/restore of multi/exec

df9cd25

Signed-off-by: Allen Samuels <[email protected]>

Uh oh!

Revise Save/Restore for true pit snapshot. #401

Are you sure you want to change the base?

Revise Save/Restore for true pit snapshot. #401

Uh oh!

Conversation

allenss-amazon commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allenss-amazon Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yairgott commented Oct 28, 2025

Uh oh!

allenss-amazon commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allenss-amazon commented Oct 1, 2025 •

edited

Loading

allenss-amazon Oct 29, 2025 •

edited

Loading