Skip to content

Conversation

@uriyage
Copy link
Contributor

@uriyage uriyage commented Jul 29, 2025

Adds detailed memory usage tracking for different index types in valkey-search, for better visibility into memory usage patterns.

Main Changes

Global Metrics System: Tracks memory usage across vectors, HNSW nodes/edges, tags, numeric records, and interned strings
INFO Command Integration: Metrics exposed through INFO SEARCH
Lifecycle Tracking: Separates active vs marked-deleted memory

Key Metrics Added

• Vector memory (active + marked deleted)
• HNSW nodes and edges (active + marked deleted)
• Flat index nodes
• Tag memory and count
• Numeric records
• Interned strings memory
• Keys memory

Technical Details

• Uses atomic counters for thread safety
• Updated all index implementations to report metrics
• Enhanced string interning with metadata tracking

Files Modified

• Core: global_metrics.h, metric_types.h
• Indexes: vector, tag, numeric implementations
• Utils: string interning enhancements
• Tests: unit and integration coverage

Signed-off-by: Uri Yagelnik <[email protected]>
@zvi-code zvi-code requested a review from allenss-amazon July 29, 2025 09:38
Copy link
Member

@allenss-amazon allenss-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed a part of the code.

I think the modifications to the StringInterning facility aren't sufficient for our needs. In particular if a string needs to belong to more than one category, then it will break as there isn't room to record more than one usage.

I would recommend that we move to a world of multiple StringInterning pools, one per type. This eliminates the need for the metadata per string, which will ultimately save more space, even though strings which are part of more than one pool would get duplicated.

// Packed metadata structure for InternedString (32 bits total)
struct MetaData {
uint32_t metric_type : 7; // 7 bits for metric type
uint32_t use_count : 16; // 16 bits for use count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move use-count to the top, so it can be incremented as a u16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


MetaData metadata{};
metadata.metric_type = static_cast<uint8_t>(metric_type);
metadata.use_count = 0; // Start with 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why start with 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed implementation to use INIT_USE_COUNT we need to differentiate between a temp vector string that was never added to an index to a vector that was used in and index.

Signed-off-by: Uri Yagelnik <[email protected]>
@uriyage
Copy link
Contributor Author

uriyage commented Aug 4, 2025

I only reviewed a part of the code.

I would recommend that we move to a world of multiple StringInterning pools, one per type. This eliminates the need for the metadata per string, which will ultimately save more space, even though strings which are part of more than one pool would get duplicated.

Changed per your suggestion to use multiple pools.

Copy link
Member

@allenss-amazon allenss-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two issues with the overall design.

  1. It seems like this design may fail whenever you have the same string with different type marks. KEY and TAG for example. Depending on the order of creation, the two different objects will either be classified as a TAG or as a KEY. And we charge differently for them, right? I believe there are two ways to fix this. a/ Instantiate different InternStringPools, one for each type. This would require changing every instance of InternStringPtr to the correctly typed one.

There's a less intrusive hack which is b/ Incorporate the MetricType into intern table. In other words change:

absl::flat_hash_map<absl::string_view, std::weak_ptr<InternedString>>
      str_to_interned_ ABSL_GUARDED_BY(mutex_);

into

      str_to_interned_ ABSL_GUARDED_BY(mutex_);

This logically ends creating different pools for different MetricTypes, but without the need to change all references to InternStringPtrs.

  1. Why is vector treated differently? If it has fundamentally different counting characteristics that suggests to me that perhaps it should be a different type.

const auto index = static_cast<size_t>(metric_type);
if (index >= static_cast<size_t>(MetricType::kMetricTypeCount)) {
return ""; // Invalid metric type
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this should be an assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, feels like the body of this and the table should be in .cc file, isn't "kMetricStringTypes" really supposed to be hidden behind "GetMetricTypeString" ??

{MetricType::kKeysMemory, "keys_memory"}
constexpr absl::string_view kMetricTypeStrings[] = {
#define METRIC_ENTRY(name, str) str,
METRIC_TYPES_TABLE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of macro with-in a macro is a very powerful thing. But I believe it would be clearer if you passed in the "METRIC_ENTRY" as a parameter to METRIC_TYPES_TABLE rather than having an externally known name that's not obvious from the usage. By passing in the "macro to invoke" you eliminate odd-ball global dependencies.


void Decr(MetricType metric_type, uint64_t value = 1) {
GetOrCreateMetric(metric_type).count.fetch_sub(value, std::memory_order_relaxed);
GetMetric(metric_type).count.fetch_sub(value, std::memory_order_relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some insulation against going negative (large positive). This is a common failure in these kinds of tracking operations. When you detect a negative counter, you should clamp it to zero and put out a log message -- be sure to use the LOG_EVERY_N .... macro to rate-limit the log messages, because you could get a lot of them here. Optionally, we might want to have a configurable to make this be a hard crash in order to help debug any account errors.

Comment on lines +57 to +59
uint64_t total = 0;
if (metric_type == MetricType::kInternedStrings) {
for (size_t j = 0; j < static_cast<size_t>(::valkey_search::StringType::kStringTypeCount); ++j) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch statement please.

Comment on lines +80 to +82
if (index >= static_cast<size_t>(MetricType::kMetricTypeCount)) {
return 0; // Invalid metric type
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like an assert would be more appropriate here on the "default" label on the switch.


namespace valkey_search::indexes {

#define METRIC_TYPES_TABLE \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as describe above make METRIC_ENTRY be a parameter to this macro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants