Skip to content

Bug: index_gt::update releases an unacquired conditional lock under specific conditions, causing a data race #680

@yoonseok-kim

Description

@yoonseok-kim

Describe the bug

Hi, I have found another data race issue in index_gt::update.

// The trickiest part of update-heavy workloads is mitigating dead-locks
// in connected nodes during traversal. A "good enough" solution would be
// to skip concurrent access, assuming the other "close" node is gonna add
// this one when forming reverse connections.
bool failed_to_acquire = false;
node_conditional_lock_t candidate_lock =
node_try_conditional_lock_(candidate_slot, updated_slot != candidate_slot, failed_to_acquire);
if (failed_to_acquire)
continue;

index_gt::update uses conditional lock(node_try_conditional_lock_) instead of the hard lock(node_lock_) that index_gt::add use. When candidate_slot == updated_slot, the function deliberately skips atomic_set, but it still returns a node_conditional_lock_t whose destructor unconditionally calls atomic_reset.

struct node_conditional_lock_t {
nodes_mutexes_t& mutexes;
std::size_t slot;
inline ~node_conditional_lock_t() noexcept {
if (slot != std::numeric_limits<std::size_t>::max())
mutexes.atomic_reset(slot);
}
};
inline node_conditional_lock_t node_try_conditional_lock_(std::size_t slot, bool condition,
bool& failed_to_acquire) const noexcept {
failed_to_acquire = condition ? nodes_mutexes_.atomic_set(slot) : false;
return {nodes_mutexes_, failed_to_acquire ? std::numeric_limits<std::size_t>::max() : slot};
}

It clears the bit for a slot we never locked and can even unlock another thread’s live lock on the same node.
Eventually, two threads think they own the slot, they both mutate the same neighbors_ref_t buffer. it is data race.

For reference, index_gt::add never happen same problem because it always uses node_lock_, so only update suffers.

I’ve tried resolving this issue and submitted a bugfix PR.
I would appreciate it if you could take a look. :)

Steps to reproduce

  1. Add 10,000 nodes (index_gt::add).
  2. Update the same 10,000 nodes in parallel, modifying only their vector values (index_gt::update).
    The issue was reproduced in my environment with 16 threads, M = 32, and efConstruction = 128.
  3. TSan sometimes detects a data race (non-deterministic, timing-dependent).

Expected behavior

index_gt::update is thread-safe.

USearch version

v2.21.2

Operating System

Red Hat Enterprise Linux release 8.6

Hardware architecture

x86

Which interface are you using?

C++ implementation

Contact Details

[email protected]

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions