-
Notifications
You must be signed in to change notification settings - Fork 249
Description
Describe the bug
Hi, I have found another data race issue in index_gt::update.
USearch/include/usearch/index.hpp
Lines 4146 to 4154 in 13c3fd9
| // The trickiest part of update-heavy workloads is mitigating dead-locks | |
| // in connected nodes during traversal. A "good enough" solution would be | |
| // to skip concurrent access, assuming the other "close" node is gonna add | |
| // this one when forming reverse connections. | |
| bool failed_to_acquire = false; | |
| node_conditional_lock_t candidate_lock = | |
| node_try_conditional_lock_(candidate_slot, updated_slot != candidate_slot, failed_to_acquire); | |
| if (failed_to_acquire) | |
| continue; |
index_gt::update uses conditional lock(node_try_conditional_lock_) instead of the hard lock(node_lock_) that index_gt::add use. When candidate_slot == updated_slot, the function deliberately skips atomic_set, but it still returns a node_conditional_lock_t whose destructor unconditionally calls atomic_reset.
USearch/include/usearch/index.hpp
Lines 3828 to 3841 in 13c3fd9
| struct node_conditional_lock_t { | |
| nodes_mutexes_t& mutexes; | |
| std::size_t slot; | |
| inline ~node_conditional_lock_t() noexcept { | |
| if (slot != std::numeric_limits<std::size_t>::max()) | |
| mutexes.atomic_reset(slot); | |
| } | |
| }; | |
| inline node_conditional_lock_t node_try_conditional_lock_(std::size_t slot, bool condition, | |
| bool& failed_to_acquire) const noexcept { | |
| failed_to_acquire = condition ? nodes_mutexes_.atomic_set(slot) : false; | |
| return {nodes_mutexes_, failed_to_acquire ? std::numeric_limits<std::size_t>::max() : slot}; | |
| } |
It clears the bit for a slot we never locked and can even unlock another thread’s live lock on the same node.
Eventually, two threads think they own the slot, they both mutate the same neighbors_ref_t buffer. it is data race.
For reference, index_gt::add never happen same problem because it always uses node_lock_, so only update suffers.
I’ve tried resolving this issue and submitted a bugfix PR.
I would appreciate it if you could take a look. :)
Steps to reproduce
- Add 10,000 nodes (
index_gt::add). - Update the same 10,000 nodes in parallel, modifying only their vector values (
index_gt::update).
The issue was reproduced in my environment with 16 threads, M = 32, and efConstruction = 128. - TSan sometimes detects a data race (non-deterministic, timing-dependent).
Expected behavior
index_gt::update is thread-safe.
USearch version
v2.21.2
Operating System
Red Hat Enterprise Linux release 8.6
Hardware architecture
x86
Which interface are you using?
C++ implementation
Contact Details
Are you open to being tagged as a contributor?
- I am open to being mentioned in the project
.githistory as a contributor
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct