-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Proposal
Problem statement
When we need to merge two BTreeMaps into one BTreeMap (preserving Ord behavior) in an efficient manner, we do this through BTreeMap::append (an O(n + m) operation or just linear time). This method is more efficient than grabbing an owned (K, V) pair from other BTreeMap and using BTreeMap::insert to insert it into self BTreeMap, which is an O(mlg(n)) operation (m is the number of (K, V) pairs in other BTreeMap and n is the number of (K, V) pairs in self BTreeMap) .
In using BTreeMap::append though, we opt in for the behavior that:
If a key from other is already present in self, the respective value from self will be overwritten with the respective value from other.
In some cases, however, we want explicit control on the value that the conflicting key should hold.
Motivating examples or use cases
- As mentioned by Mozhewen, Kpreid, and Vorpal here, they have cases where they would desire control on the value of the conflicting key:
I'm currently working on a project that makes heavy use of the BTreeMap of the standard library. One of my needs is to merge two maps into one where values of the same key are combined using a user-specified function f:
// pseudo-code
m1 = {"key1": "foo", "key2": "baz", ...}
m2 = {"key1": "bar", "key3": "dog", ...}
mergeWith(f, m1, m2) = {
"key1": f("foo", "bar"),
"key2": "baz",
"key3": "dog",
...
}
- Mozhewen
I needed to know specifically if things existed in the left, right or both map (and then also compare the values to generate diffs of those values describing what needs to be done to go from left to right).
- Vorpal
My use case is much the same, except that my maps are already patches — or rather, “transactions” — and the goal is to produce a single transaction which has the simultaneous effects of both inputs.
- Kpreid (replying to what Vorpal said above)
- Another example of when we may need this is when our values for a conflicting key in two different
BTreeMaps represent frequencies. For example, I may have twoBTreeMaps contain (String,usize) key-value pairs that I want to combine the twoBTreeMapto collect the frequencies of theStrings while preserving alphabetical or reverse-alphabetical order of the keys for various reasons.
Solution sketch
We can introduce a function like BTreeMap::append_with() that takes in an impl FnMut to decide on what value should be held in a conflicting key between two BTreeMaps.
Changes that could be made in rust/library/alloc/src/collections/btree/map.rs (implementation same as BTreeMap::append() except for root.append_from_sorted_iters_with()).
impl <K, V, A: Allocator + Clone> BTreeMap<K, V, A> {
pub fn append_with(&mut self, other: &mut BTreeMap<K, V, A>, conflict: impl FnMut(&K, V, V) -> V) {
// Do we have to append anything at all?
if other.is_empty() {
return;
}
// We can just swap `self` and `other` if `self` is empty.
if self.is_empty() {
mem::swap(self, other);
return;
}
let self_iter = mem::replace(self, Self::new_in((*self.alloc).clone())).into_iter();
let other_iter = mem::replace(other, Self::new_in((*self.alloc).clone())).into_iter();
let root = self.root.get_or_insert_with(|| Root::new((*self.alloc).clone()));
root.append_from_sorted_iters_with(self_iter, other_iter, &mut self.length, (*self.alloc).clone(), conflict)
};
}Changes that could be made in rust/library/alloc/src/collections/btree/append.rs (as a consequence of introducing FnMut conflict)
impl<K, V> Root<K, V> {
pub(super) fn append_from_sorted_iters_with<I, A: Allocator + Clone>(
&mut self,
left: I,
right: I,
length: &mut usize,
alloc: A,
f: impl FnMut(&K, V, V) -> V,
) where
K: Ord,
I: Iterator<Item = (K, V)> + FusedIterator,
{
// We prepare to merge `left` and `right` into a sorted sequence in linear time.
let iter = MergeIterWith { inner: MergeIterInner::new(left, right), f };
// Meanwhile, we build a tree from the sorted sequence in linear time.
self.bulk_push(iter, length, alloc)
}
}
struct MergeIterWith<F, K, V, I: Iterator<Item = (K, V)>> {
inner: MergeIterInner<I>,
f: F,
}
impl<F, K: Ord, V, I> Iterator for MergeIterWith<F, K, V, I>
where
F: FnMut(&K, V, V) -> V,
I: Iterator<Item = (K, V)> + FusedIterator,
{
type Item = (K, V);
/// If two keys are equal, returns the key from the left and uses `f` to return
/// a value given knowledge of conflicting key and values from left and right
fn next(&mut self) -> Option<(K, V)> {
let (a_next, b_next) = self.inner.nexts(|a: &(K, V), b: &(K, V)| K::cmp(&a.0, &b.0));
match (a_next, b_next) {
(Some((a_k, a_v)), Some((_, b_v))) => Some({
let next_val = (self.f)(&a_k, a_v, b_v);
(a_k, next_val)
}),
(Some(a), None) => Some(a),
(None, Some(b)) => Some(b),
(None, None) => None,
}
}
}You can see a solution implementation of what BTreeMap::append_with() could look like in this PR I made.
Alternatives
Alternatives to doing something akin to BTreeMap::append_with() with existing methods could be to use a combination of an owned iterator of other BTreeMap, and using BTreeMap::get and BTreeMap::insert on self BTreeMap to decide on what value the conflicting key should hold:
use std::collections::BTreeMap;
let mut a = BTreeMap::new();
a.insert(1, String::from("a"));
a.insert(2, String::from("b"));
a.insert(3, String::from("c")); // Note: Key (3) also present in b.
let mut b = BTreeMap::new();
b.insert(3, String::from("d")); // Note: Key (3) also present in a.
b.insert(4, String::from("e"));
b.insert(5, String::from("f"));
// concatenate a's value and b's value
for (b_key, b_val) in b { // takes O(m)
let a_val = a.get(&b_key); // takes O(lg(n))
// below takes O(lg(n))
if let Some(a_val) = a_val {
// concatenation occurs here
a.insert(b_key, format!("{a_val}{b_val}"));
} else {
a.insert(b_key, b_val);
}
}Doing this however leads to a time complexity of O(m(2lg(n))), where m is the number of (K, V) pairs in other BTreeMap and n is the number of (K, V) pairs in self BTreeMap. This would be a suboptimal approach to an append_with() equivalent.
We can also consider alternate impls of FnMut for the solution sketch I made like:
impl FnMut(K, &mut V, V)as suggested by Simonas in hereimpl FnMut(V, V) -> Vimpl FnMut(&mut V, V) -> V
The last two impl FnMut provides no context to the user on what the conflicting key is and only gives them access to self's and other's V. In some cases, I think having context about the conflicting key may be desirable and would warrant different behaviors on what V should be for that key, so I would not opt for this callback. As for impl FnMut(K, &mut V, V) vs. impl FnMut(&K, V, V) -> V, I mentioned this in my solution implementation PR:
I think having the
conflictfunction ownself's value and returning an owned value might give a lot more flexibility to the user. It also feels a bit more intuitive to me as the user that a value of typeVis being put intoselfwhen there's conflicting keys betweenselfandother.
For example, if my values are
Stringtypes and I want to combine my values when keys are conflicting, then they can either mutateselfs value with.push_str()passing inother's value and returnself's value, or they can useformat!()to return a concatenation ofself's andother's value.
Now, a consequence for this approach is that it's very well possible that the user decides to return a value
Vthat has nothing to do withselforotherwhen it comes to a conflicting key. However, should we decide to doconflict: impl FnMut(&K, &mut V, V), I could have mutatedself's value easily to produce a whole different value that has nothing to do with the original value ofselforother.
Links and related work
BTreeMap::append_withthat allows custom value merges rust#147700- https://internals.rust-lang.org/t/btreemap-methods-insert-append-to-combine-values-instead-of-overwriting/21353/4
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.