Deep match amb child memoization #2310

PieterOlivier · 2025-06-25T10:35:13Z

This PR implements amb child node memoization for deep (descendent) matches.

This is needed because trees with error nodes often have nested ambiguity nodes with a lot of identical nodes.
By using memoization during deep matches, identitical nodes are only visited once.

jurgenvinju · 2025-06-26T10:58:22Z

src/org/rascalmpl/library/lang/rascal/tests/concrete/recovery/ErrorTreeSemanticsTests.rsc

+    // There will only be 3 matches if deep matches are memoized, 6 if they are not.
+    return count == 3;
+}
+


pls add a test where we match on the amb node itself (0 | it + 1 | /amb(alts) <- ambTree), because it's a corner case.

jurgenvinju · 2025-06-26T11:00:42Z

src/org/rascalmpl/values/iterators/DescendantReader.java

+	private void pushAmb(ITree amb) {
+		if (debug) System.err.println("pushAmb: " + amb);
+		spine.push(amb);
+		for (IValue alt : TreeAdapter.getAlternatives(amb)) {


why not hash on the entire amb cluster? then there are 50% (or more) fewer nodes to cache.

jurgenvinju · 2025-06-26T11:02:03Z

src/org/rascalmpl/values/iterators/DescendantReader.java

@@ -123,7 +147,12 @@ private void pushConcreteSyntaxNode(ITree tree){
 		if (TreeAdapter.isAmb(tree)) {
 			// only recurse
 			for (IValue alt : TreeAdapter.getAlternatives(tree)) {
-				pushConcreteSyntaxNode((ITree) alt);
+				if (!visitedAmbChildren.contains(alt)) {


same here, we could factor "visitedAmbs.contains(tree)" for the entire cluster and not recurse into the children if we did this already before.

jurgenvinju · 2025-06-26T11:02:23Z

src/org/rascalmpl/values/iterators/DescendantReader.java

@@ -35,6 +36,7 @@
 public class DescendantReader implements Iterator<IValue> {

 	Stack<Object> spine = new Stack<Object>();
+	private Set.Transient<IValue> visitedAmbChildren = Set.Transient.of();


Why whould a normal HashSet not be enough? We're not sharing these sets, we're not doing any immutable updates on them?

The hash set has a huge footprint of empty array cells compared to this trieset. We have a very very sparse set here so the trieset is the optimal datastructure.

Also this is a mutable trieset.

I get this is the transient set, I was just wondering why it's used (I see the value when it's used to in the end make an immutable version).

If only for the empty case, we could special case it to start with null, and initialize it the first time we encounter a amb node?

Anyway, I was just wondering why in this specific case this set was used, while the interpreter is mostly full of standard jdk containers. (Or vallang ofcourse)

It's a simply conscious choice for the most optimal hash set implementation; we need a hash set but we don't need to pay for all the zeroes. The empty hash trie set is already one or more orders of magnitude (10 times to 100 times) smaller than then it's array-based counterpart.

So the trick with a null table is not even necessary if we use this transient set. The chances are very big that we encounter only one amb node, which is when the trieset outshines the table implementation.

jurgenvinju

I think this works, but it could be optimized by caching the entire cluster instead of each individual alternative within the cluster. Wdyt @PieterOlivier ?

PieterOlivier · 2025-06-27T07:02:41Z

I remember seeing cases where multiple ambiguity clusters had children that where mostly identical but not all of them. In that case caching the children allows more memoization. I do not know how common this is.

Maybe we should focus on what is most natural for the user? The "cache the ambiguity cluster itself" solution probably wins here.

I will make some measurements with the current solution and then implement memoization at the amb node level and measure again.

PieterOlivier added 5 commits June 24, 2025 13:55

Implemented deep match amb node memoization

56f109b

Added test for concrete syntax deep match amb node memoization

04caba0

Removed whitespace

319a1ce

Replaced HashSet with capsule Set.Transient.

a21b3f8

Merge branch 'feat/error-tree-support' into deep-match-amb-memoization

80d9cbc

PieterOlivier mentioned this pull request Jun 25, 2025

WIP: Rascal support for working with error trees #2304

Open

jurgenvinju reviewed Jun 26, 2025

View reviewed changes

jurgenvinju requested changes Jun 26, 2025

View reviewed changes

Merge branch 'feat/error-tree-support' into deep-match-amb-memoization

2407fc5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deep match amb child memoization #2310

Deep match amb child memoization #2310

Uh oh!

PieterOlivier commented Jun 25, 2025

Uh oh!

jurgenvinju Jun 26, 2025

Uh oh!

jurgenvinju Jun 26, 2025

Uh oh!

jurgenvinju Jun 26, 2025

Uh oh!

jurgenvinju Jun 26, 2025

Uh oh!

DavyLandman Jun 26, 2025

Uh oh!

jurgenvinju Jun 26, 2025

Uh oh!

jurgenvinju Jun 26, 2025

Uh oh!

DavyLandman Jun 26, 2025

Uh oh!

jurgenvinju Jun 27, 2025

Uh oh!

jurgenvinju Jun 27, 2025

Uh oh!

jurgenvinju left a comment

Uh oh!

PieterOlivier commented Jun 27, 2025

Uh oh!

Uh oh!

Deep match amb child memoization #2310

Are you sure you want to change the base?

Deep match amb child memoization #2310

Uh oh!

Conversation

PieterOlivier commented Jun 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jurgenvinju left a comment

Choose a reason for hiding this comment

Uh oh!

PieterOlivier commented Jun 27, 2025

Uh oh!

Uh oh!