Skip to content

[BugFix] Fix PushDownContext shallow copy bug#5199

Open
songkant-aws wants to merge 8 commits intoopensearch-project:mainfrom
songkant-aws:pushdown-action-shallow-copy-bugfix
Open

[BugFix] Fix PushDownContext shallow copy bug#5199
songkant-aws wants to merge 8 commits intoopensearch-project:mainfrom
songkant-aws:pushdown-action-shallow-copy-bugfix

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Mar 4, 2026

Description

PushdownContext.clone() / cloneWithoutSort() / cloneForAggregate() replays add(operation) to construct new context. Aggregation operations reuses the same AggPushDownAction instance in both old and new context due to shallow copy side effect happened during cloning PushdownContext. The following pushdown actions from other equivalent RelSubsets will modify this shared mutable action object, which could contaminate previous context's agg states.

Related Issues

Resolves #5125

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit 05a2ed2)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Introduce immutable AggSpec and lazy build to fix shallow copy bug

Relevant files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/scan/CalciteIndexScanCostTest.java

Sub-PR theme: Refactor AggPushDownAction copy helpers and histogram builder options

Relevant files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java

⚡ Recommended focus areas for review

Null Dereference

In pushDownSortAggregateMeasure (CalciteLogicalIndexScan.java), pushDownContext.getAggSpec() is checked for null before calling withSortMeasure. However, in AggSpec.withSortMeasure, measureSortTarget can be null (it is @Nullable), and the method throws PushDownUnSupportedException when it is null. This is expected behavior, but in AggSpec.withBucketSort, if bucketNames does not contain the bucket name from the collation, it throws PushDownUnSupportedException. The concern is that withoutBucketSort resets bucketNames to initialBucketNames, but if withBucketSort is called with a name not in initialBucketNames, the error message says "Cannot pushdown sort into aggregation bucket" without indicating which name was missing, making debugging harder. This is minor but worth noting.

public AggSpec withBucketSort(List<RelFieldCollation> collations, List<String> fieldNames) {
  if (kind != AggKind.COMPOSITE && kind != AggKind.TERMS) {
    throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
        "Cannot pushdown sort into aggregation bucket");
  }
  List<String> newBucketNames = bucketNames;
  if (kind == AggKind.COMPOSITE) {
    List<String> reordered = new ArrayList<>(bucketNames.size());
    List<String> selected = new ArrayList<>(collations.size());
    for (RelFieldCollation collation : collations) {
      String bucketName = fieldNames.get(collation.getFieldIndex());
      if (!bucketNames.contains(bucketName)) {
        throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
            "Cannot pushdown sort into aggregation bucket");
      }
      reordered.add(bucketName);
      selected.add(bucketName);
    }
    for (String name : bucketNames) {
      if (!selected.contains(name)) {
        reordered.add(name);
      }
    }
    newBucketNames = reordered;
  }
Bucket Mutation

In pushDownSortIntoCompositeBucket, the code calls bucket.order(order) and bucket.missingOrder(missingOrder) directly on the existing CompositeValuesSourceBuilder instances from compositeAggBuilder.sources() before adding them to newBuckets. These mutations affect the original bucket objects that are shared with the existing CompositeAggregationBuilder. Since AggSpec.build() re-runs AggregateAnalyzer.analyze to get a fresh builder, this is safe in the new design. However, if the same AggPushDownAction instance is reused across multiple build() calls (which it is not, since build() creates a new AggPushDownAction each time), this would be a problem. The current design appears safe, but the mutation of source objects is fragile.

List<CompositeValuesSourceBuilder<?>> buckets = compositeAggBuilder.sources();
List<CompositeValuesSourceBuilder<?>> newBuckets = new ArrayList<>(buckets.size());
List<String> newBucketNames = new ArrayList<>(buckets.size());
List<String> selected = new ArrayList<>(collations.size());

// Have to put the collation required buckets first, then the rest of buckets.
collations.forEach(
    collation -> {
      /*
       Must find the bucket by field name because:
         1. The sequence of buckets may have changed after sort push-down.
         2. The schema of scan operator may be inconsistent with the sequence of buckets
         after project push-down.
      */
      String bucketName = fieldNames.get(collation.getFieldIndex());
      CompositeValuesSourceBuilder<?> bucket = buckets.get(bucketNames.indexOf(bucketName));
      RelFieldCollation.Direction direction = collation.getDirection();
      RelFieldCollation.NullDirection nullDirection = collation.nullDirection;
      SortOrder order =
          RelFieldCollation.Direction.DESCENDING.equals(direction)
              ? SortOrder.DESC
              : SortOrder.ASC;
      if (bucket.missingBucket()) {
        MissingOrder missingOrder =
            switch (nullDirection) {
              case FIRST -> MissingOrder.FIRST;
              case LAST -> MissingOrder.LAST;
              default -> MissingOrder.DEFAULT;
            };
        bucket.missingOrder(missingOrder);
      }
      newBuckets.add(bucket.order(order));
      newBucketNames.add(bucketName);
      selected.add(bucketName);
    });

buckets.stream()
    .map(CompositeValuesSourceBuilder::name)
    .filter(name -> !selected.contains(name))
    .forEach(
        name -> {
          newBuckets.add(buckets.get(bucketNames.indexOf(name)));
          newBucketNames.add(name);
        });

AggregationBuilder finalBuilder =
    AggregationBuilders.composite(compositeAggBuilder.getName(), newBuckets)
        .subAggregations(copySubAggregations(compositeAggBuilder))
        .size(compositeAggBuilder.size());
replaceRootBuilder(original, finalBuilder);
bucketNames = newBucketNames;
Redundant Build Call

In AggSpec.create, a temporary AggPushDownAction is constructed solely to call getScriptCount(). This re-analyzes the aggregation just to count scripts, which is wasteful since builderAndParser is already available. The AggPushDownAction constructor is called with the full builder just to extract scriptCount, and the action object is discarded. This could be simplified by computing scriptCount directly from the builder without constructing a full AggPushDownAction.

new AggPushDownAction(builderAndParser, extendedTypeMapping, bucketNames).getScriptCount(),

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Code Suggestions ✨

Latest suggestions up to 05a2ed2

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent duplicate bucket names in sort pushdown

The withBucketSort method adds bucketName to both reordered and selected without
checking for duplicates. If the same field appears in multiple collations, it will
be added to reordered multiple times, leading to duplicate bucket names and
potentially incorrect query generation. Add a duplicate check before adding to
reordered.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java [192-215]

 public AggSpec withBucketSort(List<RelFieldCollation> collations, List<String> fieldNames) {
     if (kind != AggKind.COMPOSITE && kind != AggKind.TERMS) {
       throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
           "Cannot pushdown sort into aggregation bucket");
     }
     List<String> newBucketNames = bucketNames;
     if (kind == AggKind.COMPOSITE) {
       List<String> reordered = new ArrayList<>(bucketNames.size());
       List<String> selected = new ArrayList<>(collations.size());
       for (RelFieldCollation collation : collations) {
         String bucketName = fieldNames.get(collation.getFieldIndex());
         if (!bucketNames.contains(bucketName)) {
           throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
               "Cannot pushdown sort into aggregation bucket");
         }
-        reordered.add(bucketName);
-        selected.add(bucketName);
+        if (!selected.contains(bucketName)) {
+          reordered.add(bucketName);
+          selected.add(bucketName);
+        }
       }
Suggestion importance[1-10]: 5

__

Why: The suggestion is logically valid - duplicate collation fields could cause duplicate entries in reordered. However, in practice, SQL/PPL query planners typically don't generate duplicate collation fields, making this an edge case. The fix is straightforward and correct.

Low
General
Preserve root cause in exception chaining

In the second catch block, the variable ignored shadows the outer ignored variable
and then immediately throws a new exception, discarding the original cause. The
original exception should be chained to preserve the root cause for debugging. Pass
the caught exception as the cause to PushDownUnSupportedException.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java [288-305]

 private static void copyDateHistogramInterval(
     DateHistogramValuesSourceBuilder source,
     Consumer<DateHistogramInterval> fixedIntervalSetter,
     Consumer<DateHistogramInterval> calendarIntervalSetter) {
   try {
     fixedIntervalSetter.accept(source.getIntervalAsFixed());
     return;
   } catch (IllegalArgumentException | IllegalStateException ignored) {
     // Fallback to calendar interval.
   }
   try {
     calendarIntervalSetter.accept(source.getIntervalAsCalendar());
     return;
-  } catch (IllegalArgumentException | IllegalStateException ignored) {
+  } catch (IllegalArgumentException | IllegalStateException cause) {
     throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
-        "Cannot copy interval for date histogram bucket " + source.name());
+        "Cannot copy interval for date histogram bucket " + source.name(), cause);
   }
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that the caught exception in the second catch block is discarded, losing the root cause. The PushDownUnSupportedException(String, Throwable) constructor was added in this PR specifically to support this pattern, making the fix directly applicable and improving debuggability.

Low
Avoid unexpected exception in feasibility check method

The canPushDownLimitIntoBucketSize method throws an exception for UNSUPPORTED mode,
but callers may reasonably expect a boolean return to check feasibility without
catching exceptions. The UNSUPPORTED case should return false instead of throwing,
and the exception should only be thrown in withLimit where the actual push-down is
attempted.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java [181-190]

 public boolean canPushDownLimitIntoBucketSize(int size) {
     return switch (limitPushdownMode) {
       case BUCKET_SIZE -> bucketSize != null && size < bucketSize;
       case LEAF_METRIC -> true;
-      case ESTIMATE_ONLY -> false;
-      case UNSUPPORTED ->
-          throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
-              "Cannot pushdown limit into aggregation bucket");
+      case ESTIMATE_ONLY, UNSUPPORTED -> false;
     };
   }
Suggestion importance[1-10]: 4

__

Why: The suggestion has merit - a method named canPushDownLimitIntoBucketSize implies a boolean check, and throwing from it is surprising. However, the current callers in pushDownLimit already handle this via try-catch, so the practical impact is limited. The withLimit method also throws for UNSUPPORTED, so the behavior is somewhat consistent.

Low
Strengthen test assertions for sort correctness

The test verifies that consecutive sorts honor the latest sort direction (sort
gender | sort - gender), expecting descending order. However, the test only checks
the order of rows but doesn't verify that the final sort is indeed descending (M
before F). Consider also asserting that the count values are correct per gender to
make the test more robust and meaningful.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5125.yml [66]

+- match: { total: 2 }
+- match: { schema: [ { name: c, type: bigint }, { name: gender, type: string } ] }
 - match: { datarows: [ [ 4, "M" ], [ 3, "F" ] ] }
+- gt: { datarows.0.0: 0 }
Suggestion importance[1-10]: 2

__

Why: The test already verifies the exact datarows content including both count values and gender ordering ([ [ 4, "M" ], [ 3, "F" ] ]), which implicitly confirms descending sort order and correct counts. The improved_code adds a redundant gt assertion that doesn't meaningfully improve coverage, and the suggestion mischaracterizes what the existing assertions already validate.

Low

Previous suggestions

Suggestions up to commit af6c695
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix unreachable throw inside catch block

The throw inside the catch block will never be reached because the exception is
caught and then immediately re-thrown as a different exception type, but the logic
is actually correct — however, the return after calendarIntervalSetter.accept(...)
is unreachable since the throw follows it. More critically, the throw is inside the
catch block for the calendar interval, meaning a failure to get the calendar
interval will throw PushDownUnSupportedException, which is correct, but the return
statement before the closing brace of the try is unreachable. This is a logic error
in AggPushDownAction.copyDateHistogramInterval — the throw should be outside both
catch blocks.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java [298-304]

 try {
   calendarIntervalSetter.accept(source.getIntervalAsCalendar());
   return;
 } catch (IllegalArgumentException | IllegalStateException ignored) {
-  throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
-      "Cannot copy interval for date histogram bucket " + source.name());
+  // Both fixed and calendar interval failed.
 }
+throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
+    "Cannot copy interval for date histogram bucket " + source.name());
Suggestion importance[1-10]: 7

__

Why: The throw inside the catch block for the calendar interval is indeed inside the catch, meaning it correctly handles the failure case. However, the return before the closing brace of the inner try is unreachable after the throw. The suggestion correctly identifies that the throw should be outside both catch blocks for clarity and correctness, though the current code does functionally throw when both intervals fail.

Medium
General
Ensure plugin setting reset on teardown failure

The teardown disables the Calcite plugin after deleting the index, but if the index
deletion fails, the plugin setting may not be reset. Consider placing the settings
reset before the index deletion to ensure the plugin is always disabled regardless
of index deletion outcome.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5125.yml [38-47]

 teardown:
-  - do:
-      indices.delete:
-        index: issue5125
-        ignore_unavailable: true
   - do:
       query.settings:
         body:
           transient:
             plugins.calcite.enabled: false
+  - do:
+      indices.delete:
+        index: issue5125
+        ignore_unavailable: true
Suggestion importance[1-10]: 4

__

Why: The suggestion to reorder teardown steps so plugins.calcite.enabled is reset before index deletion is a reasonable defensive practice, but the impact is low since ignore_unavailable: true already handles index deletion failures gracefully, and the plugin setting reset is unlikely to be critical in test teardown scenarios.

Low
Cache aggregation build result to avoid repeated recomputation

aggSpec.build() is called every time createRequestBuilder() is invoked, which
re-runs AggregateAnalyzer.analyze and all push-down operations on each call. If
createRequestBuilder() is called multiple times (e.g., during planning), this could
produce inconsistent results or be expensive. Consider caching the result of
aggSpec.build() or ensuring createRequestBuilder() is only called once.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java [213-216]

 if (aggSpec != null) {
-  newRequestBuilder.pushDownAggregation(aggSpec.build());
+  Pair<List<AggregationBuilder>, OpenSearchAggregationResponseParser> built = aggSpec.build();
+  newRequestBuilder.pushDownAggregation(built);
   newRequestBuilder.pushTypeMapping(aggSpec.getExtendedTypeMapping());
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about repeated calls to aggSpec.build(), but the improved_code only stores the result in a local variable within the same method call — it doesn't actually cache across multiple createRequestBuilder() invocations. The improvement is minimal and doesn't address the root concern.

Low
Avoid unnecessary object instantiation for script count

In AggSpec.create, a new AggPushDownAction is instantiated solely to call
getScriptCount(). This is wasteful and couples AggSpec to AggPushDownAction's
internal state. The script count should be computed directly from the
builderAndParser using a static utility method or inline logic, avoiding the
unnecessary object creation.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java [160]

-new AggPushDownAction(builderAndParser, extendedTypeMapping, bucketNames).getScriptCount(),
+AggPushDownAction.getScriptCount(builderAndParser),
Suggestion importance[1-10]: 2

__

Why: The suggestion proposes calling a static getScriptCount method on AggPushDownAction, but no such static method exists in the PR diff — getScriptCount is an instance method. The improved_code references a non-existent API, making this suggestion incorrect as written.

Low
Suggestions up to commit 11db491
CategorySuggestion                                                                                                                                    Impact
General
Preserve root cause in exception re-throw

In the second catch block, the ignored variable shadows the outer ignored variable
from the first catch block. While this compiles, it is confusing and may hide the
actual exception. The inner catch should re-throw the PushDownUnSupportedException
with the caught exception as the cause so the root cause is not lost.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java [287-304]

 private static void copyDateHistogramInterval(
     DateHistogramValuesSourceBuilder source,
     Consumer<DateHistogramInterval> fixedIntervalSetter,
     Consumer<DateHistogramInterval> calendarIntervalSetter) {
   try {
     fixedIntervalSetter.accept(source.getIntervalAsFixed());
     return;
   } catch (IllegalArgumentException | IllegalStateException ignored) {
     // Fallback to calendar interval.
   }
   try {
     calendarIntervalSetter.accept(source.getIntervalAsCalendar());
-    return;
-  } catch (IllegalArgumentException | IllegalStateException ignored) {
+  } catch (IllegalArgumentException | IllegalStateException e) {
     throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
-        "Cannot copy interval for date histogram bucket " + source.name());
+        "Cannot copy interval for date histogram bucket " + source.name(), e);
   }
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that the caught exception in the second catch block is discarded, losing the root cause. Preserving the cause via PushDownUnSupportedException(message, e) improves debuggability, and the new constructor was added in this PR. The variable shadowing concern is also valid.

Low
Reset stale sort collations after rare-top rewrite

When withRareTop is called on an AggSpec that already has kind == AggKind.RARE_TOP
(e.g., if called twice), the guard kind != AggKind.COMPOSITE will throw an exception
even though the transition might be valid. Additionally, the bucketSortCollations
from the current spec are passed through, but after a RARE_TOP rewrite the composite
bucket structure changes, making the previous sort collations potentially invalid.
Consider resetting bucketSortCollations and bucketSortFieldNames to null in the
returned spec.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java [306-333]

 public AggSpec withRareTop(RareTopDigest digest) {
   if (kind != AggKind.COMPOSITE || !rareTopSupported) {
     throw new OpenSearchRequestBuilder.PushDownUnSupportedException("Cannot pushdown " + digest);
   }
   return new AggSpec(
-      ...
+      aggregate,
+      project,
+      outputFields,
+      rowType,
+      fieldTypes,
+      cluster,
+      bucketNullable,
+      queryBucketSize,
+      extendedTypeMapping,
+      initialBucketNames,
+      bucketNames,
+      scriptCount,
       AggKind.RARE_TOP,
       LimitPushdownMode.BUCKET_SIZE,
-      ...
+      null,
+      rareTopSupported,
+      null,
+      null,
+      measureSortCollations,
+      measureSortFieldNames,
+      digest,
       digest.byList().isEmpty() ? digest.number() : DEFAULT_MAX_BUCKETS);
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern that bucketSortCollations from a composite aggregation may be stale after a RARE_TOP rewrite. However, the improved_code doesn't match the existing_code accurately (it uses ... placeholders in the existing code), making it hard to verify the exact change. The concern about resetting collations is reasonable but the impact depends on whether withRareTop can be called after a bucket sort has been applied.

Low
Possible issue
Handle missing enum case in switch expression

The LEAF_METRIC case is missing from the inferLimitPushdownMode(AggKind) switch
expression. If AggKind.LEAF_METRIC is ever passed to this method (e.g., via
withSortMeasure), the switch will throw an IllegalArgumentException at runtime
because the enum value is not handled. Add a case for LEAF_METRIC to make the switch
exhaustive.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java [429-434]

 private static LimitPushdownMode inferLimitPushdownMode(AggKind kind) {
   return switch (kind) {
     case COMPOSITE, TERMS, MULTI_TERMS, TOP_HITS, RARE_TOP -> LimitPushdownMode.BUCKET_SIZE;
     case OTHER, DATE_HISTOGRAM, HISTOGRAM -> LimitPushdownMode.UNSUPPORTED;
+    case LEAF_METRIC -> LimitPushdownMode.LEAF_METRIC;
   };
 }
Suggestion importance[1-10]: 2

__

Why: The AggKind enum does not have a LEAF_METRIC value - looking at the enum definition in the file (lines 41-50), the values are OTHER, COMPOSITE, TERMS, MULTI_TERMS, DATE_HISTOGRAM, HISTOGRAM, TOP_HITS, and RARE_TOP. The switch is already exhaustive over all defined enum values, so the suggestion's premise is incorrect.

Low
Suggestions up to commit 84892bc
Suggestions up to commit f36c20a
CategorySuggestion                                                                                                                                    Impact
General
Fix unreachable return after calendar interval setter

The throw inside the catch block will never be reached because the exception is
caught and then the throw is inside the same catch block — the throw is actually the
fallback when the calendar interval also fails, but it's placed inside the catch for
the calendar interval, which means it only executes when getIntervalAsCalendar()
throws. This logic is correct, but the return statement after
calendarIntervalSetter.accept(...) is unreachable if an exception is thrown.
However, the real issue is that if getIntervalAsCalendar() succeeds but then
calendarIntervalSetter.accept() throws, the exception will be swallowed and
re-thrown as PushDownUnSupportedException. The return after the setter call should
be outside the catch block to avoid this confusion. Restructure so the throw only
happens when both intervals fail.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java [297-303]

 try {
   calendarIntervalSetter.accept(source.getIntervalAsCalendar());
-  return;
 } catch (IllegalArgumentException | IllegalStateException ignored) {
   throw new OpenSearchRequestBuilder.PushDownUnSupportedException(
       "Cannot copy interval for date histogram bucket " + source.name());
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that the return statement inside the try block after calendarIntervalSetter.accept() is redundant/confusing, and the improved code removes it. However, the logic is functionally equivalent since the return after the setter is only reached if no exception is thrown, and the catch block only executes on exception. The improved code is slightly cleaner but the original is not actually broken.

Low
Avoid repeated aggregation rebuild on each request creation

The aggSpec.build() call in createRequestBuilder() re-runs AggregateAnalyzer.analyze
and replays all push-down operations every time a request builder is created. If
createRequestBuilder() is called multiple times (e.g., for retries or pagination),
this could cause performance issues or subtle bugs if the analysis is
non-deterministic. Consider caching the result of aggSpec.build() or ensuring it is
only called once.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java [213-216]

 if (aggSpec != null) {
-  newRequestBuilder.pushDownAggregation(aggSpec.build());
+  Pair<List<AggregationBuilder>, OpenSearchAggregationResponseParser> builtAgg = aggSpec.build();
+  newRequestBuilder.pushDownAggregation(builtAgg);
   newRequestBuilder.pushTypeMapping(aggSpec.getExtendedTypeMapping());
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about aggSpec.build() being called multiple times, potentially causing performance issues. However, the improved_code only extracts the result to a local variable within the same method call, which doesn't actually cache it across multiple createRequestBuilder() invocations, making the fix incomplete for the stated concern.

Low
Ensure plugin setting reset runs first in teardown

The teardown disables the Calcite plugin after deleting the index, but if the index
deletion fails, the plugin setting may not be reset. Consider placing the settings
reset before the index deletion to ensure the plugin is always disabled regardless
of index deletion success.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5125.yml [38-47]

 teardown:
-  - do:
-      indices.delete:
-        index: issue5125
-        ignore_unavailable: true
   - do:
       query.settings:
         body:
           transient:
             plugins.calcite.enabled: false
+  - do:
+      indices.delete:
+        index: issue5125
+        ignore_unavailable: true
Suggestion importance[1-10]: 4

__

Why: The suggestion has merit as resetting the plugins.calcite.enabled setting before deleting the index ensures cleanup happens regardless of index deletion success. However, since ignore_unavailable: true is already set on the index deletion, failures are already handled gracefully, making this a minor ordering improvement rather than a critical fix.

Low
Document ordering constraint for withoutBucketSort usage

In withoutBucketSort(), the bucketNames field is reset to initialBucketNames, which
correctly reverts the bucket ordering. However, measureSortCollations and
measureSortFieldNames are preserved in the new AggSpec. If a measure sort was
applied after a bucket sort, removing the bucket sort should not affect the measure
sort, so this is fine. But the kind field is not reset — if withSortMeasure was
called and changed kind to measureSortTarget, calling withoutBucketSort afterward
would retain the changed kind. Verify that withoutBucketSort is only called before
withSortMeasure in the planning pipeline, or reset kind appropriately.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java [223-249]

 public AggSpec withoutBucketSort() {
   if (bucketSortCollations == null) {
     return this;
   }
+  // Note: kind is not reset here; withoutBucketSort should only be called
+  // before withSortMeasure to avoid stale kind state.
   return new AggSpec(
-      ...
+      aggregate,
+      project,
+      outputFields,
+      rowType,
+      fieldTypes,
+      cluster,
+      bucketNullable,
+      queryBucketSize,
+      extendedTypeMapping,
       initialBucketNames,
       initialBucketNames,
-      ...
+      scriptCount,
+      kind,
+      measureSortTarget,
+      rareTopSupported,
+      null,
+      null,
+      measureSortCollations,
+      measureSortFieldNames,
+      rareTopDigest,
+      bucketSize);
+}
Suggestion importance[1-10]: 2

__

Why: The suggestion only adds a comment to the existing code without changing any logic, and the improved_code is essentially the same as the existing_code with a comment added. This is a documentation-only suggestion with minimal impact.

Low
Suggestions up to commit 7df5f94
CategorySuggestion                                                                                                                                    Impact
General
Clear bucket sort when transitioning to rare-top aggregation

In withRareTop, the new AggSpec is constructed with kind set to AggKind.RARE_TOP and
measureSortTarget set to null. However, bucketSortCollations and
bucketSortFieldNames are preserved from the current spec. If a bucket sort was
previously applied, it would be carried over into the rare-top spec and re-applied
during build(). This may produce incorrect query behavior since rare-top
aggregations don't support composite bucket sort ordering. Consider clearing
bucketSortCollations and bucketSortFieldNames when transitioning to RARE_TOP.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggSpec.java [285-311]

 public AggSpec withRareTop(RareTopDigest digest) {
   if (kind != AggKind.COMPOSITE || !rareTopSupported) {
     throw new OpenSearchRequestBuilder.PushDownUnSupportedException("Cannot pushdown " + digest);
   }
   return new AggSpec(
-      ...
+      aggregate,
+      project,
+      outputFields,
+      rowType,
+      fieldTypes,
+      cluster,
+      bucketNullable,
+      queryBucketSize,
+      extendedTypeMapping,
+      initialBucketNames,
+      bucketNames,
+      scriptCount,
+      AggKind.RARE_TOP,
+      null,
+      rareTopSupported,
+      null,  // clear bucket sort collations for rare-top
+      null,  // clear bucket sort field names for rare-top
+      measureSortCollations,
+      measureSortFieldNames,
+      digest,
       digest.byList().isEmpty() ? digest.number() : DEFAULT_MAX_BUCKETS);
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern that bucketSortCollations and bucketSortFieldNames are preserved when transitioning to RARE_TOP, which could cause incorrect behavior during build() since rare-top aggregations don't support composite bucket sort. However, the actual code in withRareTop does pass bucketSortCollations and bucketSortFieldNames through, so this is a real potential issue worth addressing.

Low
Set aggSpec before adding operations in clone

The clone() method iterates over this (which uses the queue iterator) and calls
newContext.add(operation) for each operation. The add method calls
operation.action().pushOperation(this, operation) which may have side effects on
newContext (e.g., setting isAggregatePushed, updating startFrom). However, aggSpec
is set after all operations are added, which means if any operation's pushOperation
tries to access aggSpec on newContext, it would be null. Since aggSpec is now set
separately from the AGGREGATION operation's add() call (the action is a no-op
lambda), this ordering should be safe. But it's fragile — consider setting aggSpec
before adding operations or documenting this dependency.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java [51-58]

 @Override
 public PushDownContext clone() {
   PushDownContext newContext = new PushDownContext(osIndex);
+  newContext.aggSpec = aggSpec; // Set before adding operations to avoid null access
   for (PushDownOperation operation : this) {
     newContext.add(operation);
   }
-  newContext.aggSpec = aggSpec;
   return newContext;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion to set aggSpec before iterating operations in clone() is a valid defensive improvement to avoid potential null access if any pushOperation side effect ever accesses aggSpec. However, since the AGGREGATION action is a no-op lambda in the new design, the risk is currently low, making this a minor maintainability improvement.

Low
Fix teardown order to reset settings before deleting index

The teardown disables the Calcite plugin, but if the test fails mid-execution, the
plugin setting may not be reset. Consider also adding the settings reset to the
setup block or ensuring the teardown order is correct (delete index first, then
reset settings) to avoid leaving the cluster in a bad state for subsequent tests.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5125.yml [38-47]

 teardown:
-  - do:
-      indices.delete:
-        index: issue5125
-        ignore_unavailable: true
   - do:
       query.settings:
         body:
           transient:
             plugins.calcite.enabled: false
+  - do:
+      indices.delete:
+        index: issue5125
+        ignore_unavailable: true
Suggestion importance[1-10]: 3

__

Why: The suggestion to reorder teardown steps (reset settings before deleting index) is a minor improvement for test hygiene, but the original order (delete index first, then reset settings) is also valid and commonly used. The teardown block runs regardless of test failure in YAML REST tests, so the concern about "failing mid-execution" is less critical here.

Low

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 21f8940

@LantaoJin
Copy link
Member

cc @qianheng-aws as proper reviewer

Copy link
Collaborator

@qianheng-aws qianheng-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you ever tried lazy push-down for AggregationBuilderAction? If that approach can work, we don't need to create AggregationBuilder in the planning phase, neither need to do deep copy for AggregationBuilder then.

if (builder instanceof NestedAggregationBuilder nested) {
return copyNestedAggregationBuilder(nested);
}
return builder;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is other missed or upcoming agg builder?

}
}

private static final class TermsAggregationBuilderCopy extends TermsAggregationBuilder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be replaced with

source.shallowCopy(source, copySubAggregations(source), copyMetadataOrNull(source))

instead of defining these extended class?

Signed-off-by: Songkan Tang <songkant@amazon.com>
@songkant-aws songkant-aws requested a review from ahkcs as a code owner March 16, 2026 09:08
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 9fe5192

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 7df5f94

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit f36c20a

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 84892bc

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 11db491

@github-actions
Copy link
Contributor

Persistent review updated to latest commit af6c695

@songkant-aws
Copy link
Contributor Author

@qianheng-aws Now the fix is changed to lazily push down agg request at the createRequest phase. But it introduces some intermediate immutable spec object to record any logical pushdown operations. It touches more logic now but I think it could mitigate the current problem of AggPushDownAction's mutation during planning.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@songkant-aws songkant-aws force-pushed the pushdown-action-shallow-copy-bugfix branch from af6c695 to 05a2ed2 Compare March 23, 2026 08:19
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 05a2ed2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reverse Optimization fails when creating consecutive sorts due to Calcite physical optimizer merging

3 participants