[BUG] Dedup aggregation pushdown nullifies renamed non-dedup fields via top_hits response mapping

## Query Information

**PPL Command/Query:**
```ppl
source=bounty-numbers | rename value as val | dedup category | fields category, val
```

**Expected Result:**
`val` should retain original `value` data after rename + dedup:
```json
{
  "datarows": [["A", 10.5], ["B", 100.0], ["C", 50.0]]
}
```

**Actual Result:**
All `val` values are null. The dedup correctly filters by category but nullifies the renamed field:
```json
{
  "datarows": [["A", null], ["B", null], ["C", null]]
}
```

## Dataset Information

**Dataset/Schema Type**
- [ ] OpenTelemetry (OTEL)
- [ ] Simple Schema for Observability (SS4O)
- [ ] Open Cybersecurity Schema Framework (OCSF)
- [x] Custom (details below)

**Index Mapping**
```json
{
  "mappings": {
    "properties": {
      "count": {"type": "integer"},
      "category": {"type": "keyword"},
      "subcategory": {"type": "keyword"},
      "value": {"type": "double"},
      "ts": {"type": "date"}
    }
  }
}
```

**Sample Data**
```json
{"count":1,"category":"A","subcategory":"X","value":10.5,"ts":"2024-01-01"}
{"count":2,"category":"A","subcategory":"Y","value":20.3,"ts":"2024-01-02"}
{"count":10,"category":"B","subcategory":"X","value":100.0,"ts":"2024-01-03"}
{"count":5,"category":"C","subcategory":"Z","value":50.0,"ts":"2024-01-04"}
```

## Bug Description

**Issue Summary:**
When a field is renamed via the `rename` command and a subsequent `dedup` command operates on a *different* field, the renamed field returns null values. More broadly, ANY field whose project name differs from the original index field name (via `rename` or `eval` column reference) becomes null after dedup aggregation pushdown. The dedup correctly deduplicates by the specified field, but all non-dedup fields with mismatched names produce null.

This bug specifically affects the **aggregation-based `DedupPushdownRule`** path. The collapse-based pushdown path was fixed by PR #4670 (which resolved #4563 and #4664), but the aggregation path was not addressed.

**Steps to Reproduce:**

1. Create the index:
```bash
curl -XPUT 'http://localhost:9200/bounty-numbers' -H 'Content-Type: application/json' -d '{
  "mappings": {
    "properties": {
      "count": {"type": "integer"},
      "category": {"type": "keyword"},
      "subcategory": {"type": "keyword"},
      "value": {"type": "double"},
      "ts": {"type": "date"}
    }
  }
}'
```

2. Insert sample data:
```bash
curl -XPOST 'http://localhost:9200/bounty-numbers/_bulk' -H 'Content-Type: application/x-ndjson' -d '
{"index":{}}
{"count":1,"category":"A","subcategory":"X","value":10.5,"ts":"2024-01-01"}
{"index":{}}
{"count":2,"category":"A","subcategory":"Y","value":20.3,"ts":"2024-01-02"}
{"index":{}}
{"count":10,"category":"B","subcategory":"X","value":100.0,"ts":"2024-01-03"}
{"index":{}}
{"count":5,"category":"C","subcategory":"Z","value":50.0,"ts":"2024-01-04"}
'
```

3. Run the PPL query:
```bash
curl -XPOST 'http://localhost:9200/_plugins/_ppl' -H 'Content-Type: application/json' \
  -d '{"query": "source=bounty-numbers | rename value as val | dedup category | fields category, val"}'
```

4. Observe that `val` is null in all rows while `category` is correct.

**Differential Testing Results:**

| Query | Result |
|-------|--------|
| `rename value as val \| fields category, val` | val has correct values (works) |
| `dedup category \| fields category, value` | value has correct values (works) |
| `rename value as val \| sort val \| fields category, val` | val correct and sorted (works) |
| `rename value as val \| dedup category \| fields category, val` | val = **null** (BUG) |
| `rename category as cat \| dedup cat \| fields cat, value` | cat and value both correct (works) |
| `eval newval = value + 1 \| dedup category \| fields category, newval` | newval correct (works -- script expr) |
| `rename value as val, category as cat \| dedup cat \| fields cat, val` | cat correct, val = **null** (BUG) |
| `rename name as employee, salary as pay \| dedup dept_id` | employee=**null**, pay=**null** (BUG -- multiple renames) |
| `eval pay2 = salary \| rename salary as pay \| dedup dept_id` | pay=**null**, pay2=**null** (BUG -- eval column ref also null) |
| `eval extra = 42 \| rename name as employee \| dedup dept_id` | employee=**null**, extra=42, salary=correct (eval constant survives) |
| `rename value as val \| eval doubled = val * 2 \| dedup category \| fields category, val, doubled` | val=**null**, doubled=correct (eval of renamed field survives as RexCall) |

**Pattern summary:**
- Renamed fields -> **null** after dedup (unless dedup is on that renamed field itself)
- Eval'd column references (e.g., `eval pay2 = salary`) -> **null** after dedup when resolved as `RexInputRef`
- Eval'd constants (e.g., `eval extra = 42`) -> **survive** (added as `ScriptField` with correct name)
- Eval'd expressions (e.g., `eval newval = value + 1`) -> **survive** (added as `ScriptField` with correct name)
- Unrenamed original fields -> **survive** (project name == index name)
- The dedup key field itself -> **always survives** (becomes composite bucket key with correct name)

**Impact:**
This bug makes it impossible to use `rename` (or eval column references) with `dedup` on a different field in the same query pipeline when the aggregation pushdown path is taken. This is a common data transformation pattern.

## Root Cause Analysis

The bug is in the `DedupPushdownRule` aggregation pushdown path. When dedup is pushed down as a composite aggregation with `top_hits`:

1. **`DedupPushdownRule.apply()`** (`opensearch/.../planner/rules/DedupPushdownRule.java:51-127`): Creates an aggregate with `LITERAL_AGG` and calls `scan.pushDownAggregate(aggregate, targetChildProject)`. `targetChildProject` has the renamed field names (e.g., `val`).

2. **`AggregateAnalyzer.createTopHitsBuilder()`** (`opensearch/.../request/AggregateAnalyzer.java:803-868`): For the `LITERAL_AGG` case, iterates over all project args. For `RexInputRef` fields, calls `helper.inferNamedField(rex.getKey())` which resolves using the **original scan's rowType** -- gets original index field name `value`. Correctly builds `top_hits` with `fetchField("value")` / `_source.includes: ["category", "value"]`.

3. **OpenSearch response** returns `top_hits` with `_source: {"category": "A", "value": 10.5}` -- using original field names.

4. **`TopHitsParser.parse()`** (`opensearch/.../response/agg/TopHitsParser.java:123-134`): Extracts hit source as-is: `Map<String, Object>` with key `"value"` (original name).

5. **`OpenSearchResponse.handleAggregationResponse()`** (`opensearch/.../response/OpenSearchResponse.java:251-268`): Creates `ExprTupleValue` with key `"value"` from the response.

6. **`OpenSearchIndexEnumerator.current()`** (`opensearch/.../storage/scan/OpenSearchIndexEnumerator.java:84-92`): Uses `getRowType().getFieldNames()` which has the renamed names from `copyWithNewSchema(dedup.getRowType())`. Tries to resolve `"val"` from the tuple that only has `"value"` -> **returns null**.

**The disconnect:** The schema expects `val` (renamed name), but the response tuple contains `value` (original index name). No rename mapping is applied during response parsing.

**Why renaming the dedup field itself works:** The dedup field becomes the composite aggregation bucket key. The bucket key name comes from `outputFields` which uses the renamed name. So the bucket key is correctly named.

**Why `eval` expressions survive but `eval` column refs don't:** `eval newval = value + 1` creates a `RexCall` (script expression), and `createTopHitsBuilder()` adds it as a `ScriptField` with name `newval`. The response `fields` section has the correct name. However, `eval pay2 = salary` creates a `RexInputRef` (just a column reference), which is resolved to `fetchField("salary")` -- the original index name. The enumerator then tries to resolve `"pay2"` from a tuple that only has `"salary"` -> null.

## Explain API Output

```json
{
  "calcite": {
    "logical": "LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])\n  LogicalProject(category=[$1], val=[$3])\n    LogicalFilter(condition=[<=($11, 1)])\n      LogicalProject(..., val=[$3], ..., _row_number_dedup_=[ROW_NUMBER() OVER (PARTITION BY $1)])\n        LogicalFilter(condition=[IS NOT NULL($1)])\n          LogicalProject(count=[$0], category=[$1], subcategory=[$2], val=[$3], ts=[$4], ...)\n            CalciteLogicalIndexScan(table=[[OpenSearch, bounty-numbers]])\n",
    "physical": "CalciteEnumerableIndexScan(table=[[OpenSearch, bounty-numbers]], PushDownContext=[[AGGREGATION->..., LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={...\"_source\":{\"includes\":[\"category\",\"value\"],\"excludes\":[]},\"script_fields\":{}}...)])"
  }
}
```

Note: The logical plan correctly shows `val=[$3]` (renamed), but the physical plan's `_source.includes` uses `["category","value"]` (original names). The response parser does not map `value` back to `val`.

## Execute API Output

```json
{
  "schema": [
    {"name": "category", "type": "string"},
    {"name": "val", "type": "double"}
  ],
  "datarows": [["A", null], ["B", null], ["C", null]],
  "total": 3,
  "size": 3
}
```

## Environment Information

**OpenSearch Version:** OpenSearch + SQL Plugin (main branch, Calcite enabled)

## Suggested Fix

The `DedupPushdownRule` needs to account for field renames when building the response mapping. Possible approaches:

1. **Build a rename mapping in TopHitsParser**: When `LITERAL_AGG` is used for dedup, pass the project's field name mapping (original -> renamed) to the parser. The `TopHitsParser` can then rename response keys before returning.

2. **Add a post-processing project**: After the aggregation scan returns results with original field names, insert a `LogicalProject` that renames fields to match the expected schema.

3. **Use fetchField with alias support**: Instead of `builder.fetchField(originalName)`, map the response fields using the project's named expressions.

Approach 1 is the most targeted fix -- pass the `Map<String, String>` of `{originalName -> renamedName}` to `TopHitsParser` and apply it when constructing the result maps.

## Related Issues

- #4563 (closed) - `rename` breaks `dedup` when the dedup field itself is renamed (fixed by PR #4670 -- collapse-based pushdown)
- #4664 (closed) - `rename` breaks `where` with functions (fixed by PR #4670 -- filter pushdown rowType issue)
- PR #4670 fix only addressed the collapse-based pushdown and filter pushdown rowType issue; the **aggregation-based `DedupPushdownRule` path was not addressed**

Query	Result
`rename value as val \| fields category, val`	val has correct values (works)
`dedup category \| fields category, value`	value has correct values (works)
`rename value as val \| sort val \| fields category, val`	val correct and sorted (works)
`rename value as val \| dedup category \| fields category, val`	val = null (BUG)
`rename category as cat \| dedup cat \| fields cat, value`	cat and value both correct (works)
`eval newval = value + 1 \| dedup category \| fields category, newval`	newval correct (works -- script expr)
`rename value as val, category as cat \| dedup cat \| fields cat, val`	cat correct, val = null (BUG)
`rename name as employee, salary as pay \| dedup dept_id`	employee=null, pay=null (BUG -- multiple renames)
`eval pay2 = salary \| rename salary as pay \| dedup dept_id`	pay=null, pay2=null (BUG -- eval column ref also null)
`eval extra = 42 \| rename name as employee \| dedup dept_id`	employee=null, extra=42, salary=correct (eval constant survives)
`rename value as val \| eval doubled = val * 2 \| dedup category \| fields category, val, doubled`	val=null, doubled=correct (eval of renamed field survives as RexCall)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Dedup aggregation pushdown nullifies renamed non-dedup fields via top_hits response mapping #5150

Query Information

Dataset Information

Bug Description

Root Cause Analysis

Explain API Output

Execute API Output

Environment Information

Suggested Fix

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Dedup aggregation pushdown nullifies renamed non-dedup fields via top_hits response mapping #5150

Description

Query Information

Dataset Information

Bug Description

Root Cause Analysis

Explain API Output

Execute API Output

Environment Information

Suggested Fix

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions