Skip to content

Conversation

@snnn
Copy link
Contributor

@snnn snnn commented Oct 8, 2025

Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error.

Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with data_location = EXTERNAL.

The Fix

Modified InferenceContextImpl::getInputData() to:

  1. Detect tensors with in-memory external data using utils::HasExternalDataInMemory()
  2. Retrieve the corresponding OrtValue
  3. Create a temporary TensorProto with embedded data (not external reference)
  4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data without rejecting it as external.

Memory Impact

This fix introduces a minor and temporary increase in memory usage during the model loading phase.

  • When: The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model.
  • What: The fix creates a temporary in-memory copy of the tensor data.
  • Duration: This temporary copy is released as soon as shape inference is complete.

The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models.

Testing

Changes

  • onnxruntime/core/graph/graph.cc:
    • Modified getInputData() method in InferenceContextImpl class
    • Added temp_tensor_protos_ member to store temporary TensorProtos during shape inference

TODO

  • Add unit tests
  • Run full test suite

snnn added 2 commits October 8, 2025 10:48
When Constant nodes have tensors larger than 127 bytes, they are converted
to OrtValues with in-memory external data for efficiency. However, ONNX
shape inference rejects TensorProtos with data_location=EXTERNAL, as it
cannot distinguish between in-memory and file-based external data.

This fix modifies InferenceContextImpl::getInputData() to detect in-memory
external data and materialize it into a temporary TensorProto with embedded
data that ONNX shape inference can process.

Fixes #26261

The issue was introduced in commit 3b97d79 (PR #25320) which converted
large initializers to OrtValues. This regression caused models with Constant
nodes having tensors just over 127 bytes to fail loading with shape
inference errors.

Changes:
- Modified getInputData() to check for in-memory external data using
  utils::HasExternalDataInMemory()
- When detected, retrieves the OrtValue and creates a temporary TensorProto
  with embedded data (use_tensor_buffer=false)
- Added temp_tensor_protos_ member to store these temporary protos so they
  outlive the shape inference call
Tests the fix for issue #26261 where Constant nodes with tensors larger
than 127 bytes fail shape inference when stored as in-memory external data.

The test:
- Creates a Constant node with a 128-byte tensor (16 INT64 values)
- Uses this constant as input to a Split node
- Verifies that shape inference succeeds and the graph resolves correctly
- Confirms that the constant was properly converted to an initializer

Before the fix, this test would fail with:
'Cannot parse data from external tensors. Please load external data
 into raw data for tensor'

With the fix, the test passes as the in-memory external data is
materialized for shape inference.
@snnn
Copy link
Contributor Author

snnn commented Oct 8, 2025

Update: Unit Test Added

I've added a comprehensive unit test in onnxruntime/test/ir/graph_test.cc:

Test: GraphTest.ShapeInferenceWithInMemoryExternalData

What it tests:

  • Creates a Constant node with a 128-byte tensor (16 INT64 values)
  • This is just over the 127-byte threshold that triggers in-memory externalization
  • Uses this constant as input to a Split node (which requires constant data for shape inference)
  • Verifies that the model loads, shape inference succeeds, and graph resolves correctly

Before the fix: Test would fail with:

Cannot parse data from external tensors. Please load external data into raw data for tensor

With the fix: Test passes ✅

Test Execution

cd build_dir
./onnxruntime_test_all --gtest_filter="GraphTest.ShapeInferenceWithInMemoryExternalData"

Result:

[  PASSED  ] 1 test.

Next Steps

The PR now includes:

  • ✅ Bug fix implementation
  • ✅ Unit test coverage
  • 🔄 Awaiting CI results and review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

- Added ShapeInferenceWithInMemoryExternalDataViaSession test that uses
  InferenceSession to more closely match the real-world scenario
- Kept the original Model::Load test for basic coverage
- Added includes for fstream, InferenceSession, and Environment
- Both tests verify that models with Constant nodes > 127 bytes load correctly

Note: The test passes even without the fix in the test environment due to
the specific optimization paths taken. The fix is fully validated with the
real-world model test (BiRefNet-COD-epoch_125.onnx) which fails without
the fix and succeeds with it.
@snnn
Copy link
Contributor Author

snnn commented Oct 8, 2025

Summary: Investigation Complete

I've completed a thorough investigation and fix for issue #26261.

What Was Done:

  1. Root Cause Analysis - Used GDB to trace the exact failure point
  2. Fix Implementation - Modified InferenceContextImpl::getInputData() to handle in-memory external data
  3. Unit Tests Added - Two test cases for regression prevention
  4. Real-World Validation - Confirmed fix works with the actual problematic model

Test Validation:

Real-World Model (BiRefNet-COD-epoch_125.onnx):

  • WITH fix: Loads successfully with ALL optimization levels
  • WITHOUT fix: Fails with Cannot parse data from external tensors error

Unit Tests:

  • GraphTest.ShapeInferenceWithInMemoryExternalData - Basic coverage via Model::Load
  • GraphTest.ShapeInferenceWithInMemoryExternalDataViaSession - Coverage via InferenceSession

Note: The unit tests pass even without the fix due to optimization paths in the test environment, but provide regression coverage for the code path. The real validation comes from the actual model test which definitively fails without the fix and succeeds with it.

Ready for Review

The PR includes:

  • ✅ Bug fix with detailed comments
  • ✅ Unit test coverage
  • ✅ Validation with real-world model
  • ✅ Documentation of root cause

All changes are minimal and surgical, focused solely on fixing the regression introduced in v1.23.0.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

The new test ShapeInferenceAfterInitializerExternalization explicitly:
1. Creates a model with a 128-byte initializer (> 127 threshold)
2. Calls ConvertInitializersIntoOrtValues() to externalize it
3. Forces a second Resolve() to trigger shape inference
4. Verifies shape inference can access the externalized data

This test definitively fails without the fix and passes with it:
- WITHOUT fix: Fails with 'Cannot parse data from external tensors'
- WITH fix: Passes as getInputData() materializes the external data

The key insight is that SetGraphResolveNeeded() must be called after
externalization to force shape inference to run again on the modified graph.
@snnn
Copy link
Contributor Author

snnn commented Oct 8, 2025

✅ Success: Unit Test Now Properly Catches the Bug!

I've successfully created a unit test that definitively fails without the fix and passes with it.

Test: GraphTest.ShapeInferenceAfterInitializerExternalization

What it does:

  1. Creates a model with a 128-byte initializer (just over the 127-byte threshold)
  2. Calls ConvertInitializersIntoOrtValues() to trigger in-memory externalization
  3. Calls SetGraphResolveNeeded() to force re-resolution
  4. Calls Resolve() again which runs shape inference on the externalized data

Results:

  • WITHOUT the fix: FAILED with exact error:

    Node (split_node) Op (Split) [ShapeInferenceError] Cannot parse data from 
    external tensors. Please load external data into raw data for tensor: split_sizes
    
  • WITH the fix: PASSED - shape inference successfully accesses the externalized data

Why This Test Works

The key insight was understanding that:

  1. ConvertInitializersIntoOrtValues() is called AFTER initial Resolve() in InferenceSession
  2. We need to call SetGraphResolveNeeded() to force shape inference to run again
  3. This recreates the exact scenario where transformers/optimizations modify the graph after externalization

This test provides robust regression coverage and will catch any similar issues in the future.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@yuslepukhin
Copy link
Member

One more thing. ORT has several implementations of InterenceContext interface. Please, check them all.

@yuslepukhin yuslepukhin dismissed their stale review October 11, 2025 01:29

Addressed

@yuslepukhin yuslepukhin marked this pull request as ready for review October 11, 2025 01:30
Copy link
Contributor

@chilo-ms chilo-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuslepukhin yuslepukhin merged commit aafdb3a into main Oct 14, 2025
92 of 93 checks passed
@yuslepukhin yuslepukhin deleted the fix-external-data-shape-inference branch October 14, 2025 18:10
apsonawane pushed a commit that referenced this pull request Oct 17, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
apsonawane pushed a commit that referenced this pull request Oct 20, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
apsonawane added a commit that referenced this pull request Oct 21, 2025
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2:

- [TensorRT] Fix DDS output bug during engine update
  - PR: #26272
  - commit id: 00e85dd
- Fix shape inference failure with in-memory external data
   - PR: #26263
   - commit id: d955476
- [CUDA] replace 90a-virtual by 90-virtual for forward compatible 
  - PR: #26230
  - commit id: b58911f
- [QNN-EP] Fix logic flow bug
  - PR: #26148
  - commit id: b282379
- Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread
partition opt
  - PR: #26103
  - commit id: 7362518
- Update qMoE spec to support block quantization
  - PR: #25641
  - commit id: 7a8ffa8
- [VitisAI] add new api to VitisAI to save graph as a string
  - PR: #25602
  - commit id: 3361d72
- [[Build] Lock torch, onnxscript and onnx-ir versions to latest]
  - PR: #26315
  - commit id: ea69c4d

---------

Co-authored-by: Hariharan Seshadri <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Yateng Hong <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: quic-calvnguy <[email protected]>
Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>
Co-authored-by: yifei410 <[email protected]>
Co-authored-by: yifei <[email protected]>
@apsonawane apsonawane added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.2 labels Oct 21, 2025
@apsonawane
Copy link
Contributor

Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag

fs-eire pushed a commit that referenced this pull request Oct 24, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
yuslepukhin added a commit that referenced this pull request Oct 30, 2025
…lues early (#26345)

### Description
Converts weights early and revert "Properly remove in-memory references
(#25652)"
This reverts commit 3ca49d8 and makes
appropriate adjustments for the current state of the code.

This PR is made possible and on the heels of:
#26263
#25833.

Previous history:
#23979
#25320
#25626
#25652

The first change (#26263)
allows us to convert initializers to OrtValues early and save lots of
memory at model loading time.

Specifically, for Phi-4-mini-instruct-INT4 model before and after looks
like this:

**Before**
<img width="1204" height="124" alt="Before change DEBUG 2025-10-16
144819"
src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6"
/>

**After**

<img width="997" height="114" alt="After change DEBUG 2025-10-16 144819"
src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f"
/>

The two peaks represent memory usage at optimization time (8.1Gb before)
and after weights memory mapping (6.5Gb)
After this change corresponding numbers look 3.5Gb and 4.7Gb
respectively.
Most of the savings during optimization phase come from
`ConstantFolding` where we are able to reuse the resulting OrtValues
directly for the new initializers.

This PR concludes a series of PRs converting initializers to OrtValues.

Memory consumption before the conversion began was 9.3Gb and 6.7Gb
respectively. We are saving almost 6Gb during optimization and 2Gb for
the steady state.
 
 
<img width="1175" height="139" alt="image"
src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04"
/>

The model also loads about 12 seconds faster.

Example of ConstantFolding being one of the top contributors where we
duplicate memory for higher peak before Resolve takes care of no longer
used initializers.
<img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding
Transpose Optimizer"
src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40"
/>

<img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from
ConstantFolding"
src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e"
/>

<img width="325" height="160" alt="image"
src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc"
/>


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Reduce memory usage.
naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request Nov 2, 2025
…6263)

## Description

Fixes microsoft#26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR microsoft#25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue microsoft#26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: ONNXRuntime 1.23 fails to load model with external data (works fine in 1.22.x)

5 participants