Fix shape inference failure with in-memory external data #26263

snnn · 2025-10-08T17:50:04Z

Description

This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error.

Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with data_location = EXTERNAL.

The Fix

Modified InferenceContextImpl::getInputData() to:

Detect tensors with in-memory external data using utils::HasExternalDataInMemory()
Retrieve the corresponding OrtValue
Create a temporary TensorProto with embedded data (not external reference)
Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data without rejecting it as external.

Memory Impact

This fix introduces a minor and temporary increase in memory usage during the model loading phase.

When: The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model.
What: The fix creates a temporary in-memory copy of the tensor data.
Duration: This temporary copy is released as soon as shape inference is complete.

The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models.

Testing

Tested with the problematic model from issue 🐛 Bug Report: ONNXRuntime 1.23 fails to load model with external data (works fine in 1.22.x) #26261
All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL)
Unit tests to be added

Changes

onnxruntime/core/graph/graph.cc:
- Modified getInputData() method in InferenceContextImpl class
- Added temp_tensor_protos_ member to store temporary TensorProtos during shape inference

TODO

Add unit tests
Run full test suite

When Constant nodes have tensors larger than 127 bytes, they are converted to OrtValues with in-memory external data for efficiency. However, ONNX shape inference rejects TensorProtos with data_location=EXTERNAL, as it cannot distinguish between in-memory and file-based external data. This fix modifies InferenceContextImpl::getInputData() to detect in-memory external data and materialize it into a temporary TensorProto with embedded data that ONNX shape inference can process. Fixes #26261 The issue was introduced in commit 3b97d79 (PR #25320) which converted large initializers to OrtValues. This regression caused models with Constant nodes having tensors just over 127 bytes to fail loading with shape inference errors. Changes: - Modified getInputData() to check for in-memory external data using utils::HasExternalDataInMemory() - When detected, retrieves the OrtValue and creates a temporary TensorProto with embedded data (use_tensor_buffer=false) - Added temp_tensor_protos_ member to store these temporary protos so they outlive the shape inference call

Tests the fix for issue #26261 where Constant nodes with tensors larger than 127 bytes fail shape inference when stored as in-memory external data. The test: - Creates a Constant node with a 128-byte tensor (16 INT64 values) - Uses this constant as input to a Split node - Verifies that shape inference succeeds and the graph resolves correctly - Confirms that the constant was properly converted to an initializer Before the fix, this test would fail with: 'Cannot parse data from external tensors. Please load external data into raw data for tensor' With the fix, the test passes as the in-memory external data is materialized for shape inference.

snnn · 2025-10-08T17:53:00Z

Update: Unit Test Added

I've added a comprehensive unit test in onnxruntime/test/ir/graph_test.cc:

Test: `GraphTest.ShapeInferenceWithInMemoryExternalData`

What it tests:

Creates a Constant node with a 128-byte tensor (16 INT64 values)
This is just over the 127-byte threshold that triggers in-memory externalization
Uses this constant as input to a Split node (which requires constant data for shape inference)
Verifies that the model loads, shape inference succeeds, and graph resolves correctly

Before the fix: Test would fail with:

Cannot parse data from external tensors. Please load external data into raw data for tensor

With the fix: Test passes ✅

Test Execution

cd build_dir
./onnxruntime_test_all --gtest_filter="GraphTest.ShapeInferenceWithInMemoryExternalData"

Result:

[  PASSED  ] 1 test.

Next Steps

The PR now includes:

✅ Bug fix implementation
✅ Unit test coverage
🔄 Awaiting CI results and review

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/graph/graph.cc

- Added ShapeInferenceWithInMemoryExternalDataViaSession test that uses InferenceSession to more closely match the real-world scenario - Kept the original Model::Load test for basic coverage - Added includes for fstream, InferenceSession, and Environment - Both tests verify that models with Constant nodes > 127 bytes load correctly Note: The test passes even without the fix in the test environment due to the specific optimization paths taken. The fix is fully validated with the real-world model test (BiRefNet-COD-epoch_125.onnx) which fails without the fix and succeeds with it.

snnn · 2025-10-08T18:09:14Z

Summary: Investigation Complete

I've completed a thorough investigation and fix for issue #26261.

What Was Done:

Root Cause Analysis - Used GDB to trace the exact failure point
Fix Implementation - Modified InferenceContextImpl::getInputData() to handle in-memory external data
Unit Tests Added - Two test cases for regression prevention
Real-World Validation - Confirmed fix works with the actual problematic model

Test Validation:

Real-World Model (BiRefNet-COD-epoch_125.onnx):

✅ WITH fix: Loads successfully with ALL optimization levels
❌ WITHOUT fix: Fails with Cannot parse data from external tensors error

Unit Tests:

GraphTest.ShapeInferenceWithInMemoryExternalData - Basic coverage via Model::Load
GraphTest.ShapeInferenceWithInMemoryExternalDataViaSession - Coverage via InferenceSession

Note: The unit tests pass even without the fix due to optimization paths in the test environment, but provide regression coverage for the code path. The real validation comes from the actual model test which definitively fails without the fix and succeeds with it.

Ready for Review

The PR includes:

✅ Bug fix with detailed comments
✅ Unit test coverage
✅ Validation with real-world model
✅ Documentation of root cause

All changes are minimal and surgical, focused solely on fixing the regression introduced in v1.23.0.

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/test/ir/graph_test.cc

The new test ShapeInferenceAfterInitializerExternalization explicitly: 1. Creates a model with a 128-byte initializer (> 127 threshold) 2. Calls ConvertInitializersIntoOrtValues() to externalize it 3. Forces a second Resolve() to trigger shape inference 4. Verifies shape inference can access the externalized data This test definitively fails without the fix and passes with it: - WITHOUT fix: Fails with 'Cannot parse data from external tensors' - WITH fix: Passes as getInputData() materializes the external data The key insight is that SetGraphResolveNeeded() must be called after externalization to force shape inference to run again on the modified graph.

snnn · 2025-10-08T18:22:43Z

✅ Success: Unit Test Now Properly Catches the Bug!

I've successfully created a unit test that definitively fails without the fix and passes with it.

Test: `GraphTest.ShapeInferenceAfterInitializerExternalization`

What it does:

Creates a model with a 128-byte initializer (just over the 127-byte threshold)
Calls ConvertInitializersIntoOrtValues() to trigger in-memory externalization
Calls SetGraphResolveNeeded() to force re-resolution
Calls Resolve() again which runs shape inference on the externalized data

Results:

❌ WITHOUT the fix: FAILED with exact error:

Node (split_node) Op (Split) [ShapeInferenceError] Cannot parse data from 
external tensors. Please load external data into raw data for tensor: split_sizes

✅ WITH the fix: PASSED - shape inference successfully accesses the externalized data

Why This Test Works

The key insight was understanding that:

ConvertInitializersIntoOrtValues() is called AFTER initial Resolve() in InferenceSession
We need to call SetGraphResolveNeeded() to force shape inference to run again
This recreates the exact scenario where transformers/optimizations modify the graph after externalization

This test provides robust regression coverage and will catch any similar issues in the future.

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/test/ir/graph_test.cc

…pe-inference

onnxruntime/core/graph/graph.cc

onnxruntime/test/ir/graph_test.cc

yuslepukhin · 2025-10-08T19:22:36Z

One more thing. ORT has several implementations of InterenceContext interface. Please, check them all.

Addressed

onnxruntime/core/graph/graph.cc

chilo-ms

LGTM

## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>

Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Yateng Hong <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: quic-calvnguy <[email protected]> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <[email protected]> Co-authored-by: yifei <[email protected]>

apsonawane · 2025-10-21T23:44:15Z

Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag

## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>

…lues early (#26345) ### Description Converts weights early and revert "Properly remove in-memory references (#25652)" This reverts commit 3ca49d8 and makes appropriate adjustments for the current state of the code. This PR is made possible and on the heels of: #26263 #25833. Previous history: #23979 #25320 #25626 #25652 The first change (#26263) allows us to convert initializers to OrtValues early and save lots of memory at model loading time. Specifically, for Phi-4-mini-instruct-INT4 model before and after looks like this: **Before** <img width="1204" height="124" alt="Before change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6" /> **After** <img width="997" height="114" alt="After change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f" /> The two peaks represent memory usage at optimization time (8.1Gb before) and after weights memory mapping (6.5Gb) After this change corresponding numbers look 3.5Gb and 4.7Gb respectively. Most of the savings during optimization phase come from `ConstantFolding` where we are able to reuse the resulting OrtValues directly for the new initializers. This PR concludes a series of PRs converting initializers to OrtValues. Memory consumption before the conversion began was 9.3Gb and 6.7Gb respectively. We are saving almost 6Gb during optimization and 2Gb for the steady state. <img width="1175" height="139" alt="image" src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04" /> The model also loads about 12 seconds faster. Example of ConstantFolding being one of the top contributors where we duplicate memory for higher peak before Resolve takes care of no longer used initializers. <img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding Transpose Optimizer" src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40" /> <img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from ConstantFolding" src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e" /> <img width="325" height="160" alt="image" src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc" /> ### Motivation and Context  Reduce memory usage.

…6263) ## Description Fixes microsoft#26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR microsoft#25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue microsoft#26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>

snnn added 2 commits October 8, 2025 10:48

snnn mentioned this pull request Oct 8, 2025

🐛 Bug Report: ONNXRuntime 1.23 fails to load model with external data (works fine in 1.22.x) #26261

Closed

github-actions bot reviewed Oct 8, 2025

View reviewed changes

onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved

github-actions bot reviewed Oct 8, 2025

View reviewed changes

onnxruntime/test/ir/graph_test.cc Outdated Show resolved Hide resolved

onnxruntime/test/ir/graph_test.cc Outdated Show resolved Hide resolved

github-actions bot reviewed Oct 8, 2025

View reviewed changes

onnxruntime/test/ir/graph_test.cc Outdated Show resolved Hide resolved

onnxruntime/test/ir/graph_test.cc Outdated Show resolved Hide resolved

onnxruntime/test/ir/graph_test.cc Outdated Show resolved Hide resolved

snnn added 2 commits October 8, 2025 11:28

format code

87c4dde

Merge remote-tracking branch 'origin/main' into fix-external-data-sha…

07c510c

…pe-inference

yuslepukhin previously requested changes Oct 8, 2025

View reviewed changes

onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved

onnxruntime/test/ir/graph_test.cc Show resolved Hide resolved

yuslepukhin added 2 commits October 10, 2025 16:45

Merge branch 'main' into fix-external-data-shape-inference

5495308

Address review comments

b737582

yuslepukhin marked this pull request as ready for review October 11, 2025 01:30

yuslepukhin requested review from adrianlizarraga, chilo-ms, edgchen1 and skottmckay October 13, 2025 18:24

yuslepukhin added the release:1.23.2 label Oct 14, 2025

snnn commented Oct 14, 2025

View reviewed changes

onnxruntime/core/graph/graph.cc Show resolved Hide resolved

chilo-ms approved these changes Oct 14, 2025

View reviewed changes

yuslepukhin merged commit aafdb3a into main Oct 14, 2025
92 of 93 checks passed

yuslepukhin deleted the fix-external-data-shape-inference branch October 14, 2025 18:10

yuslepukhin mentioned this pull request Oct 17, 2025

Save much memory at model loading time by converting weights to OrtValues early #26345

Merged

apsonawane mentioned this pull request Oct 17, 2025

ORT 1.23.2 cherrypick 1 #26347

Closed

apsonawane mentioned this pull request Oct 20, 2025

ORT 1.23.2 cherrypick 1 #26368

Merged

apsonawane added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.2 labels Oct 21, 2025

Zlendy mentioned this pull request Oct 22, 2025

[BUG] birefnet ShapeInferenceError danielgatis/rembg#790

Closed

Fix shape inference failure with in-memory external data #26263

Fix shape inference failure with in-memory external data #26263

Uh oh!

Conversation

snnn commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root Cause

The Fix

Memory Impact

Testing

Changes

TODO

Uh oh!

snnn commented Oct 8, 2025

Update: Unit Test Added

Test: GraphTest.ShapeInferenceWithInMemoryExternalData

Test Execution

Next Steps

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snnn commented Oct 8, 2025

Summary: Investigation Complete

What Was Done:

Test Validation:

Ready for Review

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

snnn commented Oct 8, 2025

✅ Success: Unit Test Now Properly Catches the Bug!

Test: GraphTest.ShapeInferenceAfterInitializerExternalization

Why This Test Works

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Oct 8, 2025

Uh oh!

Uh oh!

chilo-ms left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

snnn commented Oct 8, 2025 •

edited

Loading

Test: `GraphTest.ShapeInferenceWithInMemoryExternalData`

Test: `GraphTest.ShapeInferenceAfterInitializerExternalization`