-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Fix shape inference failure with in-memory external data #26263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When Constant nodes have tensors larger than 127 bytes, they are converted to OrtValues with in-memory external data for efficiency. However, ONNX shape inference rejects TensorProtos with data_location=EXTERNAL, as it cannot distinguish between in-memory and file-based external data. This fix modifies InferenceContextImpl::getInputData() to detect in-memory external data and materialize it into a temporary TensorProto with embedded data that ONNX shape inference can process. Fixes #26261 The issue was introduced in commit 3b97d79 (PR #25320) which converted large initializers to OrtValues. This regression caused models with Constant nodes having tensors just over 127 bytes to fail loading with shape inference errors. Changes: - Modified getInputData() to check for in-memory external data using utils::HasExternalDataInMemory() - When detected, retrieves the OrtValue and creates a temporary TensorProto with embedded data (use_tensor_buffer=false) - Added temp_tensor_protos_ member to store these temporary protos so they outlive the shape inference call
Tests the fix for issue #26261 where Constant nodes with tensors larger than 127 bytes fail shape inference when stored as in-memory external data. The test: - Creates a Constant node with a 128-byte tensor (16 INT64 values) - Uses this constant as input to a Split node - Verifies that shape inference succeeds and the graph resolves correctly - Confirms that the constant was properly converted to an initializer Before the fix, this test would fail with: 'Cannot parse data from external tensors. Please load external data into raw data for tensor' With the fix, the test passes as the in-memory external data is materialized for shape inference.
Update: Unit Test AddedI've added a comprehensive unit test in Test:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
- Added ShapeInferenceWithInMemoryExternalDataViaSession test that uses InferenceSession to more closely match the real-world scenario - Kept the original Model::Load test for basic coverage - Added includes for fstream, InferenceSession, and Environment - Both tests verify that models with Constant nodes > 127 bytes load correctly Note: The test passes even without the fix in the test environment due to the specific optimization paths taken. The fix is fully validated with the real-world model test (BiRefNet-COD-epoch_125.onnx) which fails without the fix and succeeds with it.
Summary: Investigation CompleteI've completed a thorough investigation and fix for issue #26261. What Was Done:
Test Validation:Real-World Model (BiRefNet-COD-epoch_125.onnx):
Unit Tests:
Note: The unit tests pass even without the fix due to optimization paths in the test environment, but provide regression coverage for the code path. The real validation comes from the actual model test which definitively fails without the fix and succeeds with it. Ready for ReviewThe PR includes:
All changes are minimal and surgical, focused solely on fixing the regression introduced in v1.23.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
The new test ShapeInferenceAfterInitializerExternalization explicitly: 1. Creates a model with a 128-byte initializer (> 127 threshold) 2. Calls ConvertInitializersIntoOrtValues() to externalize it 3. Forces a second Resolve() to trigger shape inference 4. Verifies shape inference can access the externalized data This test definitively fails without the fix and passes with it: - WITHOUT fix: Fails with 'Cannot parse data from external tensors' - WITH fix: Passes as getInputData() materializes the external data The key insight is that SetGraphResolveNeeded() must be called after externalization to force shape inference to run again on the modified graph.
✅ Success: Unit Test Now Properly Catches the Bug!I've successfully created a unit test that definitively fails without the fix and passes with it. Test:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
|
One more thing. ORT has several implementations of InterenceContext interface. Please, check them all. |
chilo-ms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Yateng Hong <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: quic-calvnguy <[email protected]> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <[email protected]> Co-authored-by: yifei <[email protected]>
|
Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag |
## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
…lues early (#26345) ### Description Converts weights early and revert "Properly remove in-memory references (#25652)" This reverts commit 3ca49d8 and makes appropriate adjustments for the current state of the code. This PR is made possible and on the heels of: #26263 #25833. Previous history: #23979 #25320 #25626 #25652 The first change (#26263) allows us to convert initializers to OrtValues early and save lots of memory at model loading time. Specifically, for Phi-4-mini-instruct-INT4 model before and after looks like this: **Before** <img width="1204" height="124" alt="Before change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6" /> **After** <img width="997" height="114" alt="After change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f" /> The two peaks represent memory usage at optimization time (8.1Gb before) and after weights memory mapping (6.5Gb) After this change corresponding numbers look 3.5Gb and 4.7Gb respectively. Most of the savings during optimization phase come from `ConstantFolding` where we are able to reuse the resulting OrtValues directly for the new initializers. This PR concludes a series of PRs converting initializers to OrtValues. Memory consumption before the conversion began was 9.3Gb and 6.7Gb respectively. We are saving almost 6Gb during optimization and 2Gb for the steady state. <img width="1175" height="139" alt="image" src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04" /> The model also loads about 12 seconds faster. Example of ConstantFolding being one of the top contributors where we duplicate memory for higher peak before Resolve takes care of no longer used initializers. <img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding Transpose Optimizer" src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40" /> <img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from ConstantFolding" src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e" /> <img width="325" height="160" alt="image" src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc" /> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reduce memory usage.
…6263) ## Description Fixes microsoft#26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR microsoft#25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue microsoft#26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
Description
Fixes #26261
This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error.
Root Cause
Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with
data_location = EXTERNAL.The Fix
Modified
InferenceContextImpl::getInputData()to:utils::HasExternalDataInMemory()This allows ONNX shape inference to access the actual tensor data without rejecting it as external.
Memory Impact
This fix introduces a minor and temporary increase in memory usage during the model loading phase.
The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models.
Testing
Changes
getInputData()method inInferenceContextImplclasstemp_tensor_protos_member to store temporary TensorProtos during shape inferenceTODO