-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Is there an existing issue for this problem?
- I have searched the existing issues
Install method
Invoke's Launcher
Operating system
Windows 11 64bit
GPU vendor
Nvidia (CUDA)
GPU model
RTX 4070 Ti Super 16GB VRAM
GPU VRAM
16GB
Version number
6.0.2
Browser
No response
System Information
InvokeAI Version: 6.0.2
OS: Windows 11 Dev Build
GPU: RTX 4070 Ti Super, 16GB VRAM
RAM: 64GB
Python: (please specify your Python version)
CUDA: 12.8
Key Dependencies:
torch: 2.7.1+cu128
torchvision: 0.22.1+cu128
numpy: 1.26.3
gguf: 0.17.1
diffusers: 0.33.0
onnxruntime-gpu: 1.22.0
bitsandbytes: 0.46.1
What happened
Loading GGUF models with InvokeAI on Windows 11 and an RTX 4070 Ti Super causes unexpected VRAM spikes and PyTorch warnings when the GGUF loader passes non-writable NumPy arrays to torch.from_numpy
. This can crash the GPU or halt generation entirely.
PyTorch logs this warning:
The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor.
What you expected to happen
VRAM should stay within a normal range, and no warnings about non-writable NumPy arrays should appear in logs.
How to reproduce the problem
- Start InvokeAI with any GGUF or ggml quantized model.
- Load the model and monitor VRAM usage in Task Manager or nvidia-smi.
- Observe the VRAM spike and PyTorch warning in the logs.
- Apply the patch shown below and confirm that VRAM use stabilizes and the warning no longer appears.
Additional context
The root cause is passing a non-writable NumPy array to torch.from_numpy
in loaders.py. When the tensor is written to later, PyTorch may allocate duplicate or temporary buffers on the GPU, leading to excessive VRAM usage.
Patch to fix the issue
Replace all torch.from_numpy(tensor.data)
calls with:
torch_tensor = torch.from_numpy(tensor.data.copy() if not tensor.data.flags.writeable else tensor.data)
This was tested and confirmed on Windows 11 with an RTX 4070 Ti Super and PyTorch 2.3. After applying the patch, VRAM usage remains stable and the warning is resolved.
What changes
Before the patch:
torch.from_numpy(tensor.data)
was called directly, regardless of whether the underlying NumPy array was writable.
If tensor.data
was not writable, PyTorch issued a warning and sometimes allocated duplicate or temporary GPU buffers, leading to excessive VRAM usage and instability.
After the patch:
The code checks if tensor.data
is writable. If not, it creates a writable copy using tensor.data.copy()
, and passes this to torch.from_numpy
.
This ensures PyTorch always receives a writable array, preventing unnecessary buffer allocations, eliminating the warning, and stabilizing VRAM usage.
Why it matters
- Prevents VRAM spikes and out-of-memory errors when loading GGUF models
- Eliminates PyTorch undefined behavior warnings related to non-writable tensors
- Stabilizes image generation and model inference on large VRAM GPUs
Discord username
No response