Skip to content

[bug][FIX]: Patch for non-writable NumPy arrays in GGUF loader to prevent PyTorch undefined behavior and VRAM spikes #8280

@MK-986123

Description

@MK-986123

Is there an existing issue for this problem?

  • I have searched the existing issues

Install method

Invoke's Launcher

Operating system

Windows 11 64bit

GPU vendor

Nvidia (CUDA)

GPU model

RTX 4070 Ti Super 16GB VRAM

GPU VRAM

16GB

Version number

6.0.2

Browser

No response

System Information

InvokeAI Version: 6.0.2
OS: Windows 11 Dev Build
GPU: RTX 4070 Ti Super, 16GB VRAM
RAM: 64GB
Python: (please specify your Python version)
CUDA: 12.8

Key Dependencies:
torch: 2.7.1+cu128
torchvision: 0.22.1+cu128
numpy: 1.26.3
gguf: 0.17.1
diffusers: 0.33.0
onnxruntime-gpu: 1.22.0
bitsandbytes: 0.46.1

What happened

Loading GGUF models with InvokeAI on Windows 11 and an RTX 4070 Ti Super causes unexpected VRAM spikes and PyTorch warnings when the GGUF loader passes non-writable NumPy arrays to torch.from_numpy. This can crash the GPU or halt generation entirely.

PyTorch logs this warning:

The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor.

What you expected to happen

VRAM should stay within a normal range, and no warnings about non-writable NumPy arrays should appear in logs.

How to reproduce the problem

  1. Start InvokeAI with any GGUF or ggml quantized model.
  2. Load the model and monitor VRAM usage in Task Manager or nvidia-smi.
  3. Observe the VRAM spike and PyTorch warning in the logs.
  4. Apply the patch shown below and confirm that VRAM use stabilizes and the warning no longer appears.

Additional context

The root cause is passing a non-writable NumPy array to torch.from_numpy in loaders.py. When the tensor is written to later, PyTorch may allocate duplicate or temporary buffers on the GPU, leading to excessive VRAM usage.

Patch to fix the issue

Replace all torch.from_numpy(tensor.data) calls with:

torch_tensor = torch.from_numpy(tensor.data.copy() if not tensor.data.flags.writeable else tensor.data)

This was tested and confirmed on Windows 11 with an RTX 4070 Ti Super and PyTorch 2.3. After applying the patch, VRAM usage remains stable and the warning is resolved.

What changes

Before the patch:
torch.from_numpy(tensor.data) was called directly, regardless of whether the underlying NumPy array was writable.
If tensor.data was not writable, PyTorch issued a warning and sometimes allocated duplicate or temporary GPU buffers, leading to excessive VRAM usage and instability.

After the patch:
The code checks if tensor.data is writable. If not, it creates a writable copy using tensor.data.copy(), and passes this to torch.from_numpy.
This ensures PyTorch always receives a writable array, preventing unnecessary buffer allocations, eliminating the warning, and stabilizing VRAM usage.

Why it matters

  • Prevents VRAM spikes and out-of-memory errors when loading GGUF models
  • Eliminates PyTorch undefined behavior warnings related to non-writable tensors
  • Stabilizes image generation and model inference on large VRAM GPUs

Discord username

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions