[bug][FIX]: Patch for non-writable NumPy arrays in GGUF loader to prevent PyTorch undefined behavior and VRAM spikes

## Is there an existing issue for this problem?

- [x] I have searched the existing issues

## Install method

Invoke's Launcher

## Operating system

Windows 11 64bit

## GPU vendor

Nvidia (CUDA)

## GPU model

RTX 4070 Ti Super 16GB VRAM

## GPU VRAM

16GB

## Version number

6.0.2

## Browser

_No response_

## System Information

InvokeAI Version: 6.0.2  
OS: Windows 11 Dev Build  
GPU: RTX 4070 Ti Super, 16GB VRAM  
RAM: 64GB  
Python: _(please specify your Python version)_  
CUDA: 12.8  

**Key Dependencies:**  
torch: 2.7.1+cu128  
torchvision: 0.22.1+cu128  
numpy: 1.26.3  
gguf: 0.17.1  
diffusers: 0.33.0  
onnxruntime-gpu: 1.22.0  
bitsandbytes: 0.46.1  

## What happened

Loading GGUF models with InvokeAI on Windows 11 and an RTX 4070 Ti Super causes unexpected VRAM spikes and PyTorch warnings when the GGUF loader passes non-writable NumPy arrays to `torch.from_numpy`. This can crash the GPU or halt generation entirely.

PyTorch logs this warning:

_The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor._

## What you expected to happen

VRAM should stay within a normal range, and no warnings about non-writable NumPy arrays should appear in logs.

## How to reproduce the problem

1. Start InvokeAI with any GGUF or ggml quantized model.
2. Load the model and monitor VRAM usage in Task Manager or nvidia-smi.
3. Observe the VRAM spike and PyTorch warning in the logs.
4. Apply the patch shown below and confirm that VRAM use stabilizes and the warning no longer appears.

## Additional context

The root cause is passing a non-writable NumPy array to `torch.from_numpy` in loaders.py. When the tensor is written to later, PyTorch may allocate duplicate or temporary buffers on the GPU, leading to excessive VRAM usage.

## Patch to fix the issue

Replace all `torch.from_numpy(tensor.data)` calls with:

`torch_tensor = torch.from_numpy(tensor.data.copy() if not tensor.data.flags.writeable else tensor.data)`

This was tested and confirmed on Windows 11 with an RTX 4070 Ti Super and PyTorch 2.3. After applying the patch, VRAM usage remains stable and the warning is resolved.

## What changes

**Before the patch:**  
`torch.from_numpy(tensor.data)` was called directly, regardless of whether the underlying NumPy array was writable.  
If `tensor.data` was not writable, PyTorch issued a warning and sometimes allocated duplicate or temporary GPU buffers, leading to excessive VRAM usage and instability.

**After the patch:**  
The code checks if `tensor.data` is writable. If not, it creates a writable copy using `tensor.data.copy()`, and passes this to `torch.from_numpy`.  
This ensures PyTorch always receives a writable array, preventing unnecessary buffer allocations, eliminating the warning, and stabilizing VRAM usage.

## Why it matters

- _Prevents VRAM spikes and out-of-memory errors when loading GGUF models_
- _Eliminates PyTorch undefined behavior warnings related to non-writable tensors_
- _Stabilizes image generation and model inference on large VRAM GPUs_

## Discord username

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug][FIX]: Patch for non-writable NumPy arrays in GGUF loader to prevent PyTorch undefined behavior and VRAM spikes #8280

Is there an existing issue for this problem?

Install method

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

System Information

What happened

What you expected to happen

How to reproduce the problem

Additional context

Patch to fix the issue

What changes

Why it matters

Discord username

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug][FIX]: Patch for non-writable NumPy arrays in GGUF loader to prevent PyTorch undefined behavior and VRAM spikes #8280

Description

Is there an existing issue for this problem?

Install method

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

System Information

What happened

What you expected to happen

How to reproduce the problem

Additional context

Patch to fix the issue

What changes

Why it matters

Discord username

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions