Error: the provided PTX was compiled with an unsupported toolchain

# Prerequisites

Before submitting your issue, please ensure the following:

- [x] I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
- [x] I have carefully read and followed the instructions in the [README.md](https://github.com/SJTU-IPADS/PowerInfer/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).

# Expected Behavior

I tried to build the project and run a simple demo code as said in README.

# Current Behavior

It built successfully, but report error when I tried to run the demo code.
```
offload_ffn_split: applying augmentation to model - please wait ...

CUDA error 222 at /home/test/test06/jdz/PowerInfer/ggml-cuda.cu:9635: the provided PTX was compiled with an unsupported toolchain.
current device: 0
```

# Environment and Context


* SDK version, e.g. for Linux:

```
Python 3.10.14
cmake version 3.30.1
g++ (conda-forge gcc 11.4.0-13) 11.4.0
```

# Failure Information (for bugs)

I followed README, use `cmake -S . -B build -DLLAMA_CUBLAS=ON`,  `cmake --build build --config Release` to build the project, and then run the command:
```
./build/bin/main -m /home/test/test06/jdz/PLMs/ReluLLaMA-7B/llama-7b-relu.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" --vram-budget 8
```

I got the following logout

```
Log start



offload_ffn_split: applying augmentation to model - please wait ...

CUDA error 222 at /home/test/test06/jdz/PowerInfer/ggml-cuda.cu:9635: the provided PTX was compiled with an unsupported toolchain.
current device: 0
```

# Steps to Reproduce

Just follow README.

# Additional info

After running `cmake -S . -B build -DLLAMA_CUBLAS=ON` I got the following:
```
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/test/test06/miniconda3/envs/jdz/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/test/test06/miniconda3/envs/jdz/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Found CUDAToolkit: /home/test/test06/cuda-12.4/targets/x86_64-linux/include (found version "12.4.131")
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 12.4.131
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /home/test/test06/cuda-12.4/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 52;61;70
GNU ld (GNU Binutils) 2.40
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (17.3s)
-- Generating done (4.5s)
-- Build files have been written to: /home/test/test06/jdz/PowerInfer/build
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error: the provided PTX was compiled with an unsupported toolchain #229

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Additional info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error: the provided PTX was compiled with an unsupported toolchain #229

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Additional info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions