You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/build-s390x.md
+74-2Lines changed: 74 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ cd llama.cpp
16
16
17
17
## CPU Build with BLAS
18
18
19
-
Building llama.cpp with BLAS support is highly recommended as it has shown to provide performance improvements.
19
+
Building llama.cpp with BLAS support is highly recommended as it has shown to provide performance improvements. Make sure to have OpenBLAS installed in your environment.
20
20
21
21
```bash
22
22
cmake -S . -B build \
@@ -82,12 +82,18 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
82
82
83
83
1. **Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)**
84
84
85
+

86
+
85
87
You can find popular models pre-converted and verified at [s390x Ready Models](https://huggingface.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
86
88
87
-
These models and their respective tokenizers are verified to run correctly on IBM Z & LinuxONE.
89
+
These models have already been converted from `safetensors` to `GGUF Big-Endian`and their respective tokenizers verified to run correctly on IBM z15 and later system.
88
90
89
91
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
90
92
93
+

94
+
95
+
The model you are trying to convert must be in`safetensors` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)). Make sure you have downloaded the model repository for this case.
96
+
91
97
```bash
92
98
python3 convert_hf_to_gguf.py \
93
99
--outfile model-name-be.f16.gguf \
@@ -108,6 +114,10 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
108
114
109
115
3. **Convert existing GGUF Little-Endian model to Big-Endian**
110
116
117
+

118
+
119
+
The model you are trying to convert must be in`gguf` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
120
+
111
121
```bash
112
122
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
113
123
```
@@ -163,6 +173,22 @@ It is strongly recommended to disable SMT via the kernel boot parameters as it n
163
173
164
174
IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongly recommended to use BLAS.
165
175
176
+
## Frequently Asked Questions (FAQ)
177
+
178
+
1. I'm getting the following error message while trying to load a model: `gguf_init_from_file_impl: failed to load model: this GGUF file version 50331648 is extremely large, is there a mismatch between the host and model endianness?`
179
+
180
+
Answer: Please ensure that the model you have downloaded/converted is GGUFv3 Big-Endian. These models are usually denoted with the `-be` suffix, i.e., `granite-3.3-2b-instruct-be.F16.gguf`.
181
+
182
+
You may refer to the [Getting GGUF Models](#getting-gguf-models) section to manually convert a `safetensors` model to `GGUF` Big Endian.
183
+
184
+
2. I'm getting extremely poor performance when running inference on a model
185
+
186
+
Answer: Please refer to the [Appendix B: SIMD Support Matrix](#appendix-b-simd-support-matrix) to check if your model quantization is supported by SIMD acceleration.
187
+
188
+
3. I'm building on IBM z17 and getting the following error messages: `invalid switch -march=z17`
189
+
190
+
Answer: Please ensure that your GCC compiler is of minimum GCC 15.1.0 version, and have `binutils` updated to the latest version. If this does not fix the problem, kindly open an issue.
191
+
166
192
## Getting Help on IBM Z & LinuxONE
167
193
168
194
1. **Bugs, Feature Requests**
@@ -172,3 +198,49 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
0 commit comments