You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>Select a GPU from the list. The model instance will attempt to deploy to this GPU if resources permit.</p>
1431
1431
<h3id="backend">Backend</h3>
1432
-
<p>The inference backend. Currently, GPUStack supports two backends: llama-box and vLLM. GPUStack automatically selects the backend based on the model's information.</p>
1432
+
<p>The inference backend. Currently, GPUStack supports two backends: llama-box and vLLM. GPUStack automatically selects the backend based on the model's configuration.</p>
1433
1433
<p>For more details, please refer to the <ahref="../inference-backends/">Inference Backends</a> section.</p>
<p>Input the parameters for the backend you want to customize while running the model instance. Should be in the format <code>--parameter=value</code>, <code>--bool-parameter</code> or separate <code>--parameter</code> and <code>value</code> in two fields.
1436
-
For example, <code>--ctx-size=8192</code> for llama-box.</p>
1435
+
<p>Input the parameters for the backend you want to customize when running the model. The parameter should be in the format <code>--parameter=value</code>, <code>--bool-parameter</code> or as separate fields for <code>--parameter</code> and <code>value</code>.
1436
+
For example, use <code>--ctx-size=8192</code> for llama-box.</p>
1437
1437
<p>For full list of supported parameters, please refer to the <ahref="../inference-backends/">Inference Backends</a> section.</p>
1438
1438
<h3id="allow-cpu-offloading">Allow CPU Offloading</h3>
0 commit comments