Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,11 @@ ______________________________________________________________________
- \[2024/09\] LMDeploy PyTorchEngine achieves 1.3x faster on Llama3-8B inference by introducing CUDA graph
- \[2024/08\] LMDeploy is integrated into [modelscope/swift](https://github.com/modelscope/swift) as the default accelerator for VLMs inference
- \[2024/07\] Support Llama3.1 8B, 70B and its TOOLS CALLING
- \[2024/07\] Support [InternVL2](docs/en/multi_modal/internvl.md) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
- \[2024/07\] Support [InternVL2](docs/en/multi_modal/internvl.md) full-series models and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
- \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
- \[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5, LLaVa, InternLMXComposer2
- \[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2, MiniGemini, InternLMXComposer2.
- \[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5 and LLaVa
- \[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2 and MiniGemini.
- \[2024/04\] TurboMind adds online int8/int4 KV cache quantization and inference for all supported devices. Refer [here](docs/en/quantization/kv_quant.md) for detailed guide
- \[2024/04\] TurboMind latest upgrade boosts GQA, rocketing the [internlm2-20b](https://huggingface.co/internlm/internlm2-20b) model inference to 16+ RPS, about 1.8x faster than vLLM.
- \[2024/04\] Support Qwen1.5-MOE and dbrx.
Expand Down Expand Up @@ -171,8 +171,6 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
<td>
<ul>
<li>LLaVA(1.5,1.6) (7B-34B)</li>
<li>InternLM-XComposer2 (7B, 4khd-7B)</li>
<li>InternLM-XComposer2.5 (7B)</li>
<li>Qwen-VL (7B)</li>
<li>Qwen2-VL (2B, 7B, 72B)</li>
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
Expand Down
8 changes: 3 additions & 5 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@ ______________________________________________________________________

- \[2024/08\] 🔥🔥 LMDeployは[modelscope/swift](https://github.com/modelscope/swift)に統合され、VLMs推論のデフォルトアクセラレータとなりました
- \[2024/07\] 🎉🎉 Llama3.1 8B、70Bおよびそのツールコールをサポート
- \[2024/07\] [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e)全シリーズモデル、[InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md)およびInternLM2.5の[ファンクションコール](docs/en/llm/api_server_tools.md)をサポート
- \[2024/07\] [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e)全シリーズモデルおよびInternLM2.5の[ファンクションコール](docs/en/llm/api_server_tools.md)をサポート
- \[2024/06\] PyTorchエンジンはDeepSeek-V2およびいくつかのVLMs、例えばCogVLM2、Mini-InternVL、LlaVA-Nextをサポート
- \[2024/05\] 複数のGPUでVLMsをデプロイする際にビジョンモデルをバランスさせる
- \[2024/05\] InternVL v1.5、LLaVa、InternLMXComposer2などのVLMsで4ビットの重みのみの量子化と推論をサポート
- \[2024/04\] Llama3およびInternVL v1.1、v1.2、MiniGemini、InternLMXComposer2などのVLMモデルをサポート
- \[2024/05\] InternVL v1.5、LLaVaなどのVLMsで4ビットの重みのみの量子化と推論をサポート
- \[2024/04\] Llama3およびInternVL v1.1、v1.2、MiniGeminiなどのVLMモデルをサポート
- \[2024/04\] TurboMindはすべてのサポートされているデバイスでのオンラインint8/int4 KVキャッシュ量子化と推論を追加しました。詳細なガイドは[こちら](docs/en/quantization/kv_quant.md)を参照してください
- \[2024/04\] TurboMindの最新アップグレードによりGQAが強化され、[internlm2-20b](https://huggingface.co/internlm/internlm2-20b)モデルの推論が16+ RPSに達し、vLLMの約1.8倍の速さになりました
- \[2024/04\] Qwen1.5-MOEおよびdbrxをサポート
Expand Down Expand Up @@ -158,8 +158,6 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
<td>
<ul>
<li>LLaVA(1.5,1.6) (7B-34B)</li>
<li>InternLM-XComposer2 (7B, 4khd-7B)</li>
<li>InternLM-XComposer2.5 (7B)</li>
<li>Qwen-VL (7B)</li>
<li>Qwen2-VL (2B, 7B, 72B)</li>
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
Expand Down
8 changes: 3 additions & 5 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,11 @@ ______________________________________________________________________
- \[2024/09\] 通过引入 CUDA Graph,LMDeploy PyTorchEngine 在 Llama3-8B 推理上实现了 1.3 倍的加速
- \[2024/08\] LMDeploy现已集成至 [modelscope/swift](https://github.com/modelscope/swift),成为 VLMs 推理的默认加速引擎
- \[2024/07\] 支持 Llama3.1 8B 和 70B 模型,以及工具调用功能
- \[2024/07\] 支持 [InternVL2](docs/zh_cn/multi_modal/internvl.md) 全系列模型,[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/llm/api_server_tools.md)
- \[2024/07\] 支持 [InternVL2](docs/zh_cn/multi_modal/internvl.md) 全系列模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/llm/api_server_tools.md)
- \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2,Mini-InternVL,LlaVA-Next
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
- \[2024/04\] 支持 Llama3 和 InternVL v1.1, v1.2,MiniGemini,InternLM-XComposer2 等 VLM 模型
- \[2024/05\] 支持 InternVL v1.5 和 LLaVa 等 VLMs 模型的 4bit 权重量化和推理
- \[2024/04\] 支持 Llama3 和 InternVL v1.1, v1.2,MiniGemini 等 VLM 模型
- \[2024/04\] TurboMind 支持 kv cache int4/int8 在线量化和推理,适用已支持的所有型号显卡。详情请参考[这里](docs/zh_cn/quantization/kv_quant.md)
- \[2024/04\] TurboMind 引擎升级,优化 GQA 推理。[internlm2-20b](https://huggingface.co/internlm/internlm2-20b) 推理速度达 16+ RPS,约是 vLLM 的 1.8 倍
- \[2024/04\] 支持 Qwen1.5-MOE 和 dbrx.
Expand Down Expand Up @@ -173,8 +173,6 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<td>
<ul>
<li>LLaVA(1.5,1.6) (7B-34B)</li>
<li>InternLM-XComposer2 (7B, 4khd-7B)</li>
<li>InternLM-XComposer2.5 (7B)</li>
<li>Qwen-VL (7B)</li>
<li>Qwen2-VL (2B, 7B, 72B)</li>
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
Expand Down
4 changes: 0 additions & 4 deletions autotest/utils/get_run_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,6 @@ def get_model_name(model):
return 'internvl-internlm2'
if ('internlm2') in model_name:
return 'internlm2'
if ('internlm-xcomposer2d5') in model_name:
return 'internlm-xcomposer2d5'
if ('internlm-xcomposer2') in model_name:
return 'internlm-xcomposer2'
if ('glm-4') in model_name:
return 'glm4'
if len(model_name.split('-')) > 2 and '-'.join(model_name.split('-')[0:2]) in model_names:
Expand Down
2 changes: 1 addition & 1 deletion docs/en/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ lmdeploy serve api_server internlm/internlm2_5-7b-chat --cache-max-entry-count 0
### Api Server Fetch Timeout

The image URL fetch timeout for the API server can be configured via the environment variable `LMDEPLOY_FETCH_TIMEOUT`.
By default, requests may take up to 10 seconds before timing out. See [lmdeploy/vl/utils.py](https://github.com/InternLM/lmdeploy/blob/7b6876eafcb842633e0efe8baabe5906d7beeeea/lmdeploy/vl/utils.py#L31) for usage.
By default, requests may take up to 10 seconds before timing out. See [lmdeploy/multimodal/utils.py](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/multimodal/utils.py) for usage.

## Quantization

Expand Down
2 changes: 1 addition & 1 deletion docs/en/get_started/ascend/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Set `device_type="ascend"` in the `PytorchEngineConfig`:

```python
from lmdeploy import pipeline, PytorchEngineConfig
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image
pipe = pipeline('OpenGVLab/InternVL2-2B',
backend_config=PytorchEngineConfig(tp=1, device_type='ascend'))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
Expand Down
2 changes: 1 addition & 1 deletion docs/en/get_started/camb/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Set `device_type="camb"` in the `PytorchEngineConfig`:

```python
from lmdeploy import pipeline, PytorchEngineConfig
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image
pipe = pipeline('OpenGVLab/InternVL2-2B',
backend_config=PytorchEngineConfig(tp=1, device_type='camb'))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
Expand Down
4 changes: 2 additions & 2 deletions docs/en/get_started/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ For example, you can utilize the following code snippet to perform the inference

```python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image

pipe = pipeline('OpenGVLab/InternVL2-8B')

Expand All @@ -96,7 +96,7 @@ In VLM pipeline, the default image processing batch size is 1. This can be adjus

```python
from lmdeploy import pipeline, VisionConfig
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image

pipe = pipeline('OpenGVLab/InternVL2-8B',
vision_config=VisionConfig(
Expand Down
2 changes: 1 addition & 1 deletion docs/en/get_started/maca/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Set `device_type="maca"` in the `PytorchEngineConfig`:

```python
from lmdeploy import pipeline, PytorchEngineConfig
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image
pipe = pipeline('OpenGVLab/InternVL2-2B',
backend_config=PytorchEngineConfig(tp=1, device_type='maca'))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
Expand Down
18 changes: 9 additions & 9 deletions docs/en/inference/load_hf.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@ Starting from v0.1.0, Turbomind adds the ability to pre-process the model parame

Currently, Turbomind support loading three types of model:

1. A lmdeploy-quantized model hosted on huggingface.co, such as [llama2-70b-4bit](https://huggingface.co/lmdeploy/llama2-chat-70b-4bit), [internlm-chat-20b-4bit](https://huggingface.co/internlm/internlm-chat-20b-4bit), etc.
2. Other LM models on huggingface.co like Qwen/Qwen-7B-Chat
1. A lmdeploy-quantized model hosted on huggingface.co, such as [llama2-70b-4bit](https://huggingface.co/lmdeploy/llama2-chat-70b-4bit), etc.
2. Other LM models on huggingface.co like Qwen/Qwen2.5-7B-Instruct

## Usage

### 1) A lmdeploy-quantized model

For models quantized by `lmdeploy.lite` such as [llama2-70b-4bit](https://huggingface.co/lmdeploy/llama2-chat-70b-4bit), [internlm-chat-20b-4bit](https://huggingface.co/internlm/internlm-chat-20b-4bit), etc.
For models quantized by `lmdeploy.lite` such as [llama2-70b-4bit](https://huggingface.co/lmdeploy/llama2-chat-70b-4bit), etc.

```
repo_id=internlm/internlm-chat-20b-4bit
model_name=internlm-chat-20b
repo_id=lmdeploy/llama2-chat-70b-4bit
model_name=llama2-chat-70b
# or
# repo_id=/path/to/downloaded_model

Expand All @@ -30,13 +30,13 @@ lmdeploy serve api_server $repo_id --model-name $model_name --tp 1

### 2) Other LM models

For other LM models such as Qwen/Qwen-7B-Chat or baichuan-inc/Baichuan2-7B-Chat. LMDeploy supported models can be viewed through `lmdeploy list`.
For other LM models such as Qwen/Qwen2.5-7B-Instruct or internlm/internlm2-chat-7b. LMDeploy supported models can be viewed through `lmdeploy list`.

```
repo_id=Qwen/Qwen-7B-Chat
model_name=qwen-7b
repo_id=Qwen/Qwen2.5-7B-Instruct
model_name=qwen2.5-7b
# or
# repo_id=/path/to/Qwen-7B-Chat/local_path
# repo_id=/path/to/Qwen2.5-7B-Instruct/local_path

# Inference by TurboMind
lmdeploy chat $repo_id --model-name $model_name
Expand Down
2 changes: 1 addition & 1 deletion docs/en/llm/api_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ curl http://{server_ip}:{server_port}/v1/models
curl http://{server_ip}:{server_port}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "internlm-chat-7b",
"model": "intern-s2-preview",
"messages": [{"role": "user", "content": "Hello! How are you?"}]
}'
```
Expand Down
6 changes: 3 additions & 3 deletions docs/en/llm/api_server_anthropic.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ curl http://{server_ip}:{server_port}/v1/messages \
-H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "internlm-chat-7b",
"model": "intern-s2-preview",
"max_tokens": 128,
"messages": [{"role": "user", "content": "Hello from Anthropic client"}]
}'
Expand All @@ -42,7 +42,7 @@ curl http://{server_ip}:{server_port}/v1/messages \
-H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "internlm-chat-7b",
"model": "intern-s2-preview",
"max_tokens": 128,
"messages": [{"role": "user", "content": "Find lmdeploy docs"}],
"tools": [{
Expand Down Expand Up @@ -78,7 +78,7 @@ curl http://{server_ip}:{server_port}/v1/messages/count_tokens \
-H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "internlm-chat-7b",
"model": "intern-s2-preview",
"system": "You are a helpful assistant.",
"messages": [{"role": "user", "content": "Count these tokens"}]
}'
Expand Down
2 changes: 1 addition & 1 deletion docs/en/multi_modal/cogvlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The following sample code shows the basic usage of VLM pipeline. For more exampl

```python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion docs/en/multi_modal/deepseek_vl2.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ To construct valid DeepSeek-VL2 prompts with image inputs, users should insert `

```python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion docs/en/multi_modal/gemma3.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The following sample code shows the basic usage of VLM pipeline. For more exampl

```python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image


if __name__ == "__main__":
Expand Down
6 changes: 4 additions & 2 deletions docs/en/multi_modal/index.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
Vision-Language Models
Multimodal Models
=================================

Use ``lmdeploy.multimodal`` for multimodal helper APIs such as media loading
and local-file encoding.

.. toctree::
:maxdepth: 2
:caption: Examples

deepseek_vl2.md
llava.md
internvl.md
xcomposer2d5.md
cogvlm.md
minicpmv.md
phi3.md
Expand Down
11 changes: 5 additions & 6 deletions docs/en/multi_modal/internvl.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ LMDeploy supports the following InternVL series of models, which are detailed in
| InternVL2 | 4B | PyTorch |
| InternVL2 | 1B-2B, 8B-76B | TurboMind, PyTorch |
| InternVL2.5/2.5-MPO/3 | 1B-78B | TurboMind, PyTorch |
| Mono-InternVL | 2B | PyTorch |

The next chapter demonstrates how to deploy an InternVL model using LMDeploy, with [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) as an example.

Expand Down Expand Up @@ -43,7 +42,7 @@ The following sample code shows the basic usage of VLM pipeline. For detailed in

```python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image

pipe = pipeline('OpenGVLab/InternVL2-8B')

Expand All @@ -61,7 +60,7 @@ More examples are listed below:

```python
from lmdeploy import pipeline, GenerationConfig
from lmdeploy.vl.constants import IMAGE_TOKEN
from lmdeploy.multimodal.constants import IMAGE_TOKEN

pipe = pipeline('OpenGVLab/InternVL2-8B', log_level='INFO')
messages = [
Expand All @@ -87,7 +86,7 @@ out = pipe(messages, gen_config=GenerationConfig(top_k=1))

```python
from lmdeploy import pipeline, GenerationConfig
from lmdeploy.vl.constants import IMAGE_TOKEN
from lmdeploy.multimodal.constants import IMAGE_TOKEN

pipe = pipeline('OpenGVLab/InternVL2-8B', log_level='INFO')
messages = [
Expand Down Expand Up @@ -115,8 +114,8 @@ out = pipe(messages, gen_config=GenerationConfig(top_k=1))
import numpy as np
from lmdeploy import pipeline, GenerationConfig
from decord import VideoReader, cpu
from lmdeploy.vl.constants import IMAGE_TOKEN
from lmdeploy.vl import encode_image_base64
from lmdeploy.multimodal.constants import IMAGE_TOKEN
from lmdeploy.multimodal import encode_image_base64
from PIL import Image
pipe = pipeline('OpenGVLab/InternVL2-8B', log_level='INFO')

Expand Down
2 changes: 1 addition & 1 deletion docs/en/multi_modal/llava.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The following sample code shows the basic usage of VLM pipeline. For detailed in

```python
from lmdeploy import GenerationConfig, TurbomindEngineConfig, pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image


pipe = pipeline("llava-hf/llava-interleave-qwen-7b-hf", backend_config=TurbomindEngineConfig(cache_max_entry_count=0.5),
Expand Down
4 changes: 2 additions & 2 deletions docs/en/multi_modal/minicpmv.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The following sample code shows the basic usage of VLM pipeline. For detailed in

```python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image

pipe = pipeline('openbmb/MiniCPM-V-2_6')

Expand Down Expand Up @@ -97,7 +97,7 @@ print(out.text)

```python
from lmdeploy import pipeline, GenerationConfig
from lmdeploy.vl import encode_image_base64
from lmdeploy.multimodal import encode_image_base64
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer
Expand Down
2 changes: 1 addition & 1 deletion docs/en/multi_modal/molmo.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The following sample code shows the basic usage of VLM pipeline. For detailed in

```python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image

pipe = pipeline('allenai/Molmo-7B-D-0924')

Expand Down
10 changes: 5 additions & 5 deletions docs/en/multi_modal/multimodal_inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,7 @@ In addition to HTTP URLs, lmdeploy accepts:
- **Local file paths** via `file://` scheme: `file:///absolute/path/to/file.jpg`
- **Base64-encoded data** via data URLs: `data:<mime>;base64,<encoded_data>`

Use the helpers in `lmdeploy.vl.utils` to encode local files:
Use the helpers in `lmdeploy.multimodal.utils` to encode local files:

<details>
<summary>Local file path example</summary>
Expand Down Expand Up @@ -434,7 +434,7 @@ print(response.choices[0].message.content)

```python
from openai import OpenAI
from lmdeploy.vl.utils import encode_image_base64
from lmdeploy.multimodal.utils import encode_image_base64

client = OpenAI(api_key='EMPTY', base_url='http://localhost:23333/v1')
model_name = client.models.list().data[0].id
Expand Down Expand Up @@ -465,7 +465,7 @@ print(response.choices[0].message.content)

```python
from openai import OpenAI
from lmdeploy.vl.utils import encode_video_base64
from lmdeploy.multimodal.utils import encode_video_base64

client = OpenAI(api_key='EMPTY', base_url='http://localhost:23333/v1')
model_name = client.models.list().data[0].id
Expand Down Expand Up @@ -497,7 +497,7 @@ print(response.choices[0].message.content)

```python
from openai import OpenAI
from lmdeploy.vl.utils import encode_audio_base64
from lmdeploy.multimodal.utils import encode_audio_base64

client = OpenAI(api_key='EMPTY', base_url='http://localhost:23333/v1')
model_name = client.models.list().data[0].id
Expand Down Expand Up @@ -528,7 +528,7 @@ print(response.choices[0].message.content)

```python
from openai import OpenAI
from lmdeploy.vl.utils import encode_time_series_base64
from lmdeploy.multimodal.utils import encode_time_series_base64

client = OpenAI(api_key='EMPTY', base_url='http://localhost:23333/v1')
model_name = client.models.list().data[0].id
Expand Down
2 changes: 1 addition & 1 deletion docs/en/multi_modal/phi3.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The following sample code shows the basic usage of VLM pipeline. For more exampl

```python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.multimodal import load_image

pipe = pipeline('microsoft/Phi-3.5-vision-instruct')

Expand Down
Loading
Loading