InternLM · CUHKSZzxy · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/README.md b/README.md
@@ -52,11 +52,11 @@ ______________________________________________________________________
 - \[2024/09\] LMDeploy PyTorchEngine achieves 1.3x faster on Llama3-8B inference by introducing CUDA graph
 - \[2024/08\] LMDeploy is integrated into [modelscope/swift](https://github.com/modelscope/swift) as the default accelerator for VLMs inference
 - \[2024/07\] Support Llama3.1 8B, 70B and its TOOLS CALLING
-- \[2024/07\] Support [InternVL2](docs/en/multi_modal/internvl.md) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
+- \[2024/07\] Support [InternVL2](docs/en/multi_modal/internvl.md) full-series models and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
 - \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
 - \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
-- \[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5, LLaVa, InternLMXComposer2
-- \[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2, MiniGemini, InternLMXComposer2.
+- \[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5 and LLaVa
+- \[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2 and MiniGemini.
 - \[2024/04\] TurboMind adds online int8/int4 KV cache quantization and inference for all supported devices. Refer [here](docs/en/quantization/kv_quant.md) for detailed guide
 - \[2024/04\] TurboMind latest upgrade boosts GQA, rocketing the [internlm2-20b](https://huggingface.co/internlm/internlm2-20b) model inference to 16+ RPS, about 1.8x faster than vLLM.
 - \[2024/04\] Support Qwen1.5-MOE and dbrx.
@@ -171,8 +171,6 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
 <td>
 <ul>
   <li>LLaVA(1.5,1.6) (7B-34B)</li>
-  <li>InternLM-XComposer2 (7B, 4khd-7B)</li>
-  <li>InternLM-XComposer2.5 (7B)</li>
   <li>Qwen-VL (7B)</li>
   <li>Qwen2-VL (2B, 7B, 72B)</li>
   <li>Qwen2.5-VL (3B, 7B, 72B)</li>

diff --git a/README_ja.md b/README_ja.md
@@ -37,11 +37,11 @@ ______________________________________________________________________
 
 - \[2024/08\] 🔥🔥 LMDeployは[modelscope/swift](https://github.com/modelscope/swift)に統合され、VLMs推論のデフォルトアクセラレータとなりました
 - \[2024/07\] 🎉🎉 Llama3.1 8B、70Bおよびそのツールコールをサポート
-- \[2024/07\] [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e)全シリーズモデル、[InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md)およびInternLM2.5の[ファンクションコール](docs/en/llm/api_server_tools.md)をサポート
+- \[2024/07\] [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e)全シリーズモデルおよびInternLM2.5の[ファンクションコール](docs/en/llm/api_server_tools.md)をサポート
 - \[2024/06\] PyTorchエンジンはDeepSeek-V2およびいくつかのVLMs、例えばCogVLM2、Mini-InternVL、LlaVA-Nextをサポート
 - \[2024/05\] 複数のGPUでVLMsをデプロイする際にビジョンモデルをバランスさせる
-- \[2024/05\] InternVL v1.5、LLaVa、InternLMXComposer2などのVLMsで4ビットの重みのみの量子化と推論をサポート
-- \[2024/04\] Llama3およびInternVL v1.1、v1.2、MiniGemini、InternLMXComposer2などのVLMモデルをサポート
+- \[2024/05\] InternVL v1.5、LLaVaなどのVLMsで4ビットの重みのみの量子化と推論をサポート
+- \[2024/04\] Llama3およびInternVL v1.1、v1.2、MiniGeminiなどのVLMモデルをサポート
 - \[2024/04\] TurboMindはすべてのサポートされているデバイスでのオンラインint8/int4 KVキャッシュ量子化と推論を追加しました。詳細なガイドは[こちら](docs/en/quantization/kv_quant.md)を参照してください
 - \[2024/04\] TurboMindの最新アップグレードによりGQAが強化され、[internlm2-20b](https://huggingface.co/internlm/internlm2-20b)モデルの推論が16+ RPSに達し、vLLMの約1.8倍の速さになりました
 - \[2024/04\] Qwen1.5-MOEおよびdbrxをサポート
@@ -158,8 +158,6 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
 <td>
 <ul>
   <li>LLaVA(1.5,1.6) (7B-34B)</li>
-  <li>InternLM-XComposer2 (7B, 4khd-7B)</li>
-  <li>InternLM-XComposer2.5 (7B)</li>
   <li>Qwen-VL (7B)</li>
   <li>Qwen2-VL (2B, 7B, 72B)</li>
   <li>Qwen2.5-VL (3B, 7B, 72B)</li>

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -52,11 +52,11 @@ ______________________________________________________________________
 - \[2024/09\] 通过引入 CUDA Graph，LMDeploy PyTorchEngine 在 Llama3-8B 推理上实现了 1.3 倍的加速
 - \[2024/08\] LMDeploy现已集成至 [modelscope/swift](https://github.com/modelscope/swift)，成为 VLMs 推理的默认加速引擎
 - \[2024/07\] 支持 Llama3.1 8B 和 70B 模型，以及工具调用功能
-- \[2024/07\] 支持 [InternVL2](docs/zh_cn/multi_modal/internvl.md) 全系列模型，[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/llm/api_server_tools.md)
+- \[2024/07\] 支持 [InternVL2](docs/zh_cn/multi_modal/internvl.md) 全系列模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/llm/api_server_tools.md)
 - \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2，Mini-InternVL，LlaVA-Next
 - \[2024/05\] 在多 GPU 上部署 VLM 模型时，支持把视觉部分的模型均分到多卡上
-- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
-- \[2024/04\] 支持 Llama3 和 InternVL v1.1, v1.2，MiniGemini，InternLM-XComposer2 等 VLM 模型
+- \[2024/05\] 支持 InternVL v1.5 和 LLaVa 等 VLMs 模型的 4bit 权重量化和推理
+- \[2024/04\] 支持 Llama3 和 InternVL v1.1, v1.2，MiniGemini 等 VLM 模型
 - \[2024/04\] TurboMind 支持 kv cache int4/int8 在线量化和推理，适用已支持的所有型号显卡。详情请参考[这里](docs/zh_cn/quantization/kv_quant.md)
 - \[2024/04\] TurboMind 引擎升级，优化 GQA 推理。[internlm2-20b](https://huggingface.co/internlm/internlm2-20b) 推理速度达 16+ RPS，约是 vLLM 的 1.8 倍
 - \[2024/04\] 支持 Qwen1.5-MOE 和 dbrx.
@@ -173,8 +173,6 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力，在各种规模的模型
 <td>
 <ul>
   <li>LLaVA(1.5,1.6) (7B-34B)</li>
-  <li>InternLM-XComposer2 (7B, 4khd-7B)</li>
-  <li>InternLM-XComposer2.5 (7B)</li>
   <li>Qwen-VL (7B)</li>
   <li>Qwen2-VL (2B, 7B, 72B)</li>
   <li>Qwen2.5-VL (3B, 7B, 72B)</li>

diff --git a/autotest/utils/get_run_config.py b/autotest/utils/get_run_config.py
@@ -37,10 +37,6 @@ def get_model_name(model):
         return 'internvl-internlm2'
     if ('internlm2') in model_name:
         return 'internlm2'
-    if ('internlm-xcomposer2d5') in model_name:
-        return 'internlm-xcomposer2d5'
-    if ('internlm-xcomposer2') in model_name:
-        return 'internlm-xcomposer2'
     if ('glm-4') in model_name:
         return 'glm4'
     if len(model_name.split('-')) > 2 and '-'.join(model_name.split('-')[0:2]) in model_names:

diff --git a/docs/en/faq.md b/docs/en/faq.md
@@ -94,7 +94,7 @@ lmdeploy serve api_server internlm/internlm2_5-7b-chat --cache-max-entry-count 0
 ### Api Server Fetch Timeout
 
 The image URL fetch timeout for the API server can be configured via the environment variable `LMDEPLOY_FETCH_TIMEOUT`.
-By default, requests may take up to 10 seconds before timing out. See [lmdeploy/vl/utils.py](https://github.com/InternLM/lmdeploy/blob/7b6876eafcb842633e0efe8baabe5906d7beeeea/lmdeploy/vl/utils.py#L31) for usage.
+By default, requests may take up to 10 seconds before timing out. See [lmdeploy/multimodal/utils.py](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/multimodal/utils.py) for usage.
 
 ## Quantization
 

diff --git a/docs/en/get_started/ascend/get_started.md b/docs/en/get_started/ascend/get_started.md
@@ -50,7 +50,7 @@ Set `device_type="ascend"` in the `PytorchEngineConfig`:
 
 ```python
 from lmdeploy import pipeline, PytorchEngineConfig
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 pipe = pipeline('OpenGVLab/InternVL2-2B',
         backend_config=PytorchEngineConfig(tp=1, device_type='ascend'))
 image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')

diff --git a/docs/en/get_started/camb/get_started.md b/docs/en/get_started/camb/get_started.md
@@ -43,7 +43,7 @@ Set `device_type="camb"` in the `PytorchEngineConfig`:
 
 ```python
 from lmdeploy import pipeline, PytorchEngineConfig
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 pipe = pipeline('OpenGVLab/InternVL2-2B',
         backend_config=PytorchEngineConfig(tp=1, device_type='camb'))
 image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')

diff --git a/docs/en/get_started/get_started.md b/docs/en/get_started/get_started.md
@@ -83,7 +83,7 @@ For example, you can utilize the following code snippet to perform the inference
 
 ```python
 from lmdeploy import pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 pipe = pipeline('OpenGVLab/InternVL2-8B')
 
@@ -96,7 +96,7 @@ In VLM pipeline, the default image processing batch size is 1. This can be adjus
 
 ```python
 from lmdeploy import pipeline, VisionConfig
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 pipe = pipeline('OpenGVLab/InternVL2-8B',
                 vision_config=VisionConfig(

diff --git a/docs/en/get_started/maca/get_started.md b/docs/en/get_started/maca/get_started.md
@@ -33,7 +33,7 @@ Set `device_type="maca"` in the `PytorchEngineConfig`:
 
 ```python
 from lmdeploy import pipeline, PytorchEngineConfig
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 pipe = pipeline('OpenGVLab/InternVL2-2B',
         backend_config=PytorchEngineConfig(tp=1, device_type='maca'))
 image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')

diff --git a/docs/en/inference/load_hf.md b/docs/en/inference/load_hf.md
@@ -6,18 +6,18 @@ Starting from v0.1.0, Turbomind adds the ability to pre-process the model parame
 
 Currently, Turbomind support loading three types of model:
 
-1. A lmdeploy-quantized model hosted on huggingface.co, such as [llama2-70b-4bit](https://huggingface.co/lmdeploy/llama2-chat-70b-4bit), [internlm-chat-20b-4bit](https://huggingface.co/internlm/internlm-chat-20b-4bit), etc.
-2. Other LM models on huggingface.co like Qwen/Qwen-7B-Chat
+1. A lmdeploy-quantized model hosted on huggingface.co, such as [llama2-70b-4bit](https://huggingface.co/lmdeploy/llama2-chat-70b-4bit), etc.
+2. Other LM models on huggingface.co like Qwen/Qwen2.5-7B-Instruct
 
 ## Usage
 
 ### 1) A lmdeploy-quantized model
 
-For models quantized by `lmdeploy.lite` such as [llama2-70b-4bit](https://huggingface.co/lmdeploy/llama2-chat-70b-4bit), [internlm-chat-20b-4bit](https://huggingface.co/internlm/internlm-chat-20b-4bit), etc.
+For models quantized by `lmdeploy.lite` such as [llama2-70b-4bit](https://huggingface.co/lmdeploy/llama2-chat-70b-4bit), etc.
 
 ```
-repo_id=internlm/internlm-chat-20b-4bit
-model_name=internlm-chat-20b
+repo_id=lmdeploy/llama2-chat-70b-4bit
+model_name=llama2-chat-70b
 # or
 # repo_id=/path/to/downloaded_model
 
@@ -30,13 +30,13 @@ lmdeploy serve api_server $repo_id --model-name $model_name --tp 1
 
 ### 2) Other LM models
 
-For other LM models such as Qwen/Qwen-7B-Chat or baichuan-inc/Baichuan2-7B-Chat. LMDeploy supported models can be viewed through `lmdeploy list`.
+For other LM models such as Qwen/Qwen2.5-7B-Instruct or internlm/internlm2-chat-7b. LMDeploy supported models can be viewed through `lmdeploy list`.
 
 ```
-repo_id=Qwen/Qwen-7B-Chat
-model_name=qwen-7b
+repo_id=Qwen/Qwen2.5-7B-Instruct
+model_name=qwen2.5-7b
 # or
-# repo_id=/path/to/Qwen-7B-Chat/local_path
+# repo_id=/path/to/Qwen2.5-7B-Instruct/local_path
 
 # Inference by TurboMind
 lmdeploy chat $repo_id --model-name $model_name

diff --git a/docs/en/llm/api_server.md b/docs/en/llm/api_server.md
@@ -187,7 +187,7 @@ curl http://{server_ip}:{server_port}/v1/models
 curl http://{server_ip}:{server_port}/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
-    "model": "internlm-chat-7b",
+    "model": "intern-s2-preview",
     "messages": [{"role": "user", "content": "Hello! How are you?"}]
   }'
 ```

diff --git a/docs/en/llm/api_server_anthropic.md b/docs/en/llm/api_server_anthropic.md
@@ -29,7 +29,7 @@ curl http://{server_ip}:{server_port}/v1/messages \
   -H "content-type: application/json" \
   -H "anthropic-version: 2023-06-01" \
   -d '{
-    "model": "internlm-chat-7b",
+    "model": "intern-s2-preview",
     "max_tokens": 128,
     "messages": [{"role": "user", "content": "Hello from Anthropic client"}]
   }'
@@ -42,7 +42,7 @@ curl http://{server_ip}:{server_port}/v1/messages \
   -H "content-type: application/json" \
   -H "anthropic-version: 2023-06-01" \
   -d '{
-    "model": "internlm-chat-7b",
+    "model": "intern-s2-preview",
     "max_tokens": 128,
     "messages": [{"role": "user", "content": "Find lmdeploy docs"}],
     "tools": [{
@@ -78,7 +78,7 @@ curl http://{server_ip}:{server_port}/v1/messages/count_tokens \
   -H "content-type: application/json" \
   -H "anthropic-version: 2023-06-01" \
   -d '{
-    "model": "internlm-chat-7b",
+    "model": "intern-s2-preview",
     "system": "You are a helpful assistant.",
     "messages": [{"role": "user", "content": "Count these tokens"}]
   }'

diff --git a/docs/en/multi_modal/cogvlm.md b/docs/en/multi_modal/cogvlm.md
@@ -26,7 +26,7 @@ The following sample code shows the basic usage of VLM pipeline. For more exampl
 
 ```python
 from lmdeploy import pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 
 if __name__ == "__main__":

diff --git a/docs/en/multi_modal/deepseek_vl2.md b/docs/en/multi_modal/deepseek_vl2.md
@@ -30,7 +30,7 @@ To construct valid DeepSeek-VL2 prompts with image inputs, users should insert `
 
 ```python
 from lmdeploy import pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 
 if __name__ == "__main__":

diff --git a/docs/en/multi_modal/gemma3.md b/docs/en/multi_modal/gemma3.md
@@ -18,7 +18,7 @@ The following sample code shows the basic usage of VLM pipeline. For more exampl
 
 ```python
 from lmdeploy import pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 
 if __name__ == "__main__":

diff --git a/docs/en/multi_modal/index.rst b/docs/en/multi_modal/index.rst
@@ -1,14 +1,16 @@
-Vision-Language Models
+Multimodal Models
 =================================
 
+Use ``lmdeploy.multimodal`` for multimodal helper APIs such as media loading
+and local-file encoding.
+
 .. toctree::
    :maxdepth: 2
    :caption: Examples
 
    deepseek_vl2.md
    llava.md
    internvl.md
-   xcomposer2d5.md
    cogvlm.md
    minicpmv.md
    phi3.md

diff --git a/docs/en/multi_modal/internvl.md b/docs/en/multi_modal/internvl.md
@@ -9,7 +9,6 @@ LMDeploy supports the following InternVL series of models, which are detailed in
 |       InternVL2       |      4B       |          PyTorch           |
 |       InternVL2       | 1B-2B, 8B-76B |     TurboMind, PyTorch     |
 | InternVL2.5/2.5-MPO/3 |    1B-78B     |     TurboMind, PyTorch     |
-|     Mono-InternVL     |      2B       |          PyTorch           |
 
 The next chapter demonstrates how to deploy an InternVL model using LMDeploy, with [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) as an example.
 
@@ -43,7 +42,7 @@ The following sample code shows the basic usage of VLM pipeline. For detailed in
 
 ```python
 from lmdeploy import pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 pipe = pipeline('OpenGVLab/InternVL2-8B')
 
@@ -61,7 +60,7 @@ More examples are listed below:
 
 ```python
 from lmdeploy import pipeline, GenerationConfig
-from lmdeploy.vl.constants import IMAGE_TOKEN
+from lmdeploy.multimodal.constants import IMAGE_TOKEN
 
 pipe = pipeline('OpenGVLab/InternVL2-8B', log_level='INFO')
 messages = [
@@ -87,7 +86,7 @@ out = pipe(messages, gen_config=GenerationConfig(top_k=1))
 
 ```python
 from lmdeploy import pipeline, GenerationConfig
-from lmdeploy.vl.constants import IMAGE_TOKEN
+from lmdeploy.multimodal.constants import IMAGE_TOKEN
 
 pipe = pipeline('OpenGVLab/InternVL2-8B', log_level='INFO')
 messages = [
@@ -115,8 +114,8 @@ out = pipe(messages, gen_config=GenerationConfig(top_k=1))
 import numpy as np
 from lmdeploy import pipeline, GenerationConfig
 from decord import VideoReader, cpu
-from lmdeploy.vl.constants import IMAGE_TOKEN
-from lmdeploy.vl import encode_image_base64
+from lmdeploy.multimodal.constants import IMAGE_TOKEN
+from lmdeploy.multimodal import encode_image_base64
 from PIL import Image
 pipe = pipeline('OpenGVLab/InternVL2-8B', log_level='INFO')
 

diff --git a/docs/en/multi_modal/llava.md b/docs/en/multi_modal/llava.md
@@ -33,7 +33,7 @@ The following sample code shows the basic usage of VLM pipeline. For detailed in
 
 ```python
 from lmdeploy import GenerationConfig, TurbomindEngineConfig, pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 
 pipe = pipeline("llava-hf/llava-interleave-qwen-7b-hf", backend_config=TurbomindEngineConfig(cache_max_entry_count=0.5),

diff --git a/docs/en/multi_modal/minicpmv.md b/docs/en/multi_modal/minicpmv.md
@@ -19,7 +19,7 @@ The following sample code shows the basic usage of VLM pipeline. For detailed in
 
 ```python
 from lmdeploy import pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 pipe = pipeline('openbmb/MiniCPM-V-2_6')
 
@@ -97,7 +97,7 @@ print(out.text)
 
 ```python
 from lmdeploy import pipeline, GenerationConfig
-from lmdeploy.vl import encode_image_base64
+from lmdeploy.multimodal import encode_image_base64
 import torch
 from PIL import Image
 from transformers import AutoModel, AutoTokenizer

diff --git a/docs/en/multi_modal/molmo.md b/docs/en/multi_modal/molmo.md
@@ -19,7 +19,7 @@ The following sample code shows the basic usage of VLM pipeline. For detailed in
 
 ```python
 from lmdeploy import pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 pipe = pipeline('allenai/Molmo-7B-D-0924')
 

diff --git a/docs/en/multi_modal/multimodal_inputs.md b/docs/en/multi_modal/multimodal_inputs.md
@@ -398,7 +398,7 @@ In addition to HTTP URLs, lmdeploy accepts:
 - **Local file paths** via `file://` scheme: `file:///absolute/path/to/file.jpg`
 - **Base64-encoded data** via data URLs: `data:<mime>;base64,<encoded_data>`
 
-Use the helpers in `lmdeploy.vl.utils` to encode local files:
+Use the helpers in `lmdeploy.multimodal.utils` to encode local files:
 
 <details>
 <summary>Local file path example</summary>
@@ -434,7 +434,7 @@ print(response.choices[0].message.content)
 
 ```python
 from openai import OpenAI
-from lmdeploy.vl.utils import encode_image_base64
+from lmdeploy.multimodal.utils import encode_image_base64
 
 client = OpenAI(api_key='EMPTY', base_url='http://localhost:23333/v1')
 model_name = client.models.list().data[0].id
@@ -465,7 +465,7 @@ print(response.choices[0].message.content)
 
 ```python
 from openai import OpenAI
-from lmdeploy.vl.utils import encode_video_base64
+from lmdeploy.multimodal.utils import encode_video_base64
 
 client = OpenAI(api_key='EMPTY', base_url='http://localhost:23333/v1')
 model_name = client.models.list().data[0].id
@@ -497,7 +497,7 @@ print(response.choices[0].message.content)
 
 ```python
 from openai import OpenAI
-from lmdeploy.vl.utils import encode_audio_base64
+from lmdeploy.multimodal.utils import encode_audio_base64
 
 client = OpenAI(api_key='EMPTY', base_url='http://localhost:23333/v1')
 model_name = client.models.list().data[0].id
@@ -528,7 +528,7 @@ print(response.choices[0].message.content)
 
 ```python
 from openai import OpenAI
-from lmdeploy.vl.utils import encode_time_series_base64
+from lmdeploy.multimodal.utils import encode_time_series_base64
 
 client = OpenAI(api_key='EMPTY', base_url='http://localhost:23333/v1')
 model_name = client.models.list().data[0].id

diff --git a/docs/en/multi_modal/phi3.md b/docs/en/multi_modal/phi3.md
@@ -26,7 +26,7 @@ The following sample code shows the basic usage of VLM pipeline. For more exampl
 
 ```python
 from lmdeploy import pipeline
-from lmdeploy.vl import load_image
+from lmdeploy.multimodal import load_image
 
 pipe = pipeline('microsoft/Phi-3.5-vision-instruct')