@@ -434,6 +434,68 @@ curl http://${host_ip}:9090/v1/guardrails\
434434 -H ' Content-Type: application/json'
435435```
436436
437+ ### Profile Microservices
438+
439+ To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
440+
441+ #### 1. vLLM backend Service
442+
443+ Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
444+ By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
445+
446+ ##### Start vLLM profiling
447+
448+ ```bash
449+ curl http://${host_ip}:9009/start_profile \
450+ -H "Content-Type: application/json" \
451+ -d ' {" model" : ${LLM_MODEL_ID} }'
452+ ```
453+
454+ Users would see below docker logs from vllm-service if profiling is started correctly.
455+
456+ ```bash
457+ INFO api_server.py:361] Starting profiler...
458+ INFO api_server.py:363] Profiler started.
459+ INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
460+ ```
461+
462+ After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
463+ or ChatQnA MicroService.
464+
465+ ##### Stop vLLM profiling
466+
467+ By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result
468+ under /mnt folder in vllm-service docker instance.
469+
470+ ```bash
471+ # vLLM Service
472+ curl http://${host_ip}:9009/stop_profile \
473+ -H "Content-Type: application/json" \
474+ -d ' {" model" : ${LLM_MODEL_ID} }'
475+ ```
476+
477+ Users would see below docker logs from vllm-service if profiling is stopped correctly.
478+
479+ ```bash
480+ INFO api_server.py:368] Stopping profiler...
481+ INFO api_server.py:370] Profiler stopped.
482+ INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
483+ ```
484+
485+ After vllm profiling is stopped, users could use below command to get the \*.pt.trace.json.gz file under /mnt folder.
486+
487+ ```bash
488+ docker cp vllm-service:/mnt/ .
489+ ```
490+
491+ ##### Check profiling result
492+
493+ Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
494+ to see the vLLM profiling result as below diagram.
495+ 
496+
497+ 
498+
437499## 🚀 Launch the UI
438500
439501### Launch with origin port
0 commit comments