This example demonstrates how to launch a model server using vec-inf
, and run a downstream SLURM job that waits for the server to become ready before querying it.
This directory contains the following:
-
run_workflow.sh Launches the model server and submits the downstream job with a dependency, so it starts only after the server job begins running.
-
downstream_job.sbatch A SLURM job script that runs the downstream logic (e.g., prompting the model).
-
run_downstream.py A Python script that waits until the inference server is ready, then sends a request using the OpenAI-compatible API.
Before running this example, update the following in downstream_job.sbatch:
--job-name
,--output
, and--error
paths- Virtual environment path in the
source
line - SLURM resource configuration (e.g., partition, memory, GPU)
Also update the model name in run_downstream.py to match what you're launching.
First, activate a virtual environment where vec-inf
is installed. Then, from this directory, run:
bash run_workflow.sh