-
Notifications
You must be signed in to change notification settings - Fork 182
Open
Description
Is there any reason why both these lines:
llmperf/token_benchmark_ray.py
Line 94 in f1d6bed
clients = construct_clients(llm_api=llm_api, num_clients=1) |
llmperf/token_benchmark_ray.py
Line 148 in f1d6bed
clients = construct_clients(llm_api=llm_api, num_clients=1) |
have fixed num_clients=1
. Couldn't them be changed to:
import multiprocessing
num_cores = multiprocessing.cpu_count() # Get total available CPU cores
clients = construct_clients(llm_api=llm_api, num_clients=num_cores)
As I've seen in the docs, the parallelism in ray can be done 2 ways:
clients = [OpenAIChatCompletionsClient.remote() for _ in range(8)] # multiple actors
# OR
@ray.remote(num_cpus=2) # Each actor uses 2 CPUs
class OpenAIChatCompletionsClient(LLMClient):
pass
And in this case I would prefer the first way... Am I missing anything or the way the code is written does not use all the available CPUs??
Metadata
Metadata
Assignees
Labels
No labels