-
Notifications
You must be signed in to change notification settings - Fork 345
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
EvaluationTracker.save() will fail at dataset = Dataset.from_list([asdict(detail) for detail in task_details])
with
Exception has occurred: ArrowInvalid (note: full exception trace is shown but execution is paused at: _run_module_as_main)
cannot mix list and non-list, non-null values
if the task launched uses metrics requiring both generative and logprobs metrics. (because they don't save lists of the same lengths for ex)
To Reproduce
"name": "launch vllm",
"type": "debugpy",
"request": "launch",
"module": "lighteval",
"args": [
"vllm",
"model_name=Qwen/Qwen3-0.6B,max_num_batched_tokens=100000,max_model_length=38912,generation_parameters={temperature:0.6,top_p:0.95,top_k:20,min_p:0,presence_penalty:1,max_new_tokens:38912},system_prompt='/no_think'",
"lighteval|mmlu_redux_2:security_studies|0", //lighteval|mmlu_redux_2|0|0",
"--max-samples",
"10",
"--save-details",
],
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working