[BUG] Evaluation tracker won't save task which relies on using metrics of 2 different types

## Describe the bug
EvaluationTracker.save() will fail at `dataset = Dataset.from_list([asdict(detail) for detail in task_details])` with 
```
Exception has occurred: ArrowInvalid       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
cannot mix list and non-list, non-null values
```
if the task launched uses metrics requiring both generative and logprobs metrics. (because they don't save lists of the same lengths for ex)

## To Reproduce
```
            "name": "launch vllm",
            "type": "debugpy",
            "request": "launch",
            "module": "lighteval",
            "args": [
                "vllm",
                "model_name=Qwen/Qwen3-0.6B,max_num_batched_tokens=100000,max_model_length=38912,generation_parameters={temperature:0.6,top_p:0.95,top_k:20,min_p:0,presence_penalty:1,max_new_tokens:38912},system_prompt='/no_think'",
                "lighteval|mmlu_redux_2:security_studies|0", //lighteval|mmlu_redux_2|0|0",
                "--max-samples",
                "10",
                "--save-details",
            ],
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Evaluation tracker won't save task which relies on using metrics of 2 different types #950

Describe the bug

To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Evaluation tracker won't save task which relies on using metrics of 2 different types #950

Description

Describe the bug

To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions