Skip to content

[BUG] PreTrainedTokenizerFast._batch_encode_plus() got multiple values for keyword argument 'truncation_strategy #949

@yujonglee

Description

@yujonglee

Describe the bug

While running pipeline:

You cannot select the number of dataset splits for a generative evaluation at the moment. Automatically inferring.
Splits:   0%|          | 0/1 [00:00<?, ?it/s]          Adding requests: 100%|██████████| 10/10 [00:00<00:00, 1842.11it/s]requests:   0%|          | 0/10 [00:00<?, ?it/s]

Processed prompts:  10%|█         | 1/10 [00:09<01:28,  9.84s/it, est. speed input: 235.04 toks/s, output: 41.68 tProcessed prompts:  30%|███       | 3/10 [00:10<00:20,  2.87s/it, est. speed input: 506.65 toks/s, output: 122.60 Processed prompts:  50%|█████     | 5/10 [00:10<00:07,  1.42s/it, est. speed input: 916.65 toks/s, output: 214.78 Processed prompts:  60%|██████    | 6/10 [00:11<00:04,  1.22s/it, est. speed input: 1067.55 toks/s, output: 252.07Processed prompts:  70%|███████   | 7/10 [00:11<00:02,  1.02it/s, est. speed input: 1171.63 toks/s, output: 294.79Processed prompts:  80%|████████  | 8/10 [00:12<00:01,  1.13it/s, est. speed input: 1316.77 toks/s, output: 330.65Processed prompts: 100%|██████████| 10/10 [00:13<00:00,  1.25it/s, est. speed input: 1451.31 toks/s, output: 396.9ProcessedProcessed prompts: 100%|██████████| 10/10 [00:13<00:00,  1.39s/it, est. speed input: 1451.31 toks/s, output: 396.93 toks/s]
Splits: 100%|██████████| 1/1 [00:13<00:00, 13.97s/it]Splits: 100%|██████████| 1/1 [00:13<00:00, 13.97s/it]
Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 135.89ba/s]
Generating train split: 0 examples [00:00, ? examples/s]Generating train split: 10 examples [00:00, 1094.00 examples/s]
[rank0]:[W908 22:32:07.824350850 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Traceback (most recent call last):
  File "/pkg/modal/_runtime/container_io_manager.py", line 778, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 243, in run_input_sync
    res = io_context.call_finalized_function()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pkg/modal/_runtime/container_io_manager.py", line 197, in call_finalized_function
    res = self.finalized_function.callable(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/main2.py", line 21, in main
    evaluate()
  File "/root/eval/main.py", line 50, in evaluate
    pipeline.evaluate()
  File "/usr/local/lib/python3.12/site-packages/lighteval/pipeline.py", line 317, in evaluate
    self._compute_metrics(outputs)
  File "/usr/local/lib/python3.12/site-packages/lighteval/pipeline.py", line 417, in _compute_metrics
    outputs = apply_metric(
              ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/lighteval/metrics/__init__.py", line 50, in apply_metric
    metric.compute_sample(
  File "/usr/local/lib/python3.12/site-packages/lighteval/metrics/utils/metric_utils.py", line 59, in compute_sample
    return {self.metric_name: sample_level_fn(**kwargs)}
                              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/lighteval/metrics/metrics_sample.py", line 752, in compute
    return self.summac.score_one(inp, prediction)["score"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/lighteval/metrics/imports/summac.py", line 288, in score_one
    image = self.imager.build_image(original, generated)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/lighteval/metrics/imports/summac.py", line 218, in build_image
    batch_tokens = self.tokenizer.batch_encode_plus(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3200, in batch_encode_plus
    return self._batch_encode_plus(
           ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: transformers.tokenization_utils_fast.PreTrainedTokenizerFast._batch_encode_plus() got multiple values for keyword argument 'truncation_strategy'
Stopping app - uncaught exception raised in remote container: TypeError("transformers.tokenization_utils_fast.PreTrainedTokenizerFast._batch_encode_plus() got multiple values for keyword argument 'truncation_strategy'").
╭─ Error ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ transformers.tokenization_utils_fast.PreTrainedTokenizerFast._batch_encode_plus() got multiple values for      │
│ keyword argument 'truncation_strategy'                                                                         │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
task: Failed to run task "modal": exit status 1

To Reproduce

import sys
import os

sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))

from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.vllm.vllm_model import VLLMModelConfig
from lighteval.pipeline import ParallelismManager, Pipeline, PipelineParameters
from lighteval.utils.imports import is_accelerate_available

if is_accelerate_available():
    from datetime import timedelta
    from accelerate import Accelerator, InitProcessGroupKwargs

    accelerator = Accelerator(kwargs_handlers=[InitProcessGroupKwargs(timeout=timedelta(seconds=3000))])
else:
    accelerator = None


def evaluate():
    evaluation_tracker = EvaluationTracker(
        output_dir="./results",
        save_details=True,
        push_to_hub=True,
        hub_results_org="yujonglee",
    )

    pipeline_params = PipelineParameters(
        launcher_type=ParallelismManager.ACCELERATE,
        custom_tasks_directory="tasks",
        max_samples=10,
    )

    model_config = VLLMModelConfig(
        model_name="Qwen/Qwen3-0.6B",
        dtype="float16",
    )

    # https://huggingface.co/docs/lighteval/en/available-tasks
    tasks = ["helm|legal_summarization:billsum|0|0"]
    task = ",".join(tasks)

    pipeline = Pipeline(
        tasks=task,
        pipeline_parameters=pipeline_params,
        evaluation_tracker=evaluation_tracker,
        model_config=model_config,
    )

    pipeline.evaluate()
    pipeline.save_and_push_results()
    pipeline.show_results()


if __name__ == "__main__":
    evaluate()

Expected behavior

Run without error.

Version info

git+https://github.com/huggingface/lighteval.git@7ed2636#egg=lighteval[vllm]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions