🍱 OBenTO-LLM:
Open Benchmark Translation with Open LLMs

OBenTo-LLM provides the tools to translate evaluation benchmarks from a source language to other languages using open-source Large Language Models (LLMs) and a standardized translation pipeline.

What is the problem?

Some people may say that the translation of evaluation benchmarks is a trivial task. However, if we blindly translate the benchmarks, we may introduce biases or errors that can affect our assessment of the models' performance in the target language.

Importantly, different datasets are characterized by nuances and linguistic features that can greatly affect the translation quality, even when using state-of-the-art LLMs, such as GPT-3.5/4. For example, the translation of the ARC Challenge dataset requires a different approach than the translation of the Winogrande dataset. Not only that, different instances in ARC Challenge may require different translation strategies to obtain the best results.

However, for many existing translated benchmarks, we do not have access to the code used to generate the translations, which can cause issues when comparing models' performance across languages.

How can this project help?

OBenTO-LLM provides a standardized pipeline to translate evaluation benchmarks using Large Language Models (LLMs). The aim is to provide a pipeline that is:

Free: based on open-source LLMs, so it is free to use if you have the resources to run the translation models.
Tailored: designed to provide to maximize translation quality across different datasets by considering their peculiarities.
Reproducible: can be used to quickly regenerate the translations of the benchmarks.
Transparent: designed to be transparent, so you can understand how the translations are generated.
Extensible: can be easily extended to support new datasets and languages.

A note on quality: the quality of the translations depends on the LLMs used. The pipeline is designed to provide a good starting point for translating evaluation benchmarks, but, even if models like Tower-LLM are on par with GPT-3.5/4, they may not be perfect. Therefore, we always recommended to check the translations before using them in your experiments.

Supported Datasets

Dataset	Original Dataset	IT Translation
ARC Challenge	allenai/ai2_arc	sapienzanlp/arc_italian
ARC Easy	allenai/ai2_arc	sapienzanlp/arc_italian
BoolQ	google/boolq	sapienzanlp/boolq_italian
GSM8K	gsm8k	sapienzanlp/gsm8k_italian
HellaSwag	Rowan/hellaswag	sapienzanlp/hellaswag_italian
MMLU	cais/mmlu	sapienzanlp/mmlu_italian
PIQA	piqa	sapienzanlp/piqa_italian
SciQ	allenai/sciq	sapienzanlp/sciq_italian
TruthfulQA	truthful_qa	sapienzanlp/truthful_qa_italian
Winogrande	winogrande	sapienzanlp/winogrande_italian
GPQA	Idavidrein/gpqa	sapienzanlp/gpqa_italian
MuSR	TAUR-Lab/MuSR	sapienzanlp/MUSR_italian
MATH	lighteval/MATH-Hard	sapienzanlp/MATH_hard_italian
BBH	SaylorTwift/bbh	sapienzanlp/BBH_italian

Missing a dataset? Open an issue or submit a pull request! If you do not have the resources or hardware to translate the datasets, we can help you with that. Missing a language? We are currently writing a guide to help you translate the datasets in other languages. Stay tuned!

Supported LLMs

Currently, the pipeline supports the following LLMs:

Missing an LLM? OBenTO-LLM is designed to support most LLMs available in the Hugging Face model hub. We tested the pipeline with TowerInstruct models, but it should work with other models as well. If you encounter any issues, open an issue or submit a pull request!

Installation

It is recommended to setup a Conda environment to run the code. To create a new Conda environment, run the following command:

conda create --name llm-data-translation python=3.10

Remember to activate the Conda environment before running the code:

conda activate llm-data-translation

To install the required packages, run the following command:

pip install -r requirements.txt

Usage

The code is organized as follows:

src/translation/translate_<dataset>.py: Contains the code to translate the dataset from a source language to other languages, e.g., Italian.

For example, to translate the allenai/ai2_arc dataset from English to Italian, run the following command:

python src/translation/translate_arc.py \
    --source_language English \
    --target_language Italian \
    --output_path data/translations/it/arc_challenge.train.json \
    --model_name Unbabel/TowerInstruct-7B-v0.2 \
    --device_map "cuda:0" \
    --dataset_name allenai/ai2_arc \
    --dataset_config ARC-Challenge \
    --split train \
    --batch_size 4 \
    --max_new_tokens 1024 \
    --beam_size 3 \
    --length_penalty 2.5 \
    --num_return_sequences 1 \
    --do_sample False \
    --early_stopping False

The translated dataset will be saved in the specified output path. For more details on the arguments, check out the documentation of the script.

Translation scripts

The following scripts are available to translate the datasets from English to Italian:

scripts/local/translation/translate_arc_challenge.sh: Translates the ARC Challenge dataset.
scripts/local/translation/translate_arc_easy.sh: Translates the ARC Easy dataset.
scripts/local/translation/translate_boolq.sh: Translates the BoolQ dataset.
scripts/local/translation/translate_gsm8k.sh: Translates the GSM8K dataset.
scripts/local/translation/translate_hellaswag.sh: Translates the HellaSwag dataset.
scripts/local/translation/translate_mmlu.sh: Translates the MMLU dataset.
scripts/local/translation/translate_piqa.sh: Translates the PIQA dataset.
scripts/local/translation/translate_sciq.sh: Translates the SciQ dataset.
scripts/local/translation/translate_truthfulqa.sh: Translates the TruthfulQA dataset.
scripts/local/translation/translate_winogrande.sh: Translates the Winogrande dataset.
scripts/local/translation/translate_gpqa.sh: Translates the GPQA dataset.
scripts/local/translation/translate_musr.sh: Translates the MuSR dataset.
scripts/local/translation/translate_math.sh: Translates the MATH dataset.
scripts/local/translation/translate_bbh.sh: Translates the BBH dataset.

For more details about the parameters, check the scripts.

Publication and citation

If you use this library or part of it, consider to cite us:

@inproceedings{moroni-etal-2024-towards,
    title = "Towards a More Comprehensive Evaluation for {I}talian {LLM}s",
    author = "Moroni, Luca  and
      Conia, Simone  and
      Martelli, Federico  and
      Navigli, Roberto",
    editor = "Dell'Orletta, Felice  and
      Lenci, Alessandro  and
      Montemagni, Simonetta  and
      Sprugnoli, Rachele",
    booktitle = "Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)",
    month = dec,
    year = "2024",
    address = "Pisa, Italy",
    publisher = "CEUR Workshop Proceedings",
    url = "https://aclanthology.org/2024.clicit-1.67/",
    pages = "584--599",
    ISBN = "979-12-210-7060-6",
}

License

This repository is licensed under the MIT License. See the LICENSE file for more information.

Acknowledgments

Future AI Research for supporting this work.
Hugging Face for building the transformers and datasets libraries.
Unbabel for building Tower-LLM.
Thanks to the authors of the original datasets for making them available.

Special thanks

We would like to thank:

Simone Conia for the core idea and the development of the OBENTO library;
Pere-Lluís Huguet Cabot for his help with setting up the Tower-LLM model;
Riccardo Orlando for his experience with multi-GPU inference.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
scripts/local		scripts/local
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🍱 OBenTO-LLM:
Open Benchmark Translation with Open LLMs

What is the problem?

How can this project help?

Supported Datasets

Supported LLMs

Installation

Usage

Translation scripts

Publication and citation

License

Acknowledgments

Special thanks

About

Uh oh!

Releases

Packages

Languages

License

SapienzaNLP/OBenTO

Folders and files

Latest commit

History

Repository files navigation

🍱 OBenTO-LLM:Open Benchmark Translation with Open LLMs

What is the problem?

How can this project help?

Supported Datasets

Supported LLMs

Installation

Usage

Translation scripts

Publication and citation

License

Acknowledgments

Special thanks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🍱 OBenTO-LLM:
Open Benchmark Translation with Open LLMs

Packages