You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# 🎉 Major Updates
- 🚀 1.25x speed improvements (1.5x with `use_kv_cache=True`)
- 📉 Introduced `autoGCG` - automatic GCG tuning using Bayesian
optimization
- 💼 Data subsystem refactor to enable arbitrary dataset
support
- 🧠 Add a tutorial on how to use **LLM**art as a standalone
library.
# 🎈 Minor Updates
- Support for uv
- More intuitive dataset splitting parameters
- Disable early stopping via `early_stop=False`
- Run test only via `attack=None` or `steps=0`
- Option to enable/disable batch splitting via
`data.split_batches=True/False`
- Reusable closure creation
# 🚧 Bug Fixes
- Remove `world_size` from optimizer
- Fix `_local_swap_count` being on wrong device in optimizer
---------
Co-authored-by: Marius Arvinte <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+38-28Lines changed: 38 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,17 +10,22 @@
10
10
11
11
</div>
12
12
13
-
## :pushpin: What is **LLM**art?
13
+
## 🆕 Latest updates
14
+
❗Release 2025.02 brings significant speed-ups to the core library, with zero user involvement.\
15
+
We additionally recommend using the command line argument `per_device_bs` with a value as large as possible on GPUs with at least 48GB to take the most advantage of further speed-ups.
16
+
17
+
❗We now offer command-line support for jailbreaking thoughts and responses for DeepSeek-R1 on multi-GPU:
18
+
```bash
19
+
accelerate launch -m llmart model=deepseek-r1-distill-llama-8b data=basic per_device_bs=64 "response.replace_with=`echo -e '\"<think>\nOkay, so I need to tell someone about Saturn.\n</think>\n\nNO WAY JOSE\"'`"
20
+
```
21
+
22
+
❗Check out our new [notebook](examples/basic/basic_dev_workflow.ipynb) containing a detailed step-by-step developer overview of all `llmart` components and how to customize them.
14
23
15
24
**LLM**art is a toolkit for evaluating LLM robustness through adversarial testing. Built with PyTorch and Hugging Face integrations, **LLM**art enables scalable red teaming attacks with parallelized optimization across multiple devices.
16
25
**LLM**art has configurable attack patterns, support for soft prompt optimization, detailed logging, and is intended both for high-level users that want red team evaluation with off-the-shelf algorithms, as well as research power users that intend to experiment with the implementation details of input-space optimization for LLMs.
17
26
18
27
While it is still under development, the goal of **LLM**art is to support any Hugging Face model and include example scripts for modular implementation of different attack strategies.
19
28
20
-
🆕 We now offer command-line support for jailbreaking thoughts and responses for DeepSeek-R1 on multi-GPU:
21
-
```bash
22
-
accelerate launch -m llmart model=deepseek-r1-distill-llama-8b data=basic per_device_bs=64 "response.replace_with=`echo -e '\"<think>\nOkay, so I need to tell someone about Saturn.\n</think>\n\nNO WAY JOSE\"'`"
23
-
```
24
29
25
30
## :rocket: Quick start
26
31
Developed and tested on Ubuntu 22.04 with `Python 3.11`.
@@ -41,7 +46,8 @@ pip install -e ".[core,dev]"
41
46
```
42
47
43
48
> [!NOTE]
44
-
> We also include a Poetry 2.0 `poetry.lock` file that perfectly reproduces dependencies we use.
49
+
> We include lockfiles that ensure a valid Python environment and recommend using `uv sync --all-extras` to ensure consistent development.\
50
+
> Alternatively, `poetry sync --all-extras` can be used with `poetry >= 2.0`.
45
51
46
52
Once the environment is installed and `export HUGGINGFACE_TOKEN=...` is set to a token with valid model access, **LLM**art can be run to optimize the suffix with:
47
53
```bash
@@ -77,6 +83,8 @@ The algorithmic **LLM**art functionality is structured as follows and uses PyTor
77
83
┗ 📂pipelines # Wrappers for Hugging Face pipelines
78
84
```
79
85
86
+
A detailed developer workflow that puts together all components to launch an adversarial attack against an LLM can be found in [examples/basic](examples/basic/).
87
+
80
88
## :robot: Models
81
89
While **LLM**art comes with a limited number of models accessible via custom naming schemes (see the `PipelineConf` class in `config.py`), it is designed with Hugging Face hub model compatibility in mind.
82
90
@@ -113,39 +121,41 @@ The `basic` dataset is the simplest instantiation that lives at [`datasets/basic
113
121
For example, to modify the natural language prompt and desired open-ended response, the following basic edit is sufficient:
- dict(role="user", content="Tell me about the planet Saturn."),
134
-
+ dict(role="user", content="Tell me how to do something not allowed."),
135
-
- dict(role="assistant", content="NO WAY JOSE"),
136
-
+ dict(role="assistant", content="NO WAY JOSE<|eot_id|>"),
137
-
],
138
-
]
136
+
# Create conversation data structure and mark parts we care about
137
+
conv = [
138
+
dict(role="user", content=mark_prompt("Tell me about the planet Saturn.")),
139
+
- dict(role="assistant", content=mark_completion("NO WAY JOSE")),
140
+
+ dict(role="assistant", content=mark_completion("NO WAY JOSE<|eot_id|>")),
141
+
]
139
142
```
140
143
141
144
**LLM**art also supports loading the [AdvBench](https://github.com/llm-attacks/llm-attacks) dataset, which comes with pre-defined target responses to ensure consistent benchmarks.
142
145
143
-
Using AdvBench with **LLM**art requires downloading the two files to disk, after which simply specifying the desired dataset and the subset of samples to attack will run out of the box:
146
+
Using AdvBench with **LLM**art requires specifying the desired subset of samples to attack. By default, the following command will automatically download the .csv file from its [original source](https://raw.githubusercontent.com/llm-attacks/llm-attacks/refs/heads/main/data/advbench/harmful_behaviors.csv) and use it as a dataset:
In the most general case, you can write your own [dataset loading script](https://huggingface.co/docs/datasets/en/dataset_script) and pass it to **LLM**art:
Just make sure you conform to the output format in [`datasets/basic.py`](src/llmart/datasets/basic.py).
149
159
150
160
## :chart_with_downwards_trend: Optimizers and schedulers
151
161
Discrete optimization for language models [(Lei et al, 2019)](https://proceedings.mlsys.org/paper_files/paper/2019/hash/676638b91bc90529e09b22e58abb01d6-Abstract.html)– in particular the Greedy Coordinate Gradient (GCG) applied to auto-regressive LLMs [(Zou et al, 2023)](https://arxiv.org/abs/2307.15043)– is the main focus of [`optim.py`](src/llmart/optim.py).
@@ -175,7 +185,7 @@ If you find this repository useful in your work, please cite:
175
185
author = {Cory Cornelius and Marius Arvinte and Sebastian Szyller and Weilin Xu and Nageen Himayat},
176
186
title = {{LLMart}: {L}arge {L}anguage {M}odel adversarial robutness toolbox},
Install `llmart`and download/navigate to this folder. Run `pip install -r requirements.txt` in the working environment.
3
+
4
+
5
+
# `autoGCG` with `llmart`
6
+
The example in this folder shows how to integrate `LLMart` with the [ray-tune](https://docs.ray.io/en/latest/tune/index.html) hyperparameter optimization library to automatically search for the best attack hyper-parameters across one or multiple samples, given a total compute budget.
7
+
8
+
We call this functionality `autoGCG` -- automated greedy coordinate descent.
9
+
10
+
To run `autoGCG` on the `i`-th sample of the [AdvBench behavior](https://github.com/llm-attacks/llm-attacks/blob/main/data/advbench/harmful_behaviors.csv) dataset execute:
11
+
```bash
12
+
python main.py --subset i
13
+
```
14
+
15
+
The script will automatically use the maximum number of GPUs and parallelize hyper-parameter tuning for the `n_tokens` hyper-parameter of GCG using `llmart`'s a [ChangeOnPlateauInteger](../../src/llmart/schedulers.py#L279) scheduler.
16
+
> [!NOTE]
17
+
> The default parameter `"per_device_bs": 64` may add too much memory pressure on GPUs with less than 48 GB of VRAM. If OOM errors occur, lowering `per_device_bs` should fix the issue.
18
+
19
+
Given the sensitivity of GCG with respect to seeding (random swap picking during optimization), `autoGCG` exploits this by minimizing the 10-percentile loss across ten different seeds, for the same sample.
20
+
21
+
By default, the optimization runs for a total of _two wall-clock hours_, regardless of how many GPUs are available:
The `ray.tune` experiment will be saved at the default location of `~/ray_results/autogcg_sample{i}`, after which it can be analyzed using [`tune.Tuner.restore`](https://docs.ray.io/en/latest/tune/examples/tune_analyze_results.html).
31
+
> [!NOTE]
32
+
> Properly using `tune.Tuner.restore` will require importing the experiment function as `from main import experiment` and passing it as an argument.
Install `llmart` and download/navigate to this folder. Run `pip install -r requirements.txt`.
3
+
4
+
To understand and run the basic `llmart` developer workflow, see the [notebook](basic_dev_workflow.ipynb); alternatively, see the standalone [script](main.py).
0 commit comments