You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not all prompts are equally good/effective. Garak users want to be able to do fast runs. Why would we run all prompts every time?
There could be a better way to do this:
Run the “known spiciest” prompts first
Downsample formulations that don’t in general work so well
This requires tracking performance of prompts and prompt components, which enables a shift-left in LLM security.
Identifying prompts that work better/worse is a strong feedback channel and yields intel that affords keeping garak sharp and thereby current.
Sub-tasks include:
Determine per-prompt ASR Track the success rate of each prompt, perhaps mapped to a hash during calibration bag update in perf_stats
Prompt bucketing Bucket prompts into low, medium, high success rates (quantiles per probe-detector pair? absolute bounds? global quantiles, so some probes get no/all low/high success probes, and we discover weaker probes faster?) - this will be a bag output
Prompt downsampling Allow downsample formulations that in general don't work so well, affording faster runs/better budgeting
Spicy first Run spiciest (high-ASR bucket) prompts first (involves staging all prompts first, then doing the sort, then minting; propose doing "unbanded" prompts between spicy + medium); add new/unbucketed prompts to probes with lower prop'n of spicy prompts
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Not all prompts are equally good/effective. Garak users want to be able to do fast runs. Why would we run all prompts every time?
There could be a better way to do this:
This requires tracking performance of prompts and prompt components, which enables a shift-left in LLM security.
Identifying prompts that work better/worse is a strong feedback channel and yields intel that affords keeping garak sharp and thereby current.
Sub-tasks include:
Beta Was this translation helpful? Give feedback.
All reactions