Skip to content

[BugFix] Fix gsm8k postprocess#2426

Open
Hibbert133 wants to merge 1 commit intoopen-compass:mainfrom
Hibbert133:fix_gsm8k_postprocess
Open

[BugFix] Fix gsm8k postprocess#2426
Hibbert133 wants to merge 1 commit intoopen-compass:mainfrom
Hibbert133:fix_gsm8k_postprocess

Conversation

@Hibbert133
Copy link
Copy Markdown

@Hibbert133 Hibbert133 commented Mar 30, 2026

Motivation

This PR fixes an answer post-processing issue in the GSM8K evaluation of OpenCompass.

During evaluation, we found that a large portion of incorrect predictions were actually correct answers that were mistakenly judged as wrong due to a formatting issue in the post-processing step.

Among approximately 130 incorrect cases, around 50 cases share the same pattern:
the model outputs the correct numeric answer, but the evaluation pipeline fails to correctly extract it.

This is not an instruction-following problem, but a post-processing bug.

In GSM8K-style prompts, monetary values frequently appear with comma formatting such as:

"origin_prompt": "Question: Josh decides to try flipping a house. He buys a house for $80,000 and then puts in $50,000 in repairs. This increased the value of the house by 150%. How much profit did he make?\nLet's think step by step\nAnswer:",
"prediction": "Josh buys the house for $80,000 and spends $50,000 on repairs, so his total investment is:\n\n$80,000 (purchase) + $50,000 (repairs) = $130,000 total invested.\n\nThe repairs increased the value of the house by 150%, meaning the house’s value increased by:\n\n150% of $80,000 = 1.5 × $80,000 = $120,000 increase in value.\n\nSo the new value of the house is:\n\n$80,000 (original value) + $120,000 (increase) = $200,000.\n\nJosh’s profit is the final value minus his total investment:\n\n$200,000 (final value) - $130,000 (total investment) = $70,000 profit.\n\nThe answer is $70,000.",
"gold": "The cost of the house and repairs came out to 80,000+50,000=$<<80000+50000=130000>>130,000\nHe increased the value of the house by 80,0001.5=<<800001.5=120000>>120,000\nSo the new value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000\nSo he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000\n#### 70000"
},

print info:

"pred": ["000"],
"answer": ["70000"],
"correct": [false]

Modification

This PR updates the GSM8K post-processing logic in:opencompass/opencompass/datasets/gsm8k.py

Specifically, the gsm8k_postprocess function is modified to correctly handle numbers formatted with commas (e.g., 70,000).

The updated logic normalizes such formats by removing commas before performing numeric extraction, ensuring that the correct value is parsed.

Result

After applying this fix, GSM8K evaluation accuracy improves: 90.52 → 95.00

A significant portion of previously incorrect cases are now correctly evaluated.

Related Issue

Fixes #2343

Checklist

Before PR

  • Pre-commit or other linting tools are used to fix potential lint issues.
  • Bug fixes are fully covered by unit tests, including the case that triggers this issue.
  • The modification is covered by complete unit tests to ensure correctness.
  • Documentation has been updated accordingly (e.g., docstrings).

After PR

  • If the modification has potential influence on downstream or related projects, those projects should also be tested.
  • CLA has been signed and all committers have signed the CLA for this PR.

@Hibbert133 Hibbert133 changed the title [bugfix] Fix gsm8k postprocess [BugFix] Fix gsm8k postprocess Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Gsm8k evaluation- 评测答案后处理问题

2 participants