[Feature Request] Add max_tool_iterations parameter to GRPOTrainer

### Feature request

Add a max_tool_iterations config parameter (default 5 or 10) to limit the tool loop:

```
# In _tool_call_loop
iteration = 0
while idxs_with_tool and iteration < self.args.max_tool_iterations:
    iteration += 1
    # ...
```

### Motivation

When using `tools=` with `GRPOTrainer`, the `_tool_call_loop` runs indefinitely until either:
1. The model stops making tool calls
2. Sequence exceeds max_position_embeddings

This can cause OOM before the length check triggers. As training progresses and the model learns to use tools, sequences grow longer (tool calls + results + responses), eventually possibly causing OOM mid-training.

### Your contribution

not sure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Add max_tool_iterations parameter to GRPOTrainer #4751

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add max_tool_iterations parameter to GRPOTrainer #4751

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions