Feature request
Add a max_tool_iterations config parameter (default 5 or 10) to limit the tool loop:
# In _tool_call_loop
iteration = 0
while idxs_with_tool and iteration < self.args.max_tool_iterations:
iteration += 1
# ...
Motivation
When using tools= with GRPOTrainer, the _tool_call_loop runs indefinitely until either:
- The model stops making tool calls
- Sequence exceeds max_position_embeddings
This can cause OOM before the length check triggers. As training progresses and the model learns to use tools, sequences grow longer (tool calls + results + responses), eventually possibly causing OOM mid-training.
Your contribution
not sure.