Fix tool calling for Mistral 3#132
Conversation
There was a problem hiding this comment.
The changes in this file were the initial changes I made. Without these, the Mistral3 models never even received the tools from the caller.
| // Auto-detect tool call format from model type if not explicitly set | ||
| if mutableConfiguration.toolCallFormat == nil { | ||
| mutableConfiguration.toolCallFormat = ToolCallFormat.infer(from: baseConfig.modelType) | ||
| } |
There was a problem hiding this comment.
This seemed like an oversight. It should always have been present, as far as I could tell.
There was a problem hiding this comment.
Pull request overview
This PR fixes Mistral 3 tool-calling end-to-end by (1) including tool schemas in the Mistral3 chat template application and (2) adding parsing/streaming support for Mistral’s [TOOL_CALLS]...[ARGS]... tool-call format (including buffering until generation end).
Changes:
- Add a new
.mistraltool-call format andMistralToolCallParser, plus buffering/flush support to extract tool calls when EOS never appears in the text stream. - Ensure Mistral3 VLM prompt generation passes
tools(andadditionalContext) intoapplyChatTemplate. - Add/extend unit + integration tests for Mistral3 message conversion, format inference, parsing, and end-to-end generation.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| Tests/MLXLMTests/UserInputTests.swift | Adds unit tests validating Mistral3 message generation for text/image/tool roles. |
| Tests/MLXLMTests/ToolTests.swift | Extends format inference tests and adds parser/processor tests for Mistral tool calls + flushing behavior. |
| Tests/MLXLMIntegrationTests/ToolCallIntegrationTests.swift | Adds Mistral3 integration coverage for auto-detection and tool-call generation/parsing. |
| Libraries/MLXVLM/VLMModelFactory.swift | Auto-detects toolCallFormat for VLM containers based on model_type when unset. |
| Libraries/MLXVLM/Models/Mistral3.swift | Passes tools + additionalContext into chat template application so tools reach the model. |
| Libraries/MLXLMCommon/Tool/ToolCallProcessor.swift | Adds flush() to parse buffered tool calls at generation end (for formats where end-tag text isn’t emitted). |
| Libraries/MLXLMCommon/Tool/ToolCallFormat.swift | Introduces .mistral and wires it into parser creation + format inference for mistral3*. |
| Libraries/MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift | Adds parser for Mistral [TOOL_CALLS] / [ARGS] (and optional [CALL_ID]) format. |
| Libraries/MLXLMCommon/Evaluate.swift | Adds a generation-end callback to flush buffered tool calls and emit them before the final .info. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I've addressed the pull request comments. This should be ready for review. |
| return | ||
| } | ||
|
|
||
| if let startTag = parser.startTag { |
There was a problem hiding this comment.
Do you think this should be something that a tool call parser opts into? Or is only engaged when it doesn't have an endTag set?
Actually I think this splitting probably belongs in the ToolCallParser, something like:
/// handle remaining tool call data at EOS
func parseEOS(_ toolCallBuffer: String) -> [ToolCall]I am not sure if it should throws -- it doesn't right now and the tool call parser seems "resilient" (best effort), so probably no.
I think we can't have an endTag in the string because the parser would have already handled that, so we could just use this code here as the default implementation but it would leave the ability for tools to override it.
There was a problem hiding this comment.
I pushed a change that I think matches your idea.
Libraries/MLXLMCommon/Evaluate.swift
Outdated
| let pendingToolCalls = toolCallProcessor.toolCalls | ||
| toolCallProcessor.toolCalls.removeAll(keepingCapacity: true) |
There was a problem hiding this comment.
Why clear this? The processor is local to the eval loop, why not just iterate the toolCalls?
Libraries/MLXLMCommon/Evaluate.swift
Outdated
| let pendingToolCalls = toolCallProcessor.toolCalls | ||
| toolCallProcessor.toolCalls.removeAll(keepingCapacity: true) | ||
|
|
||
| for toolCall in pendingToolCalls { |
There was a problem hiding this comment.
I wonder if the caller will actually handle multiple tool calls (one of the cases in the tests)? If using registered tools it will immediately call the tool and restart with the <tool>...</tool> output (I think). If the caller is iterating details then they may do the same.
It might not matter as the model may just emit a second tool call after it only has one result, or maybe we need to file an issue on this -- handle multiple sequential tool calls.
There was a problem hiding this comment.
At least in our app's implementation, we definitely handle support multiple tool calls inside of the same turn. I'll paste our actual streaming response loop below so you can see how it looks in practice, but the key thing to note is we immediately stream reasoning and text responses to the UI and collect the tool calls. The tool calls are then executed, their results collected, and then those results are re-applied to the conversation, and then we await the next response from the model. So, at least for us, we'd want to support multiple tool calls.
for try await chunk in llm {
guard !Task.isCancelled else {
throw CancellationError()
}
let chunkStart = r.text.endIndex
switch chunk {
case let .reasoning(text):
r.text += text
if let current = currentReasoningRange {
currentReasoningRange =
current.lowerBound ..< r.text.endIndex
} else {
currentReasoningRange =
chunkStart ..< r.text.endIndex
}
try SmartDocumentNeedingLocalLLM
.appendStreamingContent(
chunk: text,
for: needingLLM.id,
in: db
)
case let .text(text):
pushCurrentReasoningRange()
r.text += text
try SmartDocumentNeedingLocalLLM
.appendStreamingContent(
chunk: text,
for: needingLLM.id,
in: db
)
case let .toolCall(call):
pushCurrentReasoningRange()
r.toolCalls.append(call)
}
}There was a problem hiding this comment.
OK that makes sense -- it wasn't a case I knew about so it doesn't work that way in ChatSession internal tool handling. Added:
davidkoski
left a comment
There was a problem hiding this comment.
Just waiting for updates on requested changes (nil endTag) and reorder (and maybe rename) the functions.
|
Sorry about the delay. I've pushed a commit that addresses your feedback. I just re-ran the tests, and the tool integration tests run reliably with |
|
One build failure: |
davidkoski
left a comment
There was a problem hiding this comment.
OK, everything looks good. Thank you!!
Proposed changes
The Mistral 3 text and vision models, used by Ministral 3 and Devstral 2, support tool calling, but mlx-swift-lm did not send tools to the model and did not properly process tool calls from the model. This pull request addresses both of those issues.
Prior to these changes
Tools never even made it to the model. So, the Mistral3 models just hallucinated a response.
After applying these changes
The Mistral3 models receive the tool, and the newly-introduced
MistralToolCallParserparses the tool calls correctly.Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes