Fix tool calling for Mistral 3 by atdrendel · Pull Request #132 · ml-explore/mlx-swift-lm

atdrendel · 2026-03-03T19:21:54Z

Proposed changes

The Mistral 3 text and vision models, used by Ministral 3 and Devstral 2, support tool calling, but mlx-swift-lm did not send tools to the model and did not properly process tool calls from the model. This pull request addresses both of those issues.

Prior to these changes

Tools never even made it to the model. So, the Mistral3 models just hallucinated a response.

Mistral3 unsuccessful tool calling before changes

After applying these changes

The Mistral3 models receive the tool, and the newly-introduced MistralToolCallParser parses the tool calls correctly.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

Libraries/MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift

Libraries/MLXLMCommon/Tool/ToolCallProcessor.swift

atdrendel · 2026-03-03T19:30:01Z

Libraries/MLXVLM/Models/Mistral3.swift

The changes in this file were the initial changes I made. Without these, the Mistral3 models never even received the tools from the caller.

atdrendel · 2026-03-03T19:30:42Z

Libraries/MLXVLM/VLMModelFactory.swift

+        // Auto-detect tool call format from model type if not explicitly set
+        if mutableConfiguration.toolCallFormat == nil {
+            mutableConfiguration.toolCallFormat = ToolCallFormat.infer(from: baseConfig.modelType)
+        }


This seemed like an oversight. It should always have been present, as far as I could tell.

Copilot

Pull request overview

This PR fixes Mistral 3 tool-calling end-to-end by (1) including tool schemas in the Mistral3 chat template application and (2) adding parsing/streaming support for Mistral’s [TOOL_CALLS]...[ARGS]... tool-call format (including buffering until generation end).

Changes:

Add a new .mistral tool-call format and MistralToolCallParser, plus buffering/flush support to extract tool calls when EOS never appears in the text stream.
Ensure Mistral3 VLM prompt generation passes tools (and additionalContext) into applyChatTemplate.
Add/extend unit + integration tests for Mistral3 message conversion, format inference, parsing, and end-to-end generation.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
Tests/MLXLMTests/UserInputTests.swift	Adds unit tests validating Mistral3 message generation for text/image/tool roles.
Tests/MLXLMTests/ToolTests.swift	Extends format inference tests and adds parser/processor tests for Mistral tool calls + flushing behavior.
Tests/MLXLMIntegrationTests/ToolCallIntegrationTests.swift	Adds Mistral3 integration coverage for auto-detection and tool-call generation/parsing.
Libraries/MLXVLM/VLMModelFactory.swift	Auto-detects `toolCallFormat` for VLM containers based on `model_type` when unset.
Libraries/MLXVLM/Models/Mistral3.swift	Passes `tools` + `additionalContext` into chat template application so tools reach the model.
Libraries/MLXLMCommon/Tool/ToolCallProcessor.swift	Adds `flush()` to parse buffered tool calls at generation end (for formats where end-tag text isn’t emitted).
Libraries/MLXLMCommon/Tool/ToolCallFormat.swift	Introduces `.mistral` and wires it into parser creation + format inference for `mistral3*`.
Libraries/MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift	Adds parser for Mistral `[TOOL_CALLS]` / `[ARGS]` (and optional `[CALL_ID]`) format.
Libraries/MLXLMCommon/Evaluate.swift	Adds a generation-end callback to flush buffered tool calls and emit them before the final `.info`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Libraries/MLXLMCommon/Evaluate.swift

Libraries/MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift

atdrendel · 2026-03-03T20:38:50Z

I've addressed the pull request comments. This should be ready for review.

Libraries/MLXLMCommon/Tool/ToolCallProcessor.swift

davidkoski · 2026-03-03T23:26:11Z

Libraries/MLXLMCommon/Tool/ToolCallProcessor.swift

+            return
+        }
+
+        if let startTag = parser.startTag {


Do you think this should be something that a tool call parser opts into? Or is only engaged when it doesn't have an endTag set?

Actually I think this splitting probably belongs in the ToolCallParser, something like:

/// handle remaining tool call data at EOS func parseEOS(_ toolCallBuffer: String) -> [ToolCall]

I am not sure if it should throws -- it doesn't right now and the tool call parser seems "resilient" (best effort), so probably no.

I think we can't have an endTag in the string because the parser would have already handled that, so we could just use this code here as the default implementation but it would leave the ability for tools to override it.

I pushed a change that I think matches your idea.

Libraries/MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift

davidkoski · 2026-03-03T23:47:00Z

Libraries/MLXLMCommon/Evaluate.swift

+        let pendingToolCalls = toolCallProcessor.toolCalls
+        toolCallProcessor.toolCalls.removeAll(keepingCapacity: true)


Why clear this? The processor is local to the eval loop, why not just iterate the toolCalls?

davidkoski · 2026-03-03T23:49:43Z

Libraries/MLXLMCommon/Evaluate.swift

+        let pendingToolCalls = toolCallProcessor.toolCalls
+        toolCallProcessor.toolCalls.removeAll(keepingCapacity: true)
+
+        for toolCall in pendingToolCalls {


I wonder if the caller will actually handle multiple tool calls (one of the cases in the tests)? If using registered tools it will immediately call the tool and restart with the <tool>...</tool> output (I think). If the caller is iterating details then they may do the same.

It might not matter as the model may just emit a second tool call after it only has one result, or maybe we need to file an issue on this -- handle multiple sequential tool calls.

At least in our app's implementation, we definitely handle support multiple tool calls inside of the same turn. I'll paste our actual streaming response loop below so you can see how it looks in practice, but the key thing to note is we immediately stream reasoning and text responses to the UI and collect the tool calls. The tool calls are then executed, their results collected, and then those results are re-applied to the conversation, and then we await the next response from the model. So, at least for us, we'd want to support multiple tool calls.

for try await chunk in llm { guard !Task.isCancelled else { throw CancellationError() } let chunkStart = r.text.endIndex switch chunk { case let .reasoning(text): r.text += text if let current = currentReasoningRange { currentReasoningRange = current.lowerBound ..< r.text.endIndex } else { currentReasoningRange = chunkStart ..< r.text.endIndex } try SmartDocumentNeedingLocalLLM .appendStreamingContent( chunk: text, for: needingLLM.id, in: db ) case let .text(text): pushCurrentReasoningRange() r.text += text try SmartDocumentNeedingLocalLLM .appendStreamingContent( chunk: text, for: needingLLM.id, in: db ) case let .toolCall(call): pushCurrentReasoningRange() r.toolCalls.append(call) } }

OK that makes sense -- it wasn't a case I knew about so it doesn't work that way in ChatSession internal tool handling. Added:

[BUG] ChatSession should handle multiple tool calls #134

davidkoski

Just waiting for updates on requested changes (nil endTag) and reorder (and maybe rename) the functions.

atdrendel · 2026-03-10T13:49:45Z

Sorry about the delay. I've pushed a commit that addresses your feedback. I just re-ran the tests, and the tool integration tests run reliably with mlx-community/Ministral-3-3B-Instruct-2512-4bit now.

Test Case '-[MLXLMIntegrationTests.ToolCallIntegrationTests testMistral3EndToEndToolCallGeneration]' started.
Mistral3 Output: 
Mistral3 Tool Calls: [MLXLMCommon.ToolCall(function: MLXLMCommon.ToolCall.Function(name: "get_weather", arguments: ["location": MLXLMCommon.JSONValue.string("Tokyo")]))]

Test Case '-[MLXLMIntegrationTests.ToolCallIntegrationTests testMistral3MultipleToolCallGeneration]' started.
Mistral3 Output: I'll get the current weather for Tokyo and the current time in Tokyo.
Mistral3 Calls: [MLXLMCommon.ToolCall(function: MLXLMCommon.ToolCall.Function(name: "get_weather", arguments: ["location": MLXLMCommon.JSONValue.string("Tokyo"), "unit": MLXLMCommon.JSONValue.string("celsius")])), MLXLMCommon.ToolCall(function: MLXLMCommon.ToolCall.Function(name: "get_time", arguments: ["timezone": MLXLMCommon.JSONValue.string("Asia/Tokyo")]))]
Successfully parsed 2 tool calls from Mistral3

davidkoski · 2026-03-10T16:23:15Z

One build failure:

MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift:55:25: error: cannot find 'deserialize' in scope
2026-03-10T16:20:07.5657280Z         let arguments = deserialize(argsPart)

…r-mistral3vlm

davidkoski

OK, everything looks good. Thank you!!

atdrendel added 3 commits March 3, 2026 18:57

Support tool calling in Mistral3

92abd7d

Add Mistral3 tool calling integration tests

e5499db

Add support for Mistral3 tool calling

5437890

Copilot AI review requested due to automatic review settings March 3, 2026 19:21

Copilot started reviewing on behalf of atdrendel March 3, 2026 19:22 View session

atdrendel commented Mar 3, 2026

View reviewed changes

Copilot AI reviewed Mar 3, 2026

View reviewed changes

Libraries/MLXLMCommon/Evaluate.swift Outdated Show resolved Hide resolved

Libraries/MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift Outdated Show resolved Hide resolved

atdrendel added 3 commits March 3, 2026 20:45

Address pull request comment

6ee390d

Address pull request comment

74adf25

Address pull request comments

1bb8801

davidkoski mentioned this pull request Mar 3, 2026

Fixed tool calling for qwen3.5 #133

Merged

4 tasks

davidkoski reviewed Mar 3, 2026

View reviewed changes

davidkoski mentioned this pull request Mar 4, 2026

[BUG] ChatSession should handle multiple tool calls #134

Open

davidkoski requested changes Mar 9, 2026

View reviewed changes

Address pull request comments

b7f8cf7

atdrendel added 2 commits March 10, 2026 18:49

Merge remote-tracking branch 'upstream/main' into fix-tool-calling-fo…

6441dcc

…r-mistral3vlm

Fix compilation error

8051a45

davidkoski approved these changes Mar 10, 2026

View reviewed changes

davidkoski merged commit b362c8a into ml-explore:main Mar 10, 2026
2 checks passed

atdrendel mentioned this pull request Mar 11, 2026

Add more tool call integration tests and improve integration test infrastructure #142

Open

4 tasks

		let pendingToolCalls = toolCallProcessor.toolCalls
		toolCallProcessor.toolCalls.removeAll(keepingCapacity: true)

Conversation

atdrendel commented Mar 3, 2026

Proposed changes

Prior to these changes

After applying these changes

Checklist

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

atdrendel commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski left a comment

Choose a reason for hiding this comment

Uh oh!

atdrendel commented Mar 10, 2026

Uh oh!

davidkoski commented Mar 10, 2026

Uh oh!

davidkoski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

atdrendel commented Mar 3, 2026 •

edited

Loading