Skip to content

Fix tool calling for Mistral 3#132

Merged
davidkoski merged 9 commits intoml-explore:mainfrom
shareup:fix-tool-calling-for-mistral3vlm
Mar 10, 2026
Merged

Fix tool calling for Mistral 3#132
davidkoski merged 9 commits intoml-explore:mainfrom
shareup:fix-tool-calling-for-mistral3vlm

Conversation

@atdrendel
Copy link
Contributor

Proposed changes

The Mistral 3 text and vision models, used by Ministral 3 and Devstral 2, support tool calling, but mlx-swift-lm did not send tools to the model and did not properly process tool calls from the model. This pull request addresses both of those issues.

Prior to these changes

Tools never even made it to the model. So, the Mistral3 models just hallucinated a response.

Mistral3 unsuccessful tool calling before changes

After applying these changes

The Mistral3 models receive the tool, and the newly-introduced MistralToolCallParser parses the tool calls correctly.

Mistral3 successful tool calling

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Copilot AI review requested due to automatic review settings March 3, 2026 19:21
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file were the initial changes I made. Without these, the Mistral3 models never even received the tools from the caller.

Comment on lines +332 to +335
// Auto-detect tool call format from model type if not explicitly set
if mutableConfiguration.toolCallFormat == nil {
mutableConfiguration.toolCallFormat = ToolCallFormat.infer(from: baseConfig.modelType)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like an oversight. It should always have been present, as far as I could tell.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Mistral 3 tool-calling end-to-end by (1) including tool schemas in the Mistral3 chat template application and (2) adding parsing/streaming support for Mistral’s [TOOL_CALLS]...[ARGS]... tool-call format (including buffering until generation end).

Changes:

  • Add a new .mistral tool-call format and MistralToolCallParser, plus buffering/flush support to extract tool calls when EOS never appears in the text stream.
  • Ensure Mistral3 VLM prompt generation passes tools (and additionalContext) into applyChatTemplate.
  • Add/extend unit + integration tests for Mistral3 message conversion, format inference, parsing, and end-to-end generation.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Tests/MLXLMTests/UserInputTests.swift Adds unit tests validating Mistral3 message generation for text/image/tool roles.
Tests/MLXLMTests/ToolTests.swift Extends format inference tests and adds parser/processor tests for Mistral tool calls + flushing behavior.
Tests/MLXLMIntegrationTests/ToolCallIntegrationTests.swift Adds Mistral3 integration coverage for auto-detection and tool-call generation/parsing.
Libraries/MLXVLM/VLMModelFactory.swift Auto-detects toolCallFormat for VLM containers based on model_type when unset.
Libraries/MLXVLM/Models/Mistral3.swift Passes tools + additionalContext into chat template application so tools reach the model.
Libraries/MLXLMCommon/Tool/ToolCallProcessor.swift Adds flush() to parse buffered tool calls at generation end (for formats where end-tag text isn’t emitted).
Libraries/MLXLMCommon/Tool/ToolCallFormat.swift Introduces .mistral and wires it into parser creation + format inference for mistral3*.
Libraries/MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift Adds parser for Mistral [TOOL_CALLS] / [ARGS] (and optional [CALL_ID]) format.
Libraries/MLXLMCommon/Evaluate.swift Adds a generation-end callback to flush buffered tool calls and emit them before the final .info.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@atdrendel
Copy link
Contributor Author

atdrendel commented Mar 3, 2026

I've addressed the pull request comments. This should be ready for review.

@davidkoski davidkoski mentioned this pull request Mar 3, 2026
4 tasks
return
}

if let startTag = parser.startTag {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this should be something that a tool call parser opts into? Or is only engaged when it doesn't have an endTag set?

Actually I think this splitting probably belongs in the ToolCallParser, something like:

/// handle remaining tool call data at EOS
func parseEOS(_ toolCallBuffer: String) -> [ToolCall]

I am not sure if it should throws -- it doesn't right now and the tool call parser seems "resilient" (best effort), so probably no.

I think we can't have an endTag in the string because the parser would have already handled that, so we could just use this code here as the default implementation but it would leave the ability for tools to override it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change that I think matches your idea.

Comment on lines +1354 to +1355
let pendingToolCalls = toolCallProcessor.toolCalls
toolCallProcessor.toolCalls.removeAll(keepingCapacity: true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clear this? The processor is local to the eval loop, why not just iterate the toolCalls?

let pendingToolCalls = toolCallProcessor.toolCalls
toolCallProcessor.toolCalls.removeAll(keepingCapacity: true)

for toolCall in pendingToolCalls {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the caller will actually handle multiple tool calls (one of the cases in the tests)? If using registered tools it will immediately call the tool and restart with the <tool>...</tool> output (I think). If the caller is iterating details then they may do the same.

It might not matter as the model may just emit a second tool call after it only has one result, or maybe we need to file an issue on this -- handle multiple sequential tool calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in our app's implementation, we definitely handle support multiple tool calls inside of the same turn. I'll paste our actual streaming response loop below so you can see how it looks in practice, but the key thing to note is we immediately stream reasoning and text responses to the UI and collect the tool calls. The tool calls are then executed, their results collected, and then those results are re-applied to the conversation, and then we await the next response from the model. So, at least for us, we'd want to support multiple tool calls.

for try await chunk in llm {
    guard !Task.isCancelled else {
        throw CancellationError()
    }

    let chunkStart = r.text.endIndex

    switch chunk {
    case let .reasoning(text):
        r.text += text
        if let current = currentReasoningRange {
            currentReasoningRange =
                current.lowerBound ..< r.text.endIndex
        } else {
            currentReasoningRange =
                chunkStart ..< r.text.endIndex
        }

        try SmartDocumentNeedingLocalLLM
            .appendStreamingContent(
                chunk: text,
                for: needingLLM.id,
                in: db
            )

    case let .text(text):
        pushCurrentReasoningRange()
        r.text += text
        try SmartDocumentNeedingLocalLLM
            .appendStreamingContent(
                chunk: text,
                for: needingLLM.id,
                in: db
            )

    case let .toolCall(call):
        pushCurrentReasoningRange()
        r.toolCalls.append(call)
    }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that makes sense -- it wasn't a case I knew about so it doesn't work that way in ChatSession internal tool handling. Added:

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting for updates on requested changes (nil endTag) and reorder (and maybe rename) the functions.

@atdrendel
Copy link
Contributor Author

Sorry about the delay. I've pushed a commit that addresses your feedback. I just re-ran the tests, and the tool integration tests run reliably with mlx-community/Ministral-3-3B-Instruct-2512-4bit now.

Test Case '-[MLXLMIntegrationTests.ToolCallIntegrationTests testMistral3EndToEndToolCallGeneration]' started.
Mistral3 Output: 
Mistral3 Tool Calls: [MLXLMCommon.ToolCall(function: MLXLMCommon.ToolCall.Function(name: "get_weather", arguments: ["location": MLXLMCommon.JSONValue.string("Tokyo")]))]

Test Case '-[MLXLMIntegrationTests.ToolCallIntegrationTests testMistral3MultipleToolCallGeneration]' started.
Mistral3 Output: I'll get the current weather for Tokyo and the current time in Tokyo.
Mistral3 Calls: [MLXLMCommon.ToolCall(function: MLXLMCommon.ToolCall.Function(name: "get_weather", arguments: ["location": MLXLMCommon.JSONValue.string("Tokyo"), "unit": MLXLMCommon.JSONValue.string("celsius")])), MLXLMCommon.ToolCall(function: MLXLMCommon.ToolCall.Function(name: "get_time", arguments: ["timezone": MLXLMCommon.JSONValue.string("Asia/Tokyo")]))]
Successfully parsed 2 tool calls from Mistral3

@davidkoski
Copy link
Collaborator

One build failure:

MLXLMCommon/Tool/Parsers/MistralToolCallParser.swift:55:25: error: cannot find 'deserialize' in scope
2026-03-10T16:20:07.5657280Z         let arguments = deserialize(argsPart)

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, everything looks good. Thank you!!

@davidkoski davidkoski merged commit b362c8a into ml-explore:main Mar 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants