Add Zamba2 by proazr · Pull Request #724 · ml-explore/mlx-lm

proazr · 2026-01-04T06:42:23Z

Add support for Zamba2 by Zyphra.

This is a very interesting model:

It combines Mamba's SSM with periodic attention layers for improved performance at reduced compute
It Uses a shared transformer block with per-layer adapters, reducing parameter count

The performance numbers (from tests below) highlight this:

Model	Prompt (tok/s)	Generation (tok/s)	Memory
1.2B	255	77	5.1 GB
2.7B	235	66	5.5 GB
2.7B-v2	186	66	5.5 GB
7B	125	26	15 GB

Models

Tests

1.2B Model:

$ mlx_lm.generate --model Zyphra/Zamba2-1.2B-instruct \
    --prompt "Explain the difference between a list and a tuple in Python." \
    --max-tokens 150

==========
In Python, a list and a tuple are both used to store collections of items, but they have some key differences.

1. **Data Type**:
   - **List**: A list is an ordered collection of items. Lists are mutable, which means you can change the contents of a list.
   - **Tuple**: A tuple is an ordered collection of items. Tuples are immutable, which means you cannot change the contents of a tuple.

2. **Mutability**:
   - **List**: Lists are mutable, which means you can add, remove, or change elements in a list.
   - **Tuple**: Tuples are immutable, which means you cannot add, remove
==========
Prompt: 22 tokens, 254.744 tokens-per-sec
Generation: 150 tokens, 76.639 tokens-per-sec
Peak memory: 5.078 GB

2.7B Model:

$ mlx_lm.generate --model Zyphra/Zamba2-2.7B-instruct \
    --prompt "Explain the concept of recursion in programming with a simple example." \
    --max-tokens 150

==========
Sure, here's an example of how to use recursion in programming:

Let's say we want to write a program that calculates the factorial of a number using recursion.

First, we need to define a function that takes a number and returns the factorial of that number.

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

This function uses recursion to calculate the factorial of a number.

Now, we can call the function with a number, for example, `factorial(5)` will return `
==========
Prompt: 23 tokens, 234.986 tokens-per-sec
Generation: 150 tokens, 66.183 tokens-per-sec
Peak memory: 5.469 GB

2.7B-Instruct-v2 Model:

$ mlx_lm.generate --model Zyphra/Zamba2-2.7B-Instruct-v2 \
    --prompt "What are the key benefits of functional programming?" \
    --max-tokens 150

==========
Functional programming is a programming paradigm that emphasizes the use of pure functions, immutability, and immutable data structures. The key benefits of functional programming include:

1. **Easier to reason about code**: Functional programming encourages a focus on pure functions, which makes it easier to reason about code, making it easier to understand and predict the behavior of the code.

2. **Less error-prone**: Functional programming encourages a focus on pure functions, which makes it easier to reason about code, making it easier to understand and predict the behavior of the code.

3. **Improved modularity**: Functional programming encourages a focus on pure functions, which
==========
Prompt: 18 tokens, 186.204 tokens-per-sec
Generation: 150 tokens, 65.861 tokens-per-sec
Peak memory: 5.453 GB

7B Model:

$ mlx_lm.generate --model Zyphra/Zamba2-7B-Instruct \
    --prompt "Explain how neural networks learn, using a simple analogy that a beginner could understand." \
    --max-tokens 200

==========
Imagine a neural network is like a team of people trying to learn how to play a new game.

At first, the team members don't know how to play the game, so they start by making random moves. As they play, they receive feedback on their performance, which is like getting points or penalties.

The team members use this feedback to adjust their strategy, just like how a neural network uses backpropagation. They keep trying different strategies and learning from their mistakes, just like how a neural network learns from its mistakes.

As they play more, the team members get better at the game, just like how a neural network gets better at making predictions or classifications.

In short, a neural network learns by making mistakes, adjusting its strategy, and getting better over time, just like how a team of people learning to play a new game.
==========
Prompt: 28 tokens, 125.346 tokens-per-sec
Generation: 185 tokens, 26.334 tokens-per-sec
Peak memory: 15.038 GB

Run pre-commit formatting as per CONTRIBUTING.md guidelines.

proazr · 2026-01-09T02:07:35Z

Hi @awni,

Just checking in on this PR. I believe it aligns with the current contribution guidelines and should be ready for review, but I wanted to make sure it hasn’t been missed.

Happy to make any adjustments if needed.

Sam

proazr added 2 commits January 4, 2026 00:32

Add Zamba2

c780c51

Format zamba2.py with black

26e6c56

Run pre-commit formatting as per CONTRIBUTING.md guidelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zamba2#724

Add Zamba2#724
proazr wants to merge 2 commits intoml-explore:mainfrom
proazr:add-zamba2-support

proazr commented Jan 4, 2026

Uh oh!

proazr commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

proazr commented Jan 4, 2026

Models

Tests

Uh oh!

proazr commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant