Skip to content

Add Zamba2#724

Open
proazr wants to merge 2 commits intoml-explore:mainfrom
proazr:add-zamba2-support
Open

Add Zamba2#724
proazr wants to merge 2 commits intoml-explore:mainfrom
proazr:add-zamba2-support

Conversation

@proazr
Copy link

@proazr proazr commented Jan 4, 2026

Add support for Zamba2 by Zyphra.

This is a very interesting model:

  • It combines Mamba's SSM with periodic attention layers for improved performance at reduced compute
  • It Uses a shared transformer block with per-layer adapters, reducing parameter count

The performance numbers (from tests below) highlight this:

Model Prompt (tok/s) Generation (tok/s) Memory
1.2B 255 77 5.1 GB
2.7B 235 66 5.5 GB
2.7B-v2 186 66 5.5 GB
7B 125 26 15 GB

Models

Tests

1.2B Model:

$ mlx_lm.generate --model Zyphra/Zamba2-1.2B-instruct \
    --prompt "Explain the difference between a list and a tuple in Python." \
    --max-tokens 150

==========
In Python, a list and a tuple are both used to store collections of items, but they have some key differences.

1. **Data Type**:
   - **List**: A list is an ordered collection of items. Lists are mutable, which means you can change the contents of a list.
   - **Tuple**: A tuple is an ordered collection of items. Tuples are immutable, which means you cannot change the contents of a tuple.

2. **Mutability**:
   - **List**: Lists are mutable, which means you can add, remove, or change elements in a list.
   - **Tuple**: Tuples are immutable, which means you cannot add, remove
==========
Prompt: 22 tokens, 254.744 tokens-per-sec
Generation: 150 tokens, 76.639 tokens-per-sec
Peak memory: 5.078 GB

2.7B Model:

$ mlx_lm.generate --model Zyphra/Zamba2-2.7B-instruct \
    --prompt "Explain the concept of recursion in programming with a simple example." \
    --max-tokens 150

==========
Sure, here's an example of how to use recursion in programming:

Let's say we want to write a program that calculates the factorial of a number using recursion.

First, we need to define a function that takes a number and returns the factorial of that number.

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

This function uses recursion to calculate the factorial of a number.

Now, we can call the function with a number, for example, `factorial(5)` will return `
==========
Prompt: 23 tokens, 234.986 tokens-per-sec
Generation: 150 tokens, 66.183 tokens-per-sec
Peak memory: 5.469 GB

2.7B-Instruct-v2 Model:

$ mlx_lm.generate --model Zyphra/Zamba2-2.7B-Instruct-v2 \
    --prompt "What are the key benefits of functional programming?" \
    --max-tokens 150

==========
Functional programming is a programming paradigm that emphasizes the use of pure functions, immutability, and immutable data structures. The key benefits of functional programming include:

1. **Easier to reason about code**: Functional programming encourages a focus on pure functions, which makes it easier to reason about code, making it easier to understand and predict the behavior of the code.

2. **Less error-prone**: Functional programming encourages a focus on pure functions, which makes it easier to reason about code, making it easier to understand and predict the behavior of the code.

3. **Improved modularity**: Functional programming encourages a focus on pure functions, which
==========
Prompt: 18 tokens, 186.204 tokens-per-sec
Generation: 150 tokens, 65.861 tokens-per-sec
Peak memory: 5.453 GB

7B Model:

$ mlx_lm.generate --model Zyphra/Zamba2-7B-Instruct \
    --prompt "Explain how neural networks learn, using a simple analogy that a beginner could understand." \
    --max-tokens 200

==========
Imagine a neural network is like a team of people trying to learn how to play a new game.

At first, the team members don't know how to play the game, so they start by making random moves. As they play, they receive feedback on their performance, which is like getting points or penalties.

The team members use this feedback to adjust their strategy, just like how a neural network uses backpropagation. They keep trying different strategies and learning from their mistakes, just like how a neural network learns from its mistakes.

As they play more, the team members get better at the game, just like how a neural network gets better at making predictions or classifications.

In short, a neural network learns by making mistakes, adjusting its strategy, and getting better over time, just like how a team of people learning to play a new game.
==========
Prompt: 28 tokens, 125.346 tokens-per-sec
Generation: 185 tokens, 26.334 tokens-per-sec
Peak memory: 15.038 GB

proazr added 2 commits January 4, 2026 00:32
Run pre-commit formatting as per CONTRIBUTING.md guidelines.
@proazr
Copy link
Author

proazr commented Jan 9, 2026

Hi @awni,

Just checking in on this PR. I believe it aligns with the current contribution guidelines and should be ready for review, but I wanted to make sure it hasn’t been missed.

Happy to make any adjustments if needed.

Sam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant