Skip to content

Add EngGPT MoE model support#1199

Open
robertobissanti wants to merge 1 commit intoml-explore:mainfrom
robertobissanti:add-enggpt-moe
Open

Add EngGPT MoE model support#1199
robertobissanti wants to merge 1 commit intoml-explore:mainfrom
robertobissanti:add-enggpt-moe

Conversation

@robertobissanti
Copy link
Copy Markdown

Summary

This PR adds initial support for the enggpt_moe architecture used by engineering-group/EngGPT2-16B-A3B.

The model is a decoder-only MoE language model with:

  • 24 decoder layers
  • hidden size 2880
  • 32 attention heads
  • 4 key/value heads
  • explicit head_dim = 128
  • Q/K RMSNorm inside attention
  • RoPE with rope_theta = 1000000.0
  • 64 experts per MoE layer
  • top-8 expert routing
  • SwiGLU experts
  • untied lm_head

The implementation is based on the existing Mixtral/SwitchGLU infrastructure, adapted to match the EngGPT MoE checkpoint structure and routing logic.

Motivation

The EngGPT2-16B-A3B model cannot currently be loaded or converted with mlx-lm because its config.json declares:

"model_type": "enggpt_moe"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant