This guide provides example terminal commands for using the config.yaml file with MLX-LM, based on the official MLX-LM LoRA documentation.
It is recommended to use a Python virtual environment for MLX-LM. To install mlx_lm in a .venv:
python3 -m venv .venv
source .venv/bin/activate
pip install mlx-lm- If you want the latest development version, you can install directly from GitHub:
pip install git+https://github.com/ml-explore/mlx-lm.git
To prepare your dataset for LoRA fine-tuning with MLX-LM:
- Choose a format: Supported formats include
completions,text, andchat. See examples below. - Create your data files:
- For most use cases, create
train.jsonlandvalid.jsonl(and optionallytest.jsonl). - Each line should be a valid JSON object in the chosen format.
- For most use cases, create
- Place files in the
data/directory:- Example:
data/train.jsonl,data/valid.jsonl
- Example:
- Reference the data path in your
config.yaml:- Use the relative path to your data directory or dataset name under the
data:key.
- Use the relative path to your data directory or dataset name under the
Each line in train.jsonl:
{"prompt": "What is the capital of France?", "completion": "Paris."}Each line in train.jsonl:
{"text": "This is an example for the model."}- Ensure each example is on a single line (no line breaks within an example).
- Extra keys in each JSON object will be ignored.
- For more advanced formats (e.g., chat, tools), see the official documentation.
To fine-tune a model using your config.yaml file, run:
mlx_lm.lora --config config.yaml- The config file should specify model, data, and other parameters. Command-line flags override config values.
After fine-tuning, you can generate text or chat with your model using the trained adapters, or by fusing the adapters into a new model.
To generate text using your fine-tuned adapters:
mlx_lm.generate --model <path_to_model> --adapter-path <path_to_adapters> --prompt "<your_model_prompt>"- Example:
mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --adapter-path adapters --prompt "Hello, world!" - If you use a config file, you can specify model and adapter-path there:
mlx_lm.generate --config config.yaml --prompt "Hello, world!"
To interact with your model in chat mode using the adapters:
mlx_lm.chat --model <path_to_model> --adapter-path <path_to_adapters>- Example:
mlx_lm.chat --model mistralai/Mistral-7B-v0.1 --adapter-path adapters
- You can also use a config file:
mlx_lm.chat --config config.yaml
You can fuse the adapters into a new standalone model for easier deployment:
mlx_lm.fuse --model <path_to_model> --adapter-path <path_to_adapters> --output fused_model/- Example:
mlx_lm.fuse --model mistralai/Mistral-7B-v0.1 --adapter-path adapters --output fused_model/
- This will create a new model in the
fused_model/directory.
To generate text with the fused model:
mlx_lm.generate --model fused_model/ --prompt "<your_model_prompt>"To chat with the fused model:
mlx_lm.chat --model fused_model/- Replace
<path_to_model>,<path_to_adapters>,config.yaml, orfused_model/with your actual paths as needed. - For more options, see the help for each command (e.g.,
mlx_lm.generate --help,mlx_lm.chat --help).
Below is an example config.yaml for MLX-LM LoRA fine-tuning:
# The path to the local model directory or Hugging Face repo.
model: "mlx-community/Llama-3.2-1B-Instruct"
# Whether or not to train (boolean)
train: true
# The fine-tuning method: "lora", "dora", or "full".
fine_tune_type: lora
# Optimizer
optimizer: adamw
# optimizer_config:
# adamw:
# betas: [0.9, 0.98]
# eps: 1e-6
# weight_decay: 0.05
# bias_correction: true
# Directory with {train, valid, test}.jsonl files or Hugging Face dataset name
data: "mlx-community/WikiSQL"
# The PRNG seed
seed: 0
# Number of layers to fine-tune
num_layers: 16
# Minibatch size.
batch_size: 4
# Iterations to train for.
iters: 1000
# Number of validation batches, -1 uses the entire validation set.
val_batches: 25
# Adam learning rate.
learning_rate: 1e-5
# Save/load path for the trained adapter weights.
adapter_path: "adapters"
# Save the model every N iterations.
save_every: 100
# Evaluate on the test set after training
test: false
# Number of test set batches, -1 uses the entire test set.
test_batches: 100
# Maximum sequence length.
max_seq_length: 2048
# Use gradient checkpointing to reduce memory use.
grad_checkpoint: false
# LoRA parameters can only be specified in a config file
lora_parameters:
# The layer keys to apply LoRA to.
# These will be applied for the last lora_layers
keys: ["self_attn.q_proj", "self_attn.v_proj"]
rank: 8
scale: 20.0
dropout: 0.0- You can see all available options with:
mlx_lm.lora --help
- For more details on the config format, see the example YAML.
- Local datasets should be in
data/astrain.jsonl,valid.jsonl, and/ortest.jsonl.
For more, see the MLX-LM LoRA documentation.