Wine Variety Prediction with LLMs

This project demonstrates how to use various Large Language Models (LLMs) to predict wine varieties based on wine reviews and characteristics. It's inspired by OpenAI's model distillation cookbook but focuses on comparing different models' performance without distillation.

The goal is using native API for each provider where possible for this classification task. Main focus is on local models through Ollama with llama.cpp and LM Studio with MLX.

Results first

I ran the tests on Italian wines first and then French wines later to see if there was any difference in performance. And...

French wines are easier to predict than Italian wines!
Anthropic models are the best performing classification models out there! Including Claude 3.5 Haiku.

Here is the chart that compares Italian and French results:

Here is the detailed chart for Italian wines:

Here is the detailed chart for French wines:

Overview

The project uses a dataset of Italian wines to test different LLMs' ability to predict grape varieties based on wine descriptions, regions, and other characteristics. It showcases the use of Structured Outputs with various LLM providers and compares their performance.

Features

Wine variety prediction using multiple LLM providers:
- Ollama
- OpenAI
- Google Gemini (via google-genai SDK)
- LM Studio
- OpenRouter
- DeepSeek
- Anthropic
- MLX Omni Server
- MLX Batch Inference (local fine-tuned models)
Structured Output implementation for consistent responses (where possible)
Performance comparison between different models
Support for parallel processing with some providers

Prerequisites

Python 3.12+
Jupyter Notebook
API keys for various providers (store in .env file):
- OPENAI_API_KEY
- GEMINI_API_KEY
- OPENROUTER_API_KEY
- DEEPSEEK_API_KEY
- ANTHROPIC_API_KEY

Recommendations

Customizing Model Selection

The default configuration in wine_all.py includes some large models that may not run on all systems. To adapt for your hardware:

Edit wine_all.py to use models suitable for your system:
- For Ollama: Use smaller models like "llama3.2"
- For LM Studio: Stick to 3B-7B models with 4-bit quantization
- Cloud models (OpenAI, Anthropic, etc.) don't have local hardware requirements
In individual provider files (e.g., ollama.py), adjust model selections similarly

Example model substitutions for lower-end hardware:

Replace "qwen2.5:72b-instruct" with "llama3.2"
Remove "Llama3.3" In general feel free to add models available locally to Ollama or LM Studio.

Performance vs Resource Trade-offs

Smaller models (1B-3B) run faster but may have lower accuracy
Mid-size models (7B-14B) offer good balance of performance and resource usage
Largest models (>30B) provide best accuracy but require significant resources

Dataset

The project now pulls the spawn99/wine-reviews dataset directly from Hugging Face using the datasets library. The files are cached automatically according to your HF_HOME configuration (defaults to ~/.cache/huggingface if unset), so you don't need to manually download or manage CSV files.

If you already have local copies of the Kaggle CSVs, they can remain in place; the scripts will transparently use the Hugging Face dataset.

Project Structure

wine_ollama.ipynb - Main Jupyter notebook with code and explanations
wine_all.py - Implementation using all providers

In the providers folder you will find the individual implementations for each provider.

anthropic.py - Implementation using Anthropic
deepseek.py - Implementation using DeepSeek
gemini_genai.py - Implementation using Google Gemini (google-genai SDK)
gemini_openai.py - Implementation using Google Gemini (OpenAI-compatible API)
lmstudio.py - Implementation using LM Studio
mlx_batch.py - Batch inference with MLX fine-tuned models
mlx_omni_server.py - Implementation using MLX Omni Server
mlx_server_unstructured.py - Unstructured inference via MLX Omni Server
ollama.py - Implementation using Ollama
openai.py - Implementation using OpenAI (Structured)
openai_batching.py - Implementation using OpenAI with batched requests
openai_unstructured.py - Implementation using OpenAI with unstructured processing
openrouter.py - Implementation using OpenRouter API
wine_mlx_batch.py - Wine-specific MLX batch inference helper

Usage

Clone the repository

Create a virtual environment (using uv):

uv venv .venv -p 3.13
source .venv/bin/activate
uv sync

Generate the dataset used for training:
```
python -m train.generate_data
```

Run MLX LoRA training with live validation monitoring. Choose a config for your target model:

# Qwen3 0.6B
python ./train/lora_training_monitor.py -c ./train/qwen_lora_config.yaml
# Gemma 4 e2B
python ./train/lora_training_monitor.py -c ./train/gemma_lora_config.yaml

See train/ for additional configs (Phi-4, Llama 3.3, Mistral, etc.).

(Optional) Run the Jupyter notebook or individual Python scripts for provider comparisons

Running Individual Providers

You can run individual provider modules directly using Python's module syntax:

# Run MLX batch inference provider
python -m providers.mlx_batch

# Run MLX Omni Server provider
python -m providers.mlx_omni_server

# Run Ollama provider
python -m providers.ollama

# Run OpenAI provider (Structured)
python -m providers.openai

# Run OpenAI Unstructured provider
python -m providers.openai_unstructured

# Run OpenAI Batching provider
python -m providers.openai_batching

# Run Anthropic provider
python -m providers.anthropic

# Run Gemini provider (google-genai SDK)
python -m providers.gemini_genai

# Run other providers similarly:
python -m providers.deepseek
python -m providers.lmstudio
python -m providers.openrouter

You can override the default MLX batch configuration by passing the model name and adapter path explicitly:

python -m providers.mlx_batch -m Qwen/Qwen3-0.6B -b 100 --adapter ./adapters

Here -m selects a custom model (Qwen/Qwen3-0.6B), -b adjusts the batch size, and --adapter points to an alternate LoRA adapter directory. Swap in any compatible model identifier or adapter path to suit your setup.

To run all providers at once:

python wine_all.py

Available command-line options:

python wine_all.py --generate-chart                           # Generate chart from most recent results without running new tests
python wine_all.py --generate-chart --summary SUMMARY_FILE    # Generate chart from specific summary file (e.g., summary_20250105_095642.csv)
python wine_all.py --no-provider-csv                         # Run tests but don't save individual provider results to CSV files

Fine-tuning models with MLX

See LORA.md for instructions on how to fine-tune models using LoRA with MLX.

Autonomous Hyperparameter Search

Inspired by karpathy/autoresearch, this project includes an autonomous hyperparameter search that lets an AI agent iteratively tune LoRA training parameters overnight.

How it works

program.md — Agent instructions defining the search loop, hyperparameter ranges, and logging format. Point your coding agent here.
results/hp_search.jsonl — Experiment log. One JSON line per run with all params and accuracy.
train/lora_training_monitor.py — Emits a structured HPSEARCH_RESULT|accuracy=...|val_loss=... line at the end of each training run for easy parsing.

The agent modifies train/gemma_lora_config.yaml, runs a short training (300 iters), checks accuracy, logs the result, and repeats — sweeping one parameter at a time.

Quick start

# 1. Run a baseline to establish a starting accuracy
python ./train/lora_training_monitor.py -c ./train/gemma_lora_config.yaml

# 2. Point your coding agent (Claude, Codex, etc.) at the repo and say:
#    "Read program.md and start the hyperparameter search"

# 3. Check results at any time
cat results/hp_search.jsonl

See program.md for the full search space and strategy details.

TODO

DONE

Try MLX Omni Server for Apple MLX tests
Fine tune models with MLX with distillation (Phi-3.5-mini-instruct)
Test google-genai python package for Gemini
Add Gemma 4 LoRA training config
Add MLX batch inference provider

Contributing

Feel free to open issues or submit pull requests with improvements.

License

MIT License

Acknowledgments

Inspired by OpenAI's model distillation cookbook
Uses the spawn99/wine-reviews dataset from Hugging Face

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
images		images
providers		providers
results		results
tests		tests
train		train
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
LORA.md		LORA.md
README.md		README.md
config.py		config.py
data_utils.py		data_utils.py
final_plot.py		final_plot.py
program.md		program.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
wine_all.py		wine_all.py
wine_ollama.ipynb		wine_ollama.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wine Variety Prediction with LLMs

Results first

Overview

Features

Prerequisites

Recommendations

Customizing Model Selection

Performance vs Resource Trade-offs

Dataset

Project Structure

Usage

Running Individual Providers

Fine-tuning models with MLX

Autonomous Hyperparameter Search

How it works

Quick start

TODO

DONE

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wine Variety Prediction with LLMs

Results first

Overview

Features

Prerequisites

Recommendations

Customizing Model Selection

Performance vs Resource Trade-offs

Dataset

Project Structure

Usage

Running Individual Providers

Fine-tuning models with MLX

Autonomous Hyperparameter Search

How it works

Quick start

TODO

DONE

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages