A real-time procedural Metal music generation system based on fine-tuned MusicVAE models, integrated into Unity.
Generative AI for MIDI Sequences is a thesis project focused on extending the capabilities of generative music models for stylistically complex genres — specifically Metal music — and integrating real-time procedural generation into a Unity-based interactive environment.
Despite significant advancements in AI-based music generation, general-purpose pre-trained models often struggle when applied to highly structured and stylistically demanding genres.
Preliminary experiments using pre-trained MusicVAE models produced convincing results in Classical and Jazz domains. However, when applied to Metal, significant limitations emerged:
- Lack of rhythmic consistency
- Weak harmonic coherence
- Absence of genre-specific features (such as complex drum patterns and fast tempo structures)
These issues were directly linked to the limited representation of Metal music within the original training datasets.
The primary objective of this thesis was to develop a system capable of generating dynamic and stylistically coherent Metal music in real time, integrated within a Unity game environment.
Key goals achieved:
- ✅ Creation of a specialized Metal MIDI dataset
- ✅ Fine-tuning MusicVAE models for genre-specific generation
- ✅ Real-time integration through OSC communication
- ✅ Development of a responsive procedural music system for interactive environments
This project demonstrates how targeted fine-tuning on domain-specific datasets significantly extends the expressive capabilities of generative models, allowing them to operate effectively in stylistically complex domains.
The final result is a functional prototype in which procedural generation dynamically responds to user interaction inside a game environment.
- Install Anaconda.
- Create a Python 3.10 virtual environment.
- Install the required dependencies (
tensorflow,magenta, etc.).
⚠️ Note: Magenta and TensorFlow require careful dependency management. Python 3.10 is strictly required for compatibility.
- Download your Metal MIDI files.
- Place them inside your designated dataset directory.
- To separate instrumental tracks (Guitar, Bass, Drums), run:
👉
Scripts/splitter.py
After splitting the tracks, convert each instrument dataset into the TFRecord format required by MusicVAE:
- 🎸 Guitar:
Scripts/convert_guitar_to_tf.py - 🎸 Bass:
Scripts/convert_bass_to_tf.py - 🥁 Drums:
Scripts/convert_drums_to_tf.py
The following pre-trained MusicVAE models were selected for fine-tuning based on the instrument:
| Instrument | Base Model | Training Script |
|---|---|---|
| 🎸 Guitar | cat-mel_2bar_big |
Scripts/train_guitar.py |
| 🎸 Bass | cat-mel_2bar_med_chords |
Scripts/train_bass.py |
| 🥁 Drums | cat-drums_2bar_small |
Scripts/train_drums.py |
Configuration: Training parameters (num_steps, batch_size, checkpoint_interval) can be configured directly inside each script.
After training, you can generate new, original MIDI sequences:
Customization: The generation process is highly flexible. By modifying the parameters within the script, you can vary the musical output at will—adjusting musical progressions, structural logic, and core generation data to fit your specific needs.
The system uses the OSC (Open Sound Control) protocol to trigger audio samples dynamically based on AI-generated data.
- Install OSCJack: Download and import OSCJack into Unity following the official instructions.
- OSCManager: Create an empty GameObject named
OSCManager. - Player Objects: Create 5 child GameObjects inside
OSCManager:GuitarPlayer,BassPlayer,KickPlayer,SnarePlayer, andHiHatPlayer. - AudioSources: Add an
AudioSourceto each Player and assign a short (one-shot) audio sample corresponding to the instrument. - MetalReceiver: Apply the script 👉
Scripts/MetalReceiver.csto theOSCManagerand link the Players in the Inspector. - SoundTrackManager: Apply the script 👉
Scripts/SoundTrackManager.cswithin Unity. This component manages the overall soundtrack flow and handles the card selection logic, allowing the music to react to player choices.
- Start the OSC Server: Open the Anaconda Prompt, activate your environment, and run the server script:
👉
Scripts/playbackOSC.pyThis script acts as the server that generates the Metal music and sends the data to Unity via OSC. - Play in Unity: Press Play in the Unity Editor to listen to the generated output.
- Parameter Tweaking: You can modify generation parameters, musical progressions, and other settings within the code to customize the musical results as needed.
⚠️ Disclaimer: This project is currently a research prototype and has not yet been released as a production-ready system.
It serves as a comprehensive proof-of-concept demonstrating:
- Genre-specialized generative AI
- Real-time procedural music systems
- Interactive AI-driven audio design
- 📈 Expansion of the Metal dataset for broader stylistic coverage.
- 🔗 Multi-instrument conditioning for tighter band cohesion.
- 🎭 Emotional modulation models to drive music based on game tension.
- 🏥 Validation in therapeutic environments (e.g., active music therapy).
