Real-time voice assistant powered by OpenAI's Realtime API for the M5Stack ATOM Echo (ESP32). Multiple implementation approaches with full hardware support.
Current State: Three working implementations with different trade-offs:
- ✅ ESP-IDF (PlatformIO) - Full hardware support, WebSocket ready, production candidate
- ✅ MicroPython - Networking works, audio blocked by I2S PDM limitation
- ✅ Arduino - Full hardware access, but WebSocket challenges for Realtime API
Recommended: ESP-IDF implementation for production use
- 🎤 PDM Microphone - SPM1423 working (ESP-IDF only)
- 🔊 I2S Speaker - NS4168 amplifier with audio playback
- 💡 RGB LED - SK6812 with multiple status indicators
- 🔘 Button - Press-to-talk and function control
- 🌐 WiFi - Stable connection with auto-reconnect
- 🔌 WebSocket - Client ready for OpenAI Realtime API
- 🎙️ Audio I/O - PDM input, I2S output with proper buffering
- 📡 Base64 - Audio encoding for API transmission
- 🔐 TLS/SSL - Secure connections
Device: M5Stack ATOM Echo (ESP32-PICO-D4)
Specifications:
- ESP32-PICO-D4: 240MHz Dual Core, 4MB Flash
- SPM1423 PDM Microphone (GPIO33=CLK, GPIO23=DATA)
- NS4168 I2S Speaker (GPIO19=BCK, GPIO33=WS, GPIO22=DATA)
- SK6812 RGB LED (GPIO27)
- Button (GPIO39)
Key Detail: GPIO33 is shared between mic clock and speaker WS - handled through careful I2S channel management.
\
m5stack-atom-echo-voice-assistant/
├── platformio-espidf/ # ⭐ ESP-IDF implementation (RECOMMENDED)
│ ├── src/main.c # Complete working firmware
│ ├── platformio.ini # PlatformIO configuration
│ ├── credentials.h # WiFi/API credentials (gitignored)
│ ├── credentials.h.example # Template for credentials
│ ├── DEPLOYMENT_STATUS.md # Current implementation status
│ ├── PIN_CONFIGURATION.md # Hardware wiring guide
│ └── components/ # ESP WebSocket client component
├── arduino/ # Arduino C++ implementation
│ ├── atom_echo_voice/ # Arduino sketch
│ ├── config.h.example # Configuration template
│ ├── SETUP_INSTRUCTIONS.md # Arduino IDE setup guide
│ └── ARCHITECTURAL_BLOCKER.md # WebSocket limitations
├── micropython/ # MicroPython implementation
│ ├── main.py # Complete networking code
│ ├── README.md # MicroPython-specific docs
│ └── test_*.py # Diagnostic utilities
├── server/ # Python backend server
│ ├── main.py # FastAPI server
│ ├── requirements.txt # Python dependencies
│ └── .env.example # Server configuration template
└── firmware/ # Legacy/experimental builds
\\
- M5Stack ATOM Echo connected via USB
- PlatformIO installed (VS Code extension recommended)
- WiFi network (2.4GHz)
- OpenAI API key
\\�ash
git clone https://github.com/eric-rolph/m5stack-atom-echo-voice-assistant.git cd m5stack-atom-echo-voice-assistant/platformio-espidf
cp credentials.h.example credentials.h
pio run --target upload --upload-port COM9
pio device monitor -b 115200 \\
\\�ash
\\
\\�ash
\\
| Color | Status |
|---|---|
| 🔵 Blue | Initializing / Startup |
| 🟡 Yellow | Connecting to WiFi |
| 🟢 Green | Connected and Ready |
| 🟣 Magenta | Button Pressed |
| 🔴 Red | Error |
\\c #define WIFI_SSID "YourWiFiName" #define WIFI_PASSWORD "YourWiFiPassword" #define OPENAI_API_KEY "sk-your-api-key" #define REALTIME_MODEL "gpt-4o-realtime-preview-2024-10-01" \\
\\�ash OPENAI_API_KEY=sk-... GEMINI_API_KEY=... AI_PROVIDER=openai # or gemini TTS_PROVIDER=openai HOST=0.0.0.0 PORT=8000 \\
| Feature | ESP-IDF | Arduino | MicroPython |
|---|---|---|---|
| PDM Mic | ✅ Full | ✅ Full | ❌ No PDM |
| I2S Speaker | ✅ Yes | ✅ Yes | ✅ Yes |
| WebSocket | ✅ Component | ✅ uwebsockets | |
| Memory | ✅ Efficient | ||
| TLS/SSL | ✅ Native | ✅ Native | ✅ Native |
| Development | Medium | Easy | Easy |
| Recommended | ✅ Yes | For prototypes | Learning only |
- ✅ All hardware working
- 🔄 WebSocket Realtime API integration in progress
- ✅ Hardware fully functional
⚠️ WebSocket library limitations for Realtime API (see ARCHITECTURAL_BLOCKER.md)
- ❌ PDM microphone unsupported (I2S driver limitation)
- ✅ All networking code functional
- Not recommended for this project
- Native PDM Support - I2S driver exposes \I2S_MODE_PDM\ flag
- WebSocket Component - Official esp_websocket_client with reconnection
- Memory Management - Direct control over FreeRTOS heap
- SSL/TLS - ESP-TLS with certificate bundle support
- Production Ready - Battle-tested framework for IoT devices
\
PDM Mic → I2S RX → 16-bit PCM → Base64 → WebSocket → OpenAI
OpenAI → WebSocket → Base64 Decode → PCM → I2S TX → Speaker
\\
GPIO33 serves dual purpose:
- Microphone: PDM Clock (I2S0 RX)
- Speaker: Word Select/LRC (I2S1 TX)
This works because ESP32 has two I2S peripherals (I2S0, I2S1) with independent pin configurations.
- DEPLOYMENT_STATUS.md - Current implementation details
- PIN_CONFIGURATION.md - Complete wiring guide
- ARCHITECTURAL_BLOCKER.md - Arduino WebSocket challenges
- SETUP_INSTRUCTIONS.md - Arduino IDE setup
- micropython/README.md - MicroPython implementation notes
- PDM microphone working
- I2S speaker working
- LED control
- Button input
- WiFi connection
- WebSocket client
- Base64 encoding
- OpenAI Realtime API handshake
- Bidirectional audio streaming
- Session management
- Voice activity detection
- Wake word support
- OTA updates
- Battery optimization
Contributions welcome! Areas of interest:
- OpenAI Realtime API integration
- Audio processing optimization
- Power management
- Documentation improvements
MIT License - see LICENSE for details
- M5Stack - Excellent hardware and examples
- Espressif - ESP-IDF framework and I2S drivers
- OpenAI - Realtime API
- Community - Forum posts and GitHub issues that solved PDM mysteries
- Issues: GitHub Issues
- Hardware Docs: M5Stack ATOM Echo
- API Docs: OpenAI Realtime API
Status: Active development - ESP-IDF implementation recommended for production use