Convert text to speech using Microsoft Azure Neural Text-to-Speech (TTS) and a simple Gradio web interface.
- Neural TTS with dynamic voice selection across 140+ Azure-supported languages
- Adjustable speaking style, rate, and pitch
- Input via textbox or upload a
.txt
file - Output as .wav file, played directly in the browser
- Dropdowns now auto-populate with default language, voice, and style
- Human-readable language names in the UI (e.g., "English (UK)" instead of
en-GB
) — falls back to code if name not defined - 🔧 Manage display names and visible languages directly in the UI – no need to edit files or restart the app
- Modular architecture – ready for expansion
Microsoft Azure offers 500,000 characters per month free for Neural Text-to-Speech on the F0 (free) pricing tier.
- ⚡ Billing is per character
- ✅ Free quota resets monthly
- 🧪 No credit card required to start
📖 More info: Azure Speech Pricing
Azure Neural TTS supports 140+ languages and dialects, with many realistic male and female voices, including:
- 🇬🇧 British English
- 🇺🇸 American English
- 🇫🇷 French
- 🇩🇪 German
- 🇷🇺 Russian
- 🇨🇳 Chinese
- 🇪🇸 Spanish
- 🇮🇳 Hindi
- 🌐 And more
📖 Full voice list and styles:
👉 Azure Language & Voice Support
git clone https://github.com/loglux/SpeakItAI.git
cd SpeakItAI
Before running this app, you need an active Azure Speech resource.
- Go to the Azure Portal
- Create a Speech resource (Free F0 tier available)
- Copy the Key and Region from the resource's "Keys and Endpoint" section
- You will paste them into a
.env
file as shown below:
cp .env.example .env
Then fill in your Azure credentials:
AZURE_KEY=your_azure_key
AZURE_REGION=your_azure_region
💡 Example region:
ukwest
,eastus
,westeurope
, etc.
Using virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
To fetch the latest voices and update config.json
, run:
python tts/azure/update_config.py
python app.py
- If both textbox and file are provided, the file takes priority.
- Only
.txt
files are accepted for upload. - The output is saved and played as a
.wav
file. - If a voice does not support styles, "default" will be used automatically.
SpeakItAI/
├── app.py
├── .env.example
├── requirements.txt
├── README.md
├── audio_outputs/ # automatically created on runtime
│
├── screenshots/
│ └── interface.png # UI preview
│
└── tts/
├── __init__.py
├── base.py # optional: abstract provider interface
└── azure/
├── __init__.py
├── core.py # AzureTTS implementation
├── config.py # language label loading and utilities
├── config.json # auto-generated voice config from Azure
├── language_labels.json # editable language name mappings (UI dropdowns)
└── update_config.py # fetch and build config.json
- Run
tts/azure/update_config.py
to fetch the latest Azure voice data. - This generates a new
tts/azure/config.json
with all supported languages, voices, genders, and available styles. - The app reads from
config.json
at runtime to populate the voice and style options.
- The list of display names for languages** is stored in a separate file:
tts/azure/language_labels.json
. - You can add, rename, or delete languages using the “Edit Languages” tab in the UI — no need to edit files manually or restart the app.
- This list controls which languages appear in the dropdown, and how they are named (e.g.,
"English (UK)"
instead of"en-GB"
).
{
"en-GB": "English (UK)",
"ru-RU": "Russian",
"ar-KW": "Arabic (Kuwait)"
}
- The app will display all available languages found in
config.json
. - These will be shown using their locale codes only (e.g.,
"fr-FR"
,"hi-IN"
,"de-DE"
), without readable labels.
💡 Use the “Edit Languages” tab to selectively re-add friendly names for only the languages you want to appear.
The codebase is modular and ready for extension:
- Add new languages or accents in
tts/config.py
- Replace the interface with FastAPI or Flask without touching
core.py
- Support alternative providers like ElevenLabs, Bark, or Google Cloud TTS later
This project is licensed under the MIT License.