Text to Speech API Documentation
Complete reference for the multi-provider TTS API
Authentication
Authorization: Bearer sk_live_your_key_here
POST /v1/tts/synthesize
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Text to convert (max 5,000 chars, or 10,000 for SSML) |
| provider | string | No | local, google, openai (default: local) |
| voice | string | No | Voice ID. Use /v1/tts/voices to list available voices |
| language | string | No | BCP-47 language code, e.g. en-US, ar-XA (default: en-US) |
| format | string | No | mp3, wav, ogg, aac (default: mp3) |
| speed | float | No | Speaking rate 0.25–4.0 (default: 1.0) |
| pitch | float | No | Pitch adjustment -20 to +20 (default: 0, Google only) |
| ssml | boolean | No | Treat input as SSML (Google provider) |
GET /v1/tts/voices
List available voices for a provider.
GET /v1/tts/voices?provider=openai
Providers
local
Uses macOS say or Linux espeak-ng. Free, no API key needed. Best for development.
Google Cloud Text-to-Speech. Neural/WaveNet voices. Requires GOOGLE_TTS_API_KEY. Supports SSML and pitch.
openai
OpenAI TTS (tts-1-hd). 6 neural voices. Requires OPENAI_TTS_API_KEY. Best quality.