Text to Speech API Documentation

Complete reference for the multi-provider TTS API

Authentication

Authorization: Bearer sk_live_your_key_here

Parameter	Type	Required	Description
text	string	Yes	Text to convert (max 5,000 chars, or 10,000 for SSML)
provider	string	No	local, google, openai (default: local)
voice	string	No	Voice ID. Use /v1/tts/voices to list available voices
language	string	No	BCP-47 language code, e.g. en-US, ar-XA (default: en-US)
format	string	No	mp3, wav, ogg, aac (default: mp3)
speed	float	No	Speaking rate 0.25–4.0 (default: 1.0)
pitch	float	No	Pitch adjustment -20 to +20 (default: 0, Google only)
ssml	boolean	No	Treat input as SSML (Google provider)

List available voices for a provider.

GET /v1/tts/voices?provider=openai

Uses macOS say or Linux espeak-ng. Free, no API key needed. Best for development.

Google Cloud Text-to-Speech. Neural/WaveNet voices. Requires GOOGLE_TTS_API_KEY. Supports SSML and pitch.

OpenAI TTS (tts-1-hd). 6 neural voices. Requires OPENAI_TTS_API_KEY. Best quality.