Pocket TTS

Real-time local speech synthesis & voice cloning

Server v2.5.4 · pocket-tts v2.1.0 · GitHub ↗

Model: loading…

API Documentation

OpenAI-compatible TTS API. Use any OpenAI TTS client by pointing it to this server.

GET /health

Health check endpoint for container orchestration and monitoring.

Response:
{
  "status": "healthy",
  "model_loaded": true,
  "device": "cpu",
  "sample_rate": 24000
}
GET /v1/voices

List all available voices (built-in and custom).

Response:
{
  "object": "list",
  "data": [
    {"id": "alba", "name": "Alba", "object": "voice"},
    {"id": "marius", "name": "Marius", "object": "voice"}
  ]
}
POST /v1/audio/speech

Generate speech audio from text. OpenAI-compatible endpoint.

Request Body (JSON):
Parameter Type Required Default Description
input string - The text to generate speech from (max 4096 chars)
voice string alba Voice ID, filename, or URL. See /v1/voices
model string - Ignored (for OpenAI compatibility)
response_format string mp3 Audio format: mp3, opus, aac, flac, wav, pcm
stream boolean true Enable streaming response for real-time playback
Example Request (curl):
curl -X POST http://localhost:49112/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello world!",
    "voice": "alba",
    "response_format": "mp3"
  }' \
  --output speech.mp3
Example (Python with OpenAI client):
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:49112/v1",
    api_key="not-needed"
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alba",
    input="Hello world!"
)
response.stream_to_file("output.mp3")
Response:

Audio file in the requested format (binary stream).

Content-Type: audio/mpeg, audio/wav, audio/opus, etc.

Error Response:
{
  "error": "Missing required field: input"
}

Built-in Voices

These voices work without authentication:

alba marius javert jean fantine cosette eponine azelma

Custom voice cloning requires a HuggingFace token.