Pocket TTS

Real-time local speech synthesis & voice cloning

Server v2.5.4 · pocket-tts v2.1.0 · GitHub ↗

Model: loading…

Language

Int8 quantization (lower memory, ~30% faster on CPU)

Server started with a custom model path. Restart without --model-path to enable language switching.

ⓘ Session-only change. On server restart, the model will revert to the startup configuration (POCKET_TTS_LANGUAGE env var or --language CLI flag).

Text Prompt

Voice Selection

Output Format

Enable Streaming Response

Download Audio

API Documentation

OpenAI-compatible TTS API. Use any OpenAI TTS client by pointing it to this server.

GET /health

Health check endpoint for container orchestration and monitoring.

Response:

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cpu",
  "sample_rate": 24000
}

GET /v1/voices

List all available voices (built-in and custom).

Response:

{
  "object": "list",
  "data": [
    {"id": "alba", "name": "Alba", "object": "voice"},
    {"id": "marius", "name": "Marius", "object": "voice"}
  ]
}

POST /v1/audio/speech

Generate speech audio from text. OpenAI-compatible endpoint.

Request Body (JSON):

Parameter	Type	Required	Default	Description
`input`	string	✓	-	The text to generate speech from (max 4096 chars)
`voice`	string		`alba`	Voice ID, filename, or URL. See `/v1/voices`
`model`	string		-	Ignored (for OpenAI compatibility)
`response_format`	string		`mp3`	Audio format: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`
`stream`	boolean		`true`	Enable streaming response for real-time playback

Example Request (curl):

curl -X POST http://localhost:49112/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello world!",
    "voice": "alba",
    "response_format": "mp3"
  }' \
  --output speech.mp3

Example (Python with OpenAI client):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:49112/v1",
    api_key="not-needed"
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alba",
    input="Hello world!"
)
response.stream_to_file("output.mp3")

Response:

Audio file in the requested format (binary stream).

Content-Type: audio/mpeg, audio/wav, audio/opus, etc.

Error Response:

{
  "error": "Missing required field: input"
}

Built-in Voices

These voices work without authentication:

alba marius javert jean fantine cosette eponine azelma

Custom voice cloning requires a HuggingFace token.