Voice

Stream voice responses from your clone as raw PCM audio.

Stream Voice Response

Send a text message to your clone and receive the response as a real-time audio stream.

Endpoint: POST /v3/voice/stream

Prerequisites:

The clone must have a voice configured.
A conversation must already exist (create one with POST /v3/conversation first).

Request body:

Field

Type

Required

Description

conversation_id

string

Yes

UUID of an existing conversation

message

string

Yes

User message (1 - 10,000 characters)

Example request:

curl -X POST "https://api.delphi.ai/v3/voice/stream" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "message": "What are your thoughts on AI safety?"
  }' \
  --output response.pcm

Response:

Binary stream of raw PCM audio data (application/octet-stream).

Response headers:

Header

Value

Description

X-Audio-Format

pcm_24000_16_mono

Audio format identifier

X-Audio-Sample-Rate

24000

Sample rate in Hz

X-Audio-Bits-Per-Sample

16

Bit depth

X-Audio-Channels

1

Mono audio

Playing the audio:

# Convert PCM to WAV using ffmpeg
ffmpeg -f s16le -ar 24000 -ac 1 -i response.pcm response.wav

Synthesize Voice

Convert raw text to audio using your clone's configured voice. Unlike Voice Stream, this does not generate a clone response — it speaks the exact text you provide.

Endpoint: POST /v3/voice/synthesize

Prerequisites:

The clone must have a voice configured.

Request body

Parameter

Type

Required

Description

text

string

Yes

Text to synthesize (1–10,000 characters)

Query parameters

Parameter

Type

Required

Default

Description

stream

boolean

false

Stream raw PCM bytes instead of JSON

Example request (batch)

curl -X POST "https://api.delphi.ai/v3/voice/synthesize" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the synthesis endpoint."
  }'

Example response

{
  "audio": "8/8AABcAHAAhAC8AKQAmADkALAA2AEgANwArACI..."
}

The audio field is base64-encoded raw PCM data (24 kHz, 16-bit signed, mono, little-endian).

Decoding the audio

echo "<base64_audio>" | base64 -d > output.pcm
ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav

Example request (streaming)

curl -X POST "https://api.delphi.ai/v3/voice/synthesize?stream=true" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the streaming synthesis endpoint."
  }' \
  --output synthesized.pcm

Response

Binary stream of raw PCM audio data (application/octet-stream).

Response headers

Header

Value

Description

X-Audio-Format

pcm_24000_16_mono

Audio format identifier

X-Audio-Sample-Rate

24000

Sample rate in Hz

X-Audio-Bits-Per-Sample

16

Bit depth

X-Audio-Channels

1

Mono audio

Playing the audio

ffmpeg -f s16le -ar 24000 -ac 1 -i synthesized.pcm synthesized.wav

PreviousUsage

Last updated 6 days ago

Good night

hashtagStream Voice Response

hashtagSynthesize Voice

hashtagRequest body

hashtagQuery parameters

hashtagExample request (batch)

hashtagExample response

hashtagDecoding the audio

hashtagExample request (streaming)

hashtagResponse

hashtagResponse headers

hashtagPlaying the audio

Stream Voice Response

Synthesize Voice

Request body

Query parameters

Example request (batch)

Example response

Decoding the audio

Example request (streaming)

Response

Response headers

Playing the audio