Voice

Stream voice responses from your clone as raw PCM audio.

Stream Voice Response

Send a text message to your clone and receive the response as a real-time audio stream.

Endpoint: POST /v3/voice/stream

Prerequisites:

  • The clone must have a voice configured.

  • A conversation must already exist (create one with POST /v3/conversation first).

Request body:

Field
Type
Required
Description

conversation_id

string

Yes

UUID of an existing conversation

message

string

Yes

User message (1 - 10,000 characters)

Example request:

curl -X POST "https://api.delphi.ai/v3/voice/stream" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "message": "What are your thoughts on AI safety?"
  }' \
  --output response.pcm

Response:

Binary stream of raw PCM audio data (application/octet-stream).

Response headers:

Header
Value
Description

X-Audio-Format

pcm_24000_16_mono

Audio format identifier

X-Audio-Sample-Rate

24000

Sample rate in Hz

X-Audio-Bits-Per-Sample

16

Bit depth

X-Audio-Channels

1

Mono audio

Playing the audio:

Synthesize Voice

Convert raw text to audio using your clone's configured voice. Unlike Voice Stream, this does not generate a clone response — it speaks the exact text you provide.

Endpoint: POST /v3/voice/synthesize

Prerequisites:

  • The clone must have a voice configured.

Request body

Parameter
Type
Required
Description

text

string

Yes

Text to synthesize (1–10,000 characters)

Query parameters

Parameter
Type
Required
Default
Description

stream

boolean

No

false

Stream raw PCM bytes instead of JSON

Example request (batch)

Example response

The audio field is base64-encoded raw PCM data (24 kHz, 16-bit signed, mono, little-endian).

Decoding the audio

Example request (streaming)

Response

Binary stream of raw PCM audio data (application/octet-stream).

Response headers

Header
Value
Description

X-Audio-Format

pcm_24000_16_mono

Audio format identifier

X-Audio-Sample-Rate

24000

Sample rate in Hz

X-Audio-Bits-Per-Sample

16

Bit depth

X-Audio-Channels

1

Mono audio

Playing the audio

Last updated