# Voice

### Stream Voice Response

Send a text message to your clone and receive the response as a real-time audio stream.

**Endpoint:** `POST /v3/voice/stream`

**Prerequisites:**

* The clone must have a voice configured.
* A conversation must already exist (create one with `POST /v3/conversation` first).

**Request body:**

| Field             | Type   | Required | Description                          |
| ----------------- | ------ | -------- | ------------------------------------ |
| `conversation_id` | string | Yes      | UUID of an existing conversation     |
| `message`         | string | Yes      | User message (1 - 10,000 characters) |

**Example request:**

```bash
curl -X POST "https://api.delphi.ai/v3/voice/stream" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "message": "What are your thoughts on AI safety?"
  }' \
  --output response.pcm
```

**Response:**

Binary stream of raw PCM audio data (`application/octet-stream`).

**Response headers:**

| Header                    | Value               | Description             |
| ------------------------- | ------------------- | ----------------------- |
| `X-Audio-Format`          | `pcm_24000_16_mono` | Audio format identifier |
| `X-Audio-Sample-Rate`     | `24000`             | Sample rate in Hz       |
| `X-Audio-Bits-Per-Sample` | `16`                | Bit depth               |
| `X-Audio-Channels`        | `1`                 | Mono audio              |

**Playing the audio:**

```bash
# Convert PCM to WAV using ffmpeg
ffmpeg -f s16le -ar 24000 -ac 1 -i response.pcm response.wav
```

### Synthesize Voice

Convert raw text to audio using your clone's configured voice. Unlike Voice Stream, this does not\
generate a clone response — it speaks the exact text you provide.

**Endpoint:** `POST /v3/voice/synthesize`

**Prerequisites:**

* The clone must have a voice configured.

#### Request body

| Parameter | Type   | Required | Description                              |
| --------- | ------ | -------- | ---------------------------------------- |
| `text`    | string | Yes      | Text to synthesize (1–10,000 characters) |

#### Query parameters

| Parameter | Type    | Required | Default | Description                          |
| --------- | ------- | -------- | ------- | ------------------------------------ |
| `stream`  | boolean | No       | `false` | Stream raw PCM bytes instead of JSON |

#### Example request (batch)

```bash
curl -X POST "https://api.delphi.ai/v3/voice/synthesize" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the synthesis endpoint."
  }'
```

#### Example response

```json
{
  "audio": "8/8AABcAHAAhAC8AKQAmADkALAA2AEgANwArACI..."
}
```

The `audio` field is base64-encoded raw PCM data (24 kHz, 16-bit signed, mono, little-endian).

#### Decoding the audio

```bash
echo "<base64_audio>" | base64 -d > output.pcm
ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav
```

#### Example request (streaming)

```bash
curl -X POST "https://api.delphi.ai/v3/voice/synthesize?stream=true" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the streaming synthesis endpoint."
  }' \
  --output synthesized.pcm
```

#### Response

Binary stream of raw PCM audio data (`application/octet-stream`).

#### Response headers

| Header                    | Value               | Description             |
| ------------------------- | ------------------- | ----------------------- |
| `X-Audio-Format`          | `pcm_24000_16_mono` | Audio format identifier |
| `X-Audio-Sample-Rate`     | `24000`             | Sample rate in Hz       |
| `X-Audio-Bits-Per-Sample` | `16`                | Bit depth               |
| `X-Audio-Channels`        | `1`                 | Mono audio              |

#### Playing the audio

```bash
ffmpeg -f s16le -ar 24000 -ac 1 -i synthesized.pcm synthesized.wav
```
