POST
/
api
/
clone
/
generate_response

Body

conversation_id
uuid

The id of the conversation that you want the clone generated response to be a part of Clone responses use the history of the selectd conversation while generating a response, so ensure that this id is correct Use the Get Conversation History endpoint if you would like to see the previous messages in a given conversation in order to validate

user_message
string

The text of the message that you are sending to the clone to get a response.

image_url
string
default: "None"

Publicly accessible link to an image if you want to use clone vision capabilities.

voice_input_stream
string
default: "None"

Your audio sent as a single blob of base64 encoded bytes. Audio MUST be a single-channel .wav file in 44.1lkHz sample rate and pcm_s16le encoding. Make sure to review the code example below for the specific format required when sending and receiving audio.

response_type
string
default: "text"

Format of the response you want to receive. Valid inputs are text or voice. Defaults to text. If set to voice, clone will respond with base64 encoded audio.

stream
boolean
default: "false"

Determines whether or not the response will be the default response format, or if the clone’s response will be streamed back to the requester token by token in a stream of Server-Sent Events. If streaming is turned on and response_type is set to voice, the clone will respond with a stream of base64 encoded audio snippets. See the audio code example below for more info.

⚠️ Prerequisite ⚠️

In order for the clone to be able to generate responses it must have a purpose, description, and at least one document all added on the clone editing page.

If your clone does not yet, you will receive a 403 response from this endpoint warning you to add these before continuing.

Default Response

clone_response
object
id
uuid

The id of the clone’s message

conversation_id
uuid

The id of the conversation that the generated clone response belongs to

text
string

The text of the generated clone response. This field will be called transcript if response_type == voice.

full_audio
string

This field will only exist if response_type == voice and stream == false. It includes the complete base64 encoded audio response from the clone.

created_at
datetime

UTC datetime string of when the clone’s response was generated

citations
array

An array of citations that are used in the clone’s response (url, title, text, and type)

imageUrl
string

A publicly accessible image url (if any) that the clone may send when appropriate for its response.

affiliate
dict

Affiliate product (if any) related to the user’s query.

Stream Response

The stream response will send the clone response back to the user in a stream of server-sent events as tokens in the response become available

The format of the events are similar to the Default Response except:

  • There is one extra field in the stream response type called current_token if response_type == text. audio_chunk if response_type == voice.
  • If response_type is set to text, current_token will be text. If response_type is set to voice, audio_chunk will be a chunk of base64 encoded audio.
  • The clone_response field is null while the response is still generating. The response event will include the full details at the end of the stream in the last event

View the Example Stream Response on the sidebar to see an example

current_token
string

The next generated token in the clone’s response string

The clone has finished generating when this is [DONE], indicating the last event in the stream

clone_response
null or object

During the stream of events before the final event this field will not be present.

In the last event in the stream, when current_token is [DONE]. this will contain the same information as the Default Response above

Example Python code to accept a stream of server-sent events

import os
import json
import requests


# The base url for the API (fill in your actual base url here)
base_url = "https://api.delphi.ai"

# The endpoint path
endpoint = "/api/clone/generate_response"

# Combine base url and endpoint
url = base_url + endpoint

# Your parameters (replace with actual parameters)
params = {
    'conversation_id': '<my_convo_id>',
    'user_message': 'Hello my clone friend',
    'slug': 'my-test-clone',
    'stream': True
}

# Your API key
api_key = os.getenv("DELPHI_API_KEY")

# Headers
headers = {
    'x-api-key': api_key
}

# Making a POST request
response = requests.post(url, json=params, headers=headers, stream=True)
response_text = ""

# Process the stream of server-sent events
if response.status_code == 200:
    print("Stream started. Processing tokens...")
    for line in response.iter_lines():
        if line:
            decoded_line = line.decode("utf-8")
            if decoded_line.startswith("data:"):
                event_data = decoded_line[6:]
                event = json.loads(event_data)
                current_token = event.get("current_token")
                if current_token == "[DONE]":
                    print(decoded_line)
                    print(f"Stream finished. Final response: {response_text}")
                    break
                else:
                    print(f"Current token: '{current_token}'")
                    response_text += current_token
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Example Python code to send and receive AUDIO as a stream of server-sent events

import os
import requests
import json
import base64
from pydub import AudioSegment
from dotenv import load_dotenv
import wave
load_dotenv()

url = "https://api.delphi.ai/api/clone/generate_response"
input_file_path = "sample_sentence.mp3"
output_file_path = "sample_sentence.wav"

audio = AudioSegment.from_mp3(input_file_path)
audio = audio.set_frame_rate(44100)
audio = audio.set_channels(1)
audio.export(output_file_path, format="wav", codec="pcm_s16le")

with open(output_file_path, "rb") as audio_file:
    base64_encoded_data = base64.b64encode(audio_file.read()).decode('utf-8')

data = {
    "conversation_id": "<conversation_id>",
    "stream": True,
    "response_type": "voice",
    "voice_input_stream": base64_encoded_data,
}

headers = {
    "Content-Type": "application/json",
    "x-api-key": os.getenv("YOUR_DELPHI_API_KEY")
}

# stream = True, response_type == voice
def play_audio(response):
    """Stream and play audio data directly from the server's response using sounddevice."""
    decoded_audio = bytearray()
    try:
        with sd.OutputStream(samplerate=44100, channels=1, dtype='int16') as stream:
            print("Stream started. Processing tokens...")
            for line in response.iter_lines():
                if line:
                    decoded_line = line.decode("utf-8")
                    if decoded_line.startswith("data:"):
                        event_data = decoded_line[6:]
                        event = json.loads(event_data)
                        audio_chunk = event.get("audio_chunk")
                        if audio_chunk == "[DONE]":
                            print("Stream finished.")
                            break
                        else:
                            if audio_chunk:
                                padding_needed = 4 - len(audio_chunk) % 4
                                audio_chunk += '=' * (padding_needed if padding_needed != 4 else 0)
                                chunk_bytes = base64.b64decode(audio_chunk)
                                decoded_audio.extend(chunk_bytes)
                                data = np.frombuffer(chunk_bytes, dtype=np.int16)
                                stream.write(data)
    except Exception as e:
        print(f"Error during streaming: {e}")

def get_and_stream_audio():
    with requests.post(url, headers=headers, data=json.dumps(data), stream=True) as response:
        if response.status_code == 200:
            play_audio(response)
        else:
            print(f"Error: {response.status_code}")

if __name__ == "__main__":
    get_and_stream_audio()


# stream = False, response_type == voice
print("Batch request started. Waiting...")
response_json = response.json()
print("Response:", response_json)
decoded_audio = base64.b64decode(response_json["clone_response"]["full_audio"])
print("Decoded", decoded_audio)


# Define the output file path where the audio will be saved
output_file_path = 'output_audio.wav'
n_channels = 1        # Mono audio
sample_width = 2      # Number of bytes per sample (2 for 16-bit audio)
frame_rate = 44100    # Samples per second (e.g., 44100 for CD quality)


# Open a new WAV file in write mode
with wave.open(output_file_path, 'wb') as wav_file:
    # Set the parameters
    wav_file.setnchannels(n_channels)
    wav_file.setsampwidth(sample_width)
    wav_file.setframerate(frame_rate)
    # Write the audio data
    wav_file.writeframes(decoded_audio)

print(f"Audio file saved at: {output_file_path}")