Generate Response
Generate a response from a clone to a user message
Body
The id
of the conversation that you want the clone generated response to be
a part of Clone responses use the history of the selectd conversation while
generating a response, so ensure that this id
is correct Use the Get
Conversation History endpoint if you
would like to see the previous messages in a given conversation in order to
validate
The text of the message that you are sending to the clone to get a response.
Publicly accessible link to an image if you want to use clone vision capabilities.
Your audio sent as a single blob of base64 encoded bytes. Audio MUST be a
single-channel .wav file in 44.1lkHz sample rate
and pcm_s16le
encoding.
Make sure to review the code example below for the specific format required
when sending and receiving audio.
Format of the response you want to receive. Valid inputs are text
or
voice
. Defaults to text
. If set to voice
, clone will respond with base64
encoded audio.
Determines whether or not the response will be the default response format, or
if the clone’s response will be streamed back to the requester token by token
in a stream of Server-Sent Events
. If streaming is turned on and
response_type
is set to voice
, the clone will respond with a stream of
base64 encoded audio snippets. See the audio code example below for more info.
⚠️ Prerequisite ⚠️
In order for the clone to be able to generate responses it must have a purpose, description, and at least one document all added on the clone editing page.
If your clone does not yet, you will receive a 403 response from this endpoint warning you to add these before continuing.
Default Response
The id
of the clone’s message
The id
of the conversation that the generated clone response belongs to
The text of the generated clone response. This field will be called transcript
if response_type == voice
.
This field will only exist if response_type == voice
and stream == false
. It includes the complete base64 encoded audio response from the clone.
UTC datetime
string of when the clone’s response was generated
An array of citations that are used in the clone’s response (url, title, text, and type)
A publicly accessible image url (if any) that the clone may send when appropriate for its response.
Affiliate product (if any) related to the user’s query.
Stream Response
The stream
response will send the clone response back to the user in a stream of server-sent events
as tokens in the response become available
The format of the events are similar to the Default Response except:
- There is one extra field in the stream response type called
current_token
ifresponse_type == text
.audio_chunk
ifresponse_type == voice
. - If
response_type
is set totext
,current_token
will be text. Ifresponse_type
is set tovoice
,audio_chunk
will be a chunk of base64 encoded audio. - The
clone_response
field isnull
while the response is still generating. The response event will include the full details at the end of the stream in the last event
View the Example Stream Response on the sidebar to see an example
The next generated token in the clone’s response string
The clone has finished generating when this is [DONE]
, indicating the last event in the stream
During the stream of events before the final event this field will not be present.
In the last event in the stream, when current_token
is [DONE]
. this will contain the same information as the Default Response above
Example Python code to accept a stream of server-sent
events
import os
import json
import requests
# The base url for the API (fill in your actual base url here)
base_url = "https://api.delphi.ai"
# The endpoint path
endpoint = "/api/clone/generate_response"
# Combine base url and endpoint
url = base_url + endpoint
# Your parameters (replace with actual parameters)
params = {
'conversation_id': '<my_convo_id>',
'user_message': 'Hello my clone friend',
'slug': 'my-test-clone',
'stream': True
}
# Your API key
api_key = os.getenv("DELPHI_API_KEY")
# Headers
headers = {
'x-api-key': api_key
}
# Making a POST request
response = requests.post(url, json=params, headers=headers, stream=True)
response_text = ""
# Process the stream of server-sent events
if response.status_code == 200:
print("Stream started. Processing tokens...")
for line in response.iter_lines():
if line:
decoded_line = line.decode("utf-8")
if decoded_line.startswith("data:"):
event_data = decoded_line[6:]
event = json.loads(event_data)
current_token = event.get("current_token")
if current_token == "[DONE]":
print(decoded_line)
print(f"Stream finished. Final response: {response_text}")
break
else:
print(f"Current token: '{current_token}'")
response_text += current_token
else:
print(f"Error: {response.status_code}")
print(response.text)
Example Python code to send and receive AUDIO as a stream of server-sent
events
import os
import requests
import json
import base64
from pydub import AudioSegment
from dotenv import load_dotenv
import wave
load_dotenv()
url = "https://api.delphi.ai/api/clone/generate_response"
input_file_path = "sample_sentence.mp3"
output_file_path = "sample_sentence.wav"
audio = AudioSegment.from_mp3(input_file_path)
audio = audio.set_frame_rate(44100)
audio = audio.set_channels(1)
audio.export(output_file_path, format="wav", codec="pcm_s16le")
with open(output_file_path, "rb") as audio_file:
base64_encoded_data = base64.b64encode(audio_file.read()).decode('utf-8')
data = {
"conversation_id": "<conversation_id>",
"stream": True,
"response_type": "voice",
"voice_input_stream": base64_encoded_data,
}
headers = {
"Content-Type": "application/json",
"x-api-key": os.getenv("YOUR_DELPHI_API_KEY")
}
# stream = True, response_type == voice
def play_audio(response):
"""Stream and play audio data directly from the server's response using sounddevice."""
decoded_audio = bytearray()
try:
with sd.OutputStream(samplerate=44100, channels=1, dtype='int16') as stream:
print("Stream started. Processing tokens...")
for line in response.iter_lines():
if line:
decoded_line = line.decode("utf-8")
if decoded_line.startswith("data:"):
event_data = decoded_line[6:]
event = json.loads(event_data)
audio_chunk = event.get("audio_chunk")
if audio_chunk == "[DONE]":
print("Stream finished.")
break
else:
if audio_chunk:
padding_needed = 4 - len(audio_chunk) % 4
audio_chunk += '=' * (padding_needed if padding_needed != 4 else 0)
chunk_bytes = base64.b64decode(audio_chunk)
decoded_audio.extend(chunk_bytes)
data = np.frombuffer(chunk_bytes, dtype=np.int16)
stream.write(data)
except Exception as e:
print(f"Error during streaming: {e}")
def get_and_stream_audio():
with requests.post(url, headers=headers, data=json.dumps(data), stream=True) as response:
if response.status_code == 200:
play_audio(response)
else:
print(f"Error: {response.status_code}")
if __name__ == "__main__":
get_and_stream_audio()
# stream = False, response_type == voice
print("Batch request started. Waiting...")
response_json = response.json()
print("Response:", response_json)
decoded_audio = base64.b64decode(response_json["clone_response"]["full_audio"])
print("Decoded", decoded_audio)
# Define the output file path where the audio will be saved
output_file_path = 'output_audio.wav'
n_channels = 1 # Mono audio
sample_width = 2 # Number of bytes per sample (2 for 16-bit audio)
frame_rate = 44100 # Samples per second (e.g., 44100 for CD quality)
# Open a new WAV file in write mode
with wave.open(output_file_path, 'wb') as wav_file:
# Set the parameters
wav_file.setnchannels(n_channels)
wav_file.setsampwidth(sample_width)
wav_file.setframerate(frame_rate)
# Write the audio data
wav_file.writeframes(decoded_audio)
print(f"Audio file saved at: {output_file_path}")