Overview
Give your Delphi a voice, so it can speak and sound just like you.
What it is
Voice lets users have natural, real‑time conversations with your Delphi in any modern browser—no downloads required. A clear ~30 second recording (or upload) powers your Delphi’s speaking style so calls feel personal and human.
Where: Studio → Identity → Voice
Why it matters
Feels human — People open up faster when they can just talk.
Boosts engagement — Voice calls often run longer than text chats, deepening trust and insight.
Works everywhere — Included on every plan; runs in modern browsers.
Respects privacy — You control when voice is on/off and exactly which audio sample powers your Delphi.
Quick Start (≈ 5 minutes)
Go to Identity → Voice.
Click ➕ (top‑right) → choose Start Recording or Upload File (WAV/MP3, ~30 secs).
Record tips: quiet room, steady pace, single speaker.
Click Stop (▢) → Save → wait for processing (a few minutes).
Open Settings (next to ➕) to adjust Stability, Similarity, Speed.
Best practices
Give it 30 clean seconds. Same mic, distance, and room tone throughout.
Warm, everyday language. Read a short bio + answer a favorite FAQ.
Avoid overlap. No background speakers, TV, or music.
Re‑record after big changes. New mic/space → new sample.
Recording tips (quick checklist)
Privacy & control
On/Off anytime: Use the Access Groups to set call limits; 0 to disable, and number to toggle on.
Own your sample: You decide which clip powers your Delphi and can replace it whenever you like.
Attribution: Calls reflect your sample; other voices in uploads are stored for context but not credited to you.
Sensitive audio: For confidential calls, redact first and upload a PDF transcript instead of raw audio.
FAQs
Why can't I upload audio or video files as training data to my mind? You can’t upload audio or video files as training data until you provide a voice recording. The recording teaches your Delphi which voice is yours so it can sort your words from everyone else’s and improves general transcription capabilities.
Can I upload voice samples in different languages? Does Delphi support different accents for each language? No, you cannot upload voice samples in different languages. All uploaded samples must be in the same language. Mixing languages (or separate accents per language) confuses the model and degrades quality across all languages.
Why do I hear background hiss or static? You hear a background hiss or static in your Delphi calls because the sample you initially uploaded was poor: the room or mic was noisy, or you didn't use a high quality enough microphone. Re-record in a quieter space, drop Similarity below 60 %, and use your computer built-in microphone or phone built-in microphone to see if you can improve results!
Why does the voice sound flat or robotic?
Your voice sounds flat or robotic likely for one of two reasons:
Too many mixed samples. Multiple clips with different mics or background noises can create robotic-sounding outputs. Stick to one clean 2-minute take.
Stability set too high. Lower Stability by 10-20 % and test again in a live call.
Why is my accent washed out? Your accent is washed out likely because you're still on the Default model. Switch to For Accents 1; if it’s still weak, try For Accents 2. Make sure you're still playing around with the settings, while you're testing different models!
Please note that for some accents, even with these multiple models, it might be necessary to purchase Pro Voice. We hope that our built-in voice offering can get you to where you need, but there is only so much the instant model.
How do I get it to pronounce names correctly? To get your Delphi to pronounce names correctly, use custom pronunciations. Add the word as you want it to be said phonetically.
Will this voice also identify me when I upload audio files? Yes, this voice sample is also what will be used to identify your voice when you upload audio or video files. In other words, the same voice sample trains Delphi to separate your voice from others in any audio you upload.
Is this the voice people hear during video calls? Yes, this is the voice that people hear during video calls as well. Voice and video calls both use the same settings and same sample, so they'll sound the same.
Is this the voice people hear when they click "read aloud" in chat conversations? Yes, this is the voice that people will hear when they click "read aloud" in chat conversations.
How do I turn off read aloud? You cannot turn off read aloud - it's always available and cannot be disabled.
Why does my voice sound different on a live call versus the Playground or Read Aloud? Playground clips are “experimental,” rendered offline and a bit slower, so they can use looser settings that change each time. Calls must stream in real time, so Delphi adds extra stability for speed and consistency.
To hear the truest result, tweak settings and then start a live call. Use Playground only for creative reads, ads, or long scripts.
Why does the Playground and the Read Aloud function generate a slightly different take every time? The reason the Playground and Read Aloud functions generate a slightly different take every time, even if the settings are the same, is that both run on generative AI text-to-speech technology. Each time you click Generate or Read Aloud, Delphi samples tiny shifts in pitch, timing, and energy—like rolling fresh dice inside the same rules. That touch of randomness keeps the voice from sounding canned, but it also means no two clips are identical.
To widen the picture, this is how generative AI works for all AI platforms! It's the reason that ChatGPT won't give you the same answer to one question asked twice.
Troubleshooting
Processing taking a while? Larger files take longer; keep to ~30 sec and stable formats (WAV/MP3).
Audio sounds off? Re‑record in a quieter room; reduce echo; speak closer and steadier.
Pace too slow/fast? Tweak Speed in ⚙️ Settings.
Doesn’t sound like me? Adjust Similarity/Stability and provide a cleaner sample.
Pre‑launch checklist
Last updated