Overview

Give your Delphi a voice, so it can speak and sound just like you.

What it is

Voice lets users have natural, real‑time conversations with your Delphi in any modern browser—no downloads required. A clear ~10 second recording (or upload) powers your Delphi’s speaking style so calls feel personal and human.

Where: Studio → Identity → Voice


Quick Start

  1. Go to Identity → Voice.

  2. Click (top‑right) → choose Start Recording or Upload File (WAV/MP3, ~30 secs).

  3. Record tips: quiet room, steady pace, single speaker.

  4. Click Stop (▢)Save → wait for processing (a few minutes).


Best practices

  • Give it 30 clean seconds. Same mic, distance, and room tone throughout.

  • Warm, everyday language. Read a short bio + answer a favorite FAQ.

  • Avoid overlap. No background speakers, TV, or music.

  • Re‑record after big changes. New mic/space → new sample.


FAQs

  • Why can't I upload audio or video files as training data to my mind? You can’t upload audio or video files as training data until you provide a voice recording. The recording teaches your Delphi which voice is yours so it can sort your words from everyone else’s and improves general transcription capabilities.

  • Can I upload voice samples in different languages? Does Delphi support different accents for each language? No, you cannot upload voice samples in different languages. All uploaded samples must be in the same language. Mixing languages (or separate accents per language) confuses the model and degrades quality across all languages.

  • How do I get it to pronounce names correctly? To get your Delphi to pronounce names correctly, use custom pronunciations. Add the word as you want it to be said phonetically.

  • Will this voice also identify me when I upload audio files? Yes, this voice sample is also what will be used to identify your voice when you upload audio or video files. In other words, the same voice sample trains Delphi to separate your voice from others in any audio you upload.

  • How do I turn off read aloud? You cannot turn off read aloud - it's always available and cannot be disabled.

  • Why does my voice sound different on a live call versus the Playground or Read Aloud? Playground clips are “experimental,” rendered offline and a bit slower, so they can use looser settings that change each time. Calls must stream in real time, so Delphi adds extra stability for speed and consistency.

    To hear the truest result, tweak settings and then start a live call. Use Playground only for creative reads, ads, or long scripts.

  • Why does the Playground and the Read Aloud function generate a slightly different take every time? The reason the Playground and Read Aloud functions generate a slightly different take every time, even if the settings are the same, is that both run on generative AI text-to-speech technology. Each time you click Generate or Read Aloud, Delphi samples tiny shifts in pitch, timing, and energy—like rolling fresh dice inside the same rules. That touch of randomness keeps the voice from sounding canned, but it also means no two clips are identical.

    To widen the picture, this is how generative AI works for all AI platforms! It's the reason that ChatGPT won't give you the same answer to one question asked twice.


Troubleshooting

  • Processing taking a while? Larger files take longer; keep to ~30 sec and stable formats (WAV/MP3).

  • Audio sounds off? Re‑record in a quieter room; reduce echo; speak closer and steadier.

  • Pace too slow/fast? Tweak Speed in ⚙️ Settings.

  • Doesn’t sound like me? Adjust Similarity/Stability and provide a cleaner sample.


Pre‑launch checklist

Last updated