Voice

Capture, fine-tune, and activate voice so people can talk with your Delphi out loud.

Overview

Voice lets users speak with your Delphi in real time, right in the browser. One clear 2-minute recording can take them from typed chats into natural conversation. People open up faster when they can talk, so you capture richer insights while delivering the personal touch that fuels Delphi’s mission of democratizing mentorship.

Why it matters

Feels human — Users express ideas more freely when they can just talk.
Boosts engagement — Calls last longer than text chats, deepening trust.
Works everywhere — Voice calling is included in every plan and runs in any modern browser—no downloads needed.
Respects privacy — You control when voice is on or off and decide exactly what audio sample powers your Delphi.

▶️ Quick Start Guide

Navigate to the Voice page in Studio

Click “Start Recording” or Upload File (2-minute WAV/MP3)

Speak for 2 minutes in a quiet room, steady tone.

Press Stop (▢) then Save.

Wait for processing; it can take a few minutes to upload fully.

Tweak settings (Stability, Similarity, Speed) under the gear (⚙️).

Toggle Voice On in the upper-right. A blue switch confirms callers can now reach you.

Total setup time: about 5 minutes.

Full Feature Guides

Voice Recording

Your voice recording captures the 2-minute sample that powers every call.

Learn more

Voice Settings

Voice Settings give you full control over how your Delphi sounds.

Learn more

Voice Playground

The Voice Playground lets you audition scripts—podcasts, ads, book passages—without changing your live settings until you decide.

Learn more

Pro Voice Upgrade

Give your Delphi a studio-grade presence with a richer, more lifelike sound (included on Scaler and above).

Learn more

❓FAQs/Troubleshooting

Why can't I upload audio or video files as training data to my mind?

You can’t upload audio or video files as training data until you provide a voice recording. The recording teaches your Delphi which voice is yours so it can sort your words from everyone else’s and improves general transcription capabilities.

Can I upload voice samples in different languages? Does Delphi support different accents for each language?

No, you cannot upload voice samples in different languages. All uploaded samples must be in the same language. Mixing languages (or separate accents per language) confuses the model and degrades quality across all languages.

Why do I hear background hiss or static?

You hear a background hiss or static in your Delphi calls because the sample you initially uploaded was poor: the room or mic was noisy, or you didn't use a high quality enough microphone.

Re-record in a quieter space, drop Similarity below 60 %, and use your computer built-in microphone or phone built-in microphone to see if you can improve results!

See here for more information.

Why does the voice sound flat or robotic?

Your voice sounds flat or robotic likely for one of two reasons:

Too many mixed samples. Multiple clips with different mics or background noises can create robotic-sounding outputs. Stick to one clean 2-minute take.
Stability set too high. Lower Stability by 10-20 % and test again in a live call. See here for more information.

Why is my accent washed out?

Your accent is washed out likely because you're still on the Default model. Switch to For Accents 1; if it’s still weak, try For Accents 2. Make sure you're still playing around with the settings, while you're testing different models! See here for more information.

Please note that for some accents, even with these multiple models, it might be necessary to purchase Pro Voice. We hope that our built-in voice offering can get you to where you need, but there is only so much the instant model.

How do I get it to pronounce names correctly?

To get your Delphi to pronounce names correctly, use custom pronunciations. Add the word as you want it to be said phonetically. See here for more information.

Will this voice also identify me when I upload audio files?

Yes, this voice sample is also what will be used to identify your voice when you upload audio or video files. In other words, the same voice sample trains Delphi to separate your voice from others in any audio you upload.

Is this the voice people hear during video calls?

Yes, this is the voice that people hear during video calls as well. Voice and video calls both use the same settings and same sample, so they'll sound the same.

Is this the voice people hear when they click "read aloud" in chat conversations?

Yes, this is the voice that people will hear when they click "read aloud" in chat conversations.

How do I turn off read aloud?

You cannot turn off read aloud - it's always available and cannot be disabled.

Why does my voice sound different on a live call versus the Playground or Read Aloud?

Playground clips are “experimental,” rendered offline and a bit slower, so they can use looser settings that change each time. Calls must stream in real time, so Delphi adds extra stability for speed and consistency.

To hear the truest result, tweak settings and then start a live call. Use Playground only for creative reads, ads, or long scripts.

Why does the Playground and the Read Aloud function generate a slightly different take every time?

The reason the Playground and Read Aloud functions generate a slightly different take every time, even if the settings are the same, is that both run on generative AI text-to-speech technology. Each time you click Generate or Read Aloud, Delphi samples tiny shifts in pitch, timing, and energy—like rolling fresh dice inside the same rules. That touch of randomness keeps the voice from sounding canned, but it also means no two clips are identical.

To widen the picture, this is how generative AI works for all AI platforms! It's the reason that ChatGPT won't give you the same answer to one question asked twice.

PreviousMind Settings (gear icon)NextVoice Recording

Last updated 2 months ago