AI Voice Design: Create a Custom Voice from a Single Text Prompt
Describe a voice in plain words and Fish Audio's Voice Design generates it in about 15 seconds. Create custom AI character voices — free during launch.
You need a voice that doesn't exist yet. Maybe it's a sarcastic robot sidekick for your game, a warm narrator for your documentary, or a late-night radio host for your podcast intro. Browsing voice libraries gets you the same hundred voices everyone else is using, and voice cloning requires a real person to record samples first.
Voice Design solves this differently. Now live on Fish Audio, it lets you create a completely original, custom AI voice by describing it in plain text — age, gender, accent, tone, pacing, mood — and turns that description into a usable voice model in about 15 seconds. No recordings, no voice actors, no library-diving.
During launch, voice generation with Voice Design is completely free (normally 2,000 credits per generation).
What Is AI Voice Design?
AI voice design is the process of creating a custom, original synthetic voice from a written description instead of an audio sample. You type a prompt describing how the voice should sound — for example, "a warm, slightly raspy middle-aged narrator with a soft American accent" — and the AI generates a brand-new voice matching that description, ready to use for text to speech.
This makes voice design fundamentally different from voice cloning, which replicates an existing person's voice from recordings. With voice design, the voice you create has never existed before — no one else is using it, anywhere.
How to Create Your Own AI Voice with Voice Design (Step by Step)
Wondering how to make an AI voice from nothing but a description? Here's the full workflow, start to finish. Head to the Create Voice page and select Voice Design.
Step 1: Describe the voice you want
In the description box, write out the voice you're imagining. The more specific, the better. Cover these dimensions:
- Age & gender — "a woman in her late 30s"
- Accent — "soft American accent," "light British lilt"
- Tone & texture — "husky," "bright," "slightly raspy"
- Pacing — "relaxed and unhurried," "quick and energetic"
- Mood & context — "like they're speaking to a single listener in a quiet room"
Not sure where to start? Use one of the built-in starter prompts, such as a warm late-night radio host, Documentary narrator, or Children's storyteller — and edit from there.
You can also add optional preview text (the script your samples will speak), or leave it blank and let the system write an in-context sample for you. When you're ready, hit Generate Samples. Generation normally costs 2,000 credits, but it's free during launch.
Step 2: Compare two generated voice samples and pick one
Voice Design generates two distinct voice samples from your prompt. Play both, compare, and select the one that fits. Not quite right? Tweak your description and hit Re-generate Samples — iterating costs nothing during the launch period, so refine until it sounds exactly like the voice in your head.
Step 3: Save it as your own voice model
Hit Continue and turn your chosen sample into a reusable voice model:
- Name and cover — give your voice an identity
- Tags — gender, age, voice style (warm, smooth, deep, breathy...)
- Use cases — conversational, narration, character voice, social media, educational, advertisement, or entertainment
Then choose who can use it:
- Public — listed on the discovery page for everyone to find and use
- Unlisted — hidden from discovery, shareable via direct link
- Private — visible only to you
Confirm that the voice doesn't impersonate a real, identifiable person, click Create Voice, and you're done. Your custom AI voice now lives in your library, ready for any text-to-speech project — and with S2's word-level inline tags, you can direct exactly how it delivers every line.
Start with a starter prompt → — generation is free during launch.
How to Write Better Voice Design Prompts
The quality of your voice depends on the quality of your description. Here's what separates a generic result from a perfect one.
Take this starter prompt:
"A warm, intimate late-night radio host in their late 30s with a soft, husky voice. Relaxed, unhurried pacing with occasional gentle chuckles, like they're speaking to a single listener in a quiet room."
Notice what it does:
-
Anchors a persona ("late-night radio host") — a role the model can instantly characterize, more powerful than listing ten adjectives
-
Stacks concrete vocal qualities ("soft, husky") — texture words beat vague ones like "nice" or "good"
-
Specifies delivery ("relaxed, unhurried pacing with occasional gentle chuckles") — pacing and quirks bring a voice to life
-
Sets the scene ("speaking to a single listener in a quiet room") — context shapes intimacy and energy better than any single adjective
Weak prompt: "A female voice, pleasant and clear."
Strong prompt: "A cheerful tour guide in her 20s with a bright Australian accent, fast playful pacing, always sounding mid-smile."
One persona, three or four sensory details, one scene. That's the formula.
A Character Voice Generator Built for Original Characters
If you create characters — for games, animations, audiobooks, audio dramas, or virtual companions — Voice Design works as a character voice generator with one decisive advantage: every voice is original.
Library voices are shared by thousands of users; your villain shouldn't sound like someone else's meditation app. Cloning a real person's voice for a fictional character raises consent and licensing questions. A designed voice sidesteps both — a voice built for your character, with no real-person likeness behind it.
A few prompt directions to spark ideas — from grounded to fully fantastical:
- "An ancient, gravelly dragon with a slow, rumbling delivery and theatrical menace"
- "A hyperactive male teenage inventor, fast talker, voice cracks slightly when excited"
- "A serene elderly librarian with a whisper-soft tone and deliberate pauses"
- "A hard-boiled detective in his 50s, low gravelly monotone, world-weary, dry delivery"
- "A bubbly cooking-show host with a thick Italian accent, loud, expressive, always on the edge of laughter"
- "A glitchy ship AI: flat, precise, slightly too calm, with clipped robotic cadence"
Generate, compare two samples, refine, save — a full original cast in an afternoon. Then put them in a scene together with multispeaker text to speech, or browse AI character voices others have built for inspiration.
Voice Design vs. Voice Cloning: Which Should You Use?
Fish Audio now offers three ways to create a voice, and they serve different jobs:
| Voice Design | Instant Voice Clone | Professional Voice Clone | |
|---|---|---|---|
| Input | A text description | ~10s of audio | Studio-quality recordings |
| Time | ~15 seconds | ~1 minute | 1–2 hours |
| Best for | Original characters & brand-new voices | Quickly replicating an existing recording | Verified, studio-grade clone of a real person |
| Voice exists already? | No — created from scratch | Yes | Yes — with ownership verification |
The rule of thumb: if the voice doesn't exist yet, design it. If it does, clone it.
Original by design
There's a quieter benefit to designed voices worth naming: they don't borrow from anyone. Every Voice Design output is generated from a description, not from a person's recordings — and every voice created on Fish Audio must pass a confirmation that it doesn't impersonate a real, identifiable person. It's a workflow designed to keep your project clear of consent and likeness concerns.
And when the voice you need does belong to a real person — yours, or a voice actor's — we believe the answer isn't to blur that line, but to make ownership explicit. Voice actors around the world are pushing for exactly this: consent and fair compensation for how their voices are used in the AI era. That's the idea behind our new Professional Voice Clone: a verified, studio-quality clone of a real person's voice, built on real-time ownership verification, with optional commercial release and revenue share for the voice owner. It's the start of a cleaner deal between voice owners and the people who want to use their voices — more on that in our upcoming deep dive.
Design Your First Voice in 15 Seconds
The right voice used to mean auditioning actors, digging through libraries, or settling for "close enough." Now it means writing one good sentence.
Design your first voice free → — free during launch.
Sabrina is part of Fish Audio's support and marketing team, helping users get the most out of AI voice products while turning launches, updates, and customer insights into clear, practical content.
اقرأ المزيد من Sabrina Shu
