기간 한정 혜택- 연간 50% 할인사용하기
2026년 3월 10일GUIDE

How to Use Inline Tags in Fish Audio S2

How to Use Inline Tags in Fish Audio S2

Fish Audio S2 supports inline tags - short natural-language cues placed in square brackets anywhere in your text — to control how speech is delivered. This guide covers the supported tags, how to use them, and tips for getting the best results.


Basic Syntax

Place a tag in square brackets immediately before the word or phrase it should affect:

The door was open. [whispering] I didn't want to go inside.

Tags can be placed at any position in the text, and you can use multiple tags in a single generation.


S2 accepts free-form natural-language tags — you're not limited to a fixed list. That said, the tags below are well-tested and produce consistently strong results. Use them as starting points, or write your own descriptions (e.g. [speaking slowly, almost hesitant]) for more specific control.

Breathing & Vocal Reactions

TagDescription
[clears throat]Throat-clearing sound before speaking
[inhalation] / [inhale]Audible breath in
[exhale]Audible breath out
[sigh]Expressive sigh
[panting]Heavy, rapid breathing
[breathing]General audible breathing
[gasp]Sharp, sudden intake of breath

Vocal Sounds

TagDescription
[groan]Low sound of discomfort or exasperation
[moaning]Extended vocal sound of pain or displeasure
[sobbing]Crying with convulsive breaths
[crying]Audible tears in voice
[laughing]Full laughter
[chuckling]Soft, quiet laughter
[giggle]Light, high-pitched laughter

Pacing

TagDescription
[pause]Brief silence
[short pause]Shorter beat
[long pause]Extended silence for dramatic effect

Voice Style

TagDescription
[whispering] / [whispering voice]Hushed, breathy delivery
[soft voice]Quiet and gentle
[low voice]Deeper, lower-pitched register
[loud voice]Raised volume
[shouting]Full-volume yelling

Emotion

TagDescription
[excited]High energy, upbeat
[angry]Harsh, forceful tone
[sad]Heavy, downcast delivery

Other

TagDescription
[emphasis]Stress on the following word or phrase
[rustling sound]Background rustling noise

Placement

Tags affect what comes after them. Place the tag right before the point where you want the shift to happen.

Good — tag at the transition point:

I thought everything was fine. [whispering] Then I heard the noise.

Less effective — tag too early:

[whispering] I thought everything was fine. Then I heard the noise.

In this case the entire passage will be whispered, including the first sentence.


Combining Tags

You can chain multiple tags across a passage to create shifts in delivery:

[soft voice] I wasn't sure what to say. [long pause] [loud voice] But then it hit me.

Vocal reaction tags can be placed between sentences for natural-sounding transitions:

That was the third time this week. [sigh] I really need to fix that.

Multi-Speaker Dialogue

S2 supports multi-speaker, multi-turn generation with per-speaker inline tag control. Multi-speaker is coming soon to the Fish Audio playground and API — stay tuned.


Tips

Start simple. A single well-placed [whispering] or [sigh] can transform a passage. You don't need a tag on every sentence.

Use pauses for pacing. [pause] and [long pause] are among the most useful tags for making speech feel natural, especially before emotional shifts.

Let reactions carry emotion. Instead of relying on emotion tags alone, try combining with reactions: [sigh] [sad] I just don't know anymore. The sigh grounds the emotion physically.

Test and iterate. Different voices may respond to tags with varying intensity. If a tag feels too subtle, try reinforcing it with context in the surrounding text.


Kyle Cui

Kyle CuiX

Kyle is a Founding Engineer at Fish Audio and UC Berkeley Computer Scientist and Physicist. He builds scalable voice systems and grew Fish into the #1 global AI text-to-speech platform. Outside of startups, he has climbed 1345 trees so far around the Bay Area. Find his irresistibly clouty thoughts on X at @kile_sway.

Kyle Cui의 더 많은 글 보기

실감 나는 목소리를 만들어보세요

오늘부터 최고 품질의 오디오를 생성하세요.

이미 계정이 있으신가요? 로그인