Question 1

What is Fish Audio S2 Pro?

Accepted Answer

Fish Audio S2 Pro is a leading text-to-speech model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages, it combines reinforcement learning alignment with a Dual-Autoregressive (Dual-AR) architecture — a 4B-parameter Slow AR for semantic prediction and a 400M-parameter Fast AR for acoustic detail. The release includes model weights, fine-tuning code, and an SGLang-based streaming inference engine.

Question 2

How does fine-grained inline control work?

Accepted Answer

S2 Pro enables localized control over speech generation by embedding natural-language instructions directly within the text using [tag] syntax. Rather than relying on a fixed set of predefined tags, S2 Pro accepts free-form textual descriptions — such as [whisper in small voice], [professional broadcast tone], or [pitch up] — allowing open-ended expression control at the word level. Over 15,000 unique tags are supported, including [pause], [emphasis], [laughing], [excited], [whisper], [singing], and many more.

Question 3

What is the streaming performance of S2 Pro?

Accepted Answer

On a single NVIDIA H200 GPU, S2 Pro achieves a Real-Time Factor (RTF) of 0.195, time-to-first-audio of ~100ms, and throughput of 3,000+ acoustic tokens per second while maintaining RTF below 0.5. The SGLang-based inference engine inherits all LLM-native serving optimizations — including continuous batching, paged KV cache, CUDA graph replay, and RadixAttention-based prefix caching.

Question 4

How many languages does S2 Pro support?

Accepted Answer

S2 Pro supports 80+ languages. Tier 1 languages (highest quality) include Japanese, English, and Chinese. Tier 2 languages include Korean, Spanish, Portuguese, Arabic, Russian, French, and German. Many additional languages are supported including Swedish, Italian, Turkish, Dutch, Hindi, Thai, Vietnamese, and more.

Question 5

What is the license for S2 Pro?

Accepted Answer

S2 Pro is licensed under the Fish Audio Research License. Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio — contact business@fish.audio for details.

Fish Audio S2

믿을 수 없을 만큼 사실적인 음성 생성

S2의 차별점

초저지연

오픈 도메인 제어 & 다중 화자

완전 오픈소스

Fish Audio S2 API로 구축

자주 묻는 질문

Fish Audio S2 Pro란 무엇인가요?

세밀한 인라인 제어는 어떻게 작동하나요?

S2 Pro의 스트리밍 성능은 어떤가요?

S2 Pro는 몇 개 언어를 지원하나요?

S2 Pro의 라이선스는 무엇인가요?