Question 1

What is Fish Audio S2 Pro?

Accepted Answer

Fish Audio S2 Pro is a leading text-to-speech model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages, it combines reinforcement learning alignment with a Dual-Autoregressive (Dual-AR) architecture — a 4B-parameter Slow AR for semantic prediction and a 400M-parameter Fast AR for acoustic detail. The release includes model weights, fine-tuning code, and an SGLang-based streaming inference engine.

Question 2

How does fine-grained inline control work?

Accepted Answer

S2 Pro enables localized control over speech generation by embedding natural-language instructions directly within the text using [tag] syntax. Rather than relying on a fixed set of predefined tags, S2 Pro accepts free-form textual descriptions — such as [whisper in small voice], [professional broadcast tone], or [pitch up] — allowing open-ended expression control at the word level. Over 15,000 unique tags are supported, including [pause], [emphasis], [laughing], [excited], [whisper], [singing], and many more.

Question 3

What is the streaming performance of S2 Pro?

Accepted Answer

On a single NVIDIA H200 GPU, S2 Pro achieves a Real-Time Factor (RTF) of 0.195, time-to-first-audio of ~100ms, and throughput of 3,000+ acoustic tokens per second while maintaining RTF below 0.5. The SGLang-based inference engine inherits all LLM-native serving optimizations — including continuous batching, paged KV cache, CUDA graph replay, and RadixAttention-based prefix caching.

Question 4

How many languages does S2 Pro support?

Accepted Answer

S2 Pro supports 80+ languages. Tier 1 languages (highest quality) include Japanese, English, and Chinese. Tier 2 languages include Korean, Spanish, Portuguese, Arabic, Russian, French, and German. Many additional languages are supported including Swedish, Italian, Turkish, Dutch, Hindi, Thai, Vietnamese, and more.

Question 5

What is the license for S2 Pro?

Accepted Answer

S2 Pro is licensed under the Fish Audio Research License. Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio — contact business@fish.audio for details.

Fish Audio S2

Générez une voix incroyablement réaliste

Ce qui rend S2 différent

Latence ultra-faible

Contrôle de domaine ouvert & multi-locuteur

Entièrement open source

Construisez avec l'API Fish Audio S2

Questions fréquemment posées

Qu'est-ce que Fish Audio S2 Pro ?

Comment fonctionne le contrôle en ligne à grain fin ?

Quelle est la performance de streaming de S2 Pro ?

Combien de langues S2 Pro prend-il en charge ?

Quelle est la licence de S2 Pro ?