Question 1

What is Fish Audio S2 Pro?

Accepted Answer

Fish Audio S2 Pro is a leading text-to-speech model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages, it combines reinforcement learning alignment with a Dual-Autoregressive (Dual-AR) architecture — a 4B-parameter Slow AR for semantic prediction and a 400M-parameter Fast AR for acoustic detail. The release includes model weights, fine-tuning code, and an SGLang-based streaming inference engine.

Question 2

How does fine-grained inline control work?

Accepted Answer

S2 Pro enables localized control over speech generation by embedding natural-language instructions directly within the text using [tag] syntax. Rather than relying on a fixed set of predefined tags, S2 Pro accepts free-form textual descriptions — such as [whisper in small voice], [professional broadcast tone], or [pitch up] — allowing open-ended expression control at the word level. Over 15,000 unique tags are supported, including [pause], [emphasis], [laughing], [excited], [whisper], [singing], and many more.

Question 3

What is the streaming performance of S2 Pro?

Accepted Answer

On a single NVIDIA H200 GPU, S2 Pro achieves a Real-Time Factor (RTF) of 0.195, time-to-first-audio of ~100ms, and throughput of 3,000+ acoustic tokens per second while maintaining RTF below 0.5. The SGLang-based inference engine inherits all LLM-native serving optimizations — including continuous batching, paged KV cache, CUDA graph replay, and RadixAttention-based prefix caching.

Question 4

How many languages does S2 Pro support?

Accepted Answer

S2 Pro supports 80+ languages. Tier 1 languages (highest quality) include Japanese, English, and Chinese. Tier 2 languages include Korean, Spanish, Portuguese, Arabic, Russian, French, and German. Many additional languages are supported including Swedish, Italian, Turkish, Dutch, Hindi, Thai, Vietnamese, and more.

Question 5

What is the license for S2 Pro?

Accepted Answer

S2 Pro is licensed under the Fish Audio Research License. Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio — contact business@fish.audio for details.

Fish Audio S2

Genera voz increíblemente realista

Qué hace diferente a S2

Latencia ultrabaja

Control de dominio abierto y multihablante

Totalmente de código abierto

Construye con la API de Fish Audio S2

Preguntas frecuentes

¿Qué es Fish Audio S2 Pro?

¿Cómo funciona el control en línea de grano fino?

¿Cuál es el rendimiento de streaming de S2 Pro?

¿Cuántos idiomas admite S2 Pro?

¿Cuál es la licencia de S2 Pro?