A Complete Guide to Text-to-Speech on Mac: Settings, Usage, and Disabling Methods

Feb 28, 2026

A Complete Guide to Text-to-Speech on Mac: Settings, Usage, and Disabling Methods

You finished a 2,000-word podcast script in Pages, hit the Speak Selection shortcut, and heard a voice that sounded like it was recorded inside a microwave in 2009. You dug into System Settings, found six different menus that mention "speech" or "spoken content," changed three things, and somehow made it worse. Now your Mac announces every notification out loud, and you can't figure out how to turn it off.

macOS has had text-to-speech built in since the early 2000s. Apple has improved it significantly over the past few years, but the settings are scattered across multiple panels, the behavior changes between macOS versions, and the gap between what the built-in voices can do and what content creators actually need remains wide. The good news: once you know where everything lives, setup takes about 5 minutes. And when you outgrow the built-in options, the upgrade path is cleaner than most people expect.

macOS has 3 Separate TTS Systems. Most People Only Find 1.

This is the part that causes the most confusion. Apple doesn't have a single "text-to-speech" toggle. It has three distinct systems that overlap, each controlled from a different place:

SystemWhat It DoesWhere to Find ItPrimary Use
Spoken ContentReads selected text or entire screen aloudSystem Settings > Accessibility > Spoken ContentReading articles, proofreading, and accessibility
VoiceOverFull-screen reader for visually impaired usersSystem Settings > Accessibility > VoiceOverNavigation, accessibility
Siri VoicePowers Siri responses and dictation feedbackSystem Settings > SiriVirtual assistant responses

Most people searching "text to speech on Mac" want Spoken Content. That's the feature that reads selected text in any app using a keyboard shortcut. VoiceOver is a full accessibility tool that narrates everything on screen, including buttons, menus, and window titles. Turning on VoiceOver when you just want text read aloud is like calling a fire truck to light a candle.

Setting Up Spoken Content: The 5-Minute Setup

For macOS Sonoma (14) and later

  1. Open System Settings (click the Apple menu > System Settings)
  2. Click Accessibility in the sidebar
  3. Click Spoken Content
  4. Toggle on Speak Selection
  5. Choose your preferred voice by clicking the dropdown next to "System Voice."
  6. Adjust the speaking rate slider to your preference
  7. Optionally toggle on the Speak item under the pointer if you want hover-to-read functionality

For macOS Ventura (13) and earlier

The path is slightly different on older versions:

  1. Open System Preferences (not System Settings)
  2. Click Accessibility
  3. Click Spoken Content in the left sidebar
  4. Check Speak Selection
  5. Click System Voice dropdown to pick a voice
  6. Adjust the speaking rate

The keyboard shortcut

Once Spoken Content is enabled, select any text in any application and press Option + Esc to hear it read aloud. You can customize this shortcut:

  1. In the Spoken Content settings, click Options next to Speak Selection
  2. Set your preferred key combination
  3. Enable or disable the on-screen controller (a small floating panel with play/pause/skip controls)

That on-screen controller is worth enabling. It lets you pause, resume, skip forward, and adjust speed without going back to System Settings every time.

Choosing the Right Voice (Apple Has More Than You Think)

Most Mac users have only heard "Samantha" or the default Siri voice. Apple actually offers dozens of voices across multiple languages, and the quality difference between the basic voices and the premium downloads is significant.

How to download premium voices

  1. Go to System Settings > Accessibility > Spoken Content
  2. Click the System Voice dropdown
  3. Click Manage Voices
  4. Browse by language. Premium voices are marked with a download icon.
  5. Click the download arrow next to any voice. Files range from 150 MB to 900 MB, depending on the quality tier.

Voice quality tiers

Apple categorizes its voices into several quality levels:

  • Compact voices: Small file size, robotic quality. Fine for quick system announcements. Not usable for listening to anything longer than a paragraph.
  • Standard voices: Mid-tier quality. Decent for proofreading short documents. You'll notice unnatural rhythm in longer passages.
  • Premium/Enhanced voices: The largest downloads, but noticeably more natural. These use neural network synthesis and sound closer to a real person. "Zoe (Premium)," "Evan (Premium)," and several others fall into this category.

Even the premium voices, though, have a ceiling. They sound good for 2 to 3 minutes. Past that, the prosody flattens, emotional variation disappears, and the voice settles into a monotone rhythm that's hard to listen to for extended periods. That's not a bug. It's a limitation of the on-device model size Apple can practically ship.

Using Text-to-Speech Across Mac Apps

Once Spoken Content is active, the Option + Esc shortcut works in nearly every Mac application. Here's how it behaves in the most common ones:

Pages and TextEdit: Select text, press the shortcut. Works reliably. The voice reads the selected passage and stops.

Safari and Chrome: Select text on any webpage and press the shortcut. Useful for listening to articles while doing something else. Safari also has a separate Reader Mode that strips page formatting before reading, which sometimes improves pacing.

Preview (PDFs): Select text in a PDF and press the shortcut. Quality depends on whether the PDF has selectable text. Scanned documents without OCR won't work.

Mail: Select an email body, press the shortcut. Handy for long emails you'd rather listen to than read.

Terminal: Yes, you can also trigger TTS from the command line. Run say "Your text here" and macOS reads it aloud using the system voice. For longer text: say -f /path/to/textfile.txt. You can even export to audio: say -f script.txt -o output.aiff. That last command is the closest macOS gets to a built-in audio export feature.

The Terminal trick most people don't know

The say command accepts a -v flag to specify any installed voice:

say -v "Zoe (Premium)" "This is a test of the premium voice."

say -v "?"

That second command lists every voice installed on your system. It's the fastest way to audition voices without clicking through System Settings.

How to Disable Text-to-Speech (When It Won't Stop Talking)

This section exists because a surprising number of Mac users accidentally enable VoiceOver or Spoken Content and can't figure out how to silence it. If your Mac is currently narrating everything on screen, here's the fastest fix:

If VoiceOver is running (Mac is narrating every click and button)

Press Cmd + F5 immediately. This toggles VoiceOver off. On MacBooks with Touch Bar or newer models, you can also triple-press the Touch ID button.

If Speak Selection won't stop mid-read

Press Option + Esc again to stop the current reading. If that doesn't work, click anywhere outside the selected text.

If your Mac speaks notifications or alerts

  1. Go to System Settings > Accessibility > Spoken Content
  2. Toggle off Speak announcements
  3. While you're there, check that Speak item under the pointer is also off if you don't want hover-to-read

Full disable checklist

To completely silence all TTS on your Mac:

  • Spoken Content: System Settings > Accessibility > Spoken Content > Toggle off everything
  • VoiceOver: System Settings > Accessibility > VoiceOver > Toggle off (or press Cmd + F5)
  • Siri voice feedback: System Settings > Siri > Voice Feedback > Off
  • Audio alerts: System Settings > Sound > Uncheck "Play sound on startup" and adjust alert volume

After running through that list, your Mac will stay silent unless you explicitly trigger speech again.

Where macOS TTS Hits Its Ceiling (and What to Do Next)

Apple's built-in voices are good enough for two things: quick proofreading of short documents and accessibility. For anything beyond that, you'll run into hard limitations:

  • No voice customization: You can't adjust emotion, emphasis, or pacing beyond a single speed slider. The voice reads a joke and a tragedy with the same inflection.
  • Limited voice selection: Even with all premium voices downloaded, you're choosing from maybe 15-20 English options. If you need a specific tone, accent, or personality for content production, the library is too small.
  • No voice cloning: There's no way to create a voice that sounds like you or matches a specific brand voice.
  • Audio export is primitive: The say command exports to AIFF, but there's no built-in way to generate MP3, WAV, or podcast-ready audio with proper normalization.
  • Multilingual quality drops fast: Apple's premium voices are strong in English. Switch to Thai, Arabic, or Portuguese, and you're back to robotic quality.
  • No long-form consistency: The prosody drifts after 2-3 minutes, making extended listening fatiguing. A 20-minute script will sound noticeably worse in minute 18 than in minute 1.

These limitations don't matter if you're using TTS to catch typos in an email. They matter a lot if you're producing a YouTube video, narrating a course, or converting written content into audio that an audience will actually listen to.

From Mac Proofreading to Professional Audio Production

When your needs outgrow the built-in voices, the workflow shift is straightforward: keep writing on your Mac, but generate audio through a dedicated AI TTS platform.

Fish Audio fills the exact gaps macOS leaves open. Here's what changes when you switch:

2,000,000+ voices instead of 20. Fish Audio's voice library is categorized by language, accent, tone, and use case. Need a warm, conversational American English voice for a tutorial? Filter for it. Need a crisp Japanese narrator for a localized product video? It's there. The selection is roughly 100,000x larger than what Apple ships.

Prosody that holds across long scripts. Fish Audio's model architecture handles emotional variation and pacing across extended content. A 15-minute voiceover maintains its character from start to finish, without the monotone drift that kicks in after 2-3 minutes with macOS voices. Questions sound like questions. Emphasis lands where it should.

15-second voice cloning. Want every piece of audio to sound like you? Upload a 15-second sample, and Fish Audio creates a clone that carries your vocal identity across any text you generate. Apple offers nothing comparable.

13+ languages without quality collapse. Fish Audio maintains native-level pronunciation across its full language set. A voice that sounds natural in English stays natural in Spanish, Mandarin, Japanese, and Arabic. No sudden quality cliff when you switch languages.

Production-ready audio files. Generate and download MP3 or WAV files ready for YouTube, podcast hosting, course platforms, or any other distribution channel. No Terminal workarounds, no AIFF-to-MP3 conversion chains.

The Mac creator's workflow

  1. Write your script in Pages, Google Docs, or any Mac text editor
  2. Quick proofread using macOS Spoken Content (Option + Esc) to catch awkward phrasing
  3. Copy the finished text and paste it into fish.audio/text-to-speech
  4. Choose a voice from the library (or use your cloned voice)
  5. Adjust emotion and pacing to match your content
  6. Generate and download the audio file
  7. Drop into your project: Final Cut Pro, Logic Pro, GarageBand, your podcast editor, whatever you use

That workflow keeps macOS TTS in its sweet spot (free, instant proofreading) and uses Fish Audio for the part that actually needs to sound professional.

What it costs

Fish Audio offers a free tier generous enough to test with real scripts. Paid plans start at $11/ per month for 250,000 credits, up to 200 minutes (~3h 20m) of S1 generation, or up to 400 minutes (~6h 40m) of v1.5 or v1.6 generation. For perspective, macOS TTS is free but can't export usable audio files. A human voice actor for 15 hours of recorded content would cost $3,000 to $15,000. The full pricing breakdown is here. fish-logo

Conclusion

macOS has a capable text-to-speech (TTS) system hiding behind scattered settings panels. Once you know that Spoken Content is the feature you actually want, that Option + Esc is the shortcut, and that premium voice downloads exist, the built-in Mac text-to-speech setuphandles quick proofreading and casual listening well. And if VoiceOver accidentally starts narrating your entire screen, Cmd+F5 is your panic button.

But the built-in voices were designed for accessibility and system feedback, not content production. The moment you need audio that an audience will listen to for more than 2 minutes, voices that match your brand, or multilingual output that doesn't sound like a translation engine, you've outgrown what Apple ships. Write on your Mac, proofread with Spoken Content, and produce with Fish Audio. The writing tool you already have, paired with a text-to-speech engine built for the audio your audience actually hears.

Create voices that feel real

Start generating the highest quality audio today.

Already have an account? Log in

Share this article


Kyle Cui

Kyle CuiX

Kyle is a Founding Engineer at Fish Audio and UC Berkeley Computer Scientist and Physicist. He builds scalable voice systems and grew Fish into the #1 global AI text-to-speech platform. Outside of startups, he has climbed 1345 trees so far around the Bay Area. Find his irresistibly clouty thoughts on X at @kile_sway.

Read more from Kyle Cui >

Recent Articles

View all >