A Complete Guide to Text-to-Speech on Mac: Settings, Usage, and Disabling Methods
Feb 28, 2026
You finished a 2,000-word podcast script in Pages, hit the Speak Selection shortcut, and heard a voice that sounded like it was recorded inside a microwave in 2009. You dug into System Settings, found six different menus that mention "speech" or "spoken content," changed three things, and somehow made it worse. Now your Mac announces every notification out loud, and you can't figure out how to turn it off.
macOS has had text-to-speech built in since the early 2000s. Apple has improved it significantly over the past few years, but the settings are scattered across multiple panels, the behavior changes between macOS versions, and the gap between what the built-in voices can do and what content creators actually need remains wide. The good news: once you know where everything lives, setup takes about 5 minutes. And when you outgrow the built-in options, the upgrade path is cleaner than most people expect.
macOS has 3 Separate TTS Systems. Most People Only Find 1.
This is the part that causes the most confusion. Apple doesn't have a single "text-to-speech" toggle. It has three distinct systems that overlap, each controlled from a different place:
| System | What It Does | Where to Find It | Primary Use |
|---|---|---|---|
| Spoken Content | Reads selected text or entire screen aloud | System Settings > Accessibility > Spoken Content | Reading articles, proofreading, and accessibility |
| VoiceOver | Full-screen reader for visually impaired users | System Settings > Accessibility > VoiceOver | Navigation, accessibility |
| Siri Voice | Powers Siri responses and dictation feedback | System Settings > Siri | Virtual assistant responses |
Most people searching "text to speech on Mac" want Spoken Content. That's the feature that reads selected text in any app using a keyboard shortcut. VoiceOver is a full accessibility tool that narrates everything on screen, including buttons, menus, and window titles. Turning on VoiceOver when you just want text read aloud is like calling a fire truck to light a candle.
Setting Up Spoken Content: The 5-Minute Setup
For macOS Sonoma (14) and later
- Open System Settings (click the Apple menu > System Settings)
- Click Accessibility in the sidebar
- Click Spoken Content
- Toggle on Speak Selection
- Choose your preferred voice by clicking the dropdown next to "System Voice."
- Adjust the speaking rate slider to your preference
- Optionally toggle on the Speak item under the pointer if you want hover-to-read functionality
For macOS Ventura (13) and earlier
The path is slightly different on older versions:
- Open System Preferences (not System Settings)
- Click Accessibility
- Click Spoken Content in the left sidebar
- Check Speak Selection
- Click System Voice dropdown to pick a voice
- Adjust the speaking rate
The keyboard shortcut
Once Spoken Content is enabled, select any text in any application and press Option + Esc to hear it read aloud. You can customize this shortcut:
- In the Spoken Content settings, click Options next to Speak Selection
- Set your preferred key combination
- Enable or disable the on-screen controller (a small floating panel with play/pause/skip controls)
That on-screen controller is worth enabling. It lets you pause, resume, skip forward, and adjust speed without going back to System Settings every time.
Choosing the Right Voice (Apple Has More Than You Think)
Most Mac users have only heard "Samantha" or the default Siri voice. Apple actually offers dozens of voices across multiple languages, and the quality difference between the basic voices and the premium downloads is significant.
How to download premium voices
- Go to System Settings > Accessibility > Spoken Content
- Click the System Voice dropdown
- Click Manage Voices
- Browse by language. Premium voices are marked with a download icon.
- Click the download arrow next to any voice. Files range from 150 MB to 900 MB, depending on the quality tier.
Voice quality tiers
Apple categorizes its voices into several quality levels:
- Compact voices: Small file size, robotic quality. Fine for quick system announcements. Not usable for listening to anything longer than a paragraph.
- Standard voices: Mid-tier quality. Decent for proofreading short documents. You'll notice unnatural rhythm in longer passages.
- Premium/Enhanced voices: The largest downloads, but noticeably more natural. These use neural network synthesis and sound closer to a real person. "Zoe (Premium)," "Evan (Premium)," and several others fall into this category.
Even the premium voices, though, have a ceiling. They sound good for 2 to 3 minutes. Past that, the prosody flattens, emotional variation disappears, and the voice settles into a monotone rhythm that's hard to listen to for extended periods. That's not a bug. It's a limitation of the on-device model size Apple can practically ship.
Using Text-to-Speech Across Mac Apps
Once Spoken Content is active, the Option + Esc shortcut works in nearly every Mac application. Here's how it behaves in the most common ones:
Pages and TextEdit: Select text, press the shortcut. Works reliably. The voice reads the selected passage and stops.
Safari and Chrome: Select text on any webpage and press the shortcut. Useful for listening to articles while doing something else. Safari also has a separate Reader Mode that strips page formatting before reading, which sometimes improves pacing.
Preview (PDFs): Select text in a PDF and press the shortcut. Quality depends on whether the PDF has selectable text. Scanned documents without OCR won't work.
Mail: Select an email body, press the shortcut. Handy for long emails you'd rather listen to than read.
Terminal: Yes, you can also trigger TTS from the command line. Run say "Your text here" and macOS reads it aloud using the system voice. For longer text: say -f /path/to/textfile.txt. You can even export to audio: say -f script.txt -o output.aiff. That last command is the closest macOS gets to a built-in audio export feature.
The Terminal trick most people don't know
The say command accepts a -v flag to specify any installed voice:
say -v "Zoe (Premium)" "This is a test of the premium voice."
say -v "?"
That second command lists every voice installed on your system. It's the fastest way to audition voices without clicking through System Settings.
How to Disable Text-to-Speech (When It Won't Stop Talking)
This section exists because a surprising number of Mac users accidentally enable VoiceOver or Spoken Content and can't figure out how to silence it. If your Mac is currently narrating everything on screen, here's the fastest fix:
If VoiceOver is running (Mac is narrating every click and button)
Press Cmd + F5 immediately. This toggles VoiceOver off. On MacBooks with Touch Bar or newer models, you can also triple-press the Touch ID button.
If Speak Selection won't stop mid-read
Press Option + Esc again to stop the current reading. If that doesn't work, click anywhere outside the selected text.
If your Mac speaks notifications or alerts
- Go to System Settings > Accessibility > Spoken Content
- Toggle off Speak announcements
- While you're there, check that Speak item under the pointer is also off if you don't want hover-to-read
Full disable checklist
To completely silence all TTS on your Mac:
- Spoken Content: System Settings > Accessibility > Spoken Content > Toggle off everything
- VoiceOver: System Settings > Accessibility > VoiceOver > Toggle off (or press Cmd + F5)
- Siri voice feedback: System Settings > Siri > Voice Feedback > Off
- Audio alerts: System Settings > Sound > Uncheck "Play sound on startup" and adjust alert volume
After running through that list, your Mac will stay silent unless you explicitly trigger speech again.
Where macOS TTS Hits Its Ceiling (and What to Do Next)
Apple's built-in voices are good enough for two things: quick proofreading of short documents and accessibility. For anything beyond that, you'll run into hard limitations:
- No voice customization: You can't adjust emotion, emphasis, or pacing beyond a single speed slider. The voice reads a joke and a tragedy with the same inflection.
- Limited voice selection: Even with all premium voices downloaded, you're choosing from maybe 15-20 English options. If you need a specific tone, accent, or personality for content production, the library is too small.
- No voice cloning: There's no way to create a voice that sounds like you or matches a specific brand voice.
- Audio export is primitive: The say command exports to AIFF, but there's no built-in way to generate MP3, WAV, or podcast-ready audio with proper normalization.
- Multilingual quality drops fast: Apple's premium voices are strong in English. Switch to Thai, Arabic, or Portuguese, and you're back to robotic quality.
- No long-form consistency: The prosody drifts after 2-3 minutes, making extended listening fatiguing. A 20-minute script will sound noticeably worse in minute 18 than in minute 1.
These limitations don't matter if you're using TTS to catch typos in an email. They matter a lot if you're producing a YouTube video, narrating a course, or converting written content into audio that an audience will actually listen to.
From Mac Proofreading to Professional Audio Production
When your needs outgrow the built-in voices, the workflow shift is straightforward: keep writing on your Mac, but generate audio through a dedicated AI TTS platform.
Fish Audio fills the exact gaps macOS leaves open. Here's what changes when you switch:
2,000,000+ voices instead of 20. Fish Audio's voice library is categorized by language, accent, tone, and use case. Need a warm, conversational American English voice for a tutorial? Filter for it. Need a crisp Japanese narrator for a localized product video? It's there. The selection is roughly 100,000x larger than what Apple ships.
Prosody that holds across long scripts. Fish Audio's model architecture handles emotional variation and pacing across extended content. A 15-minute voiceover maintains its character from start to finish, without the monotone drift that kicks in after 2-3 minutes with macOS voices. Questions sound like questions. Emphasis lands where it should.
15-second voice cloning. Want every piece of audio to sound like you? Upload a 15-second sample, and Fish Audio creates a clone that carries your vocal identity across any text you generate. Apple offers nothing comparable.
13+ languages without quality collapse. Fish Audio maintains native-level pronunciation across its full language set. A voice that sounds natural in English stays natural in Spanish, Mandarin, Japanese, and Arabic. No sudden quality cliff when you switch languages.
Production-ready audio files. Generate and download MP3 or WAV files ready for YouTube, podcast hosting, course platforms, or any other distribution channel. No Terminal workarounds, no AIFF-to-MP3 conversion chains.
The Mac creator's workflow
- Write your script in Pages, Google Docs, or any Mac text editor
- Quick proofread using macOS Spoken Content (Option + Esc) to catch awkward phrasing
- Copy the finished text and paste it into fish.audio/text-to-speech
- Choose a voice from the library (or use your cloned voice)
- Adjust emotion and pacing to match your content
- Generate and download the audio file
- Drop into your project: Final Cut Pro, Logic Pro, GarageBand, your podcast editor, whatever you use
That workflow keeps macOS TTS in its sweet spot (free, instant proofreading) and uses Fish Audio for the part that actually needs to sound professional.
What it costs
Fish Audio offers a free tier generous enough to test with real scripts. Paid plans start at $11/ per month for 250,000 credits, up to 200 minutes (~3h 20m) of S1 generation, or up to 400 minutes (~6h 40m) of v1.5 or v1.6 generation. For perspective, macOS TTS is free but can't export usable audio files. A human voice actor for 15 hours of recorded content would cost $3,000 to $15,000. The full pricing breakdown is here.

Conclusion
macOS has a capable text-to-speech (TTS) system hiding behind scattered settings panels. Once you know that Spoken Content is the feature you actually want, that Option + Esc is the shortcut, and that premium voice downloads exist, the built-in Mac text-to-speech setuphandles quick proofreading and casual listening well. And if VoiceOver accidentally starts narrating your entire screen, Cmd+F5 is your panic button.
But the built-in voices were designed for accessibility and system feedback, not content production. The moment you need audio that an audience will listen to for more than 2 minutes, voices that match your brand, or multilingual output that doesn't sound like a translation engine, you've outgrown what Apple ships. Write on your Mac, proofread with Spoken Content, and produce with Fish Audio. The writing tool you already have, paired with a text-to-speech engine built for the audio your audience actually hears.
