期間限定プロモ - 年間50%OFF利用する

How to Make an AI Companion with Pipecat

Nov 21, 2025

JamesJamesTutorial
How to Make an AI Companion with Pipecat

AI companion apps hit about 220 million global downloads on the Apple App Store and Google Play Store in 2025, with downloads rising 88% year-over-year. With new AI companions popping up every day along with regular controversy and discussion over their usage, it’s hard to look past the blooming space. Whether it’s companionship, a friend, someone to talk to, or someone to practice speaking with that your users are looking for, AI companions are forming a new sector of frontier technology that combines much of the cutting-edge tools available today. Generative videos, generative text, and generative speech are all mixing to create the opportunity to make a companion who feels real and present.

AI Companion’s Voice

One of the most important aspects of an AI companion is its voice. As the distilled essence of the companion’s personality, character, and identity, the AI companion’s voice is vital in conveying who they are. The highest quality audio is necessary to create the best possible experience for the user, and furthermore needs capabilities like real-time streaming for live chats or calling, emotional steerability, and customizability.

Pipecat

For developers creating real-time AI companions who chat with live voice calls, Pipecat is a great option to get started. Pipecat offers a developer platform and SDKs for creating live streaming chats in voice, through their parent company’s Daily rooms product. Pipecat powers the infrastructure of streaming information to and from the AI companion and putting the building blocks of speech-to-text, LLM, and text-to-speech together. Pipecat uses Daily rooms as the environment where the user and AI companions dial into. Furthermore, Pipecat offers many integrations with text-to-speech voice providers such as Fish Audio. Using Fish Audio’s highly expressive voices is as easy as swapping in the Fish Audio client. Pipecat

How to Get Started with Pipecat

For Python, Pipecat’s FishTTSService provides real-time text-to-speech synthesis through Fish Audio’s websocket-based streaming API.

Make sure to install the required dependency: pip install “pipecat-ai[fish]” then set up your Fish Audio account.

You should first sign in to Fish Audio, and then you can either use the default voice, clone your own voice, or choose one from the library. Fish Audio’s voice cloning is the top AI voice cloner, capturing full emotional expressiveness and likeness. It does require at least 10 seconds of audio recording of the voice you are cloning, so to get started even faster you can also find one generated by the community on the Discovery page. Once you have your voice, get your API key from the API console, set it as an environment variable FISH_API_KEY and you’re ready to integrate Fish Audio into Pipecat!

Text-to-Speech Service

Once you have Fish Audio ready, you must create the TTS service and place it in your Pipecat pipeline. It must be positioned correctly to receive text and generate audio frames. Read more at Pipecat’s official documentation here. Pipecat Text-to-Speech Service

And that’s it! Once you have your TTS service ingesting LLM text chunks or direct speech requests and then outputting audio frames, your AI companion is ready to go to use your Fish Audio voice to speak to the user. You can play around with different voices, experiment with system prompting the LLM to produce emotion tags that Fish Audio supports, and even try putting together multiple AI companions to produce complex dialogue.

Create voices that feel real

Start generating the highest quality audio today.

Already have an account? Log in