Back to Browse

Pipecat Flows - open source Voice AI agent builder

6.7K views
Jan 17, 2025
1:50

Maslow's heirarchy of voice AI . (*) .. ... ..... ....... (*) You are here. Network transport turn detection, interruption handling, natural voices, tool use. James demonstrates how to create a Voice AI agent that can reliably perform complex tasks (using Gemini 2.0 and Pipecat Flows). Demo repo with the code in the video above is here: https://github.com/pipecat-ai/pipecat-flows/blob/main/examples/static/movie_explorer_gemini.py The low-level building blocks you need for a good voice AI agent are: 1. Network transport - a reliable, low-latency connection between the user and the agent. 2. Turn detection - determining when the user is finished talking (and expects the agent to respond). 3. Interruption handling - gracefully stopping the agent's current response when the user interrupts. All three of these are complicated. Like everything we're all working on in AI in 2025, implementations will continue to improve. But today there are very good open source implementations of all three, and voice AI agents are being deployed at scale for use cases like customer support. 4. Natural voices. We have these, too, today from companies like @cartesia, @elevenlabs, @google, @openai, and @rime. (I put that list in alphabetical order, to try to be fair! All these teams are great, and the models all have slightly different strengths/weaknesses.) I expect more progress in voice in 2025. Especially for languages other than English, in the open source domain, as part of speech-to-speech models, and for on-device use cases. But natural voices are here. 5. Tool use. Tool use is the newest building block for voice AI to hit the 80/20 "good enough to build really good stuff" tipping point. It turns out that tools are a big, big unlock. Voice AI agents being built today leverage tool use very, very heavily. - interacting with back-end systems - interacting with telephony stacks - RAG (and context manipulation in general) - search/grounding - guardrails and safety - script following That last one is really important. It turns out that you can increase the reliability of a voice AI agent a lot by breaking a complex workflow down into smaller pieces. But after you split the workflow up into smaller stages, how do you transition between the stages? Well, the best LLMs today are quite good at figuring out how to call a function to perform an action. The transitions between the states are function calls! When you transition between states, you usually do one or all of: - change the system instructions - summarize or restructure the context - modify the tools list Pipecat Flows provides abstractions for all of this. You can define a flow and use it directly in a @pipecat_ai pipeline. And there's a GUI editor for building flows! Pipecat Flows is open source, easy to extend and customize, and completely open source. Clone it, play with it, and submit any PRs that would make it better for your use case.

Download

0 formats

No download links available.

Pipecat Flows - open source Voice AI agent builder | NatokHD