Back to Browse

Voice AI programming with Gemini² and Cursor

300 views
Jan 21, 2025
2:54

Adrian, co-founder of built a Gemini voice + vision AI agent that writes software indirectly, collaborating with a human and with Gemini running inside @cursor_ai. The setup here is: 1. Prompt the "designer" agent with a project plan. The designer is a voice- and vision-enabled AI agent - Gemini 2.0 running in a @pipecat_ai multimodal pipeline. 2. The designer can see your screen so it always has full, up-to-date context. 3. For each step, the designer prompts the coding agent in Cursor, then pauses for input/feedback from the human. Like a lot of the most mind-blowing things people are building these days, this feels to me both like a glimpse of the future and a hack! Adrian uses the macOS accessibility APIs to glue things together, and manually accepts the changes. I borrowed the "designer" terminology from Adrian's colleague at Canonical AI, Tom Shapland. Tom talks about this pattern as a "designer AI" collaborating with an "engineer AI". (Follow Tom for lots of great pointers to voice AI resources.) https://voice.canonical.chat/ https://x.com/tom_shapland I think of this, also, as a nice example of a multi-agent system with each agent specialized both in terms of task and UI. The voice+vision agent is a realtime, multimodal, conversational scaffolding wrapped around Gemini 2.0. Similarly, the Gemini coding agent is leveraging all of the Cursor framework and capabilities. It's clear that a lot of new AI applications that ship in 2025 will be built as multi-agent systems.

Download

0 formats

No download links available.

Voice AI programming with Gemini² and Cursor | NatokHD