The anatomy of a voice assistant
In this episode, we welcome Shawn Wen, Co-founder and CTO at PolyAI. Shawn provides an in-depth overview of the AI tech stack essential for developing high-quality AI voice assistants. Inspired by Andreessen Horowitz’s recent publication on AI voice agents, the discussion covers key components of a complex system, including speech recognition, voice activity detection, the application of generative AI models, and integrating these technologies into practical applications.
Shawn also explores the challenges of managing latency, how input affects selected speech recognition models, and the future of end-to-end AI systems. Join us as we unravel the complexities behind creating and optimizing effective voice AI solutions!
Never miss an episode.
"The LLM can make a lot of judgments by itself, but you're not going to let it make the entire business decision for you because that's way too risky, right?"
"You take into account as well the fact that people don't know how to speak to voice technologies. They've been trained to speak to them in keywords or in this really stilted and awkward way."