Paired Error Analysis With AI Agents
Join the AI Evals September 2026 cohort: https://maven.com/parlance-labs/evals?promoCode=yt-2026 Isaac Flath is a builder, educator, and ML practitioner. He's currently at SpecStory (which is the tool he's demoing in the video) where he's been building for about six weeks, and this is its first public showing. In this session, we build an annotation app from scratch on a dataset neither of us had seen before and find real product failures in a travel chatbot in under 20 minutes, just from looking at the data. Timestamps: 00:00 Introduction: Isaac unveils Specs Story 00:36 How the tool works: video call + shared docs + coding agent in one place 01:38 What they're building today: a live annotation app for evals 04:23 Building an annotation app from scratch on an unknown dataset 07:10 Adding visualizations to understand the data shape 08:55 Visualizing how users move through the chatbot 11:40 Reading the transition data: where does this product actually break down? 16:22 Adding annotation controls and live notes 19:54 What error analysis looks like when done right 24:22 Where multiplayer coding is most valuable (and who it's really for) Connect with Hamel: Website ► https://hamel.dev LinkedIn ► https://www.linkedin.com/in/hamelhusain/ Twitter/X ► https://x.com/hamelhusain Instagram ► https://www.instagram.com/hamelsmu/ Tik Tok ► https://www.tiktok.com/@hamel_husain Connect with Isaac Flath: LinkedIn ► https://www.linkedin.com/in/isaacflath Twitter/X ► https://x.com/isaac_flath YouTube ► https://www.youtube.com/@IsaacFlath
Download
0 formatsNo download links available.