Evaluating AI in Production: A Practical Guide

Name: Evaluating AI in Production: A Practical Guide
Uploaded: May 11, 2026
Duration: 1246 s

Databricks Events56 subscribers

33 views

May 11, 2026

20:46

Evaluation is a continuous process rather than a one-time task in the AI development lifecycle. This session provides a technical guide to assessing and improving AI application performance using structured experiments. This talk details the core components of evaluations, including the task, test data, and scorers. It explores specialized evaluation approaches for AI agents, multi-agent coordination, and multimodal data. Additionally, the session covers the roles of engineers and product managers in defining success criteria and using remote evaluations to test changes in live environments. Key Takeaways: • Implementing the evaluation lifecycle from MVP to production • Level-based evaluation for agents: end-to-end, step-level, and trajectory efficiency • Measuring team performance in multi-agent systems via routing and hand-off quality • Using simulated users for realistic assessment of multi-turn conversations Watch Session: https://luma.com/AgenticAIObservability

Download

0 formats

No download links available.