Why AI Agents Shouldn't Replace Your Fraud Models
Varant Zanoyan, Co-founder & CEO of Zipline AI and original author of Chronon — the open-source feature platform built at Airbnb that now powers Stripe's charge path, OpenAI's Sora 2 personalization, and Netflix content ranking — explains why AI agents should NOT make high-stakes decisions directly, and what to do instead. This talk introduces "agentic experimentation": a pattern where agents iterate on production ML systems (creating features, training new model versions, deploying to dev) while a human reviews and ships — without ever touching live infrastructure. Varant breaks down the three challenges that kill most agent-on-prod-ML projects: infrastructure sprawl, safety, and reproducibility, and shows how branch-based isolation + semantic hashing + compute reuse make it actually work. Topics covered: - Why fraud detection, search ranking, and underwriting CAN'T tolerate full agentic decisioning - The difference between agents replacing models vs. agents improving models - How Chronon went from Airbnb payments fraud to powering Stripe, OpenAI Sora 2, Netflix, Uber, and Roku - Branch-based resource isolation: keeping agent experiments off production compute - Partial aggregate caching and compute reuse so agents don't blow up your infra bill - Semantic hashing for reproducible agent-generated pipelines - Data isolation without losing cross-team feature sharing - Resource limits as the real organizational guardrail when running 2,000+ experiments - Why agent-written SQL across Spark, Flink, Kafka, and Airflow is unreviewable - The handoff: what an agent should produce so a human can actually ship it to prod For ML engineers, data platform teams, and anyone building agentic systems on top of business-critical pipelines. Links and Resources: - Zipline AI: https://zipline.ai/ - Chronon (open source): https://github.com/airbnb/chronon - Chronon docs: https://chronon.ai/ - Varant Zanoyan on LinkedIn: https://www.linkedin.com/in/vzanoyan/ - Zipline AI $7M seed announcement: https://www.businesswire.com/news/home/20250819568349/en/ - MLOps Community: https://mlops.community/ Timestamps (approximate — adjust on upload): 00:00 Intro: building agents for high-stakes systems 01:20 Chronon origin story at Airbnb payments fraud 03:30 From fraud to search ranking, trust and safety, customer support 05:15 Stripe partnership and going fully open source 06:00 OpenAI Sora 2, Netflix, and the high-stakes use case pattern 07:30 Why full agentic decisioning breaks high-stakes systems 08:45 Agentic experimentation: agents that improve, not replace, models 10:30 What "production ready" actually means for agent output 12:00 Challenge 1: infrastructure sprawl across Spark, Flink, Kubernetes, Airflow 14:00 Chronon's semantic API and infrastructure automation 15:30 Challenge 2: safety and branch-based resource isolation 17:30 Compute reuse via partial aggregate caching 19:30 Shared feature repository and the economics of agent collaboration 21:00 Challenge 3: reproducibility and semantic hashing 22:30 Summary: data foundation for high-stakes agentic workflows 23:30 Q&A: data isolation, the agent layer above Chronon, and scaling to 2,000 experiments #AgenticAI #FeatureStore #AIAgents
Download
0 formatsNo download links available.