STATE-Bench - Memory-agnostic Benchmark
STATE-Bench (Stateful Task Agent Evaluation Benchmark): an open-source, memory-agnostic benchmark STATE-Bench is a new open-source benchmark designed to measure whether memory actually improves AI agent performance on realistic, stateful enterprise tasks. Instead of testing simple recall, it evaluates how agents handle procedural workflows, reliability across repeated runs, efficiency, and user experience in domains like customer support, travel, and shopping. In this episode, we’ll explore why traditional memory benchmarks fall short, how STATE-Bench closes that gap, and what it means to “bring your own memory” to a benchmark built for production readiness. ✅ Chapters: 00:00 What's project STATE Bench 03:45 Why this benchmark is different 13:06 How it works 18:57 What's Next and How to Contribute 20:58 Final statements ✅ Resources: GitHub Repo: https://github.com/microsoft/STATE-Bench Using Microsoft Agent Framework with Foundry managed memory: https://youtu.be/DZn9bNDEs4U?si=IV2itRlRjMXPYQl8 Short link for this video: https://aka.ms/memory-benchmark 📌 Let's connect: Jorge Arteiro | https://www.linkedin.com/in/jorgearteiro Lewis Liu | https://www.linkedin.com/in/lewisxl/ Pablo Castro | https://www.linkedin.com/in/pabloc/ Nishant Yadav | https://www.linkedin.com/in/nisyad/ Subscribe to the Open at Microsoft: https://aka.ms/OpenAtMicrosoft Open at Microsoft Playlist: https://aka.ms/OpenAtMicrosoftPlaylist 📝Submit Your OSS Project for Open at Microsoft https://aka.ms/OpenAtMsCFP New episode on Tuesdays!
Download
0 formatsNo download links available.