Science of Misalignment

Name: Science of Misalignment
Uploaded: Nov 22, 2025
Duration: 2975 s

Neel Nanda12.6K subscribers

1.7K views

Nov 22, 2025

49:35

If a future model were to be dangerously misaligned, could we tell? If this kind of research sounds interesting to you, apply to do research with me in MATS! Due 23 Dec tinyurl.com/neel-mats-app 00:00:00 The Problem with Viral Demos 00:06:49 Hunting for "Eval Awareness" 00:17:00 Debunking the Shutdown Demo 00:24:00 Why Do Models Blackmail 00:31:33 A New Tool: The Resilience Score 00:32:30 The Science of Misalignment 00:35:45 How to Convince Skeptics? 00:47:00 The Future of AI Psychology

Download

0 formats

No download links available.