How Will Mech Interp Help Make AGI Safe?

Name: How Will Mech Interp Help Make AGI Safe?
Uploaded: Nov 15, 2025
Duration: 2939 s

Neel Nanda12.6K subscribers

2.1K views

Nov 15, 2025

48:59

This is a talk I gave to my MATS 9.0 trainee scholars about my theories of change for how mechanistic interpretability can help make AGI safe, and how this impacts what research should be done. Notes: https://docs.google.com/document/d/1dKAjGPdKdyemy5rZUI96nYwNDonKfXM6H7p58FF5rcE/edit?usp=sharing 00:00 Why Interpretability? The North Star 05:13 How Interpretability Helps Make Aligned AGI 11:50 What Does 'AI Alignment' Mean? 20:32 Spotting Real Misalignment 28:35 What Happens After We Build AGI? 33:40 What Makes Basic Science Useful? 41:06 Precision vs. Completeness

Download

0 formats

No download links available.