Back to Browse

How Reasoning Models Break Mechanistic Interpretability Techniques

3.5K views
Nov 24, 2025
42:21

A talk I gave to my MATS 9.0 training program about reasoning model interpretability If this kind of research sounds interesting to you, apply to do research with me in MATS! Due 23 Dec tinyurl.com/neel-mats-app 0:00:00 The Curse of Reasoning Models 0:05:11 Thought Anchors 0:13:00 Probing Faithfulness and Bias 0:16:00 What 'Thinking' Models Actually Learn 0:22:01 Open Research Questions 0:26:00 Circuit Analysis Challenges 0:30:00 Alignment and Control Applications 0:35:14 Future Directions and Q&A

Download

0 formats

No download links available.

How Reasoning Models Break Mechanistic Interpretability Techniques | NatokHD