Back to Browse

Attribution Graphs for Dummies - 1. What are Attribution Graphs?

7.8K views
Aug 5, 2025
49:09

Part 2: https://youtu.be/hdi1a9MjwDs An introduction to attribution graphs from Anthropic's Circuit Tracing and Model Biology papers, featuring Jack Lindsey (Anthropic), Emmanuel Ameisen (Anthropic), Tom McGrath (Goodfire AI), and Neel Nanda (Google DeepMind). 0:00 Introduction 2:18 Attribution Graph Orientation 19:10 Analyzing an Attribution Graph from Scratch 40:25 Reflection: What have we Learned? Explore Attribution Graphs: https://neuronpedia.org/graph Blog Post: https://www.neuronpedia.org/graph/info circuit-tracer GitHub: https://github.com/safety-research/circuit-tracer Original Papers by Anthropic - Circuit Tracing: https://transformer-circuits.pub/2025/attribution-graphs/methods.html - Biology of an LLM: https://transformer-circuits.pub/2025/attribution-graphs/biology.html Learn More and Get Involved: - Exploring Gemma Scope: A beginner-friendly guided demo for AI interpretability - https://neuronpedia.org/gemma-scope - MATS: A paid fellowship for doing real, supervised mechanistic interpretability research with no prior experience required - https://www.matsprogram.org - ARENA: A free, guided course on Alignment Research where you can work independently or in-person: https://www.arena.education - SPAR: Part-time, remote fellowship to do 3-month research projects, all experience levels accepted - https://sparai.org

Download

0 formats

No download links available.

Attribution Graphs for Dummies - 1. What are Attribution Graphs? | NatokHD