Attention Approximates Sparse Distributed Memory

Name: Attention Approximates Sparse Distributed Memory
Uploaded: Oct 20, 2021
Duration: 1626 s

MITCBMM61.3K subscribers

7.8K views

Oct 20, 2021

27:06

Trenton Bricken, Harvard University Abstract: While Attention has come to be an important mechanism in deep learning, it emerged out of a heuristic process of trial and error, providing limited intuition for why it works so well. Here, we show that Transformer Attention closely approximates Sparse Distributed Memory (SDM), a biologically plausible associative memory model, under certain data conditions. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.

Download

1 formats

Video Formats

360pmp452.3 MB

Download

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.