Back to Browse

Attention Approximates Sparse Distributed Memory

7.8K views
Oct 20, 2021
27:06

Trenton Bricken, Harvard University Abstract: While Attention has come to be an important mechanism in deep learning, it emerged out of a heuristic process of trial and error, providing limited intuition for why it works so well. Here, we show that Transformer Attention closely approximates Sparse Distributed Memory (SDM), a biologically plausible associative memory model, under certain data conditions. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.

Download

1 formats

Video Formats

360pmp452.3 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Attention Approximates Sparse Distributed Memory | NatokHD