This video explores the paper "Tensor Product Attention Is All You Need". https://arxiv.org/html/2501.06425v7. I compare traditional multihead attention with tensor product attention architecture with an example.
No download links available.