Deep Math Ep. 1- Why Transformers Use Sinusoidal Positional Encoding?

Name: Deep Math Ep. 1- Why Transformers Use Sinusoidal Positional Encoding?
Uploaded: May 3, 2026
Duration: 687 s

numbmath5 subscribers

96 views

May 3, 2026

11:27

Why do transformers use sinusoidal positional encoding? In this video, we derive sinusoidal positional encoding from first principles. We start with the core problem: self-attention is blind to token order. Then we build the mathematical requirements for a useful positional encoding: boundedness, injectivity, and relative-position awareness. From there, we connect dot products, characteristic functions, complex exponentials, Euler’s formula, and geometric frequency scales to arrive at the original transformer positional encoding formula. Topics covered: Why self-attention needs positional information Why appending raw position indices fails Bounded and injective positional encodings Relative position through inner products Characteristic functions and complex exponentials Why sine and cosine appear Why transformer positional encoding uses multiple frequencies Why the base 10000 appears in the formula why use sine and cosine in transformers sinusoidal positional encoding explained transformer positional encoding why transformers need positional encoding sine cosine positional encoding

Download

0 formats

No download links available.