Deep Math Ep. 2: Rotary Position Encoding (RoPE), Derived from First Principles

Name: Deep Math Ep. 2: Rotary Position Encoding (RoPE), Derived from First Principles
Uploaded: May 7, 2026
Duration: 1506 s

numbmath4 subscribers

14 views

May 7, 2026

25:06

Every formula derived, not memorised. In this episode we derive Rotary Position Encoding (RoPE) line by line — starting from the constraint that attention scores should depend only on relative position, and arriving at f(q, m) = q · e^{imθ}. 📐 What's covered: 0:00 Introduction & roadmap 1:03 Recap — why replace sinusoidal PE? 3:01 Prerequisites — complex numbers & polar form 5:34 The goal — define f(q, m) so ⟨f(q,m), f(k,n)⟩ = g(q,k,m−n) 7:06 Reduce to ℂ — complex inner product form 8:21 Polar decomposition — split into magnitude & phase 10:18 Solve the magnitude equation → R_f = ‖q‖ 11:47 Solve the phase equation → α(m) = mθ 14:29 Real form — 2D rotation matrix & block-diagonal extension to ℝ^d 17:44 Long-range decay & frequency choice θ_i = 10000^{−2i/d} 19:18 Production implementation — element-wise ⊗ trick 21:07 Linear attention — why RoPE works where others can't 📄 Based on: "RoFormer: Enhanced Transformer with Rotary Position Embedding" (Su et al., 2021) #deepmatharu #RoPE #RotaryPositionEncoding #Transformers #MachineLearning #Math

Download

0 formats

No download links available.