Every formula derived, not memorised.
In this episode we derive Rotary Position Encoding (RoPE) line by line — starting from the constraint that attention scores should depend only on relative position, and arriving at f(q, m) = q · e^{imθ}.
📐 What's covered:
0:00 Introduction & roadmap
1:03 Recap — why replace sinusoidal PE?
3:01 Prerequisites — complex numbers & polar form
5:34 The goal — define f(q, m) so ⟨f(q,m), f(k,n)⟩ = g(q,k,m−n)
7:06 Reduce to ℂ — complex inner product form
8:21 Polar decomposition — split into magnitude & phase
10:18 Solve the magnitude equation → R_f = ‖q‖
11:47 Solve the phase equation → α(m) = mθ
14:29 Real form — 2D rotation matrix & block-diagonal extension to ℝ^d
17:44 Long-range decay & frequency choice θ_i = 10000^{−2i/d}
19:18 Production implementation — element-wise ⊗ trick
21:07 Linear attention — why RoPE works where others can't
📄 Based on: "RoFormer: Enhanced Transformer with Rotary Position Embedding" (Su et al., 2021)
#deepmatharu #RoPE #RotaryPositionEncoding #Transformers #MachineLearning #Math
Download
0 formats
No download links available.
Deep Math Ep. 2: Rotary Position Encoding (RoPE), Derived from First Principles | NatokHD