Back to Browse

Why Rotating Vectors Solves Positional Encoding in Transformers | Rotary Positional Embeddings(ROPE)

4.0K views
Jan 13, 2026
23:06

Rotary Positional Embeddings (RoPE) explained from first principles. This video covers how transformers encode relative positional information using rotation, dot products, and attention, and how RoPE works mathematically. Unlike absolute positional encodings, Rotary Positional Embeddings allow transformers to reason about relative distance between tokens, which is crucial for long-context models and large language models. We start by building intuition around relative positional information, then carefully derive how RoPE uses rotations to inject relative position into attention scores. From there, we generalize RoPE to d-dimensional embeddings and analyze how factors like base angles, frequency scaling parameters, and relative distance affect attention behavior. ⏱️ Timestamps 00:00 In this video 00:40 What and Why of Relative Positional Information 04:29 2D Rotation Review 06:40 Rotary Position Embeddings(ROPE) Explained 11:00 ROPE beyond 2D 13:29 Why & How Rotary Positional Encodings Work 📖 Resources ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING - https://arxiv.org/pdf/2104.09864 ROUND AND ROUND WE GO! WHAT MAKES ROTARY POSITIONAL ENCODINGS USEFUL?- https://arxiv.org/pdf/2410.06205 🔔 Subscribe : https://tinyurl.com/exai-channel-link Email - [email protected]

Download

1 formats

Video Formats

360pmp422.8 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Why Rotating Vectors Solves Positional Encoding in Transformers | Rotary Positional Embeddings(ROPE) | NatokHD