FlashAttention V1 Deep Dive By Google Engineer | Fast and Memory-Efficient LLM Training

Name: FlashAttention V1 Deep Dive By Google Engineer | Fast and Memory-Efficient LLM Training
Uploaded: May 26, 2025
Duration: 2078 s

Martin Is A Dad2.91K subscribers

2.6K views

May 26, 2025

34:38

Slides are available at https://martinisadad.github.io/ Transformers are everywhere in AI and almost all LLMs these days. The secret sauce is the attention mechanism. However it can be super slow and eat up tons of memory, especially for larger and larger models. That's where FlashAttention comes in. It's a game-changer and one of the most important breakthroughs in recent years, making it dramatically faster and more memory-efficient. As a regular normal SWE, I'd like to share this great technique with all of you :) Related Video: Transformer Deep Dive https://youtu.be/TcKJMBZySj0 #ai #llm #transformers #attention #flash #maths #machinelearning 0:00 Intro 0:56 CPU and GPU Memory Hierarchy 4:29 Standard Attention 8:26 Flash Attention Intro 11:31 Softmax Algorithms 14:23 Tiling 16:32 Online Safe Softmax 21:13 Final Forward Pass Algorithm 23:39 Memory IO Analysis 26:47 Backward Pass 34:11 Ending

Download

0 formats

No download links available.