Back to Browse

FlashAttention V1 Deep Dive By Google Engineer | Fast and Memory-Efficient LLM Training

2.6K views
May 26, 2025
34:38

Slides are available at https://martinisadad.github.io/ Transformers are everywhere in AI and almost all LLMs these days. The secret sauce is the attention mechanism. However it can be super slow and eat up tons of memory, especially for larger and larger models. That's where FlashAttention comes in. It's a game-changer and one of the most important breakthroughs in recent years, making it dramatically faster and more memory-efficient. As a regular normal SWE, I'd like to share this great technique with all of you :) Related Video: Transformer Deep Dive https://youtu.be/TcKJMBZySj0 #ai #llm #transformers #attention #flash #maths #machinelearning 0:00 Intro 0:56 CPU and GPU Memory Hierarchy 4:29 Standard Attention 8:26 Flash Attention Intro 11:31 Softmax Algorithms 14:23 Tiling 16:32 Online Safe Softmax 21:13 Final Forward Pass Algorithm 23:39 Memory IO Analysis 26:47 Backward Pass 34:11 Ending

Download

0 formats

No download links available.

FlashAttention V1 Deep Dive By Google Engineer | Fast and Memory-Efficient LLM Training | NatokHD