12-Month GPU Programming Course From Scratch
join the kernel writing in Skool - https://skool.com/become-ai-researcher-2669/about github (course) - https://github.com/vukrosic/gpu-kernel-engineer-from-scratch This full course teaches GPU programming and CUDA kernel engineering from scratch. You will learn how grids, blocks, threads, warps, memory layout, indexing, strides, memory bandwidth, coalesced access, benchmarking, reductions, shared memory, and warp-level reductions work, then connect those ideas to kernels used in AI engineering. Join Skool: - 7+ hours of from-scratch video courses - math fundamentals, PyTorch, neural networks, transformers, reinforcement learning, LLMs - Every lesson is code-first: you build the thing, not just watch it - Implementation notebooks, exercises, and walkthroughs - Advanced breakdowns that go deeper than the YouTube tutorials - Autonomous AI research systems that run experiments while you sleep - Community of AI researchers: ask questions, share work, get feedback Chapters: 0:00 GPU programming course overview 1:10 One-year GPU kernel engineering roadmap 2:34 Host, device, and CUDA kernels 7:09 Threads, blocks, and grids 13:28 Memory layout, indexing, and strides 23:06 Simple elementwise kernels 26:30 Memory bandwidth and arithmetic intensity 31:27 Contiguous vs strided memory access 33:10 Warps and coalesced reads 36:18 Benchmarking and timing GPU kernels 47:17 Reductions: sum, max, mean, and shape shrink 52:33 Block-level reductions with shared memory 58:10 Warps and warp-level reductions
Download
1 formatsVideo Formats
Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.