AI-Powered GPU Kernel Optimization(Mako.dev) + Distributed PyTorch with nbdistributed (Hugging Face)
Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth New book on high-performance co-design of hardware (NVIDIA GPUs, software (PyTorch, vLLM), and algorithms coming next month! Pre-order now and be the first to experience the magic!! https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Inference/dp/B0F47689K8 Talk #1: High-Performance, AI-Powered GPU Kernel and System Optimization Engine by Mohammed Abdelfattah @ Mako.dev In this talk, we explain the high-level architecture for Mako's Automated GPU Kernel and System Optimization Engine, and we will present some case studies of advanced optimizations enabled by our approach. Talk #2: Distributed, Large-Scale PyTorch Training with Hugging Face accelerate and nbdistributed on NVIDIA GPUs by Zachary Mueller @ HuggingFace and Scratch to Scale (Course on Large-Scale Training in the Modern World) In this talk, Zachary will deliver part of his upcoming course called Scratch to Scale: Large-Scale Training in the Modern World using PyTorch, Hugging Face accelerate, and nbdistributed (notebook distributed) for massive, in-notebook training jobs. Enrol now! https://maven.com/walk-with-code/scratch-to-scale Zoom link: https://us02web.zoom.us/j/82308186562 Related Links Github Repo: http://github.com/cfregly/ai-performance-engineering/ O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/ YouTube: https://www.youtube.com/@AIPerformanceEngineering Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
Download
0 formatsNo download links available.