Ready to move beyond single-GPU limits and master distributed systems? Join us for a webinar where ML and platform engineers will explore how to scale model training from a single node to a massive cluster using PyTorch and Ray.
In this virtual session you will learn:
- What is distributed Training ? And do we need it ?
- Introduction to Distributed Data Parallel (DDP)
- Utilize advanced DDP techniques with ZeRO-1, ZeRO-2, ZeRO-3, and FSDP.
- Introduction to Ray and how you can use Ray Train to train models at scale
- Training a model at scale using Ray Train and PyTorch at scale
This session is more than a demo. You’ll leave with a working understanding of Ray, a reusable project you can build on, and a clear view of how Ray and Anyscale work together to accelerate AI development.
Download
0 formats
No download links available.
Webinar: Getting Started with Distributed Training at Scale | NatokHD