Introduction to Primus
This talk highlights AMD leadership in high-performance computing and large-scale AI training. Zhenyu introduces Primus, a modular training stack that accelerates time-to-market: Primus-LM (comprehensive parallelism with compute–communication overlap), Primus-Turbo (general-purpose acceleration with ROCm software on AMD GPUs), and Primus-SaFE (three-phase stability architecture: preflight, inflight, postflight). Benchmarks include a 2T-parameter MoE on 1,024 GPUs with DeepEP, end-to-end performance breakdowns, and DeepSeek V3 weak scaling on torchtitan. He also discusses the AMD Instella model family trained at scale on AMD GPUs and concludes with a full out-of-box lifecycle for training developers. Find the resources you need to develop using AMD products: https://www.amd.com/en/developer.html *** © 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Download
0 formatsNo download links available.