Back to Browse

Introduction to Primus

141 views
Nov 10, 2025
20:43

This talk highlights AMD leadership in high-performance computing and large-scale AI training. Zhenyu introduces Primus, a modular training stack that accelerates time-to-market: Primus-LM (comprehensive parallelism with compute–communication overlap), Primus-Turbo (general-purpose acceleration with ROCm software on AMD GPUs), and Primus-SaFE (three-phase stability architecture: preflight, inflight, postflight). Benchmarks include a 2T-parameter MoE on 1,024 GPUs with DeepEP, end-to-end performance breakdowns, and DeepSeek V3 weak scaling on torchtitan. He also discusses the AMD Instella model family trained at scale on AMD GPUs and concludes with a full out-of-box lifecycle for training developers. Find the resources you need to develop using AMD products: https://www.amd.com/en/developer.html *** © 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Download

0 formats

No download links available.

Introduction to Primus | NatokHD