In this video, Benoit Côté and Aditya Tanikanti present ALCF Inference service, an HPC-scale Inference-as-a-Service platform that enables dynamic deployment of large language models across GPU clusters. The talk covers its architecture, scheduling framework, and real-world LLM use cases across scientific domains.
Thang Pham and Murat Keçeli will discuss LangGraph, a framework for building agentic AI systems, and provide a hands-on tutorial on constructing single-agent and multi-agent workflows. This part will demonstrate how these agents can be built, adapted, and applied across diverse scientific domains to automate research tasks.
Following that, a science talk from Postdoctoral Appointee Reet Barik, titled Hybrid Pre-training of large models by leveraging Low-rank adapters. With increasing size of large models, the cost of training them is becoming prohibitively expensive. In this talk, Barik highlight a recent work where we employed a fine-tuning technique in pre-training and reduced the parameter count of a Vision Transformer model to 10% of the original that saved 9 hours of training time while using 64 GPUs while maintaining the accuracy.
Chapters
00:00 Inference Workflows
1:04:42 Science Talk: Hybrid Pre-training of large models by leveraging Low-rank adapters