Back to Browse

Inference Workflows

106 views
Nov 5, 2025
1:17:59

In this video, Benoit Côté and Aditya Tanikanti present ALCF Inference service, an HPC-scale Inference-as-a-Service platform that enables dynamic deployment of large language models across GPU clusters. The talk covers its architecture, scheduling framework, and real-world LLM use cases across scientific domains. Thang Pham and Murat Keçeli will discuss LangGraph, a framework for building agentic AI systems, and provide a hands-on tutorial on constructing single-agent and multi-agent workflows. This part will demonstrate how these agents can be built, adapted, and applied across diverse scientific domains to automate research tasks. Following that, a science talk from Postdoctoral Appointee Reet Barik, titled Hybrid Pre-training of large models by leveraging Low-rank adapters. With increasing size of large models, the cost of training them is becoming prohibitively expensive. In this talk, Barik highlight a recent work where we employed a fine-tuning technique in pre-training and reduced the parameter count of a Vision Transformer model to 10% of the original that saved 9 hours of training time while using 64 GPUs while maintaining the accuracy. Chapters 00:00 Inference Workflows 1:04:42 Science Talk: Hybrid Pre-training of large models by leveraging Low-rank adapters

Download

1 formats

Video Formats

360pmp4115.0 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Inference Workflows | NatokHD