Back to Browse

SageMaker Inference Components: Deploying Multiple LLMs on One Endpoint

403 views
Mar 12, 2025
20:57

In this video we continue the SageMaker Inference series playlist and specifically explore SageMaker Inference Components. We leverage this feature to efficiently deploy multiple LLMs on a singular SageMaker Real-Time Endpoint. Prerequisites/Good To Know - LLM Hosting Options on SageMaker: https://www.youtube.com/watch?v=ofh1Z4aW8Qk&t=453s Video Resources - Inference Components Blog: https://medium.com/towards-data-science/hosting-multiple-llms-on-a-single-endpoint-32eda0201832?sk=7095f78b4dbc7c19b0b338a88eeaeb96 - Notebook: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/SM-Inference-Video-Series/Part7-Multi-LLM-IC-Dept Timestamps 0:00 Introduction 1:30 Sample Multi-LLM Use-Cases 3:00 Why not MME 5:09 What are ICs 11:20 Notebook Walkthrough #aws #sagemaker #llms #generativeai #machinelearning #cloudcomputing

Download

0 formats

No download links available.

SageMaker Inference Components: Deploying Multiple LLMs on One Endpoint | NatokHD