In this video we continue the SageMaker Inference series playlist and specifically explore SageMaker Inference Components. We leverage this feature to efficiently deploy multiple LLMs on a singular SageMaker Real-Time Endpoint.
Prerequisites/Good To Know
- LLM Hosting Options on SageMaker: https://www.youtube.com/watch?v=ofh1Z4aW8Qk&t=453s
Video Resources
- Inference Components Blog: https://medium.com/towards-data-science/hosting-multiple-llms-on-a-single-endpoint-32eda0201832?sk=7095f78b4dbc7c19b0b338a88eeaeb96
- Notebook: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/SM-Inference-Video-Series/Part7-Multi-LLM-IC-Dept
Timestamps
0:00 Introduction
1:30 Sample Multi-LLM Use-Cases
3:00 Why not MME
5:09 What are ICs
11:20 Notebook Walkthrough
#aws #sagemaker #llms #generativeai #machinelearning #cloudcomputing
Download
0 formats
No download links available.
SageMaker Inference Components: Deploying Multiple LLMs on One Endpoint | NatokHD