How do you identify the batch size and number of model instances for the optimal inference performance? Triton Model Analyzer is an offline tool that can be used to evaluate 100’s of configurations to meet the latency, throughput & memory requirements of your application.
Get started with model analyzer here: https://github.com/triton-inference-server/model_analyzer
#Triton #Inference #ModelAnalyzer #AI
Download
0 formats
No download links available.
Optimizing Model Deployments with Triton Model Analyzer | NatokHD