AI Performance Engineering

13.1K subscribers

190 videos

View on YouTube

Latest Videos

55:21

Optimizing AI Inference for Heterogeneous Clusters by Natalie Serrino, Founder @ Gimlet Labs

AI Performance Engineering

886 views·3 weeks ago

1:16:45

NVIDIA GTC 2026 Conf Recap + Inference Engines + Scaling Disagg Prefill-Decode + RadixAttention

AI Performance Engineering

570 views·1 month ago

1:16:07

OpenClawMCP for AI Systems Performance Tuning + NVFP4 Low Precision AI System Optimizations

AI Performance Engineering

858 views·2 months ago

1:01:19

Mastering Nvidia Nsight GPU Profiling

AI Performance Engineering

2.1K views·2 months ago

54:35

Neurips 2025 AI Systems Recap by Chris Fregly

AI Performance Engineering

791 views·5 months ago

1:05:55

Advanced and Accelerated Data Curation + Visualizations for LLMs with NVIDIA CuML, DBSCAN, and tSNE

AI Performance Engineering

209 views·5 months ago

14:46

Automated Browser Use with Amazon AGI by Antje Barth

AI Performance Engineering

128 views·5 months ago

1:39:13

Speed of Light Inference w NVIDIA + AMD GPUs and Modular by Abdul Dakkak, Head of Gen AI @ Modular

AI Performance Engineering

1.1K views·6 months ago

1:30:36

AI-Powered GPU Kernel Optimization(Mako.dev) + Distributed PyTorch with nbdistributed (Hugging Face)

AI Performance Engineering

1.2K views·6 months ago

1:04:49

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + PyTorchCUDA Performance with Luminal

AI Performance Engineering

1.2K views·8 months ago

7:38

Auto Optimizer - PyTorch Code Optimizer

AI Performance Engineering

285 views·8 months ago

1:22:21

Maximize LLM Inference Performance + Auto-ProfileOptimize PyTorchCUDA Code

AI Performance Engineering

1.7K views·8 months ago

1:25:13

DynamicAdaptive RL-based Inference CUDA Kernel Optimization +Accelerated PyTorch +Modular MojoMAX

AI Performance Engineering

926 views·9 months ago

1:22:57

AI Agent Inference Performance Optimizations + vLLM vs. SGLang vs. TensorRT w Charles Frye (Modal)

AI Performance Engineering

2.2K views·11 months ago

1:32:34

PyTorch Data Loader Tuning + GPU Cross-Architecture Optimizations CUDA and AMD

AI Performance Engineering

839 views·11 months ago

1:11:39

Nvidia GTC 2025 Recap + PyTorch Model Tuning +AI Systems Performance Engineering Tips

AI Performance Engineering

2.0K views·1 year ago

1:23:30

GPUs @ KubeCon 2024 + New DeepLearning.ai Data Engineering Course + LLMs with Amazon EKSRay Serve

AI Performance Engineering

743 views·1 year ago

1:21:34

Quantum Artificial General Intelligence (AGI) + Multi-Modal Chatbot + Karini GenAI SaaS Startup!

AI Performance Engineering

787 views·1 year ago

0:46

SummarizeMe Alpha Project Meeting

AI Performance Engineering

267 views·1 year ago

1:00:32

Hands-on with Devin.Al and Crew.AI + Text-to-Video GenAI Pipelines with Vikit.AI

AI Performance Engineering

814 views·1 year ago

2:13:20

Segment Anything Model 2 +Building XR Applications +Code-Savvy Assistants +Multi-Modal RAG Embedding

AI Performance Engineering

530 views·1 year ago

1:05:41

Multi-Modal RAG - Segment Anything Model v2, ImageBind, Embeddings, Fine-Tuning

AI Performance Engineering

933 views·1 year ago

1:24:37

Multi-Modal RAG w Milvus + Multi-Agent Optimization w DSPy + LLM Distillation

AI Performance Engineering

1.2K views·1 year ago

1:06:43

From RLHF with PPODPO to ORPO + How to build ORPO on TrainiumNeuron SDK

AI Performance Engineering

946 views·1 year ago

7:23

Databricks Data + AI Summit 2024 Highlights - LLM Performance, Tracing, Debugging, SQL

AI Performance Engineering

799 views·1 year ago

5:02

Apple Reveals Foundation Model Details Datasets, Frameworks, and Evaluation Benchmarks!

AI Performance Engineering

651 views·1 year ago

1:08:11

Mistral AI Updates incl Mixtral 8x22B + OpenLLMetry Evaluation Optimization

AI Performance Engineering

598 views·1 year ago

1:02:06

Anthropic 2024 Updates including Claude 3 + GenAI Observability and LLM Evaluation with Truera

AI Performance Engineering

1.2K views·2 years ago

1:05:45

Nvidia GTC 2024 Recap + Generative AI Live Demo w Nvidia Jetson Edge GPU Device + Nvidia H200, B200

AI Performance Engineering

450 views·2 years ago

1:00:13

Advanced RAG by Jay Alammar (Cohere) + Parameter-Efficient Fine-Tuning (PEFT)

AI Performance Engineering

2.9K views·2 years ago

Load More Videos

Latest Videos

Optimizing AI Inference for Heterogeneous Clusters by Natalie Serrino, Founder @ Gimlet Labs

NVIDIA GTC 2026 Conf Recap + Inference Engines + Scaling Disagg Prefill-Decode + RadixAttention

OpenClawMCP for AI Systems Performance Tuning + NVFP4 Low Precision AI System Optimizations

Mastering Nvidia Nsight GPU Profiling

Neurips 2025 AI Systems Recap by Chris Fregly

Advanced and Accelerated Data Curation + Visualizations for LLMs with NVIDIA CuML, DBSCAN, and tSNE

Automated Browser Use with Amazon AGI by Antje Barth

Speed of Light Inference w NVIDIA + AMD GPUs and Modular by Abdul Dakkak, Head of Gen AI @ Modular

AI-Powered GPU Kernel Optimization(Mako.dev) + Distributed PyTorch with nbdistributed (Hugging Face)

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + PyTorchCUDA Performance with Luminal

Auto Optimizer - PyTorch Code Optimizer

Maximize LLM Inference Performance + Auto-ProfileOptimize PyTorchCUDA Code

DynamicAdaptive RL-based Inference CUDA Kernel Optimization +Accelerated PyTorch +Modular MojoMAX

AI Agent Inference Performance Optimizations + vLLM vs. SGLang vs. TensorRT w Charles Frye (Modal)

PyTorch Data Loader Tuning + GPU Cross-Architecture Optimizations CUDA and AMD

Nvidia GTC 2025 Recap + PyTorch Model Tuning +AI Systems Performance Engineering Tips

GPUs @ KubeCon 2024 + New DeepLearning.ai Data Engineering Course + LLMs with Amazon EKSRay Serve

Quantum Artificial General Intelligence (AGI) + Multi-Modal Chatbot + Karini GenAI SaaS Startup!

SummarizeMe Alpha Project Meeting

Hands-on with Devin.Al and Crew.AI + Text-to-Video GenAI Pipelines with Vikit.AI

Segment Anything Model 2 +Building XR Applications +Code-Savvy Assistants +Multi-Modal RAG Embedding

Multi-Modal RAG - Segment Anything Model v2, ImageBind, Embeddings, Fine-Tuning

Multi-Modal RAG w Milvus + Multi-Agent Optimization w DSPy + LLM Distillation

From RLHF with PPODPO to ORPO + How to build ORPO on TrainiumNeuron SDK

Databricks Data + AI Summit 2024 Highlights - LLM Performance, Tracing, Debugging, SQL

Apple Reveals Foundation Model Details Datasets, Frameworks, and Evaluation Benchmarks!

Mistral AI Updates incl Mixtral 8x22B + OpenLLMetry Evaluation Optimization

Anthropic 2024 Updates including Claude 3 + GenAI Observability and LLM Evaluation​ with Truera

Nvidia GTC 2024 Recap + Generative AI Live Demo w Nvidia Jetson Edge GPU Device + Nvidia H200, B200

Advanced RAG by Jay Alammar (Cohere) + Parameter-Efficient Fine-Tuning (PEFT)

Anthropic 2024 Updates including Claude 3 + GenAI Observability and LLM Evaluation with Truera