How We Solved the GPU Problem for Kubernetes
Most engineers know GPUs are expensive. What's less obvious is that roughly 50% of GPU costs are wasted, and even when your GPUs report 100% utilization, you can fit significantly more onto them without sacrificing performance. GPU utilization is one of the worst metrics for understanding how efficiently your hardware is actually being used. In this episode of 1 IDEA, Suresh Mathew sits down with Pooja Malik, Distinguished Engineer at Sedai, to talk through how Sedai's engineering team built GPU optimization from scratch, why the standard metrics fail, and what's still unsolved. We cover: - Why nvidia-smi GPU utilization measures activity, not efficiency — and what to use instead - Time slicing vs. MIG vs. DRA: what's actually production-ready today - How Sedai built per-application models to validate optimization when no industry standard exists - The unsolved problem: making GPU partitions dynamic at runtime CHAPTERS 00:00 Introduction 01:38 Why GPU utilization is a misleading metric 02:08 The complexity trap: no single metric works 05:17 Why memory bandwidth is the real bottleneck 06:48 Building the measurement algorithm from scratch 09:53 Validating without an industry standard 10:18 How Sedai's feedback loop works 11:27 Time slicing, MIG, DRA, and fractional GPUs 17:29 The unsolved problem: dynamic GPU partitioning 18:43 Autonomy vs. automation
Download
0 formatsNo download links available.