π Description
In this video, I show how to set up Grafana and Prometheus inside a Kubernetes cluster running on a GPU node β using a single one-line installation script β‘
The script automatically:
π§© Creates a full Kubernetes cluster
π§ Installs Helm
π Deploys Prometheus + Grafana
π» Installs the NVIDIA GPU Operator for GPU support
At this stage, the cluster and monitoring stack are fully working β Grafana dashboards are live and ready.
π‘ GPU metrics (via DCGM Exporter) will be added in the next video!
π§ What Youβll Learn
β
How to bootstrap a K8s + monitoring stack with one command
βοΈ Installing Prometheus & Grafana using Helm
π‘ Setting up the NVIDIA GPU Operator
π Accessing Grafana dashboards
π Whatβs next: enabling GPU metrics (DCGM Exporter + ServiceMonitor)
π§© Next Video Preview
In Part 2, weβll connect DCGM Exporter and pull live GPU metrics β
π― Utilization | VRAM | Power | Temperature | Perf State β directly into Grafana!
π» Repo & Scripts
π GitHub: [https://github.com/saujandsre/GPU-VastAI]