Spark RAPIDS ML: GPU Accelerated Distributed ML in Spark Clusters

Name: Spark RAPIDS ML: GPU Accelerated Distributed ML in Spark Clusters
Uploaded: Jul 23, 2024
Duration: 2280 s

Databricks158K subscribers

1.1K views

Jul 23, 2024

38:00

Spark MLlib is a key component of Apache Spark™ for large-scale machine learning and provides built-in implementations of many popular machine learning algorithms. These implementations were created a decade ago and do not leverage modern computing accelerators like GPUs. In this talk, we present Spark RAPIDS ML (https://github.com/spark-rapids-ml), an open source Python package for enabling GPU acceleration of Spark distributed machine learning applications. It is built upon the proven RAPIDS cuML c++/python-based library (https://github.com/rapidsai/cuml), implementing GPU-accelerated versions of classical ML algorithms for regression, classification, clustering, and dimensionality reduction. For such algorithms also in Spark MLlib, Spark RAPIDS ML provides essentially no-code-change Spark MLlib DataFrame API compatibility. We share benchmark results demonstrating up to 100x speedup and 50x cost savings over baseline Spark MLlib in compute-intensive regimes. Talk By: Erik Ordentlich, Sr. Manager, NVIDIA ; Jinfeng Li, Senior Engineer, Machine Learning, NVIDIA Here's more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data… Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Download

0 formats

No download links available.