Dask Tutorial: In-Depth Tutorial from Dask Community Leader Jacob Tomlinson
This is a 90-minute Dask tutorial covering the basics of using Dask, from Dask community leader Jacob Tomlinson. The materials are available at https://github.com/jacobtomlinson/dask-video-tutorial-2020 A transcript of the chat is available at https://gist.github.com/mrocklin/0553cdc745b2210179d793d7232af382 Dask is a parallel computing library that scales the existing Python ecosystem. This Dask tutorial will introduce Dask and parallel data analysis more generally. Dask can scale down to your laptop and up to a cluster. Here, we’ll use an environment you setup on your laptop to analyze medium sized datasets in parallel locally. Dask provides multi-core and distributed parallel execution on larger-than-memory datasets. We can think of Dask at a high and a low level - High-level collections: Dask provides high-level Array, Bag, and DataFrame collections that mimic NumPy, lists, and Pandas but can operate in parallel on datasets that don’t fit into memory. Dask’s high-level collections are alternatives to NumPy and Pandas for large datasets. - Low-level schedulers: Dask provides dynamic task schedulers that execute task graphs in parallel. These execution engines power the high-level collections mentioned above but can also power custom, user-defined workloads. These schedulers are low-latency (around 1ms) and work hard to run computations in a small memory footprint. Dask’s schedulers are an alternative to direct use of threading or multiprocessing libraries in complex cases or other task scheduling systems like Luigi or iPython parallel. Different users operate at different levels but it is useful to understand both. Share your feedback on this Dask tutorial with us in the comments and let us know: - Did you find this Dask tutorial helpful? - Have you used Dask before? Learn more at dask.org KEY MOMENTS 00:00:00 Intro 00:02:41 General Overview of Dask 00:04:07 Dask Natively Scales Python 00:07:41 Dask Dataframe 00:15:47 Dask History 00:20:37 Lab: Analytics Exercises 00:30:14 Lab: Exercise Solution 00:34:37 Dask Dashboard GUIs: Workers, Tasks, and Memory 00:40:07 Dask Array 00:48:57 Dask ML 00:54:29 Optional Hands-on ML Lab 01:07:01 ML Lab Solution 01:11:08 Dask Bags, Futures, and Bonus Features 01:22:20 Dask Distributed 01:32:10 Best Practices and Wrap-up
Download
0 formatsNo download links available.