Back to Browse

PySpark Full Course using Azure Databricks | Spark SQL and DataFrames

493 views
Oct 9, 2024
1:02:31

In this 1 hour PySpark course, we'll learn everything there is to know about PySpark Spark SQL and DataFrames. We will start with PySpark's introduction and architecture. We will also briefly look at what the Azure Databricks platform is - which is what we will be using in this course. We will learn different data manipulation techniques - creating new columns, renaming existing columns, filtering dataset to create a new subset etc. We will look at the differences between Lazy Evaluation and Eager Evaluation in DataFrames. ---------------------------------------------- If you like this video, please support me in making more of these: https://buymeacoffee.com/algometica ---------------------------------------------- Here is everything we will learn: 00:00 - PySpark Intro and Architecture 05:16 - Overview of Azure Databricks environment 07:50 - What is Azure Databricks (Optional) 12:24 - Initializing SparkSession 13:57 - PySpark DataFrame 17:02 - Lazy Evaluation 17:24 - Transformations and Actions 18:56 - DataFrame Schema 20:54 - create PySpark DataFrame from Pandas DataFrame 24:32 - Viewing Data 26:06 - Eager Evaluation 27:40 - View Column Names 28:05 - Count Columns in dataset 28:42 - collect and take action 31:26 - Rename a column 32:45 - Add a new column 35:00 - Selecting Columns 35:33 - Filtering Rows 36:52 - GroupBy 42:39 - Writing to and Reading from external file types 46:53 - Working with SQL in PySpark 49:14 - UDFs 51:13 - Course Project 01:01:40 - Outro and future announcements https://algometica.com (under construction) - stay tuned for Data Engineering courses.

Download

0 formats

No download links available.

PySpark Full Course using Azure Databricks | Spark SQL and DataFrames | NatokHD