This video explains in detail using PySpark Programming the following concepts:
What is Spark RDD?
Characteristics of Spark RDD - Distributed, Resilient, Immutable, and Fault-Tolerant
Resilient characteristic of RDD by simulating worker node failure
Transformations, Action, and Lazy Evaluation
DAG and lineage of transformations on Spark UI
We have made use of a 3 node spark cluster, PySpark, and Jupyter Notebook in this video to explain the concepts better.
Kindly share, like, and post your comments for further improvements.
Regards,
Team K2 Analytics
www.k2analytics.co.in