2. PySpark coding Databricks Scenario Based Question | Interview Codding Question 2025

Name: 2. PySpark coding Databricks Scenario Based Question | Interview Codding Question 2025
Uploaded: Feb 19, 2025
Duration: 433 s

The Data Engineering Edge625 subscribers

2.5K views

Feb 19, 2025

7:13

PySpark coding interview question: find students with the same marks in math and chemistry In this video, we tackle a PySpark coding interview question commonly asked in Data Engineering interviews. We will: ✅ Generate a dataset of students with marks in multiple subjects. ✅ Implement an optimized PySpark solution to find students having the same marks in Math and Chemistry. ✅ Use window functions (LAG) and filtering for better performance, avoiding expensive operations like `pivot()`. ✅ Discuss optimization techniques to improve PySpark query performance. This question is often asked in interviews at companies like PWC, KPIT, Accenture, and more! 📌 Code Used in the Video: from pyspark.sql import SparkSession from pyspark.sql.window import Window from pyspark.sql.functions import col, lag # Create DataFrame data = [ (101, "Alice", "Math", 85), (101, "Alice", "Chemistry", 85), (101, "Alice", "Physics", 78), (102, "Bob", "Math", 90), (102, "Bob", "Physics", 88), (103, "Charlie", "Math", 75), (103, "Charlie", "Chemistry", 75), (104, "David", "Math", 88), (104, "David", "Chemistry", 88), (104, "David", "Physics", 95), (105, "Eve", "Chemistry", 91), ] columns = ["Student_ID", "Name", "Subject", "Marks"] df = spark.createDataFrame(data, columns) df.display() 📌 **Subscribe for More PySpark Interview Questions!** #pyspark #databricks #dataengineering #interviewquestions #bigdata #spark #azuredataengineer #hadooptutorial #interviewquestionsandanswers #accentureinterview #kpittech

Download

1 formats

Video Formats

360pmp49.0 MB

Download

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.