Back to Browse

5. kpmg pyspark interview question & answer | databricks scenario based interview question & answer

21.4K views
Jan 27, 2024
13:31

#Databricks #PysparkInterviewQuestions #deltalake Azure Databricks #spark #pyspark #azuredatabricks #azure In this video, I discussed KPMG PySpark scenario based interview questions and answers. PySpark advanced interview questions answers? databricks interview questions and answers? kpmg pyspark interview questions and answers? Create dataframe: ====================================================== #Employees Salary info data1=[(100,"Raj",None,1,"01-04-23",50000), (200,"Joanne",100,1,"01-04-23",4000),(200,"Joanne",100,1,"13-04-23",4500),(200,"Joanne",100,1,"14-04-23",4020)] schema1=["EmpId","EmpName","Mgrid","deptid","salarydt","salary"] df_salary=spark.createDataFrame(data1,schema1) display(df_salary) #department dataframe data2=[(1,"IT"), (2,"HR")] schema2=["deptid","deptname"] df_dept=spark.createDataFrame(data2,schema2) display(df_dept) ----------------------------------------------------------------------------------------------------------------------- df=df_salary.withColumn('Newsaldt',to_date('salarydt','dd-MM-yy')) display(df) --------------------------------------------------------------------------------------------------------------------- from pyspark.sql.functions import col df1=df.join(df_dept,['deptid']) #display(df1) df2=df1.alias('a').join(df1.alias('b'),col('a.Mgrid')==col('b.EmpId'),'left').select( col('a.deptname'), col('b.EmpName').alias('ManagerName'), col('a.EmpName'), col('a.Newsaldt'), col('a.salary') ) display(df2) ------------------------------------------------------------------------------------------------------------------- from pyspark.sql.functions import year,month df3=df2.groupBy('deptname','ManagerName','EMpName',year('Newsaldt').alias('Year'),date_format('Newsaldt','MMMM').alias('Month')).sum('salary') display(df3) ============================================================ Learn PySpark, an interface for Apache Spark in Python. PySpark is often used for large-scale data processing and machine learning. Azure data factory tutorial playlist: https://youtube.com/playlist?list=PLNRxk1s77zfjX_3ktp5sKsOh4Q2cWMMDX ADF interview question & answer: https://youtube.com/playlist?list=PLNRxk1s77zfgXfQKyScXtbn2MdFkvJtgH 1. pyspark introduction | pyspark tutorial for beginners | pyspark tutorial for data engineers: https://youtu.be/hBDLfBILAuQ 2. what is dataframe in pyspark | dataframe in azure databricks | pyspark tutorial for data engineer: https://youtu.be/VNNlNlVKn98 3. How to read write csv file in PySpark | Databricks Tutorial | pyspark tutorial for data engineer: https://youtu.be/9kwxwCww4zI 4. Different types of write modes in Dataframe using PySpark | pyspark tutorial for data engineers: https://youtu.be/-0_LkRtD3Bo 5. read data from parquet file in pyspark | write data to parquet file in pyspark: https://youtu.be/B6wrbfLbaX0 6. datatypes in PySpark | pyspark data types | pyspark tutorial for beginners: https://youtu.be/LqTUjOOHwQU 7. how to define the schema in pyspark | structtype & structfield in pyspark | Pyspark tutorial: https://youtu.be/SqDlX_B7NmI 8. how to read CSV file using PySpark | How to read csv file with schema option in pyspark: https://youtu.be/s1HHtTVg9xU 9. read json file in pyspark | read nested json file in pyspark | read multiline json file: https://youtu.be/dOkPf_zVqaw 10. add, modify, rename and drop columns in dataframe | withcolumn and withcolumnrename in pyspark: https://youtu.be/2SzrgwVhsy0 11. filter in pyspark | how to filter dataframe using like operator | like in pyspark: https://youtu.be/4Hk8xmDPFZA 12. startswith in pyspark | endswith in pyspark | contains in pyspark | pyspark tutorial: https://youtu.be/8Bep9kk4JB8 13. isin in pyspark and not isin in pyspark | in and not in in pyspark | pyspark tutorial: https://youtu.be/bY86Et-uIcA 14. select in PySpark | alias in pyspark | azure Databricks #spark #pyspark #azuredatabricks #azure https://youtu.be/Ih9IlDO63CY 15. when in pyspark | otherwise in pyspark | alias in pyspark | case statement in pyspark: https://youtu.be/d1GVRCXZ64o 16. Null handling in pySpark DataFrame | isNull function in pyspark | isNotNull function in pyspark: https://youtu.be/si4bhjK1uB8 17. fill() & fillna() functions in PySpark | how to replace null values in pyspark | Azure Databrick: https://youtu.be/OgAry0H_P9c 18. GroupBy function in PySpark | agg function in pyspark | aggregate function in pyspark: https://youtu.be/_IaHywzYYFc 19. count function in pyspark | countDistinct function in pyspark | pyspark tutorial for beginners: https://youtu.be/wDNSgMkkwPM 20. orderBy in pyspark | sort in pyspark | difference between orderby and sort in pyspark: https://youtu.be/L3d6Eaxurz0 21. distinct and dropduplicates in pyspark | how to remove duplicate in pyspark | pyspark tutorial: https://youtu.be/HY54i2m4C0M

Download

0 formats

No download links available.

5. kpmg pyspark interview question & answer | databricks scenario based interview question & answer | NatokHD