Spark Scenario Based Question: How to use explode , explode_outer and posexplode #interview #spark @datadnyan
client.csv
Id|Name|address
1|Dnyan| Pune,Mumbai,Latur
2|Rahul| Nanded,Mumbai
3|Sonali| Pune,Mumbai
4|Yogesh |
demo.ipnyb
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.functions import count
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number
# Create SparkSession
spark = SparkSession.builder \
.master("local[2]") \
.appName("demo") \
.getOrCreate()
df=spark.read.option('delimiter','|').option("Header",True).csv(r"G:\Youtube\Docx\client.csv")
df.printSchema()
df.show()
df.withColumn("City",F.explode_outer(F.split(F.col('address'),','))).drop('address').show()
df.select("*",F.posexplode_outer(F.split(F.col('address'),','))).drop("address").withColumnRenamed('col','City').show()