So basically I have an input data frame as below
which I want to transform into below output
Can anyone please help me as to ho we can implement this using PySpark Dataframes ?
I tried different ways but could not find an optimal way to do the same
So basically I have an input data frame as below
which I want to transform into below output
Can anyone please help me as to ho we can implement this using PySpark Dataframes ?
I tried different ways but could not find an optimal way to do the same
Do a groupby on common columns and collect the column with distinct values into a list.
import pyspark.sql.functions as F
ans_df = df.groupBy(F.col('HCP ID'), F.col('TERR ID')).agg(collect_list(F.col('PRODUCT')).alias("LINEUP"))