I have the following DataFrame df in Spark:
+------------+---------+-----------+
|OrderID | Type| Qty|
+------------+---------+-----------+
| 571936| 62800| 1|
| 571936| 62800| 1|
| 571936| 62802| 3|
| 661455| 72800| 1|
| 661455| 72801| 1|
I need to select the row that has a largest value of Qty per each unique OrderID or the last rows per OrderID if all Qty are the same (e.g. as for 661455). The expected result:
+------------+---------+-----------+
|OrderID | Type| Qty|
+------------+---------+-----------+
| 571936| 62802| 3|
| 661455| 72801| 1|
Any ides how to get it?
This is what I tried:
val partitionWindow = Window.partitionBy(col("OrderID")).orderBy(col("Qty").asc)
val result = df.over(partitionWindow)