So, I have this folder, let's call it /data.
And it has partitions in it, e.g.:
/data/partition1, /data/partition2.
I read new data from kafka, and imagine I only need to update /data/partition2. I do:
dataFrame
.write
.mode(SaveMode.Overwrite)
.partitionBy("date", "key")
.option("header", "true")
.format(format)
.save("/data")
and it successfully updates /data/partition2, but /data/partition1 is gone... How can I force spark's SaveMode.Overwrite to not touch HDFS partitions that don't need to be updated?