I have a basic spark job that does a couple of joins. The 3 data frames that get joined are somewhat big, nearly 2 billion records in each of them. I have a spark infrastructure that automatically scales up nodes whenever necessary. It seems like a very simple spark SQL query whose results I write to disk. But the job always gets stuck at 99% when I look at from Spark UI.
Bunch of things I have tried are:
- Increase the number of
executorsandexecutor memory. - Use
repartitionwhile writing the file. - Use the native spark
joininstead ofspark SQL joinetc
However, none of these things have worked. It would be great if somebody can share the experience of solving this problem. Thanks in advance.