- Coalesce uses the existing partitions. So, it has less data shuffling in the network. It is recommended to use Coalesce if you want to reduce the number of partitions.
- Repartition uses network shuffling and recreates new partitions that are equal in size. It is recommended to use repartition when you want to increase the number of partitions.
- Coalesce is not recommended to increase the number of partitions, as it may create unequal size partitions, and a spark job does not work well with unequal size partitions. This may, in turn, create a need for network shuffling.