- As the number of keys is very less, if you use Hash partitions, then it will create 10 partitions, and each partition will be about 1 TB in size. So, a Hash partition is not a good choice here.
- You should go for a Range partition where a number of partitions will be created for each key, and they will be placed in sorted order. It will increase the number of partitions and also the performance of the Spark job