Let's say you have a cluster of three worker nodes, each with three cores.

In terms of parallelism, you have at most nine parallel tasks. Which of the following configurations is better in terms of Sark memory management?

  • Number of executors = 3; each executor has 3 cores.
  • Number of executors = 9; each executor has one core.

You should definitely go with the first approach because when you increase the number of executors, it will increase the number of JVM instances. Now, JVM has its own memory requirements, and caching is done at the executor level. So, each executor will try to cache some DataFrames that will increase the memory overlap and increase the RDD recreations.
So, it is recommended that each Executor should have more than one core so that some parallel tasks can be executed per JVM and also it should not be more than 4-5 as this will increase the burden of I/O on a single JVM.