What are Partitions in Spark?

In  spark,  a  partition  is  a  logical  chunk  of  data.  Spark  data  structures  RDDs/DataFrames/Data  sets  are partitioned and processed in different cores/nodes. Partitioning allows you to control application parallelism, which results in better utilisation of clusters. In general, distribution of work to more workers is achieved by creating more number of partitions; with fewer partitions of the data, the worker nodes complete the work in larger data chunks. However, partitioning may not be helpful in all applications. For example, if the given

RDD is scanned only once, partitioning it in advance is pointless. Partitioning is useful only when a data set is reused multiple times (in key-oriented situations using functions such as join()).