Why is RDD immutable?

Some of the advantages of having immutable RDDs in Spark are as follows:

  • In a distributed parallel processing environment, the immutability of Spark RDD rules out the possibility of inconsistent results. In other words, immutability solves the problems caused by concurrent use of the data set by multiple threads at once.
  • Additionally, immutable data can as easily live in memory as on disk in a multiprocessing environment.
  • The immutability of Spark RDDs also makes them a deterministic function of their input. This means that RDDs can be recreated at any time. This helps in making RDDs fault tolerance.