Limitations of PySpark

HARD TO COMMUNICATE

While it comes to communicate an issue in MapReduce fashion, now and again it's HARD to convey.

LESS EFFECTIVE

Pythons are less effective when contrasted with other programming models. For instance as MPI when we need a ton of correspondence.

MODERATE

Essentially, Python is delayed when contrasted with Scala for Spark Jobs, Performance wise. Approximately, 10x more slow. That implies assuming we need to do substantial preparing then Python will be more slow than Scala.

JUVENILE

In Spark 1.2, Python upholds for Spark Streaming still it isn't pretty much as full grown as Scala as of now. So, we should go to Scala, if we need Streaming.

Can't utilize INTERNAL working of SPARK

As the entire of Spark is written in Scala, so we need to work with Scala if we need to or need to change from interior working of Spark for our task, we can't utilize Python for it.