What is the difference between Data Block and Input Split?

Data Block: HDFS stores data by first splitting it into smaller chunks. HDFS splits a large file into smaller chunks known as blocks. Thus, it stores each file as a set of data blocks. These data blocks are replicated and distributed across multiple DataNodes.

Input Split: An input split represents the amount of data that is processed by an individual Mapper at a time. In MapReduce, the number of input splits is equal to that of Map tasks. Hence, it is used to configure the number of Map tasks which is equal to the number of Input Splits.

Comments