HDFS is designed for processing/storing big data. So, in case of small files, it is not prepared to efficiently process/store numerous small files. These files generate a lot of overhead to the NameNode and the DataNodes. Reading through small files normally causes a lot of seeks and hopping from one DataNode to another to retrieve each small file. All of this adds up to inefficient data read/write operations.
Comments