Skip to main content
Home
  • Tutorials
    • Quality Assurance
    • Software Development
    • Machine Learning
    • Data Science
  • About Us
  • Contact
programsbuzz facebook programsbuzz twitter programsbuzz linkedin
  • Log in

Main navigation

  • Tutorials
    • Quality Assurance
    • Software Development
    • Machine Learning
    • Data Science
  • About Us
  • Contact

Apache Hadoop HDFS Architecture

Profile picture for user shiksha.dahiya
Written by shiksha.dahiya on 07/15/2021 - 13:37

The structure of HDFS is a master/slave model. The HDFS cluster will have one single name node, the master server that organize the namespace and control the files that are accessed by clients. Then under the name node, there are several data nodes that manage storage attached to the nodes. They store and retrieve blocks as the name node or the clients requested and send back the set of blocks that carry those information.

The blocks are stored internally in the name node and they are much larger than a normal block in a disk. The default for the block is 64MB and files are broken into block-sized chunks to be stored. There are several benefits of having a block structure for the distributed system. First, since a file can be larger than the disk in the network, the file can be divided into several blocks and to be stored on different disks. This way, the file can actually be processed in parallel. In addition, for fault tolerance and recovery, block structure is easily replicated from another disk and bring the process back to normal.

hadoop

Namenode

The namenode is the commodity hardware that contains the GNU/Linux operating system and the namenode software. It is a software that can be run on commodity hardware. The system having the namenode acts as the master server and it does the following tasks −

  • Manages the file system namespace.

  • Regulates client’s access to files.

  • It also executes file system operations such as renaming, closing, and opening files and directories.

Datanode

The datanode is a commodity hardware having the GNU/Linux operating system and datanode software. For every node (Commodity hardware/System) in a cluster, there will be a datanode. These nodes manage the data storage of their system.

  • Datanodes perform read-write operations on the file systems, as per client request.

  • They also perform operations such as block creation, deletion, and replication according to the instructions of the namenode.

Block

Generally the user data is stored in the files of HDFS. The file in a file system will be divided into one or more segments and/or stored in individual data nodes. These file segments are called as blocks. In other words, the minimum amount of data that HDFS can read or write is called a Block. The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration.

hadoop

Since HDFS is built using the Java language, any machine that supports Java can run the name node or the data node software. There exists a variety of other interfaces that are compatible using HDFS by different methods, this include Thrift, C, FUSE, WebDAV, HTTP and FTP. Usually, the other file system interfaces need additional integration in order to access HDFS. For example, for some non-Java applications that have Thrift bindings, they use the Thrift API in their implementation by accessing the Thrift service and ease the interaction to Hadoop.

HDFS interacts with other components of Apache Hadoop to distribute files and data as requested.

hadoop

Related Content
Apache Hadoop Tutorial
Introduction to HDFS (Hadoop Distributed File System)
Limitations of HDFS
Tags
Apache Hadoop
  • Log in or register to post comments

Choose Your Technology

  1. Agile
  2. Apache Groovy
  3. Apache Hadoop
  4. Apache HBase
  5. Apache Spark
  6. Appium
  7. AutoIt
  8. AWS
  9. Behat
  10. Cucumber Java
  11. Cypress
  12. DBMS
  13. Drupal
  14. GitHub
  15. GitLab
  16. GoLang
  17. Gradle
  18. HTML
  19. ISTQB Foundation
  20. Java
  21. JavaScript
  22. JMeter
  23. JUnit
  24. Karate
  25. Kotlin
  26. LoadRunner
  27. matplotlib
  28. MongoDB
  29. MS SQL Server
  30. MySQL
  31. Nightwatch JS
  32. PactumJS
  33. PHP
  34. Playwright
  35. Playwright Java
  36. Playwright Python
  37. Postman
  38. Project Management
  39. Protractor
  40. PyDev
  41. Python
  42. Python NumPy
  43. Python Pandas
  44. Python Seaborn
  45. R Language
  46. REST Assured
  47. Ruby
  48. Selenide
© Copyright By iVagus Services Pvt. Ltd. 2023. All Rights Reserved.

Footer

  • Cookie Policy
  • Privacy Policy
  • Terms of Use