Skip to main content
Home
  • Tutorials
    • Quality Assurance
    • Software Development
    • Machine Learning
    • Data Science
  • About Us
  • Contact
programsbuzz facebook programsbuzz twitter programsbuzz linkedin
  • Log in

Main navigation

  • Tutorials
    • Quality Assurance
    • Software Development
    • Machine Learning
    • Data Science
  • About Us
  • Contact

HBase Architecture and its components

Profile picture for user shiksha.dahiya
Written by shiksha.dahiya on 07/14/2021 - 09:10

HBase provides low-latency random reads and writes on top of HDFS. In HBase, tables are dynamically distributed by the system whenever they become too large to handle (Auto Sharding). The simplest and foundational unit of horizontal scalability in HBase is a Region. A continuous, sorted set of rows that are stored together is referred to as a region (subset of table data). HBase architecture has a single HBase master node (HMaster) and several slaves i.e. region servers. Each region server (slave) serves a set of regions, and a region can be served only by a single region server. Whenever a client sends a write request, HMaster receives the request and forwards it to the corresponding region server. HBase can be run in a multiple master setup, wherein there is only single active master at a time. HBase tables are partitioned into multiple regions with every region storing multiple table’s rows.

The HBase Physical Architecture consists of servers in a Master-Slave relationship as shown below. Typically, the HBase cluster has one Master node, called HMaster and multiple Region Servers called HRegionServer. Each Region Server contains multiple Regions – HRegions. Data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.

hbase

The HMaster in the HBase is responsible for

  • Performing Administration
  • Managing and Monitoring the Cluster
  • Assigning Regions to the Region Servers
  • Controlling the Load Balancing and Failover

On the other hand, the HRegionServer perform the following work

  • Hosting and managing Regions
  • Splitting the Regions automatically
  • Handling the read/write requests
  • Communicating with the Clients directly

Each Region Server contains a Write-Ahead Log (called HLog) and multiple Regions. Each Region in turn is made up of a MemStore and multiple StoreFiles (HFile). The data lives in these StoreFiles in the form of Column Families (explained below). The MemStore holds in-memory modifications to the Store (data).

The mapping of Regions to Region Server is kept in a system table called .META. When trying to read or write data from HBase, the clients read the required Region information from the .META table and directly communicate with the appropriate Region Server. Each Region is identified by the start key (inclusive) and the end key (exclusive)

Components of Apache HBase Architecture

HBase architecture has 3 important components- HMaster, Region Server and ZooKeeper.

i.) HMaster

HBase HMaster is a lightweight process that assigns regions to region servers in the Hadoop cluster for load balancing. Responsibilities of HMaster –

● Manages and Monitors the Hadoop Cluster 
● Performs Administration (Interface for creating, updating and deleting tables.)Controlling the failover
● DDL operations are handled by the HMaster
● Whenever a client wants to change the schema and change any of the metadata operations, HMaster is responsible for all these operations.

ii.) Region Server

These are the worker nodes which handle read, write, update, and delete requests from clients. Region Server process, runs on every node in the hadoop cluster. Region Server runs on HDFS DataNode and consists of the following components –

● Block Cache – This is the read cache. Most frequently read data is stored in the read cache and whenever the block cache is full, recently used data is evicted.
● MemStore- This is the write cache and stores new data that is not yet written to the disk. Every column family in a region has a MemStore.
● Write Ahead Log (WAL) is a file that stores new data that is not persisted to permanent storage.
● HFile is the actual storage file that stores the rows as sorted key values on a disk.

iii.) Zookeeper

HBase uses ZooKeeper as a distributed coordination service for region assignments and to recover any region server crashes by loading them onto other region servers that are functioning. ZooKeeper is a centralized monitoring server that maintains configuration information and provides distributed synchronization. Whenever a client wants to communicate with regions, they have to approach Zookeeper first. HMaster and Region servers are registered with ZooKeeper service, client needs to access ZooKeeper quorum in order to connect with region servers and HMaster. In case of node failure within an HBase cluster, ZKquoram will trigger error messages and start repairing failed nodes. ZooKeeper service keeps track of all the region servers that are there in an HBase clustertracking information about how many region servers are there and which region servers are holding which DataNode. HMaster contacts ZooKeeper to get the details of region servers. Various services that Zookeeper provides include –

● Establishing client communication with region servers.
● Tracking server failure and network partitions.
● Maintain Configuration Information
● Provides ephemeral nodes, which represent different region servers.

Related Content
Apache HBase Tutorial
Data Model of Apache HBase
HBase Advantages and Disadvantages
Tags
Apache HBase
  • Log in or register to post comments

Choose Your Technology

  1. Agile
  2. Apache Groovy
  3. Apache Hadoop
  4. Apache HBase
  5. Apache Spark
  6. Appium
  7. AutoIt
  8. AWS
  9. Behat
  10. Cucumber Java
  11. Cypress
  12. DBMS
  13. Drupal
  14. GitHub
  15. GitLab
  16. GoLang
  17. Gradle
  18. HTML
  19. ISTQB Foundation
  20. Java
  21. JavaScript
  22. JMeter
  23. JUnit
  24. Karate
  25. Kotlin
  26. LoadRunner
  27. matplotlib
  28. MongoDB
  29. MS SQL Server
  30. MySQL
  31. Nightwatch JS
  32. PactumJS
  33. PHP
  34. Playwright
  35. Playwright Java
  36. Playwright Python
  37. Postman
  38. Project Management
  39. Protractor
  40. PyDev
  41. Python
  42. Python NumPy
  43. Python Pandas
  44. Python Seaborn
  45. R Language
  46. REST Assured
  47. Ruby
  48. Selenide
© Copyright By iVagus Services Pvt. Ltd. 2023. All Rights Reserved.

Footer

  • Cookie Policy
  • Privacy Policy
  • Terms of Use