Data Model of Apache HBase

The Data Model in HBase is designed to accommodate semi-structured data that could vary in field size, data type and columns. Additionally, the layout of the data model makes it easier to partition the data and distribute it across the cluster. The Data Model in HBase is made of different logical components such as Tables, Rows, Column Families, Columns, Cells and Versions. HBase is a NoSQL and column-oriented database. While it looks like a relational database that includes rows and columns, HBase is not a relational database. It is a column-oriented database, while the relational databases are row-oriented. Both databases store data differently on the hard disk.

hbase

it illustration of the HBase table architecture while the components of the HBase table are described below:

(1) Table: HBase tables are column-oriented, i.e., data are stored in column format. The HBase Tables are more like logical collection of rows stored in separate partitions called Regions. As shown above, every Region is then served by exactly one Region Server. The figure above shows a representation of a Table.

(2) Row Key: It is the most crucial component of the HBase table. It is used for searching and retrieving data. It increases the speed of the searches. A row is one instance of data in a table and is identified by a row key. Row keys are unique in a Table and are always treated as a byte[].

(3) Column Families: The entire columns related to each other are combined and called column families. Data in a row are grouped together as Column Families. Each Column Family has one more Columns and these Columns in a family are stored together in a low level storage file known as HFile. Column Families form the basic unit of physical storage to which certain HBase features like compression are applied. Hence it’s important that proper care be taken when designing Column Families in table. The table above shows Customer and Sales Column Families. The Customer Column Family is made up 2 columns – Name and City, whereas the Sales Column Families is made up to 2 columns – Product and Amount.

(4) Column Qualifiers: Each column in the HBase table is known as the Column Qualifier. A Column Family is made of one or more columns. A Column is identified by a Column Qualifier that consists of the Column Family name concatenated with the Column name using a colon – example: columnfamily:columnname. There can be multiple Columns within a Column Family and Rows within a table can have varied number of Columns.

(5) Cell: A cell is made up of row key, column family, and column qualifier. Actual data are stored in a cell. There are many versions of the cell. A Cell stores data and is essentially a unique combination of row key, Column Family and the Column (Column Qualifier). The data stored in a Cell is called its value and the data type is always treated as byte[].

(6) Time Stamp: A Time Stamp is made up of date and time. Whenever data are stored, they have a unique date and time. The timestamp is stored with the actual data, making it easy to search for a particular version of the data. The data stored in a cell is versioned and versions of data are identified by the timestamp. The number of versions of data retained in a column family is configurable and this value by default is 3.