How deletions are handled inside HBase?

Delete is a special type UpDate in HBase, where the values for which the delete request is submitted are not deleted immediately. Rather these values are masked by assigning a tombstone marker to them. Every request to read these values(with tombstone markers) returns null to the client, which gives the client the impression that the values are already deleted(Consistency).

The reason why HBase does this, is because HFiles are immutable(Recall: HDFS doesn't allow modifying data of a file). All the values with the tombstone marker are permanently removed during the next Major Compaction.

There are three types of tombstone markers:

  • Version Delete Marker: which is used to mark a single version of a column value.
  • Column Delete Marker: Marks all versions of a column.
  • Family Delete Marker: Marks all versions of all columns for a column family. 

Finally, during the next Major compaction, the values with tombstone markers (deleted data) along with expired values(whose TTL is over) are removed from the HBase.