Use cases for Apache HBase

The Apache H Base carries all the features of the original Google Big table paper like the Bloom filters, in-memory operations and compression. The Apache H Base table can serve as the input for Map Reduce jobs on the Hadoop ecosystem and it can also serve as output after the data is processed by Map Reduce. The data in the H Base can be accessed via the Java API or through the REST API or even the Thrift and AVRO gateways.

What H Base is that it is basically a column-oriented key-value data store, and the since it works extremely fine with the kind of data that Hadoop process it is natural fit for deploying as a top layer on HDFS. It is extremely fast when it comes to both read and write operations and does not lose this extremely important quality even when the data sets are humongous. Therefore H Base is being widely used by corporations for its high throughput and low input/output latency. H Base cannot work as a replacement for the SQL database but it is perfectly possible to have an SQL layer on top of H Base to integrate it with the various business intelligence and analytics tools.

As an operational data store, you can run your applications on top of HBase. You can also integrate your application with HBase. You can use HBase in CDP alongside your on-prem HBase clusters for disaster recovery use cases.

Some of the other use cases of HBase in CDP include:

  • Support customer mission-important/mission-critical scale-out applications
  • Query data with millisecond latency
  • Perform fraud model serving and detection
  • Enable serving analytics on mobile and web applications directly to end-customers
  • Operationalize Artificial Intelligence/Machine Learning to drive revenue or manage operational cost
  • Surfacing useful data in your applications (for example, customer 360 applications for customer support)
  • Use as a key-value store for applications
  • Bring together data spanning sources, schemas and data types and leverage in your applications
  • Use as a small file store. You can use HBase to store logs from various devices into HBase

Telecom Industry faces the following Technical challenges:

  • Storing billions of CDR (Call detailed recording) log records generated by telecom domain
  • Providing real-time access to CDR logs and billing information of customers
  • Provide cost-effective solution comparing to traditional database systems

Solution: HBase is used to store billions of rows of detailed call records. If 20TB of data is added per month to the existing RDBMS database, performance will deteriorate. To handle a large amount of data in this use case, HBase is the best solution. HBase performs fast querying and displays records.

Banking industry generates millions of records on a daily basis. In addition to this, the banking industry also needs an analytics solution that can detect Fraud in money transactions.

Solution: To store, process and update vast volumes of data and performing analytics, an ideal solution is - HBase integrated with several Hadoop ecosystem components.

The Hbase is mainly used to write heavy applications and provide fast random access to available data. The big companies like Facebook, Tuienti uses Hbase for messaging platform. Twitter, Yahoo, and Adobe use HBase internally. Hbase has wide range of application in the following areas like,

Medical: In medical field Hbase used for storing genome sequences and running MapReduce on it, and stores the disease history of patients.

Sports: In sports field Hbase used for storing match histories for better analytics and prediction.

Web: It is also used to store user history and preferences for better customer targeting.

Oil and petroleum industry: HBase is used in the oil and petroleum industry to store exploration data for analysis and predict probable places where oil can be found.

E-commerce: It is used for recording and storing logs about customer search history, and to perform analytics and then target advertisement for better business.

Other fields: HBase can be used in many other fields where it's needed to store petabytes of data and run analysis on it, for which traditional systems may take months. We will discuss more about use cases and industry usability in further chapters.