Application Characteristics

Vohra, Deepak

doi:10.1007/978-1-4842-2424-3_3

Deepak Vohra²

944 Accesses

Abstract

Apache HBase is designed to be used for random, real-time, relatively low latency, read/write access to big data. HBase’s goal is to store very large tables with billions/millions of rows and billions/millions of columns on clusters installed on commodity hardware.

Download chapter PDF

Apache HBase is designed to be used for random, real-time, relatively low latency, read/write access to big data. HBase’s goal is to store very large tables with billions/millions of rows and billions/millions of columns on clusters installed on commodity hardware.

The following characteristics make an application suitable for HBase:

Large quantities of data in the scale of 100s of GBs to TBs and PBs. Not suitable for small-scale data
Fast, random access to data
Variable, flexible schema. Each row is or could be different
Key-based access to data when storing, loading, searching, retrieving, serving, and querying
Data stored in collections. For example, some metadata, message data, or binary data is all keyed into the same value
High throughput in the scale of 1000s of records per second
Horizontally scalable cache capacity. Capacity may be increased by just adding nodes
The data layout is designed for key lookup with no overhead for sparse columns
Data-centric model rather than a relationship-centric model. Not suitable for an ERD (entity relationship diagram) model
Strong consistency and high availability are requirements. Consistency is favored over availability
Lots of insertion, lookup, and deletion of records
Write-heavy applications
Append-style writing (inserting and overwriting) rather than heavy read-modify-write

Some use-cases for HBase are as follows:

Audit logging systems
Tracking user actions
Answering queries such as
- What are the last 10 actions made by the user?
- Which users logged into the system on a particular day?
Real-time analytics
- Real-time counters
- Interactive reports showing trends and breakdowns
- Time series databases
Monitoring system
Message-centered systems (Twitter-like messages and statuses)
Content management systems serving content out of HBase
Canonical use-cases such as storing web pages during crawling of the Web

HBase is not suitable/optimized for

Classical transactional applications or relational analytics
Batch MapReduce (not a substitute for HDFS)
Cross-record transactions and joins

HBase is not a replacement for RDBMS or HDFS. HBase is suitable for

Large datasets
Sparse datasets
Loosely coupled (denormalized) records
Several concurrent clients

HBase is not suitable for

Small datasets (unless many of them)
Highly relational records
Schema designs requiring transactions

Summary

In this chapter, I discussed the characteristics that make an application suitable for Apache HBase. The characteristics include fast, random access to large quantities of data with high throughput. Application characteristics not suitable were also discussed. In the next chapter, I will discuss the physical storage in HBase.

Author information

Authors and Affiliations

White Rock, British Columbia, Canada
Deepak Vohra

Authors

Deepak Vohra
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vohra, D. (2016). Application Characteristics. In: Apache HBase Primer. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-2424-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-4842-2424-3_3
Published: 18 November 2016
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-2423-6
Online ISBN: 978-1-4842-2424-3
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics