Skip to main content
Log in

\(\mathcal{MD}\)-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The ubiquity of location enabled devices has resulted in a wide proliferation of location based applications and services. To handle the growing scale, database management systems driving such location based services (LBS) must cope with high insert rates for location updates of millions of devices, while supporting efficient real-time analysis on latest location. Traditional DBMSs, equipped with multi-dimensional index structures, can efficiently handle spatio-temporal data. However, popular open-source relational database systems are overwhelmed by the high insertion rates, real-time querying requirements, and terabytes of data that these systems must handle. On the other hand, key-value stores can effectively support large scale operation, but do not natively provide multi-attribute accesses needed to support the rich querying functionality essential for the LBSs.

We present the design and implementation of \(\mathcal {MD}\) -HBase, a scalable data management infrastructure for LBSs that bridges this gap between scale and functionality. Our approach leverages a multi-dimensional index structure layered over a key-value store. The underlying key-value store allows the system to sustain high insert throughput and large data volumes, while ensuring fault-tolerance, and high availability. On the other hand, the index layer allows efficient multi-dimensional query processing. Our optimized query processing technique accesses only the index and storage level entries that intersect with the query region, thus ensuring efficient query processing. We present the design of \(\mathcal {MD}\)-HBase that demonstrates how two standard index structures—the K-d tree and the Quad tree—can be layered over a range partitioned key-value store to provide scalable multi-dimensional data infrastructure. Our prototype implementation using HBase, a standard open-source key-value store, can handle hundreds of thousands of inserts per second using a modest 16 node cluster, while efficiently processing multi-dimensional range queries and nearest neighbor queries in real-time with response times as low as few hundreds of milliseconds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Algorithm 6
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. The name \(\mathcal {MD}\)-HBase signifies adding multi-dimensional data processing capabilities to HBase, a range partitioned key-value store.

  2. http://en.wikipedia.org/wiki/Distributed_hash_table.

  3. Since in HBase, a region can only be split into two sub-regions, we could not implement RPB for Quad trees as our experiments are for a 3D space.

References

  1. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Brinkhoff, T., Str, O.: A framework for generating network-based moving objects. GeoInformatica 6, 2002 (2002)

    Article  Google Scholar 

  3. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI, pp. 205–218 (2006)

    Google Scholar 

  4. Das, S., Agrawal, D., El Abbadi, A.: G-Store: a scalable data store for transactional multi key access in the cloud. In: SOCC, pp. 163–174 (2010)

    Chapter  Google Scholar 

  5. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

    Google Scholar 

  6. DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  7. Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval on composite keys. Acta Inform. 4, 1–9 (1974)

    Article  MATH  Google Scholar 

  8. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)

    Google Scholar 

  9. The Apache Hadoop Project. http://hadoop.apache.org/core/ (2010)

  10. HBase: Bigtable-like structured storage for Hadoop HDFS. http://hadoop.apache.org/hbase/ (2010)

  11. Jensen, C.S., Lin, D., Ooi, B.C.: Query and update efficient b+-tree based indexing of moving objects. In: VLDB, pp. 768–779 (2004). VLDB Endowment

    Google Scholar 

  12. Lawder, J.K.: Querying multi-dimensional data indexed using the Hilbert space-filling curve. SIGMOD Rec. 30, 2001 (2001)

    Article  Google Scholar 

  13. Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. Tech. rep., IBM Ottawa, Canada (1966)

  14. Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: MD-HBase: a scalable multi-dimensional data infrastructure for location aware services. In: MDM, pp. 7–16 (2011)

    Google Scholar 

  15. Ramabhadran, S., Ratnasamy, S., Hellerstein, J.M., Shenker, S.: Prefix hash tree: an indexing data structure over distributed hash tables. Tech. rep., Intel Research, Berkeley (2004)

  16. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  17. Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp. 563–576 (2009)

    Chapter  Google Scholar 

  18. Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: SIGMOD, pp. 591–602 (2010)

    Google Scholar 

  19. http://en.wikipedia.org/wiki/List_of_mobile_network_operators (2010)

  20. Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: CloudDB, pp. 17–24 (2009)

    Chapter  Google Scholar 

Download references

Acknowledgements

This work is partly funded by NSF grants III 1018637 and CNS 1053594.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudipto Das.

Additional information

Communicated by Dipanjan Chakraborty.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nishimura, S., Das, S., Agrawal, D. et al. \(\mathcal{MD}\)-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib Parallel Databases 31, 289–319 (2013). https://doi.org/10.1007/s10619-012-7109-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-012-7109-z

Keywords

Navigation