Skip to main content
Log in

Indexing Techniques of Distributed Ordered Tables: A Survey and Analysis

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Many NoSQL (Not Only SQL) databases were proposed to store and query on a huge amount of data. Some of them like BigTable, PNUTS, and HBase, can be modeled as distributed ordered tables (DOTs). Many additional indexing techniques have been presented to support queries on non-key columns for DOTs. However, there was no comprehensive analysis or comparison of these techniques, which brings troubles to users in selecting or proposing a proper indexing technique for a certain workload. This paper proposes a taxonomy based on six indexing issues to classify indexing techniques on DOTs and provides a comprehensive review of the state-of-the-art techniques. Based on the taxonomy, we propose a performance model named QSModel to estimate the query time and storage cost of these techniques and run experiments on a practical workload from Tencent to evaluate this model. The results show that the maximum error rates of the query time and storage cost are 24.2% and 9.8%, respectively. Furthermore, we propose IndexComparator, an open source project that implements representative indexing techniques. Therefore, users can select the best-fit indexing technique based on both theoretical analysis and practical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chang F, Dean J, Ghemawat S, Hsieh W C, Wallach D A, Burrows M, Chandra T, Fikes A, Gruber R E. BigTable: A distributed storage system for structured data. ACM Trans. Computer Systems (TOCS), 2008, 26(2): Article No. 4.

  2. Lakshman A, Malik P. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35-40.

    Article  Google Scholar 

  3. Cooper B F, Ramakrishnan R, Srivastava U, Silberstein A, Bohannon P, Jacobsen H A, Puz N, Weaver D, Yerneni R. PNUTS: Yahoo!’s hosted data serving platform. Proceedings of the VLDB Endowment, 2008, 1(2): 1277-1288.

    Article  Google Scholar 

  4. Harter T, Borthakur D, Dong S Y, Aiyer A, Tang L Y, Arpaci-Dusseau A C, Arpaci-Dusseau R H. Analysis of HDFS under HBase: A Facebook messages case study. In Proc. the 12th USENIX Conf. File and Storage Technologies, February 2014, pp.199-212.

  5. George L. HBase: The Definitive Guide. O’Reilly Media, Inc., 2011.

  6. Feng C, Yang X, Liang F, Sun X H, Xu Z W. LCIndex: A local and clustering index on distributed ordered tables for flexible multi-dimensional range queries. In Proc. the 44th Int. Conf. Parallel Processing, September 2015, pp.719-728.

  7. Nishimura S, Das S, Agrawal D, Abbadi A E. MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In Proc. the 12th IEEE Int. Conf. Mobile Data Management, June 2011, pp.7-16.

  8. Ma Y Z, Rao J, Hu W S, Meng X F, Han X, Zhang Y, Chai Y P, Liu C Q. An efficient index for massive IOT data in cloud environment. In Proc. the 21st ACM Int. Conf. Information and Knowledge Management, October 2012, pp.2129-2133.

  9. Zou Y Q, Liu J, Wang S C, Zha L, Xu Z W. CCIndex: A complemental clustering index on distributed ordered tables for multi-dimensional range queries. In Proc. the 7th IFIP Int. Conf. Network and Parallel Computing, September 2010, pp.247-261.

  10. Zhang C, Chen X Y, Shi Z L, Ge B. Algorithms for spatiotemporal queries in HBase. Journal of Chinese Computer Systems, 2016, 37(11): 2409-2415. (in Chinese)

    Google Scholar 

  11. Hsu Y T, Pan Y C, Wei L Y, Peng W C, Lee W C. Key formulation schemes for spatial index in cloud data managements. In Proc. the 13th IEEE Int. Conf. Mobile Data Management, July 2012, pp.21-26.

  12. Zhou X, Zhang X, Wang Y H, Li R, Wang S. Efficient distributed multi-dimensional index for big data management. In Proc. the 14th Int. Conf. Web-Age Information Management, June 2013, pp.130-141.

  13. O’Neil P, Cheng E, Gawlick D, O’Neil E. The log-structured merge-tree (LSM-tree). Acta Informatica, 1996, 33(4): 351-385.

    Article  MATH  Google Scholar 

  14. Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. ACM SIGMOD Int. Conf. Management of Data, June 1984, pp.47-57.

  15. Sellis T K, Roussopoulos N, Faloutsos C. The R+-tree: A dynamic index for multi-dimensional objects. In Proc. the 13th Int. Conf. Very Large Data Bases, September 1987, pp.507-518.

  16. Silberschatz A, Korth H F, Sudarshan S. Database System Concepts (3rd edition). McGraw-Hill, 1997.

  17. Wang L, Chen B, Liu Y H. Distributed storage and index of vector spatial data based on HBase. In Proc. the 21st Int. Conf. Geoinformatics, June 2013.

  18. GeW, Luo S M, ZhouWH, Zhao D, Tang Y, Zhou J, QuW W, Yuan C F, Huang Y H. HiBase: A hierarchical indexing mechanism and system for efficient HBase query. Chinese Journal of Computers, 2016, 39(1): 140-153. (in Chinese)

    MathSciNet  Google Scholar 

  19. Wang S C, Mares M A, Guo Y K. CGDM: Collaborative genomic data model for molecular profiling data using NoSQL. Bioinformatics, 2016, 32(23): 3654-3660.

    Google Scholar 

  20. Serrano D, Han D, Stroulia E. From relations to multidimensional maps: Towards an SQL-to-HBase transformation methodology. In Proc. the 8th IEEE Int. Conf. Cloud Computing, July 2015, pp.81-89.

  21. Chang C R, Hsieh M J, Wu J J, Wu P Y, Liu P F. HSQL: A highly scalable cloud database for multi-user query processing. In Proc. the 5th IEEE Int. Conf. Cloud Computing, June 2012, pp.943-944.

  22. Agrawal P, Silberstein A, Cooper B F, Srivastava U, Ramakrishnan R. Asynchronous view maintenance for VLSD databases. In Proc. ACM SIGMOD Int. Conf. Management of Data, June 2009, pp.179-192.

  23. Zhou X, Li H, Zhang X, Wang S, Ma Y Y, Liu K Y, Zhu M, Huang M L. ABR-Tree: An efficient distributed multidimensional indexing approach for massive data. In Proc. Int. Conf. Algorithms and Architectures for Parallel Processing, November 2015, pp.781-790.

  24. Zhang Y, Ma Y Z, Meng X F. Efficient processing of spatial keyword queries on HBase. Journal of Chinese Computer Systems, 2012, 33(10): 2141-2146. (in Chinese)

    Google Scholar 

  25. Zhang N Y, Zheng G Z, Chen H J, Chen J Y, Chen X. HBaseSpatial: A scalable spatial data storage based on HBase. In Proc. the 13th IEEE Int. Conf. Trust Security and Privacy in Computing and Communications, September 2014, pp.644-651.

  26. Han D, Stroulia E. HGrid: A data model for large geospatial data sets in HBase. In Proc. the 6th IEEE Int. Conf. Cloud Computing, June 28-July 3, 2013, pp.910-917.

  27. Zhou W, Han J Z, Zhang Z, Dai J, Xu Z Y. HDKV: Supporting efficient high-dimensional similarity search in keyvalue stores. Concurrency and Computation: Practice and Experience, 2013, 25(12): 1675-1698.

    Article  Google Scholar 

  28. Ma Y Z, Zhang Y, Meng X F. ST-HBase: A scalable data management system for massive geo-tagged objects. In Proc. the 14th Int. Conf. Web-Age Information Management, June 2013, pp.155-166.

  29. Tang X S, Han B D, Chen H. A hybrid index for multidimensional query in HBase. In Proc. the 4th Int. Conf. Cloud Computing and Intelligence Systems, August 2016, pp.332-336.

  30. Ma Y Z, Meng X F, Wang S Y, Hu W S, Han X, Zhang Y. An efficient index method for multi-dimensional query in cloud environment. In Proc. the 2nd Int. Conf. Cloud Computing and Big Data in Asia, June 2015, pp.307-318.

  31. Chen X Y, Zhang C, Ge B, Xiao W D. Efficient historical query in HBase for spatio-temporal decision support. International Journal of Computers Communications & Control, 2016, 11(5): 613-630.

    Article  Google Scholar 

  32. Lee K, Ganti R K, Srivatsa M, Liu L. Efficient spatial query processing for big data. In Proc. the 22nd ACM SIGSPATIAL Int. Conf. Advances in Geographic Information Systems, November 2014, pp.469-472.

  33. Wang H P, Ci X, Meng X F. Fast multi-fields query processing in bigtable based cloud systems. In Proc. the 14th Int. Conf. Web-Age Information Management, June 2013, pp.142-154.

  34. Huang S, Wang B T, Zhu J Y, Wang G R, Yu G. R-HBase: A multi-dimensional indexing framework for cloud computing environment. In Proc. IEEE Int. Conf. Data Mining Workshop, December 2014, pp.569-574.

  35. Li Q C, Lu Y, Gong X L, Zhang J. Optimizational method of HBase multi-dimensional data query based on Hilbert space-filling curve. In Proc. the 9th Int. Conf. P2P Parallel Grid Cloud and Int. Conf. Computing, November 2014, pp.469-474.

  36. Li S, Hu S H, Ganti R, Srivatsa M, Abdelzaher T. Pyro: A spatial-temporal big-data storage system. In Proc. the USENIX Annual Technical Conf., July 2015, pp.97-109.

  37. Du N B, Zhan J F, Zhao M, Xiao D R, Xie Y C. Spatiotemporal data index model of moving objects on fixed networks using HBase. In Proc. IEEE Int. Conf. Computational Intelligence & Communication Technology, February 2015, pp.247-251.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Feng.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 56 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, C., Li, CD. & Li, R. Indexing Techniques of Distributed Ordered Tables: A Survey and Analysis. J. Comput. Sci. Technol. 33, 169–189 (2018). https://doi.org/10.1007/s11390-018-1813-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-018-1813-8

Keywords

Navigation