Abstract
The informal data structures and trillions of data volume are the challenges for databases to store and retrieve semi-structured data. Most researchers deal with the issues through R-Tree, KD-tree and space curves, but these structures are not suitable for default and discrete values of semi-structured data, and even require sampling before storage. We present MD-Index, a scalable multi-dimensional indexing system that supports high-throughput and real-time range queries. MD-Index builds bitmap index of sliced data over a range partitioned Key-value store. The underlying Key-value store guarantees high throughput, large data storage, high availability and fault tolerance of the system, and bitmap provides multi-dimensional index of data. Meanwhile, MD-Index encodes the discrete values as the hash code of a slice, and stores the data and the bitmap of a slice in the same region (a storage unit of the range partitioned Key-value store) to utilize distributed computing and data locality. Our prototype of MD-Index is built on HBase, the standard Key-value database. Experimental results reveal that MD-Index is capable of storing and retrieving trillions of semi-structured data and achieving a throughput of two million records per second.
Supported by 2016YFB1000604.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhou, X., Zhang, X., Wang, Y., Li, R., Wang, S.: Efficient distributed multi-dimensional index for big data management. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds.) WAIM 2013. LNCS, vol. 7923, pp. 130–141. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38562-9_14
Nishimura, S., et al.: MD-HBase: a scalable multi-dimensional data infrastructure for location aware services. In: 2011 12th IEEE International Conference on Mobile Data Management (MDM), vol. 1. IEEE (2011)
Lawder, J.K., King, P.J.H.: Querying multi-dimensional data indexed using the Hilbert space-filling curve. ACM Sigmod Rec. 30(1), 19–24 (2001)
Chan, C.-Y., Ioannidis, Y.E.: Bitmap index design and evaluation. ACM SIGMOD Rec. 27(2), 355–366 (1998)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Guttman, A.: R-trees: a dynamic index structure for spatial searching, vol. 14, no. 2. ACM (1984)
Jensen, C.S., Lin, D., Ooi, B.C.: Query and update efficient B+-tree based indexing of moving objects. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30. VLDB Endowment (2004)
Apache HBase - Apache HBase™Home. base.apache.org/
Apache Hadoop. hadoop.apache.org/
Entity-Relationship Model Wikipedia, Wikimedia Foundation, 4 October 2018. en.wikipedia.org/wiki/Entity
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, X., Qi, Y., Hou, D. (2019). Multi-dimensional Index over a Key-Value Store for Semi-structured Data. In: Li, J., Meng, X., Zhang, Y., Cui, W., Du, Z. (eds) Big Scientific Data Management. BigSDM 2018. Lecture Notes in Computer Science(), vol 11473. Springer, Cham. https://doi.org/10.1007/978-3-030-28061-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-28061-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28060-4
Online ISBN: 978-3-030-28061-1
eBook Packages: Computer ScienceComputer Science (R0)