Abstract
In this study, we present a novel tree based index scheme for efficient indexing and serving large datasets in the cloud. It incorporates and extends the functionality of Hadoop to create a fully parallel index system. Our new scheme can be summarized as follows. First, we leverage the MapReduce framework to create an index, then publish the index meta information and write it into a meta table. Second, we use the meta information to help the system adopting an efficient method to handle a given query. Finally, we optimize the system by using cache mechanism. We conduct extensive experiments on the Hadoop cluster to demonstrate the scalability, availability and efficiency of the proposed index framework.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adya, A., Bolosky, W.J., Castro, M., Cermak, G., Chaiken, R., Douceur, J.R., Howell, J., Lorch, J.R., Theimer, M., Wattenhofer, R.P.: Farsite: federated, available, and reliable storage for an incompletely trusted environment. In: OSDI (2002)
Aguilera, M.K., Golab, W., Shah, M.A.: A practical scalable distributed b-tree. Proc. VLDB Endow. 1(1), 598–609 (2008)
Aguilera, M.K., Merchant, A., Shah, M., Veitch, A., Karamanolis, C.: Sinfonia: a new paradigm for building scalable distributed systems. In: SIGOPS (2007)
Bajda-Pawlikowski, K., Abadi, D.J., Silberschatz, A., Paulson, E.: Efficient processing of data warehousing queries in a split execution environment. In: SIGMOD (2011)
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD (1990)
Borkar, V., Carey, M., Grover, R., Onose, N., Vernica, R.: Hyracks: A flexible and extensible foundation for data-intensive computing. In: 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 (2011)
Bozanis, P., Foteinos, P.: Wer-trees. Data Knowl. Eng. 63(2), 397–413 (2007)
Brakatsoulas, S., Pfoser, D., Theodoridis, Y.: Revisiting R-tree construction principles. In: Manolopoulos, Y., Návrat, P. (eds.) ADBIS 2002. LNCS, vol. 2435, pp. 149–162. Springer, Heidelberg (2002)
Comer, D.: Ubiquitous b-tree. ACM Comput. Surv. 11(2), 121–137 (1979)
Crainiceanu, A., Linga, P., Machanavajjhala, A., Gehrke, J., Shanmugasundaram, J.: P-ring: an efficient and robust p2p range index structure. In: SIGMOD (2007)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dittrich, J., Quiané-Ruiz, J.-A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3(1-2), 515–529 (2010)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: SOSP (2003)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. SIGMOD Rec. 14(2), 47–57 (1984)
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., Zhao, B.: Oceanstore: an architecture for global-scale persistent storage. SIGARCH Comput. Archit. News 28(5), 190–201 (2000)
Li, N., Rao, J., Shekita, E., Tata, S.: Leveraging a scalable row store to build a distributed text index. In: CloudDB (2009)
Liao, H., Han, J., Fang, J.: Multi-dimensional index on hadoop distributed file system. In: NAS (2010)
Lin, K.I., Jagadish, H.V., Faloutsos, C.: The tv-tree: an index structure for high-dimensional data. The VLDB Journal 3(4), 517–542 (1994)
Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The r+-tree: A dynamic index for multi-dimensional objects. In: VLDB (1987)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: SIGMOD (2010)
Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: OSDI (2006)
Wu, S., Jiang, D., Ooi, B.C., Wu, K.-L.: Efficient b-tree based indexing for cloud data processing. Proc. VLDB Endow. 3(1-2), 1207–1218 (2010)
Xia, T., Zhang, D.: Improving the r*-tree with outlier handling techniques. In: GIS (2005)
Zuo, H., Jing, N., Deng, Y., Chen, L.: Can-qtree: A distributed spatial index for peer-to-peer networks. In: HPCC (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yin, Y., Yao, B., Shen, Y., Guo, M., Xu, C. (2013). A Generic Tree-Like Index Framework in the Cloud. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-41230-1_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41229-5
Online ISBN: 978-3-642-41230-1
eBook Packages: Computer ScienceComputer Science (R0)