Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Wu, Huayu

doi:10.1007/978-3-319-10085-2_16

Huayu Wu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8645))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1407 Accesses
3 Citations

Abstract

Processing XML queries over big XML data using MapReduce has been studied in recent years. However, the existing works focus on partitioning XML documents and distributing XML fragments into different compute nodes. This attempt may introduce high overhead in XML fragment transferring from one node to another during MapReduce execution. Motivated by the structural join based XML query processing approach, which uses only related inverted lists to process queries in order to reduce I/O cost, we propose a novel technique to use MapReduce to distribute labels in inverted lists in a computing cluster, so that structural joins can be parallelly performed to process queries. We also propose an optimization technique to reduce the computing space in our framework, to improve the performance of query processing. Last, we conduct experiment to validate our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://hadoop.apache.org
http://www.xml-benchmark.org
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: VLDB, pp. 922–933 (2009)
Google Scholar
Bidoit, N., Colazzo, D., Malla, N., Ulliana, F., Nole, M., Sartiani, C.: Processing XML queries and updates on map/reduce clusters. In: EDBT, pp. 745–748 (2013)
Google Scholar
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MapReduce. In: SIGMOD, pp. 975–986 (2010)
Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD, pp. 310–321 (2002)
Google Scholar
Choi, H., Lee, K., Kim, S., Lee, Y., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: CIKM, pp. 2737–2739 (2012)
Google Scholar
Cong, G., Fan, W., Kementsietsidis, A., Li, J., Liu, X.: Partial evaluation for distributed XPath query processing and beyond. ACM Trans. Database Syst. 37(4), 32 (2012)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: USENIX Symp. on Operating System Design and Implementation, pp. 137–150 (2004)
Google Scholar
Kling, P., Ozsu, M.T., Daudjee, K.: Generating efficient excution plans for vertically partitioned XML databases. PVLDB 4(1), 1–11 (2010)
Google Scholar
Lin, Y., Agrawa, D., Chen, C., Ooi, B.C., Wu, S.: Llama: leveraging columnar starage for scalable join processing in the MapReduce framework. In: SIGMOD, pp. 961–972 (2011)
Google Scholar
Okcan, A., Riedewald, M.: Pricessing theta-joins using MapReduce. In: SIGMOD, pp. 949–960 (2011)
Google Scholar
Suciu, D.: Distributed query evaluation on semistricutred data. ACM Trans. Database Syst. 27(1), 1–62 (2002)
Article Google Scholar
Wu, H.: Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce. Tech Report, http://www1.i2r.a-star.edu.sg/~huwu/paraSJ.pdf

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, A*STAR, Singapore
Huayu Wu

Authors

Huayu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, 46022, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Knowledge Management, LMU University of Munich, Leopoldstraße 13, 80802, Munich, Germany
Marcus Spies
FAW, University of Linz, Altenbergerstrasse 69, 4040, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, H. (2014). Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8645. Springer, Cham. https://doi.org/10.1007/978-3-319-10085-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-10085-2_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10084-5
Online ISBN: 978-3-319-10085-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics