Abstract
The popularity of multi-core systems makes software parallelization become an important way to improve performance. As a mainstream XML query language, XQuery is the core of XML processing. It is critical to take full advantage of multi-core computing to improve XML processing performance through parallelization of XQuery. However, usually it is difficult to parallelize XQuery programs because of the nested style of XQuery expressions. Moreover, implicit parallelism is necessary to simplify the development of parallel XML application. In this paper, we propose an automatic parallelization approach, which can automatically select proper types of parallelism for a specific XQuery query. Specifically, we propose a functional intermediate language called pFXQL (parallel Functional XML Query Language) to describe parallel query plans. pFXQL has parallel semantics and is well complied with XQuery. We propose a cost model to effectively support plan generation and the selection of a preferred plan. The model estimates both computational cost and parallel cost. We implement our approach in XQuery engine and conduct experiments on various multi-core systems. Experimental results verify the effectiveness of our approach.
Similar content being viewed by others
References
Alexandrov A, Ewen S, Heimel M, Hueske F, Kao O, Markl V, Nijkamp E, Warneke D (2011) MapReduce and PACT-comparing data parallel programming models. In: Proceedings of the 14th conference on database systems for business, technology, and web (BTW), pp 25–44
Bamford R, Borkar V, Brantner M, Fischer PM, Florescu D, Graf D, Kossmann D, Kraska T, Muresan D, Nasoi S (2009) XQuery reloaded. Proc VLDB Endow 2(2):1342–1353
Berglund A, Boag S, Chamberlin D, Fernandez MF, Kay M, Robie J, Siméon J (2015) XML Path Language (XPath) 2.0 (Second Edition). http://www.w3.org/TR/xpath20/
Berglund A, Fernández M, Malhotra A, Marsh J, Nagy M, Walsh N (2015) XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition). http://www.w3.org/TR/xpath-datamodel/
Bidoit N, Colazzo D, Sartiani C, Solimando A, Ulliana F (2015) Andromeda: a system for processing queries and updates on big XML documents. In: Morzy T, Valduriez P, Bellatreche L (eds) New trends in databases and information systems: ADBIS 2015 short papers and workshops, BigDap, DCSA, GID, MEBIS, OAIS, SW4CH, WISARD. Springer International Publishing, Cham, pp 218–228. doi:10.1007/978-3-319-23201-0_24
Boag S, Chamberlin D, Fernández MF, Florescu D, Robie J, Siméon J, Stefanescu M (2015) XQuery 1.0: An XML Query Language (Second Edition). http://www.w3.org/TR/xquery/
Bordawekar R, Lim L, Kementsietsidis A, Kok BWL (2010) Statistics-based parallelization of XPath queries in shared memory systems. In: Proceedings of the 13th International Conference on Extending Database Technology (EDBT2010), pp 159–170
Camachorodríguez J, Colazzo D, Manolescu I (2014) PAXQuery: a massively parallel XQuery processor. In: Proceedings of workshop on data analytics in the cloud (Danac’14), pp 1–4. doi:10.1145/2627770.2627772
Carman EP Jr, Vinayak TW, Borkar R, Michael J, Carey, Tsotras VJ (2015) Apache VXQuery: a scalable XQuery implementation. arXiv:1504.00331 (CoRR)
Chamberlin D, Fankhauser P, Florescu D, Marchiori M, Robie J (2007) XML query use cases. http://www.w3.org/TR/xquery-use-cases/
Chen R, Liao H, Chen W (2011) XML parsing schema based on parallel sub-tree construction. Comput Sci 38(3):191–194
Damigos M, Gergatsoulis M, Plitsos S (2014) Distributed processing of XPath queries using mapReduce. In: Catania B, Cerquitelli T, Chiusano S et al. (eds) New trends in databases and information systems: 17th east european conference on advances in databases and information systems. Springer International Publishing, Cham, pp 69–77. doi:10.1007/978-3-319-01863-8_8
Dongarra J (2007) The promise and perils of the coming multicore revolution and its impact. CTWatch Q 3(1):1–33
Draper D, Fankhauser P, Fernández MF, Malhotra A, Rose K, Rys M, Siméon J, Wadler P (2015) XQuery 1.0 and XPath 2.0 Formal Semantics (Second Edition). http://www.w3.org/TR/xquery-semantics/
Fegaras L, Dash R, Wang YH (2006) A fully pipelined XQuery processor. In: XQuery Implementation, Experience and Perspectives (XIME-P’06) Workshop
Fernández M, Jim T, Morton K, Onose N, Simeon J (2007) DXQ: a distributed XQuery scripting language. In: Proceedings of the 4th international workshop on XQuery implementation, experience and perspectives, pp 1–6
Fluet M, Rainey M, Reppy J, Shaw A (2008) Implicitly-threaded parallelism in Manticore. ACM Sigplan Not 9:119–130
Frasincar F, Houben G-J, Pau C (2002) XAL: an algebra for XML query optimization. Aust Comput Sci Commun 24 (2):49–56. doi:10.1145/563932.563912
Hammond K (1994) Parallel functional programming: an introduction. In: International Symposium on Parallel Symbolic Computation, Citeseer
Herlihy M, Shavit N (2008) The art of multiprocessor programming. Morgan Kaufmann Publisher, Burlington
Hidaka S, Kato H, Yoshikawa M (2007) A relative cost model for XQuery. In: Proceedings of the 2007 ACM symposium on Applied computing, pp 1332–1333
Jones SP, Leshchinskiy R, Keller G, Chakravarty MMT (2008) Harnessing the multicores: nested data parallelism in Haskell. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2008) Dagstuhl, Citeseer, Germany
Kahn G (1987) Natural semantics. Springer, Heidelberg
Kepser S (2004) A simple proof for the Turing-completeness of XSLT and XQuery. In: Extreme Markup Languages
Khatchadourian S, Consens MP, Siméon J (2011) Having a ChuQL at XML on the Cloud. In: Mendelzon Int’l A. Workshop
Kling P, Ozsu MT, Daudjee K (2010) Distributed XML query processing: Fragmentation, localization and pruning. University of Waterloo, Technical Report CS-2010-02
Li X (2006) Efficient and parallel evaluation of XQuery. The Ohio State University
Liao H (2013) Principles and implementation techniques of XQuery. Science Press, Beijing
Liao H, Shan W, Gao H (2013) Automatic parallelization of XQuery programs. J Softw 8(4):842–851
Liao H, Tang L (2009) A decorrelation method based on XQA Query algebra. J Beijing Univ Technol 35(8):1108–1114
Loogen R, Ortega-Mallén Y, Peña-Marí R (2005) Parallel functional programming in Eden. J Funct Program 15(03):431–475
Lu W, Gannon D (2007) Parallel XML processing by work stealing. In: Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches, pp 31–38
Machdi I, Amagasa T, Kitagawa H (2010) Parallel holistic twig joins on a multi-core system. Int J Web Inform Syst 6(2):149–177
Miao H, Nie T, Yue D, Zhang T, Liu J (2012) Algebra for parallel XQuery processing. In: Bao Z, Gao Y, Gu Y et al. (eds) Web-Age information management: WAIM 2012 international workshops: GDMM, IWSN, MDSP, USDM, and XMLDM. Springer, Berlin, Heidelberg, pp 1–10. doi:10.1007/978-3-642-33050-6_1
Quan Y, Liao H (2014) A Task Scheduling Algorithm for Automated Parallel Processing of XQuery. In: the Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP2014) 2014. pp 250–254
Re C, Brinkley J, Hinshaw K, Suciu D (2004) Distributed xquery. In: Workshop on Information Integration on the Web, Citeseer, pp 116–121
Re C, Siméon J, Fernandez M (2006) A complete and efficient algebraic compiler for XQuery. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), pp 14–25
Schmidt A, Waas F, Kersten M, Carey MJ, Manolescu I, Busse R (2002) XMark: a benchmark for XML data management. In: Proceedings of the 28th international conference on Very Large Data Bases, 2002. VLDB Endowment, pp 974–985
Seshadri P, Pirahesh H, Leung TYC (1996) Complex query decorrelation. In: Proceedings of the Twelfth International Conference on Data Engineering, pp 450–458
Shnaiderman L, Shmueli O (2015) Multi-Core Processing of XML Twig Patterns. IEEE Trans Knowl Data Eng 27(4):1057–1070
Sivaramakrishnan KC, Ziarek L, Prasad R, Jagannathan S (2010) Lightweight asynchrony using parasitic threads. In: Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming (DAMP2010), pp 63–72
Spoonhower D, Blelloch GE, Harper R (2007) A Semantic Framework for Scheduling Parallel Programs. Computer Science Department 886
Sutter H (2005) The free lunch is over: a fundamental turn toward concurrency in software. http://www.gotw.ca/publications/concurrency-ddj.htm
Wilkinson B, Allen M (2005) Parallel programming: techniques and applications using networked workstations and parallel computers, 2nd edn. Prentice hall, New Jersey
Yui M, Miyazaki J, Uemura S, Kato H (2008) XBird/D: distributed and parallel XQuery processing using remote proxy. In: Proceedings of the 2008 ACM symposium on Applied computing, pp 1003–1007
Zhang X, Liao H (2010) A framework for XQuery system with XML algebra and tree pattern query. J Front Comput Sci Technol 4(11):996–1004
Zhang Y, Tang N, Boncz P (2009) Efficient distribution of full-fledged XQuery. In: IEEE 25th International Conference on Data Engineering (ICDE’09) IEEE, pp 565–576
Acknowledgments
This research was supported by the National Natural Science Foundation of China (No. 41201462), the Natural Science Foundation of Fujian Province of China (No. 2014J01245) and the Science Foundation of Jimei University (No. ZQ2014003). Many thanks are due to Dr. Huang Bin for reading this paper and offering helpful advice.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, R., Liao, H., Wang, Z. et al. Automatic parallelization of XQuery programs on multi-core systems. J Supercomput 72, 1517–1548 (2016). https://doi.org/10.1007/s11227-016-1679-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1679-x