Skip to main content

Parallel XPath query based on cost optimization

Abstract

The performance of XPath query is the key factor to the capacity of XML processing. It is an important way to improve the performance of XPath by making full use of multi-threaded computing resources for parallel processing. However, in the process of XPath parallelization, load imbalance and thread inefficiency often lead to the decline of parallel performance. In this paper, we propose a cost optimization-based parallel XPath query method named coPXQ. This method improves the parallel processing effect of navigational XPath query through a series of optimization measures. The main measures include as follows: first, by optimizing the storage of XML node relation index, both storage and access efficiency of the index are improved. Secondly, load balancing is realized by a new cost estimation method according to the number of XML node relations to optimize parallel relation index creation and parallel primitive execution. Thirdly, the strategy of determining the number of worker threads based on parallel effectiveness estimation is utilized to ensure the effective use of threads in query. Compared with the existing typical methods, the experimental results show that our method can obtain better parallel performance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. 1.

    Buneman P (1997) Semistructured data. In: Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, pp 117–121

  2. 2.

    Robie J, Dyck M, Spiegel J (2017) XML path language (XPath). https://www.w3.org/TR/xpath/

  3. 3.

    Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal XML pattern matching. In: the 2002 ACM SIGMOD International Conference on Management of Data, Wisconsin, USA, 2002. ACM, pp 310–321

  4. 4.

    Cate BT, Marx M (2007) Navigational XPath: calculus and algebra. ACM SIGMOD Rec 36(2):19–26

    Article  Google Scholar 

  5. 5.

    Grün C, Worteler L, Kircher L, Shadura R (2018) BaseX: the XML framework https://basex.org/

  6. 6.

    Meier W (2019) EXist-db Project https://github.com/exist-db/exist

  7. 7.

    Franc X (2019) Qizxopen http://www.axyana.com/qizxopen

  8. 8.

    Shah B, Rao P, Moon B, Rajagopalan M (2009) A data parallel algorithm for XML DOM parsing. In: Database and XML technologies, pp 75–90

  9. 9.

    Pan Y, Lu W, Zhang Y, Chili K (2007) A static load-balancing scheme for parallel XML parsing on multicore CPUs. In: Seventh IEEE international symposium on cluster computing and the grid (CCGRID 2007). IEEE, pp 351–362

  10. 10.

    Machdi I, Amagasa T, Kitagawa H (2010) Parallel holistic twig joins on a multi-core system. Int J Web Inf Syst 6(2):149–177

    Article  Google Scholar 

  11. 11.

    Bordawekar R, Lim L, Shmueli O (2009) Parallelization of XPath queries using multi-core processors. In: International Conference on Extending Database Technology: Advances in Database Technology (EDBT2009), pp 180–191

  12. 12.

    Chen R, Liao H, Wang Z (2013) Parallel XPath evaluation based on node relation matrix. J Comput Inf Syst 9(19):7583–7592

    Google Scholar 

  13. 13.

    Shnaiderman L, Shmueli O (2015) Multi-core processing of XML twig patterns. iEEE Trans Knowl Data Eng 27(4):1057–1070

    Article  Google Scholar 

  14. 14.

    Chen R, Liao H, Wang Z, Su H (2016) Automatic parallelization of XQuery programs on multi-core systems. J Supercomput 72(4):1517–1548

    Article  Google Scholar 

  15. 15.

    Miao H, Nie T, Yue D, Zhang T, Liu J (2012) Algebra for parallel XQuery processing. Web Age Inf Manag 2012:1–10

    Google Scholar 

  16. 16.

    Kim SH, Lee KH, Lee YJ (2016) Multi-query processing of XML data streams on multicore. J Supercomput 73(6):1–30

    Google Scholar 

  17. 17.

    Jiang L, Zhao Z (2017) Grammar-aware parallelization for scalable XPath querying. In: the 22nd ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP ’17),2017. ACM, pp 371–383

  18. 18.

    Karsin B, Casanova H, Lim L (2017) Low-latency XPath query evaluation on multi-core processors. In: Hawaii International Conference on System Sciences, 2017, pp 6222–6231

  19. 19.

    Chen R, Wang Z, Hong Y (2021) Hong Y (2021) Pipelined XPath query based on cost optimization. Sci Program 19:1–16

    Google Scholar 

  20. 20.

    Huang X, Si X, Yuan X, Wang C (2014) A dynamic load-balancing scheme for XPath queries parallelization in shared memory multi-core systems. J Comput 9:6

    Google Scholar 

  21. 21.

    Moussalli R, Halstead R, Salloum M, Najjar WA, Tsotras VJ (2011) Efficient XML path filtering using GPUs. In: International workshop on accelerating data management systems using modern processor and storage architectures (ADMS 2011), Seattle, WA, USA

  22. 22.

    Kim S, Lee Y, Lee JJ (2015) Matrix-based XML stream processing using a GPU. In: IEEE international congress on big data

  23. 23.

    Sampson J, Gonzalez R (2006) Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In: The 39th annual IEEE/ACM international symposium on microarchitecture, Orlando, USA, 2006. pp 235–246

  24. 24.

    Willebeek-Lemair MH, Reeves AP (1993) Strategies for dynamic load balancing on highly parallel computers. IEEE Trans Parallel Distrib Syst 4(9):979–993

    Article  Google Scholar 

  25. 25.

    Weissman JB (2002) Predicting the cost and benefit of adapting data parallel applications in clusters. J Parallel Distrib Comput 62(8):1248–1271

    Article  Google Scholar 

  26. 26.

    Zuo W, Chen Y, He F, Chen K (2011) Load balancing parallelizing XML query processing based on shared cache chip multi-processor (CMP). Sci Res Essays 6(18):3914–3926

    Article  Google Scholar 

  27. 27.

    Subramaniam S, Haw SC, Soon LK (2021) Improved centralized XML query processing using distributed query workload. IEEE Access 9:29127–29142

    Article  Google Scholar 

  28. 28.

    Zhang C, Naughton J, DeWitt D, Luo Q, Lohman G (2001) On supporting containment queries in relational database management systems. In: ACM SIGMOD record, 2001, vol 2. ACM, pp 425–436

  29. 29.

    Sestakova E, Janousek J (2018) Automata approach to XML data indexing. Information 9(1):12

    Article  Google Scholar 

  30. 30.

    Widemann BT, Lepper M (2019) Simple and effective relation-based approaches to XPath and XSLT type checking. Technical Report, Bad Honnef (2015)

  31. 31.

    Bordawekar R, Lim L, Kementsietsidis A (2010) Statistics-based parallelization of XPath queries in shared memory. In: The 13th International Conference on Extending Database Technology (EDBT), 2010. ACM

  32. 32.

    Sato S, Hao W, Matsuzaki K (2018) Parallelization of XPath queries using modern XQuery processors. In: New Trends in Databases and Information Systems. ADBIS 2018

  33. 33.

    Hartmann S, Ma H, Schewe KD (2007) Cost-based vertical fragmentation for XML. In: al. KCCe (ed) APWeb/WAIM 2007. Springer, Berlin, Heidelberg, pp 12–24

  34. 34.

    Georgiadis H, Charalambides M, Vassalos V (2010) Efficient physical operators for cost-based XPath execution. In: Paper presented at the EDBT 2010

  35. 35.

    Hidaka S, Kato H, Yoshikawa M (2007) A relative cost model for XQuery. In: Proceedings of the 2007 ACM symposium on Applied computing, 2007. ACM, pp 1332–1333

  36. 36.

    Herlihy M, Shavit N (2008) The art of multiprocessor programming. Morgan Kaufmann, New York

    Google Scholar 

  37. 37.

    University of Pennsylvania Treebank Project (2002) http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/reebank/treebank_e.xml

  38. 38.

    Schmidt A, Waas F, Kersten M, Carey MJ, Manolescu I, Busse R (2002) XMark: a benchmark for XML data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, 2002. VLDB Endowment, pp 974–985

  39. 39.

    Wilkinson B, Allen M (2005) Parallel programming: techniques and applications using networked workstations and parallel computers. 2nd edn, Pearson Education

  40. 40.

    Linford JC, Hermanns M-A, Geimer M, Boehme D, Wolf F (2008) Detecting load imbalance in massively parallel applications. Technical Report FZJ-JSC-IB-2008–09. Forschungszentrum Julich

  41. 41.

    Robie J, Dyck M, Spiegel J (2017) XQuery 3.1: an XML query language. https://www.w3.org/TR/xquery

Download references

Acknowledgements

This research was supported by the Natural Science Foundation of Fujian Province of China (2018J01538, 2020J01697), the Science Foundation of Jimei University (ZQ2014003), and Open Fund of Digital Fujian Big Data Modeling and Intelligent Computing Institute.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rongxin Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, R., Wang, Z., Su, H. et al. Parallel XPath query based on cost optimization. J Supercomput (2021). https://doi.org/10.1007/s11227-021-04074-y

Download citation

Keywords

  • XPath query
  • Relation index
  • Cost estimation
  • Load balancing
  • Parallel effectiveness