Evaluation of XPath Queries Over XML Documents Using SparkSQL Framework

  • Radoslav Hricov
  • Adam Šenk
  • Petr Kroha
  • Michal Valenta
Conference paper

DOI: 10.1007/978-3-319-58274-0_3

Part of the Communications in Computer and Information Science book series (CCIS, volume 716)
Cite this paper as:
Hricov R., Šenk A., Kroha P., Valenta M. (2017) Evaluation of XPath Queries Over XML Documents Using SparkSQL Framework. In: Kozielski S., Mrozek D., Kasprowski P., Małysiak-Mrozek B., Kostrzewa D. (eds) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. BDAS 2017. Communications in Computer and Information Science, vol 716. Springer, Cham

Abstract

In this contribution, we present our approach to querying XML document that is stored in a distributed system. The main goal of this paper is to describe how to use Spark SQL framework to implement a subset of expressions from XPath query language. Five different methods of our approach are introduced and compared, and by this, we also demonstrate the actual state of query optimization on Spark SQL platform. It may be taken as the next contribution of our paper. A subset of expressions from XPath query language (supported by the implemented methods) contains all XPath axes except the axes of attribute and namespace while predicates are not implemented in our prototype. We present our implemented system, data, measurements, tests, and results. The evaluated results support our belief that our method significantly decreases data transfers in the distributed system that occur during the query evaluation.

Keywords

Spark SQL XML XPath Big data 

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Radoslav Hricov
    • 1
  • Adam Šenk
    • 1
  • Petr Kroha
    • 1
  • Michal Valenta
    • 1
  1. 1.Faculty of Information TechnologyCzech Technical University in PraguePragueCzech Republic

Personalised recommendations