The Journal of Supercomputing

, Volume 73, Issue 2, pp 810–836 | Cite as

A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

  • Jinhyun Ahn
  • Dong-Hyuk Im
  • Taewhi Lee
  • Hong-Gee Kim


A massive amount of extensible markup language (XML) data from various areas is available on the Web. Answering structural queries against XML data is important, as it is the core of information retrieval systems for XML data. Labeling scheme has been suggested for rapid query processing of massive XML data. Interval-based, prefix-based, and prime number labeling scheme exist. Of these, the prime number labeling scheme has the advantage of query processing by arithmetic operations. Recently, the repetitive prime number labeling scheme was proposed; this scheme produces a smaller label size than conventional prime number labeling using prime numbers repetitively. However, a parallel algorithm for the repetitive prime number labeling scheme does not exist; therefore, this scheme is difficult to apply to massive XML data. In this paper, a dynamic and parallel approach of XML labeling algorithm that works with MapReduce is proposed for, particularly, the repetitive prime number labeling scheme. Two optimization techniques are devised: the label assignment order adjustment to further reduce the label size and the upper tree compressing technique to reduce the memory requirements during the labeling process. Experiments over real-world XML data confirmed that the techniques are effective than the previous works.


Repetitive prime labeling MapReduce XML Tree 



This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0101-16-0054, WiseKB: Big data based self-evolving knowledge base and reasoning platform) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A1A1002236) and ETRI R&D Program (“Development of Big Data Platform for Dual Mode Batch-Query Analytics, 16ZS1410”) funded by the Government of Korea.


  1. 1.
    Choi H, Lee KH, Lee YJ (2014) Parallel labeling of massive XML data with MapReduce. J Supercomput 67(2):408–437CrossRefGoogle Scholar
  2. 2.
    Christophides V, Karvounarakis G, Plexousakis D, Scholl M, Tourtounis S (2004) Optimizing taxonomic semantic web queries using labeling schemes. Web Semant Sci Serv Agents World Wide Web 1(2):207–228CrossRefGoogle Scholar
  3. 3.
    Clark J, DeRose S et al (2004) XML Path Language (XPath) Version 1.0, W3C Recommendation, 1999Google Scholar
  4. 4.
    Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  5. 5.
    Klaib A, Joan L (2014) Investigation into indexing XML data techniques. In: Proceedings. The International Conference on Internet Computing (ICOMP), 21–24 July 2014, Las Vegas, USAGoogle Scholar
  6. 6.
    Leonardi E, Bhowmick SS, Madria S (2005) Xandy: detecting changes on large unordered XML documents using relational databases. In: Zhou L, Ooi BC, Meng X (eds) Database systems for advanced applications. Springer, Berlin, pp 711–723Google Scholar
  7. 7.
    Li C, Ling TW (2005) QED: a novel quaternary encoding to completely avoid re-labeling in XML updates. In: CIKM. ACMGoogle Scholar
  8. 8.
    Lin RR, Chang YH, Chao KM (2013) A compact and efficient labeling scheme for XML documents. In: Meng W, Feng L, Bressan S, Winiwarter W, Song W(eds) Database systems for advanced applications. Springer, Berlin, pp 269–283Google Scholar
  9. 9.
    Lu J, Meng X, Ling TW (2011) Indexing and querying XML using extended Dewey labeling scheme. Data Knowl Eng 70(1):35–59CrossRefGoogle Scholar
  10. 10.
    Morozov S, Saiedian H, Wang H (2014) Reusable prime number labeling scheme for hierarchical data representation in relational databases. J Comput Inf Technol 22(1):31–43Google Scholar
  11. 11.
    Subramaniam S, Haw SC, Soon LK (2014) Relab: a subtree based labeling scheme for efficient XML query processing. In: IEEE 2nd International Symposium on Telecommunication Technologies (ISTT). IEEE, pp 121–125Google Scholar
  12. 12.
    Sun DH, Hwang SC (2014) A labeling methods for keyword search over large XML documents. J KIISE 41(9):699–706Google Scholar
  13. 13.
    Tatarinov I, Viglas SD, Beyer K, Shanmugasundaram J, Shekita E, Zhang C (2002) Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 204–215. ACMGoogle Scholar
  14. 14.
    Wang Y, DeWitt DJ, Cai JY (2003) X-diff: an effective change detection algorithm for XML documents. In: ICDE, 2003. IEEE, pp 519–530Google Scholar
  15. 15.
    Wu X, Lee ML, Hsu W (2004) A prime number labeling scheme for dynamic ordered XML trees. In: ICDEGoogle Scholar
  16. 16.
    Xu L, Bao Z, Ling TW (2007) A dynamic labeling scheme using vectors. In: Wagner R, Revell N, Pernul G (eds) Database and expert systems applications. Springer, Berlin, pp 130–140Google Scholar
  17. 17.
    Xu L, Ling TW, Wu H (2012) Labeling dynamic XML documents: an order-centric approach. IEEE Trans Knowl Data Eng 24(1):100–113CrossRefGoogle Scholar
  18. 18.
    Xu L, Ling TW, Wu H, Bao Z (2009) DDE: from dewey to a fully dynamic XML labeling scheme. In: SIGMOD. ACMGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Jinhyun Ahn
    • 1
  • Dong-Hyuk Im
    • 2
  • Taewhi Lee
    • 3
  • Hong-Gee Kim
    • 4
  1. 1.Biomedical Knowledge Engineering Laboratory and Dental Research InstituteSeoul National UniversitySeoulKorea
  2. 2.Department of Computer and Information EngineeringHoseo UniversityAsanKorea
  3. 3.BigData Intelligence Research DepartmentElectronics and Telecommunications Research InstituteDaejonKorea
  4. 4.Biomedical Knowledge Engineering Laboratory, Dental Research Institute and Institute of Human-Environment Interface BiologySeoul National UniversitySeoulKorea

Personalised recommendations