A Distributed Keyword Search Algorithm in XML Databases Using MapReduce

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 107)


With the extensive use of XML technology, the using of XML databases which consist of mass structured and semi-structured XML documents has become increasingly popular. How to acquire the data to meet users’ needs in XML databases quickly and efficiently has become an urgent problem. Current researches mainly focus on XML streams and XML documents. On the contrary, the keyword search algorithm in XML databases gets little attention. In this chapter, we combine the concept of keyword search algorithm-SLCA in XML documents and the characteristics of MapReduce to propose a distributed keyword search algorithm in XML databases, and implement it by open-source framework-Hadoop. Finally, sufficient experiments show that our method is efficient in practice in various aspects.


Mapreduce Xml database Keyword information retrieval Hadoop Slca Algorithm 


  1. 1.
    Yanlei D, Altinel M, Franlin M et al (2003) Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans on Database Syst 28, pp 467–516Google Scholar
  2. 2.
    Chen Y, Davidson SB, Zheng YF (2005) An efficient XPath query processor for XML streams. In: Proceedings of ICDE. Georgia: IEEE Press, 2005Google Scholar
  3. 3.
    Chan CY, Felber P, Garofalakis MN, Rastogi R (2002) Efficient filtering of XML documents with XPath expressions. In: Proceedings of ICDE, vol 27. IEEE Computer Press, California, pp 235–244Google Scholar
  4. 4.
    Li Y, Yu C, Jagadish HV (2004) Schema-Freee XQuery. In:Nascimento MA, (ed) Proc.of 13th Int’1 Conf. on Very Large Data Bases (VLDB). Toronto: Morgan Kaufmann Publishers, pp 72–83Google Scholar
  5. 5.
    Guo L, Shao F, Botev C (2003) XRANK: Ranked keyword search over XML documents. In: Proc. of the 22th ACM SIGMOD Conference. San Diego, California, USA: ACM Press, pp 16–27Google Scholar
  6. 6.
    Xu Y, Papakonstantinou Y (2005) Efficient keyword search for smallest LCAs in XML databases. In: Proc. of the 24th ACM SIGMOD Conference. Baltimore, Margland, USA: ACM Press, pp 527–538Google Scholar
  7. 7.
    Bao Z, Ling TW, Chen B, and Lu J (2009) Effictive XML keyword search with relevance oriented ranking. In: ICDE, 2009Google Scholar
  8. 8.
    Xiaofeng W, Xin Z, Min X, Xiaofeng M, Junfeng Z (2006) Keyword search on XML streams. J Comput Res Develop 23(03):484–489Google Scholar
  9. 9.
    Dean J, and Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI’04: Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation (Berkeley, CA, USA), USENIX Association. 10-10Google Scholar
  10. 10.
    Hadoop 0.21.0 api documentation. (2011)
  11. 11.
    Cloudera, Hadoop training and support. (2011)
  12. 12.
    Al-Khalifa S, Jagadish HV, Patel JM, Wu Y, Koudas N, and Srivastava D (2002) Structural joins: A primitive for efficient XML query pattern matching. In: Proc. of ICDE Conference, pp 141–152Google Scholar
  13. 13.
    Bhalotia G, Nakhey C, Hulgeri A, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases Using BANKS. In: Proceedings of ICDE. California: IEEE Computer Press, pp 431–440Google Scholar
  14. 14.
    Goldman R, Shivakumar N, Venkatasubramanian S, Garicia-Molina H (1998) Proximity search in databases. In: Proceedings of VLDB. New York:Morgan Kaufmann, pp 26–37Google Scholar
  15. 15.
    Hristidis V, Papakonstantinouand Y, Balmin A (2003) Keyword proximity search on XML graphs. In: Proceedings of ICDE. Bangalore: IEEE Computer Press, pp 367–378Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.College of Computer Science and Information EngineeringZhejiang Gongshang UniversityHangzhouChina

Personalised recommendations