The VLDB Journal

, Volume 23, Issue 1, pp 25–50 | Cite as

Efficient query processing for XML keyword queries based on the IDList index

  • Junfeng ZhouEmail author
  • Zhifeng Bao
  • Wei Wang
  • Jinjia Zhao
  • Xiaofeng Meng
Regular Paper


Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword \(k\) consists of ordered nodes that directly or indirectly contain \(k\). We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes. We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases.


XML Keyword Query Processing LCA SLCA ELCA 



This research was partially supported by the grants from the Natural Science Foundation of China (No. 61073060, 60833005, 61070055, 91024032, 91124001), the National Science and Technology Major Project (No. 2010-ZX01042-002-003), the Fundamental Research Funds for the Central Univ., the Research Funds of Renmin Univ. (No. 11XNL010, 10XNI018), and the Research Funds from Education Department of Hebei Province (No. Y2012014). Zhifeng Bao’s research is carried out at the SeSaMe Centre. It is supported by the Singapore NRF under its IRC@SG Funding Initiative and administered by the IDMPO. Wei Wang was partially supported by ARC DP130103401 and DP130103405.

Supplementary material

778_2013_313_MOESM1_ESM.pdf (422 kb)
Supplementary material 1 (pdf 422 KB)


  1. 1.
    Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)Google Scholar
  2. 2.
    Barbay, J., Lpez-Ortiz, A., Lu, T.: Faster adaptive set intersections for text searching. In: WEA, pp. 146–157 (2006)Google Scholar
  3. 3.
    Bentley, J.L., Yao, A.C.-C.: An almost optimal algorithm for unbounded searching. Inf. Process. Lett. 5(3), 82–87 (1976)Google Scholar
  4. 4.
    Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE, pp. 689–700 (2010)Google Scholar
  5. 5.
    Chen, Y., Wang, W., Liu, Z.: Keyword-based search and exploration on databases. In: ICDE, pp.1380–1383 (2011)Google Scholar
  6. 6.
    Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword search on structured and semi-structured data. In: SIGMOD Conference, pp. 1005–1010 (2009)Google Scholar
  7. 7.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: a semantic search engine for xml. In: VLDB, pp. 45–56 (2003)Google Scholar
  8. 8.
    Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: SODA, pp. 743–752 (2000)Google Scholar
  9. 9.
    Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Experiments on adaptive set intersections for text retrieval systems. In: ALENEX, pp. 91–104 (2001)Google Scholar
  10. 10.
    Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011)Google Scholar
  11. 11.
    Fisher, D.K., Lam, F., Shui, W.M., Wong, R.K.: Efficient ordering for xml data. In: CIKM, pp. 350–357 (2003)Google Scholar
  12. 12.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD Conference, pp. 16–27 (2003)Google Scholar
  13. 13.
    Kong, L., Gilleron, R., Lemay, A.: Retrieving meaningful relaxed tightest fragments for xml keyword search. In: EDBT, pp. 815–826 (2009)Google Scholar
  14. 14.
    Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: CIKM, pp. 31–40 (2007)Google Scholar
  15. 15.
    Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD Conference, pp. 695–706 (2009)Google Scholar
  16. 16.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–83 (2004)Google Scholar
  17. 17.
    Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: SIGMOD Conference, pp. 329–340 (2007)Google Scholar
  18. 18.
    Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search. PVLDB 1(1), 921–932 (2008)Google Scholar
  19. 19.
    Liu, Z., Chen, Y.: Processing keyword search on xml: a survey. World Wide Web 14(5–6), 671–707 (2011)CrossRefGoogle Scholar
  20. 20.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  21. 21.
    Raman, V., Qiao, L., Han, W., Narang, I., Chen, Y.-L., Yang, K.-H., Ling, F.-L.: Lazy, adaptive rid-list intersection, and its application to index anding. In: SIGMOD Conference, pp. 773–784 (2007)Google Scholar
  22. 22.
    Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: WWW, pp. 1043–1052 (2007)Google Scholar
  23. 23.
    Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered xml using a relational database system. In: SIGMOD Conference, pp. 204–215 (2002)Google Scholar
  24. 24.
    Tsirogiannis, D., Guha, S., Koudas, N.: Improving the performance of list intersection. PVLDB 2(1), 838–849 (2009)Google Scholar
  25. 25.
    Wang, W., Wang, X., Zhou A.: Hash-search: an efficient slca-based keyword search algorithm on xml documents. In: DASFAA, pp. 496–510 (2009)Google Scholar
  26. 26.
    Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD Conference, pp. 537–538 (2005)Google Scholar
  27. 27.
    Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: EDBT, pp. 535–546 (2008)Google Scholar
  28. 28.
    Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW, pp. 401–410 (2009)Google Scholar
  29. 29.
    Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: SIGMOD Conference, pp. 425–436 (2001)Google Scholar
  30. 30.
    Zhou, J., Bao, Z., Wang, W., Ling, T.W., Chen, Z., Lin, X., Guo, J.: Fast slca and elca computation for xml keyword queries based on set intersection. In: ICDE, pp. 905–916 (2012)Google Scholar
  31. 31.
    Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: EDBT, pp. 549–560 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Junfeng Zhou
    • 1
    Email author
  • Zhifeng Bao
    • 2
  • Wei Wang
    • 3
  • Jinjia Zhao
    • 1
  • Xiaofeng Meng
    • 4
  1. 1.The Key Laboratory for Computer Virtual Technology and System Integration of HeBei Province, School of Information Science and EngineeringYanshan UniversityQinhuangdaoChina
  2. 2.Interactive Digital Media InstituteSingaporeSingapore
  3. 3.The University of New South WalesKensington, NSWAustralia
  4. 4.Renmin University of ChinaBeijingChina

Personalised recommendations