The VLDB Journal

, Volume 24, Issue 3, pp 441–465 | Cite as

Reasoning with patterns to effectively answer XML keyword queries

  • Cem Aksoy
  • Aggeliki Dimitriou
  • Dimitri Theodoratos
Regular Paper


Keyword search is a popular technique for searching tree-structured data on the Web because it frees the user from knowing a complex query language and the structure of the data sources. However, the imprecision of the keyword queries usually results in a very large number of results of which only a few are relevant to the query. Multiple previous approaches have tried to address this problem. They exploit the structural properties of the tree data in order to filter out irrelevant results. This is not an easy task though, and in the general case, these approaches show low precision and/or recall and low quality of result ranking. In this paper, we argue that exploiting the structural relationships of the query matches locally in the data tree is not sufficient and a global analysis of the keyword matches in the data tree is necessary in order to assign meaningful semantics to keyword queries. We present an original approach for answering keyword queries which extracts structural patterns of the query matches and reasons with them in order to return meaningful results ranked with respect to their relevance to the query. Comparisons between patterns are realized based on different types of homomorphisms between patterns. As the number of patterns is typically much smaller than that of the of query matches, this global reasoning is feasible. We design an efficient stack-based algorithm for evaluating keyword queries on tree-structured data, and we also devise a heuristic extension which further improves its performance. We run comprehensive experiments on different datasets to evaluate the efficiency of the algorithms and the effectiveness of our ranking and filtering semantics. The experimental results show that our approach produces results of higher quality compared to previous ones and our algorithms are fast and scale well with respect to the input and output size.


XML keyword search Keyword query semantics Patterns Ranking 

Supplementary material

778_2015_384_MOESM1_ESM.pdf (3.6 mb)
Supplementary material 1 (pdf 3682 KB)


  1. 1.
    Aksoy, C., Dimitriou, A., Theodoratos, D., Wu, X.: XReason: a semantic approach that reasons with patterns to answer XML keyword queries. In: DASFAA, pp. 299–314 (2013)Google Scholar
  2. 2.
    Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press/Addison-Wesley, New York (1999)Google Scholar
  3. 3.
    Bao, Z., Ling, T.W., Chen, B., Lu. J.: Effective XML keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)Google Scholar
  4. 4.
    Bao, Z., Lu, J., Ling, T.W., Chen, B.: Towards an effective XML keyword search. IEEE Trans. Knowl. Data Eng. 22(8), 1077–1092 (2010)CrossRefGoogle Scholar
  5. 5.
    Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp. 431–440 (2002)Google Scholar
  6. 6.
    Botev, C., Shanmugasundaram, J.: Context-sensitive keyword search and ranking for XML. In: WebDB, pp. 115–120 (2005)Google Scholar
  7. 7.
    Chen, L.J., Papakonstantinou, Y.: Supporting top-K keyword search in XML databases. In: ICDE, pp. 689–700 (2010)Google Scholar
  8. 8.
    Clough, P., Sanderson, M.: Evaluating the performance of information retrieval systems using test collections. Inf. Res. 18(2) (2013).
  9. 9.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: VLDB, pp. 45–56 (2003)Google Scholar
  10. 10.
    Dimitriou, A., Theodoratos, D.: Efficient keyword search on large tree structured datasets. In: KEYS, pp. 63–74 (2012)Google Scholar
  11. 11.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD, pp. 16–27 (2003)Google Scholar
  12. 12.
    Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Eng. 18(4), 525–539 (2006)CrossRefGoogle Scholar
  13. 13.
    Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB, pp. 670–681 (2002)Google Scholar
  14. 14.
    Kong, L., Gilleron, R., Mostrare, A.L.: Retrieving meaningful relaxed tightest fragments for XML keyword search. In: EDBT, pp. 815–826 (2009)Google Scholar
  15. 15.
    Lee, K.-H., Whang, K.-Y., Han, W.-S., Kim, M.-S.: Structural consistency: enabling XML keyword search to eliminate spurious results consistently. VLDB J. 19(4), 503–529 (2010)CrossRefGoogle Scholar
  16. 16.
    Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable LCAs over XML documents. In: CIKM, pp. 31–40 (2007)Google Scholar
  17. 17.
    Li, G., Li, C., Feng, J., Zhou, L.: SAIL: structure-aware indexing for effective and progressive top-k keyword search over XML documents. Inf. Sci. 179(21), 3745–3762 (2009)CrossRefGoogle Scholar
  18. 18.
    Li, J., Liu, C., Zhou, R., Wang, W.: Suggestion of promising result types for XML keyword search. In: EDBT, pp. 561–572 (2010)Google Scholar
  19. 19.
    Li, J., Wang, J.: XQSuggest: an interactive XML keyword search system. In: DEXA, pp. 340–347 (2009)Google Scholar
  20. 20.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In VLDB, pp. 72–83 (2004)Google Scholar
  21. 21.
    Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD, pp. 563–574 (2006)Google Scholar
  22. 22.
    Liu, X., Wan, C., Chen, L.: Returning clustered results for keyword search on XML documents. IEEE Trans. Knowl. Data Eng. 23(12), 1811–1825 (2011)CrossRefGoogle Scholar
  23. 23.
    Liu, Z., Chen, Y.: Identifying meaningful return information for XML keyword search. In: SIGMOD, pp. 329–340 (2007)Google Scholar
  24. 24.
    Liu, Z., Chen, Y.: Answering keyword queries on XML using materialized views. In: ICDE, pp. 1501–1503 (2008)Google Scholar
  25. 25.
    Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for XML keyword search. PVLDB 1(1), 921–932 (2008)Google Scholar
  26. 26.
    Liu, Z., Chen, Y.: Return specification inference and result clustering for keyword search on XML. ACM Trans. Database Syst. 35(2), 10:1–10:47 (2010)Google Scholar
  27. 27.
    Liu, Z., Chen, Y.: Processing keyword search on XML: a survey. World Wide Web 14(5–6), 671–707 (2011)CrossRefGoogle Scholar
  28. 28.
    Lu, Y., Wang, W., Li, J., Liu, C.: XClean: providing valid spelling suggestions for XML keyword queries. In: ICDE, pp. 661–672 (2011)Google Scholar
  29. 29.
    Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: top-k keyword query in relational databases. In: SIGMOD, pp. 115–126 (2007)Google Scholar
  30. 30.
    Nguyen, K., Cao, J.: Top-k answers for XML keyword queries. World Wide Web 15(5–6), 485–515 (2012)CrossRefGoogle Scholar
  31. 31.
    Pu, K.Q., Yu, X.: Keyword query cleaning. PVLDB 1(1), 909–920 (2008)MathSciNetGoogle Scholar
  32. 32.
    Raghavan, V., Bollmann, P., Jung, G.S.: A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst. 7(3), 205–229 (1989)CrossRefGoogle Scholar
  33. 33.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRefGoogle Scholar
  34. 34.
    Schmidt, A., Kersten, M., Windhouwer, M.: Querying XML documents made easy: nearest concept queries. In: ICDE, pp. 321–329 (2001)Google Scholar
  35. 35.
    Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual XML views. VLDB J. 18(2), 543–570 (2009)CrossRefGoogle Scholar
  36. 36.
    Sun, C., Chan, C.Y., Goenka, A.K.: Multiway SLCA-based keyword search in XML data. In: WWW, pp. 1043–1052 (2007)Google Scholar
  37. 37.
    Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered XML using a relational database system. In: SIGMOD, pp. 204–215 (2002)Google Scholar
  38. 38.
    Termehchy, A., Winslett, M.: Using structural information in XML keyword search effectively. ACM Trans. Database Syst. 36(1), 4 (2011)CrossRefGoogle Scholar
  39. 39.
    Theodoratos, D., Wu, X.: An original semantics to keyword queries for XML using structural patterns. In: DASFAA, pp. 727–739 (2007)Google Scholar
  40. 40.
    Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. In: SIGMOD, pp. 537–538 (2005)Google Scholar
  41. 41.
    Xu, Y., Papakonstantinou, Y.: Efficient LCA based keyword search in XML data. In: EDBT, pp. 535–546 (2008)Google Scholar
  42. 42.
    Zhou, J., Bao, Z., Wang, W., Ling, T.W., Chen, Z., Lin, X., Guo, J.: Fast SLCA and ELCA computation for XML keyword queries based on set intersection. In: ICDE, pp. 905–916 (2012)Google Scholar
  43. 43.
    Zhou, R., Liu, C., Li, J.: Fast ELCA computation for keyword queries on XML data. In: EDBT, pp. 549–560 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Cem Aksoy
    • 1
  • Aggeliki Dimitriou
    • 2
  • Dimitri Theodoratos
    • 1
  1. 1.New Jersey Institute of TechnologyNewarkUSA
  2. 2.National Technical University of AthensAthensGreece

Personalised recommendations