Skip to main content
Log in

Processing keyword search on XML: a survey

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Keyword search is a user-friendly approach for users to retrieve information from XML data. Since an XML document can have a large size and contain a lot of information, an XML keyword search result should be a fragment of an XML document dynamically constructed at query time, which is achievable due to the structuredness of XML. Processing keyword searches on XML has several challenges, e.g., what are the elements in the XML document that are relevant to the query? How to generate the results efficiently and rank the results meaningfully? How to present the results to the user in a way such that the user can quickly find the desired information? In this survey, we review the papers in the literature that attempted to address these problems. We divide the existing approaches into several classes based on the problem they tackled, and perform a comprehensive analysis of these works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ashoori, E., Lalmas, M.: Using topic shifts for focussed access to XML repositories. In: ECIR, pp. 444–455 (2007)

  2. Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective XML keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)

  3. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, pp. 431–440 (2002)

  4. Blanke, T., Lalmas, M.: Specificity aboutness in XML retrieval. In: ICTIR, pp. 176–187 (2009)

  5. Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML schema to relations: a cost-based approach to XML storage. In: ICDE, pp. 64–75 (2002)

  6. Braga, D., Campi, A.: XQBE: a graphical environment to query XML data. World Wide Web 8(3), 287–316 (2005)

    Article  Google Scholar 

  7. Chen, L.J., Papakonstantinou, Y.: Supporting top-K keyword search in XML databases. In: ICDE, pp. 689–700 (2010)

  8. Chen, Y., Mihaila, G.A., Bordawekar, R., Padmanabhan, S.: L-Tree: a dynamic labeling structure for ordered XML data. In: EDBT Workshops, pp. 209–218 (2004)

  9. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: VLDB, pp. 45–56 (2003)

  10. Dalvi, B.B., Kshirsagar, M., Sudarshan, S.: Keyword search on external memory data graphs. Proc. VLDB Endow. 1(1), 1189–1204 (2008)

    Google Scholar 

  11. Deutsch, A., Fernández, M.F., Suciu, D.: Storing semistructured data with STORED. In: SIGMOD Conference, pp. 431–442 (1999)

  12. Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)

  13. Golenberg, K., Kimelfeld, B., Sagiv, Y.: Keyword proximity search in complex data graphs. In: SIGMOD Conference, pp. 927–940 (2008)

  14. Gövert, N., Fuhr, N., Lalmas, M., Kazai, G.: Evaluating the effectiveness of content-oriented XML retrieval methods. Inf. Retr. 9(6), 699–722 (2006)

    Article  Google Scholar 

  15. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD Conference, pp. 16–27 (2003)

  16. Hansen, P., Roberts, F.S.: An impossibility result in axiomatic location theory. Math. Oper. Res. 21(1), 195–208 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  17. He, H., Wang, H., Yang, J., Yu, P.S.: BLINKS: ranked keyword searches on graphs. In: SIGMOD Conference, pp. 305–316 (2007)

  18. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Eng. 18(4), 525–539 (2006)

    Article  Google Scholar 

  19. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: ICDE, pp. 367–378 (2003)

  20. Huang, Y., Liu, Z., Chen, Y.: eXtract: a snippet generation system for XML search. PVLDB 1(2), 1392–1395 (2008)

    Google Scholar 

  21. Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in XML search. In: SIGMOD Conference, pp. 315–326 (2008)

  22. INEX: INitiative for the Evaluation of XML Retrieval

  23. Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: VLDB, pp. 505–516 (2005)

  24. Kazai, G., Lalmas, M.: eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Trans. Inf. Syst. 24(4), 503–542 (2006)

    Article  Google Scholar 

  25. Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)

  26. Kleinberg, J.M.: An impossibility theorem for clustering. In: NIPS, pp. 446–453 (2002)

  27. Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Précis: the essence of a query answer. In: ICDE, pp. 69–78 (2006)

  28. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable LCAs over XML documents. In: CIKM, pp. 31–40 (2007)

  29. Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: SIGMOD Conference, pp. 903–914 (2008)

  30. Li, J., Liu, C., Zhou, R., Wang, W.: Suggestion of promising result types for XML keyword search. In: EDBT, pp. 561–572 (2010)

  31. Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In: VLDB, pp. 72–83 (2004)

  32. Liu, Z., Cai, Y., Chen, Y.: TargetSearch: a ranking friendly XML keyword search engine. In: ICDE, pp. 1101–1104 (2010)

  33. Liu, Z., Chen, Y.: Identifying meaningful return information for XML keyword search. In: SIGMOD Conference, pp. 329–340 (2007)

  34. Liu, Z., Chen, Y.: Answering keyword queries on XML using materialized views. In: ICDE, pp. 1501–1503 (2008)

  35. Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for XML keyword search. PVLDB 1(1), 921–932 (2008)

    Google Scholar 

  36. Liu, Z., Chen, Y.: Return specification inference and result clustering for keyword search on XML. ACM Trans. Database Syst. 35(2), 10:1–10:47 (2010)

    Google Scholar 

  37. Liu, Z., Huang, Y., Chen, Y.: Improving XML search by generating and utilizing informative result snippets. ACM Trans. Database Syst. 35(3), 19:1–19:45 (2010)

    Article  MathSciNet  Google Scholar 

  38. Liu, Z., Walker, J., Chen, Y.: XSeek: a semantic XML search engine using keywords. In: VLDB, pp. 1330–1333 (2007)

  39. Manolescu, I., Florescu, D., Kossmann, D., Xhumari, F., Olteanu, D.: Agora: living with XML and relational. In: VLDB, pp. 623–626 (2000)

  40. Ning, X., Jin, H., Jia, W., Yuan, P.: Practical and effective IR-style keyword search over semantic web. Inf. Process. Manag. 45(2), 263–271 (2009)

    Article  Google Scholar 

  41. O’Neil, P.E., O’Neil, E.J., Pal, S., Cseri, I., Schaller, G., Westbury, N.: ORDPATHs: insert-friendly XML node labels. In: SIGMOD Conference, pp. 903–908 (2004)

  42. Osborne, M.J., Rubinstein, A.: A Course in Game Theory. MIT Press (1994)

  43. Pennock, D.M., Horvitz, E., Giles, C.L.: Social choice theory and recommender systems: analysis of the axiomatic foundations of collaborative filtering. In: AAAI/IAAI, pp. 729–734 (2000)

  44. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill (1983)

  45. Schenkel, R., Theobald, M.: Structural feedback for keyword-based XML retrieval. In: ECIR, pp. 326–337 (2006)

  46. Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML documents made easy: nearest concept queries. In: ICDE, pp. 321–329 (2001)

  47. Schmidt, A., Kersten, M.L., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of XML documents. In: WebDB (Selected Papers), pp. 137–150 (2000)

  48. Sun, C., Chan, C.Y., Goenka, A.K.: Multiway SLCA-based keyword search in XML data. In: WWW, pp. 1043–1052 (2007)

  49. Tran, T., Cimiano, P., Rudolph, S., Studer, R.: Ontology-based interpretation of keywords for semantic search. In: ISWC/ASWC, pp. 523–536 (2007)

  50. Tran, T., Wang, H., Rudolph, S., Cimiano, P.: Top-k exploration of query candidates for efficient keyword search on graph-shaped (RDF) data. In: ICDE, pp. 405–416 (2009)

  51. Wu, X., Lee, M.-L., Hsu, W.: A prime number labeling scheme for dynamic ordered XML trees. In: ICDE, pp. 66–78 (2004)

  52. Xu, L., Ling, T.W., Wu, H., Bao, Z.: DDE: from Dewey to a fully dynamic XML labeling scheme. In: SIGMOD Conference, pp. 719–730 (2009)

  53. Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. In: SIGMOD Conference, pp. 537–538 (2005)

  54. Xu, Y., Papakonstantinou, Y.: Efficient LCA based keyword search in XML data. In: EDBT, pp. 535–546 (2008)

  55. Yu, J.X., Luo, D., Meng, X., Lu, H.: Dynamically updating XML data: numbering scheme revisited. World Wide Web 8(1), 5–26 (2005)

    Article  Google Scholar 

  56. Zheng, S., Zhou, A., Zhang, L., Lu, H.: DVQ: towards visual query processing of XML database systems. World Wide Web 6(2), 233–253 (2003)

    Article  Google Scholar 

  57. Zhou, Q., Wang, C., Xiong, M., Wang, H., Yu, Y.: Spark: adapting keyword query to semantic search. In: ISWC/ASWC, pp. 694–707 (2007)

  58. Zhou, R., Liu, C., Li, J.: Fast ELCA computation for keyword queries on XML data. In: EDBT, pp. 549–560 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziyang Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Chen, Y. Processing keyword search on XML: a survey. World Wide Web 14, 671–707 (2011). https://doi.org/10.1007/s11280-011-0128-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-011-0128-2

Keywords

Navigation