The VLDB Journal

, Volume 24, Issue 4, pp 493–518 | Cite as

A general framework to resolve the MisMatch problem in XML keyword search

Regular Paper

Abstract

When users issue a query to a database, they have expectations about the results. If what they search for is unavailable in the database, the system will return an empty result or, worse, erroneous mismatch results. We call this problem the MisMatch problem. In this paper, we solve the MisMatch problem in the context of XML keyword search. Our solution is based on two novel concepts that we introduce: target node type and Distinguishability. Target Node Type represents the type of node a query result intends to match, and Distinguishability is used to measure the importance of the query keywords. Using these concepts, we develop a low-cost post-processing algorithm on the results of query evaluation to detect the MisMatch problem and generate helpful suggestions to users. Our approach has three noteworthy features: (1) for queries with the MisMatch problem, it generates the explanation, suggested queries and their sample results as the output to users, helping users judge whether the MisMatch problem is solved without reading all query results; (2) it is portable as it can work with any lowest common ancestor-based matching semantics (for XML data without ID references) or minimal Steiner tree-based matching semantics (for XML data with ID references) which return tree structures as results. It is orthogonal to the choice of result retrieval method adopted; (3) it is lightweight in the way that it occupies a very small proportion of the whole query evaluation time. Extensive experiments on three real datasets verify the effectiveness, efficiency and scalability of our approach. A search engine called XClear has been built and is available at http://xclear.comp.nus.edu.sg.

Keywords

XML Keyword search MisMatch problem 

References

  1. 1.
    Berkeley, D.B.: http://www.sleepycat.com
  2. 2.
    Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: ICDE (2009)Google Scholar
  3. 3.
    Bao, Z., Lu, J., Ling, T.W., Chen, B.: Towards an effective xml keyword search. IEEE Trans. Knowl. Data Eng. 22(8), 1077–1092 (2010)CrossRefGoogle Scholar
  4. 4.
    Bao, Z., Lu, J., Ling, T.W., Xu, L., Wu, H.: An effective object-level xml keyword search. In: DASFAA (2010)Google Scholar
  5. 5.
    Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE (2002)Google Scholar
  6. 6.
    Chapman, A., Jagadish, H.V.: Why not? In: SIGMOD (2009)Google Scholar
  7. 7.
    Coffman, J., Weaver, A.C.: An empirical performance evaluation of relational keyword search techniques. IEEE Trans. Knowl. Data Eng. 26(1), 30–42 (2014)Google Scholar
  8. 8.
    Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE (2007)Google Scholar
  9. 9.
    Dreyfus, S.E., Wagner, R.A.: The Steiner problem in graphs. In: Networks (1971)Google Scholar
  10. 10.
    Drosou, M., Pitoura, E.: Ymaldb: exploring relational databases via result-driven recommendations. VLDB J. 22(6), 849–874 (2013)CrossRefGoogle Scholar
  11. 11.
    Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases. In: VLDB (1997)Google Scholar
  12. 12.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: ranked keyword search over xml documents. In: SIGMOD (2003)Google Scholar
  13. 13.
    Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: ICDE (2008)Google Scholar
  14. 14.
    He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: ranked keyword searches on graphs. In: SIGMOD (2007)Google Scholar
  15. 15.
    Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in xml trees. IEEE Trans. Knowl. Data Eng. 18(4), 525–539 (2006)Google Scholar
  16. 16.
    Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on xml graphs. In: ICDE (2003)Google Scholar
  17. 17.
    Huang, J., Chen, T., Doan, A., Naughton, J.F.: On the provenance of non-answers to queries over extracted data. PVLDB 1(1), 736–747 (2008)Google Scholar
  18. 18.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)Google Scholar
  19. 19.
    Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: WWW (2006)Google Scholar
  20. 20.
    Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: VLDB (2005)Google Scholar
  21. 21.
    Kasneci, G., Ramanath, M., Sozio, M., Suchanek, F.M., Weikum, G.: Star: Steiner-tree approximation in relationship graphs. In: ICDE (2009)Google Scholar
  22. 22.
    Lee, K.H., Whang, K.Y., Han, W.S., Kim, M.S.: Structural consistency: enabling xml keyword search to eliminate spurious results consistently. VLDB J. 19(4), 503–529 (2010)Google Scholar
  23. 23.
    Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)Google Scholar
  24. 24.
    Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: CIKM (2007)Google Scholar
  25. 25.
    Li, G., Li, C., Feng, J., Zhou, L.: Sail: structure-aware indexing for effective and progressive top-k keyword search over xml documents. Inf. Sci. 179(21), 3745–3762 (2009)Google Scholar
  26. 26.
    Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: SIGMOD (2007)Google Scholar
  27. 27.
    Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search. PVLDB 1(1), 921–932 (2008)Google Scholar
  28. 28.
    Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation. PVLDB 2(1), 313–324 (2009)Google Scholar
  29. 29.
    Muslea, I.: Machine learning for online query relaxation. In: KDD (2004)Google Scholar
  30. 30.
    Muslea, I., Lee, T.J.: Online query relaxation via bayesian causal structures discovery. In: AAAI (2005)Google Scholar
  31. 31.
    Nambiar, U., Kambhampati, S.: Answering imprecise queries over autonomous web databases. In: ICDE (2006)Google Scholar
  32. 32.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)Google Scholar
  33. 33.
    Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying xml documents made easy: nearest concept queries. In: ICDE (2001)Google Scholar
  34. 34.
    Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: WWW (2007)Google Scholar
  35. 35.
    Tao, Y., Papadopoulos, S., Sheng, C., Stefanidis, K., Stefanidis, K.: Nearest keyword search in xml documents. In: SIGMOD (2011)Google Scholar
  36. 36.
    Termehchy, A., Winslett, M.: Using structural information in xml keyword search effectively. ACM Trans. Database Syst. 36(1), 4 (2011)Google Scholar
  37. 37.
  38. 38.
    Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD (2005)Google Scholar
  39. 39.
    Zeng, Y., Bao, Z., Ling, T.W., Jagadish, H.V., Li, G.: Breaking out of the mismatch trap. In: ICDE (2014)Google Scholar
  40. 40.
    Zeng, Y., Bao, Z., Ling, T.W., Li, G.: Efficient xml keyword search: from graph model to tree model. In: DEXA (2013)Google Scholar
  41. 41.
    Zeng, Y., Bao, Z., Ling, T.W., Li, G.: Removing the mismatch headache in xml keyword search. In: SIGIR (2013, demo paper. http://xclear.comp.nus.edu.sg)
  42. 42.
    Zhang, W.V., He, X., Rey, B., Jones, R.: Query rewriting using active learning for sponsored search. In: SIGIR (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.School of Computer Science and Information TechnologyRMIT UniversityMelbourneAustralia
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore
  3. 3.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  4. 4.Electrical Engineering and Computer ScienceUniversity of MichiganAnn ArborUSA

Personalised recommendations