The VLDB Journal

, Volume 20, Issue 1, pp 1–19 | Cite as

Providing built-in keyword search capabilities in RDBMS

  • Guoliang Li
  • Jianhua Feng
  • Xiaofang Zhou
  • Jianyong Wang
Regular Paper

Abstract

A common approach to performing keyword search over relational databases is to find the minimum Steiner trees in database graphs transformed from relational data. These methods, however, are rather expensive as the minimum Steiner tree problem is known to be NP-hard. Further, these methods are independent of the underlying relational database management system (RDBMS), thus cannot benefit from the capabilities of the RDBMS. As an alternative, in this paper we propose a new concept called Compact Steiner Tree (CSTree), which can be used to approximate the Steiner tree problem for answering top-k keyword queries efficiently. We propose a novel structure-aware index, together with an effective ranking mechanism for fast, progressive and accurate retrieval of top-k highest ranked CSTrees. The proposed techniques can be implemented using a standard relational RDBMS to benefit from its indexing and query-processing capability. We have implemented our techniques in MYSQL, which can provide built-in keyword-search capabilities using SQL. The experimental results show a significant improvement in both search efficiency and result quality comparing to existing state-of-the-art approaches.

Keywords

Keyword search Relational databases Steiner Tree Compact Steiner tree Approximate algorithms Structure-aware index Progressive search 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)Google Scholar
  2. 2.
    Amer-Yahia S., Hiemstra D., Roelleke T., Srivastava D., Weikum G.: Db&ir integration: report on the dagstuhl seminar ranked xml querying. SIGMOD Rec. 37(3), 46–49 (2008)CrossRefGoogle Scholar
  3. 3.
    Arai B., Das G., Gunopulos D., Koudas N.: Anytime measures for top-algorithms on exact and fuzzy data sets. VLDB J. 18(2), 407–427 (2009)CrossRefGoogle Scholar
  4. 4.
    Aurenhammer F.: Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput. Surv. 23(3), 345–405 (1991)CrossRefGoogle Scholar
  5. 5.
    Balmin, A., Hristidis, V., Papakonstantinou, Y.: Objectrank: authority-based keyword search in databases. In: VLDB, pp. 564–575 (2004)Google Scholar
  6. 6.
    Bao, Z., Ling, T. W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)Google Scholar
  7. 7.
    Bast, H., Weber, I.: The completesearch engine: interactive, efficient, and towards ir& db integration. In: CIDR, pp. 88–95 (2007)Google Scholar
  8. 8.
    Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp. 431–440 (2002)Google Scholar
  9. 9.
    Brin S., Page L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)Google Scholar
  10. 10.
    Chakrabarti, S.: Dynamic personalized pagerank in entity-relation graphs. In: WWW, pp. 571–580 (2007)Google Scholar
  11. 11.
    Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword search on structured and semi-structured data. In: SIGMOD Conference, pp. 1005–1010 (2009)Google Scholar
  12. 12.
    Chu, E., Baid, A., Chai, X., Doan, A., Naughton, J.F.: Combining keyword search and forms for ad hoc querying of databases. In: SIGMOD Conference, pp. 349–360 (2009)Google Scholar
  13. 13.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: a semantic search engine for xml. In: VLDB, pp. 45–56 (2003)Google Scholar
  14. 14.
    Dalvi, B.B., Kshirsagar, M., Sudarshan, S.: Keyword search on external memory data graphs. In: VLDB, pp. 1189–1204 (2008)Google Scholar
  15. 15.
    Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)Google Scholar
  16. 16.
    Fagin, R.: Fuzzy queries in multimedia database systems. In: PODS, pp. 1–10 (1998)Google Scholar
  17. 17.
    Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)Google Scholar
  18. 18.
    Feng, J., Li, G., Wang, J., Zhou, L.: Finding and ranking compact connected trees for effective keyword proximity search in xml documents. Inform. Syst. (2009)Google Scholar
  19. 19.
    Fredman M.L., Tarjan R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Garey M.R., Johnson D.S.: The rectilinear steiner tree problem in np complete. SIAM J. Appl. Math. 32, 826–834 (1977)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Garg N., Konjevod G., Ravi R.: A polylogarithmic approximation algorithm for the group steiner tree problem. J. Algorithms 37(1), 66–84 (2000)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Golenberg, K., Kimelfeld, B., Sagiv, Y.: Keyword proximity search in complex data graphs. In: SIGMOD Conference, pp. 927–940 (2008)Google Scholar
  23. 23.
    Guo, L., Shanmugasundaram, J., Yona, G.: Topology search over biological databases. In: ICDE, pp. 556–565 (2007)Google Scholar
  24. 24.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD Conference, pp. 16–27 (2003)Google Scholar
  25. 25.
    He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: ranked keyword searches on graphs. In: SIGMOD Conference, pp. 305–316 (2007)Google Scholar
  26. 26.
    Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)Google Scholar
  27. 27.
    Hristidis V., Koudas N., Papakonstantinou Y., Srivastava D.: Keyword proximity search in xml trees. IEEE TKDE 18(4), 525–539 (2006)Google Scholar
  28. 28.
    Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB, pp. 670–681 (2002)Google Scholar
  29. 29.
    Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on xml graphs. In: ICDE, pp. 367–378 (2003)Google Scholar
  30. 30.
    Hua M., Pei J., Fu A. W.-C., Lin X., Leung H.-F.: Top-k typicality queries and efficient query answering methods on large databases. VLDB J. 18(3), 809–835 (2009)CrossRefGoogle Scholar
  31. 31.
    Ilyas I.F., Aref W.G., Elmagarmid A.K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)CrossRefGoogle Scholar
  32. 32.
    Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In: WWW, pp. 371–380 (2009)Google Scholar
  33. 33.
    Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: VLDB, pp. 505–516 (2005)Google Scholar
  34. 34.
    Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)Google Scholar
  35. 35.
    Kleinberg J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Kong, L., Gilleron, R., Lemay, A.: Retrieving meaningful relaxed tightest fragments for xml keyword search. In: EDBT, pp. 815–826 (2009)Google Scholar
  37. 37.
    Koutrika, G., Zadeh, Z.M., Garcia-Molina, H.: Data clouds: summarizing keyword search results over structured data. In: EDBT, pp. 391–402 (2009)Google Scholar
  38. 38.
    Lempel R., Moran S.: Salsa: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19(2), 131–160 (2001)CrossRefGoogle Scholar
  39. 39.
    Li, G., Feng, J., Wang, J., Song, X., Zhou, L.: Sailer: an effective search engine for unified retrieval of heterogeneous xml and web documents. In: WWW, pp. 1061–1062 (2008)Google Scholar
  40. 40.
    Li, G., Feng, J., Wang, J., Yu, B., He, Y.: Race: finding and ranking compact connected trees for keyword proximity search over xml documents. In: WWW, pp. 1045–1046 (2008)Google Scholar
  41. 41.
    Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: CIKM, pp. 31–40 (2007)Google Scholar
  42. 42.
    Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD Conference, pp. 695–706 (2009)Google Scholar
  43. 43.
    Li, G., Li, C., Feng, J., Zhou, L.: Sail: Structure-aware indexing for effective and progressive top-k keyword search over xml documents. Inform. Sci. (2009)Google Scholar
  44. 44.
    Li, G., Ooi, B. C., Feng, J., Wang, J., Zhou, L.: Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: SIGMOD Conference, pp. 903–914 (2008)Google Scholar
  45. 45.
    Li, G., Zhou, X., Feng, J., Wang, J.: Progressive keyword search in relational databases. In: ICDE (2009)Google Scholar
  46. 46.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–83 (2004)Google Scholar
  47. 47.
    Liu, F., Yu, C. T., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD Conference, pp. 563–574 (2006)Google Scholar
  48. 48.
    Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: SIGMOD Conference, pp. 329–340 (2007)Google Scholar
  49. 49.
    Liu Z., Chen Y.: Reasoning and identifying relevant matches for xml keyword search. PVLDB 1 1, 921–932 (2008)Google Scholar
  50. 50.
    Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: top-k keyword query in relational databases. In: SIGMOD Conference, pp. 115–126 (2007)Google Scholar
  51. 51.
    Markowetz, A., Yang, Y., Papadias, D.: Keyword search on relational data streams. In: SIGMOD Conference, pp. 605–616 (2007)Google Scholar
  52. 52.
    Qin, L., Yu, J. X., Chang, L.: Keyword search in databases: the power of rdbms. In: SIGMOD Conference, pp. 681–694 (2009)Google Scholar
  53. 53.
    Richardson, M., Domingos,P.: The intelligent surfer: probabilistic combination of link and content information in pagerank. In: NIPS, pp. 1441–1448 (2001)Google Scholar
  54. 54.
    Robins, G., Zelikovsky, A.: Improved steiner tree approximation in graphs. In: SODA, pp. 770–779, (2000)Google Scholar
  55. 55.
    Sayyadian, M., LeKhac, H., Doan, A., Gravano, L.: Efficient keyword search across heterogeneous relational databases. In: ICDE, pp. 346–355, (2007)Google Scholar
  56. 56.
    Shao F., Guo L., Botev C., Bhaskar A., Chettiar M., Yang F., Shanmugasundaram J.: Efficient keyword search over virtual xml views. VLDB J. 18(2), 543–570 (2009)CrossRefGoogle Scholar
  57. 57.
    Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual xml views. In: VLDB, pp. 1057–1068 (2007)Google Scholar
  58. 58.
    Simitsis A., Koutrika G., Ioannidis Y.E.: Précis: from unstructured keywords as queries to structured databases as answers. VLDB J. 17(1), 117–149 (2008)Google Scholar
  59. 59.
    Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: WWW, pp. 1043–1052 (2007)Google Scholar
  60. 60.
    Tao, Y., Yu, J.X.: Finding frequent co-occurring terms in relational keyword search. In: EDBT, pp. 839–850 (2009)Google Scholar
  61. 61.
    Theobald M., Bast H., Majumdar D., Schenkel R., Weikum G.: Topx: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)Google Scholar
  62. 62.
    Tran, T., Wang, H., Rudolph, S., Cimiano, P.: Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: ICDE, pp. 405–416 (2009)Google Scholar
  63. 63.
    Vu, Q.H., Ooi, B.C., Papadias, D., Tung, A.K.H.: A graph method for keyword-based selection of the top-k databases. In: SIGMOD Conference, pp. 915–926 (2008)Google Scholar
  64. 64.
    Weikum, G.: Db&ir: both sides now. In: SIGMOD Conference, pp. 25–30 (2007)Google Scholar
  65. 65.
    Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD Conference, pp. 537–538 (2005)Google Scholar
  66. 66.
    Xu, Y., Papakonstantinou, Y.: Efficient LCA based keyword search in XML data. In: EDBT, pp. 535–546 (2008)Google Scholar
  67. 67.
    Yu, B., Li, G., Sollins, K.R., Tung, A.K.H.: Effective keyword-based selection of relational databases. In: SIGMOD Conference, pp. 139–150 (2007)Google Scholar
  68. 68.
    Zhang, D., Chee, Y. M., Mondal, A., Tung, A. K. H., Kitsuregawa, M.: Keyword search in spatial databases: Towards searching by document. In: ICDE, pp. 688–699 (2009)Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Guoliang Li
    • 1
  • Jianhua Feng
    • 1
  • Xiaofang Zhou
    • 2
  • Jianyong Wang
    • 1
  1. 1.Department of Computer Science and TechnologyTsinghua National Laboratory for Information Science and Technology, Tsinghua UniversityBeijingChina
  2. 2.School of Information Technology and Electrical EngineeringThe University of Queensland and NICTA Queensland LaboratoryBrisbaneAustralia

Personalised recommendations