Keyword Search on Large-Scale Structured, Semi-Structured, and Unstructured Data

Chapter

Abstract

As our world is now in its information era, huge amounts of structured, semi-structured, and unstructured data are accumulated everyday. A real universal challenge nowadays is to retrieve interesting and meaningful information from these large collections of data with the purpose of capturing users’ information needs. Keyword search is a type of search that looks for matching objects which contain one or more keywords specified by a user. Keyword search provides a simple but relatively powerful solution for millions of users to search information from large-scale data. Due to the high demands of managing and processing large collections of structured, semi-structured, and unstructured data in various emerging applications, keyword search has become an important technique. In the past decade, many efficient and effective techniques for keyword search have been developed. In this chapter, we survey several representative techniques in the literature. These techniques have several desirable characteristics which are very useful in different application scenarios.

References

  1. 1.
    Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das. DBXplorer: A system for keyword-based search over relational databases. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02), pages 5–16, Washington, DC, USA, 2002. IEEE Computer Society.Google Scholar
  2. 2.
    S. Amer-Yahia, P. Case, T. Rölleke, J. Shanmugasundaram, and G. Weikum. Report on the DB/IR panel at sigmod 2005. SIGMOD Record, 34(4):71–74, 2005.CrossRefGoogle Scholar
  3. 3.
    Ricardo A. Baeza-Yates, Carlos A. Hurtado, and Marcelo Mendoza. Query recommendation using query logs in search engines. In EDBT Workshops, volume 3268 of Lecture Notes in Computer Science, pages 588–596. Springer, 2004.Google Scholar
  4. 4.
    Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.Google Scholar
  5. 5.
    Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. Objectrank: authority-based keyword search in databases. In Proceedings of the Thirtieth international conference on Very large data bases (VLDB’04), pages 564–575. VLDB Endowment, 2004.Google Scholar
  6. 6.
    Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02), pages 431–440. IEEE Computer Society, 2002.Google Scholar
  7. 7.
    Huanhuan Cao, Daxin Jiang, Jian Pei, Enhong Chen, and Hang Li. Towards context-aware search by learning a very large variable length hidden markov model from search logs. In Proceedings of the 18th International World Wide Web Conference (WWW’09), pages 191–200, Madrid, Spain, April 20-24 2009.Google Scholar
  8. 8.
    Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. Context-aware query suggestion by mining click-through and session data. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’08), pages 875–883, New York, NY, USA, 2008. ACM.Google Scholar
  9. 9.
    Surajit Chaudhuri and Gautam Das. Keyword querying and ranking in databases. PVLDB, 2(2):1658–1659, 2009.MathSciNetGoogle Scholar
  10. 10.
    Surajit Chaudhuri, Raghu Ramakrishnan, and Gerhard Weikum. Integrating DB and IR technologies: What is the sound of one hand clapping? In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR’05), pages 1–12, 2005.Google Scholar
  11. 11.
    Yi Chen, Wei Wang, Ziyang Liu, and Xuemin Lin. Keyword search on structured and semi-structured data. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD’09), pages 1005–1010. ACM, 2009.Google Scholar
  12. 12.
    Paul Alexandru Chirita, Claudiu S. Firan, and Wolfgang Nejdl. Personalized query expansion for the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’07), pages 7–14, New York, NY, USA, 2007. ACM.Google Scholar
  13. 13.
    Kenneth Church and Bo Thiesson. The wild thing! In Proceedings of the ACL 2005 on Interactive poster and demonstration sessions (ACL’05), pages 93–96, Morristown, NJ, USA, 2005. Association for Computational Linguistics.Google Scholar
  14. 14.
    Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web (WWW’02), pages 325–332, New York, NY, USA, 2002. ACM.Google Scholar
  15. 15.
    Bhavana Bharat Dalvi, Meghana Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. Proc. VLDB Endow., 1(1):1189–1204, 2008.Google Scholar
  16. 16.
    Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin. Finding top-k min-cost connected trees in databases. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE’07), pages 836–845, Washington, DC, USA, 2007. IEEE Computer Society.Google Scholar
  17. 17.
    S. E. Dreyfus and R. A. Wagner. The steiner problem in graphs. Networks, 1:195–207, 1972.CrossRefMATHMathSciNetGoogle Scholar
  18. 18.
    Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River, NJ, USA, 2 edition, 2008.Google Scholar
  19. 19.
    Donna Harman, R. Baeza-Yates, Edward Fox, and W. Lee. Inverted files. In Information retrieval: data structures and algorithms, pages 28–43, Upper Saddle River, NJ, USA, 1992. Prentice-Hall, Inc.Google Scholar
  20. 20.
    Hao He, Haixun Wang, Jun Yang, and Philip S. Yu. Blinks: ranked keyword searches on graphs. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD’07), pages 305–316, New York, NY, USA, 2007. ACM.Google Scholar
  21. 21.
    Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. Efficient ir-style keyword search over relational databases. In Proceedings of the 29st international conference on Very large data bases (VLDB’03), pages 850–861, 2003.Google Scholar
  22. 22.
    Vagelis Hristidis and Yannis Papakonstantinou. Discover: Keyword search in relational databases. In Proceedings of the 28st international conference on Very large data bases (VLDB’02), pages 670–681. Morgan Kaufmann, 2002.Google Scholar
  23. 23.
    Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web (WWW’06), pages 387–396, New York, NY, USA, 2006. ACM.Google Scholar
  24. 24.
    Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proceedings of the 31st international conference on Very large data bases (VLDB’05), pages 505–516. ACM, 2005.Google Scholar
  25. 25.
    Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proceedings of the 31st international conference on Very large data bases (VLDB’05), pages 505–516. ACM, 2005.Google Scholar
  26. 26.
    Benny Kimelfeld and Yehoshua Sagiv. Finding and approximating top-k answers in keyword proximity search. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS’06), pages 173–182, New York, NY, USA, 2006. ACM.Google Scholar
  27. 27.
    Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA’98), pages 668–677. ACM, 1998.Google Scholar
  28. 28.
    Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD’08), pages 903–914, New York, NY, USA, 2008. ACM.Google Scholar
  29. 29.
    Jianxin Li, Chengfei Liu, Rui Zhou, and Wei Wang. Suggestion of promising result types for xml keyword search. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10), pages 561–572. ACM, 2010.Google Scholar
  30. 30.
    Mu Li, Yang Zhang, Muhua Zhu, and Ming Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (ACL’06), pages 1025–1032, Morristown, NJ, USA, 2006. Association for Computational Linguistics.Google Scholar
  31. 31.
    Wen-Syan Li, K. Selçuk Candan, Quoc Vu, and Divyakant Agrawal. Query relaxation by structure and semantics for retrieval of logical web documents. IEEE Trans. on Knowl. and Data Eng., 14(4):768–791, 2002.Google Scholar
  32. 32.
    Fang Liu, Clement Yu, Weiyi Meng, and Abdur Chowdhury. Effective keyword search in relational databases. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD’06), pages 563–574, New York, NY, USA, 2006. ACM.Google Scholar
  33. 33.
    Yi Luo, Xuemin Lin, Wei Wang, and Xiaofang Zhou. Spark: top-k keyword query in relational databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 115–126, New York, NY, USA, 2007. ACM.Google Scholar
  34. 34.
    Mark Magennis and Cornelis J. van Rijsbergen. The potential and actual effectiveness of interactive query expansion. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’97), pages 324–332, New York, NY, USA, 1997. ACM.Google Scholar
  35. 35.
    Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.Google Scholar
  36. 36.
    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.Google Scholar
  37. 37.
    Lu Qin, Je Xu Yu, and Lijun Chang. Keyword search in databases: the power of rdbms. In Proceedings of the 35th SIGMOD International Conference on Management of Data (SIGMOD’09), pages 681–694, Providence, Rhode Island, USA, 2009. ACM Press.Google Scholar
  38. 38.
    Lu Qin, Jeffrey Xu Yu, Lijun Chang, and Yufei Tao. Querying communities in relational databases. In Proceedings of the 25th International Conference on Data Engineering (ICDE’09), pages 724–735. IEEE, 2009.Google Scholar
  39. 39.
    Mehran Sahami and Timothy D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on World Wide Web (WWW’06), pages 377–386, New York, NY, USA, 2006. ACM.Google Scholar
  40. 40.
    Kamal Taha and Ramez Elmasri. Bussengine: a business search engine. Knowledge and Information Systems, 23(2):153–197, 2010.CrossRefGoogle Scholar
  41. 41.
    Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Random walk with restart: fast solutions and applications. Knowledge and Information Systems, 14(3):327–346, 2008.CrossRefMATHGoogle Scholar
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
    Gerhard Weikum. DB&IR: both sides now. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 25–30, New York, NY, USA, 2007. ACM.Google Scholar
  47. 47.
    Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Clustering user queries of a search engine. In Proceedings of the 10th international conference on World Wide Web (WWW’01), pages 162–168, New York, NY, USA, 2001. ACM.Google Scholar
  48. 48.
    Jeffrey Xu Yu, Lu Qin, and Lijun Chang. Keyword Search in Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2010.Google Scholar
  49. 49.
    Jeffrey Xu Yu, Lu Qin, and Lijun Chang. Keyword search in relational databases: A survey. IEEE Data Eng. Bull., 33(1):67–78, 2010.Google Scholar
  50. 50.
    Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, and Wolfgang Nejdl. Query relaxation using malleable schemas. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 545–556, New York, NY, USA, 2007. ACM.Google Scholar
  51. 51.
    N. Ziviani, E. Silva de Moura, G. Navarro, and R. Baeza-Yates. Compression: A key for next generation text retrieval systems. Computers, 33(11):37–44, 2000.CrossRefGoogle Scholar
  52. 52.
    J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 1(1):1–30, 1998.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Information SystemsUniversity of Maryland, Baltimore County (UMBC)BaltimoreUSA

Personalised recommendations