Advertisement

World Wide Web

, Volume 16, Issue 3, pp 273–297 | Cite as

An efficient approach to suggesting topically related web queries using hidden topic model

  • Lin LiEmail author
  • Guandong Xu
  • Zhenglu Yang
  • Peter Dolog
  • Yanchun Zhang
  • Masaru Kitsuregawa
Article

Abstract

Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.

Keywords

query suggestion hidden topic model latent Dirichlet allocation Web search engine 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Improving search engines by query clustering. J. Am. Soc. Inf. Sci. Technol. 58(12), 1793–1804 (2007)CrossRefGoogle Scholar
  2. 2.
    Balfe, E., Smyth, B.: An analysis of query similarity in collaborative Web search. In: Advances in Information Retrieval, 27th European Conference on IR Research, (ECIR’05), pp. 330–344. Santiago de Compostela, Spain (2005)Google Scholar
  3. 3.
    Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp. 131–140. Banff, Alberta, Canada (2007)Google Scholar
  4. 4.
    Beeferman, D., Berger, A.L.: Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 407–416. Boston, MA (2000)Google Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart. In: Proceedings of Text REtrieval Conference (TREC’03), pp. 69–080. Gaithersburg, Maryland (2003)Google Scholar
  7. 7.
    Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: Context-aware query suggestion by mining click-through and session data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD’08), pp. 875–883. Las Vegas, Nevada (2008)Google Scholar
  8. 8.
    Carman, M.J., Crestani, F., Harvey, M., Baillie, M.: Towards query log based personalization using topic models. In: Proceedings of the 19th ACM Conference on Information and Knowledge Management(CIKM’10), pp. 1849–1852. Toronto, Ontario (2010)Google Scholar
  9. 9.
    Chien, S., Immorlica, N.: Semantic similarity between search engine queries using temporal correlation. In: Proceedings of the 14th international conference on World Wide Web, (WWW’05), pp. 2–11. Chiba, Japan (2005)Google Scholar
  10. 10.
    Chirita, P.A., Firan, C.S., Nejdl, W.: Personalized query expansion for the Web. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07), pp. 7–14. Amsterdam, The Netherlands (2007)Google Scholar
  11. 11.
    Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (CIKM’05), pp. 704–711. Bremen, Germany (2005)Google Scholar
  12. 12.
    Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng. 15(4), 829–839 (2003)CrossRefGoogle Scholar
  13. 13.
    Dolog, P., Stuckenschmidt, H., Wache, H., Diederich, J.: Relaxing rdf queries based on user and domain preferences. J. Intell. Inf. Syst. 33(3), 239–260 (2009)CrossRefGoogle Scholar
  14. 14.
    Eda, T., Yoshikawa, M., Uchiyama, T., Uchiyama, T.: The effectiveness of latent semantic analysis for building up a bottom-up taxonomy from folksonomy tags. World Wide Web 12(4), 421–440 (2009)CrossRefGoogle Scholar
  15. 15.
    Fan, J., Wu, H., Li, G., Zhou, L.: Suggesting topic-based query terms as you type. In: Advances in Web Technologies and Applications, Proceedings of the 12th Asia-Pacific Web Conference(APWeb’10), pp. 61–67. Buscan, Korea (2010)Google Scholar
  16. 16.
    Fitzpatrick, L., Dent, M.: Automatic feedback using past queries: social searching? In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pp. 306–313. Philadelphia, PA (1997)Google Scholar
  17. 17.
    Fonseca, B.M., Golgher, P.B., Pôssas, B., Ribeiro-Neto, B.A., Ziviani, N.: Concept-based interactive query expansion. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, (CIKM’05), pp. 696–703 (2005)Google Scholar
  18. 18.
    Fu, L., lian Goh, D.H., boon Foo, S.S.: The effect of similarity measures on the quality of query clusters. J. Inf. Sci. 30(5), 396–407 (2004)CrossRefGoogle Scholar
  19. 19.
    Glance, N.S.: Community search assistant. In: Proceedings of the 2001 International Conference on Intelligent User Interfaces (IUI’01), pp. 91–96. Santa Fe, NM (2001)Google Scholar
  20. 20.
    He, X., Yan, J., Ma, J., Liu, N., Chen, Z.: Query topic detection for reformulation. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp. 1187–1188. Banff, Alberta (2007)Google Scholar
  21. 21.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’99), pp. 289–296. Stockholm, Sweden (1999)Google Scholar
  22. 22.
    Huang, S., Zhao, Q., Mitra, P., Giles, C.L.: Hierarchical location and topic based query expansion. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI’08), pp. 1150–1155. Chicago, Illinois (2008)Google Scholar
  23. 23.
    Jansen, B.J., Spink, A., Bateman, J., Saracevic, T.: Real life information retrieval: a study of user queries on the Web. SIGIR Forum 32(1), 5–17 (1998)CrossRefGoogle Scholar
  24. 24.
    Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02), pp. 538–543. Edmonton, Alberta (2002)Google Scholar
  25. 25.
    Kelly, D., Cushing, A., Dostert, M., Niu, X., Gyllstrom, K.: Effects of popularity and quality on the usage of query suggestions during information search. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems(CHI’10), pp. 45–54. Atlanta, Georgia (2010)Google Scholar
  26. 26.
    Li, L., Otsuka, S., Kitsuregawa, M.: Query recommendation using large-scale web access logs and Web page archive. In: Proceedings of 19th International Conference on Database and Expert Systems Applications (DEXA’08), pp. 134–141. Turin, Italy (2008)Google Scholar
  27. 27.
    Li, L., Otsuka, S., Kitsuregawa, M.: Finding related search engine queries by Web community based query enrichment. World Wide Web 13(1–2), 121–142 (2010)CrossRefGoogle Scholar
  28. 28.
    Li, L., Yang, Z., Liu, L., Kitsuregawa, M.: Query-url bipartite based approach to personalized query recommendation. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence,(AAAI’08), pp. 1189–1194. Chicago, Illinois (2008)Google Scholar
  29. 29.
    Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)zbMATHCrossRefGoogle Scholar
  30. 30.
    Ma, H., Lyu, M.R., King, I.: Diversifying query suggestion results. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI’10). Atlanta, Georgia (2010)Google Scholar
  31. 31.
    Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from clickthrough data for query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 709–718. Napa Valley, California (2008)Google Scholar
  32. 32.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
  33. 33.
    Mei, Q., Zhou, D., Church, K.W.: Query suggestion using hitting time. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 469–478. Napa Valley, California (2008)Google Scholar
  34. 34.
    Pereira, F.C.N., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL’93), pp. 183–190 (1993)Google Scholar
  35. 35.
    Ravid, G., Rafaeli, S.: Popularity and findability through log analysis of search terms and queries: the case of a multilingual public service web site. IEEE Trans. Inf. Theory 33(5), 567–583 (2007)Google Scholar
  36. 36.
    Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41(4), 288–297 (1990)CrossRefGoogle Scholar
  37. 37.
    Shi, X., Yang, C.C.: Mining related queries from web search engine query logs using an improved association rule mining model. J. Am. Soc. Inf. Sci. Technol. 58(12), 1871–1883 (2007)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Song, Y., wei He, L.: Optimal rare query suggestion with implicit user feedback. In: Proceedings of the 19th International Conference on World Wide Web (WWW’10), pp. 901–910. Raleigh, North Carolina (2010)Google Scholar
  39. 39.
    Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05), pp. 418–425. Houston, Texas (2005)Google Scholar
  40. 40.
    Sun, R., Ong, C.H., Chua, T.S.: Mining dependency relations for query expansion in passage retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06), pp. 382–389. Seattle, Washington (2006)Google Scholar
  41. 41.
    Vechtomova, O., Wang, Y.: A study of the effect of term proximity on query expansion. J. Inf. Sci. 32(4), 324–333 (2006)CrossRefGoogle Scholar
  42. 42.
    Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 61–69. Dublin, Ireland (1994)Google Scholar
  43. 43.
    Wen, J.R., Nie, J.Y., Zhang, H.: Query clustering using user logs. ACM Trans. Inf. Sys. 20(1), 59–81 (2002)CrossRefGoogle Scholar
  44. 44.
    Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Sys. 18(1), 79–112 (2000)CrossRefGoogle Scholar
  45. 45.
    Yang, J.M., Cai, R., Jing, F., Wang, S., Zhang, L., Ma, W.Y.: Search-based query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 1439–1440. Napa Valley, California (2008)Google Scholar
  46. 46.
    Zhu, Y., Gruenwald, L.: Query expansion using Web access log files. In: Proceedings of the 16th International Conference on Database and Expert Systems Applications (DEXA’05), pp. 686–695. Copenhagen, Denmark (2005)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Lin Li
    • 1
    Email author
  • Guandong Xu
    • 2
  • Zhenglu Yang
    • 3
  • Peter Dolog
    • 2
  • Yanchun Zhang
    • 4
  • Masaru Kitsuregawa
    • 3
  1. 1.School of Computer Science and TechnolgoyWuhan University of TechnologyWuhanChina
  2. 2.Department of Computer ScienceAalborg UniversityAalborgDenmark
  3. 3.Institute of Industrial ScienceThe University of TokyoTokyoJapan
  4. 4.School of Engineering & ScienceVictoria UniversityMelbourneAustralia

Personalised recommendations