Abstract
The conventional approaches of finding related search engine queries rely on the common terms shared by two queries to measure their relatedness. However, search engine queries are usually short and the term overlap between two queries is very small. Using query terms as a feature space cannot accurately estimate relatedness. Alternative feature spaces are needed to enrich the term based search queries. In this paper, given a search query, first we extract the Web pages accessed by users from Japanese Web access logs which store the users individual and collective behavior. From these accessed Web pages we usually can get two kinds of feature spaces, i.e, content-sensitive (e.g., nouns) and content-ignorant (e.g., URLs), to enrich the expressions of search queries. Then, the relatedness between search queries can be estimated on their enriched expressions. Our experimental results show that the URL feature space produces much lower precision scores than the noun feature space which, however, is not applicable in non-text pages, dynamic pages and so on. It is crucial to improve the quality of the URL (content-ignorant) feature space since it is generally available in all types of Web pages. We propose a novel content-ignorant feature space, called Web community which is created from a Japanese Web page archive by exploiting link analysis. Experimental results show that the proposed Web community feature space generates much better results than the URL feature space.
Similar content being viewed by others
References
Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Improving search engines by query clustering. JASIST 58(12), 1793–1804 (2007)
Beeferman, D., Berger, A.L.: Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 407–416. Boston, MA, USA (2000)
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart. In: Proceedings of Text REtrieval Conference (TREC’03), pp. 69–080. Gaithersburg, Maryland (1994)
Catledge, L., Pitkow, J.: Characterizing browsing behaviors on the world-wide web. Comput. Netw. ISDN Syst. 27(6) (1995)
Caverlee, J., Liu, L., Rocco, D.: Discovering interesting relationships among deep web databases: a source-biased approach. World Wide Web 9(4), 585–622 (2006)
Chirita, P.A., Firan, C.S., Nejdl, W.: Personalized query expansion for the web. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07), pp. 7–14. Amsterdam, The Netherlands (2007)
Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (CIKM’05), pp. 704–711. Bremen, Germany (2005)
Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng. 15(4), 829–839 (2003)
Dean, J., Henzinger, M.R.: Finding related pages in the world wide web. Comput. Networks 31(11–16), 1467–1479 (1999)
Fitzpatrick, L., Dent, M.: Automatic feedback using past queries: social searching? In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pp. 306–313. Philadelphia, PA, USA (1997)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: Proceedings of the 6h ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 150–160. Boston, MA, USA (2000)
Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of web communities. IEEE Comput. 35(3), 66–71 (2002)
Gibson, D., Kleinberg, J.M., Raghavan, P.: Inferring web communities from link topology. In: Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (HT’98), pp. 225–234. Pittsburgh, PA, USA (1998)
Glance, N.S.: Community search assistant. In: Proceedings of the 2001 International Conference on Intelligent User Interfaces (IUI’01), pp. 91–96. Santa Fe, NM, USA (2001)
Greco, G., Greco, S., Zumpano, E.: Web communities: models and algorithms. World Wide Web 7(1), 58–82 (2004)
Jansen, B.J., Spink, A., Bateman, J., Saracevic, T.: Real life information retrieval: a study of user queries on the web. SIGIR Forum 32(1), 5–17 (1998)
Khy, S., Ishikawa, Y., Kitagawa, H.: A novelty-based clustering method for on-line documents. World Wide Web 11(1), 1–37 (2008)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Comput. Networks 31(11–16), 1481–1493 (1999)
Li, L., Yang, Z., Liu, L., Kitsuregawa, M.: Query-url bipartite based approach to personalized query recommendation. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence,(AAAI’08), pp. 1189–1194. Chicago, Illinois, USA (2008)
Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145 (1991)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Otsuka, S., Toyoda, M., Hirai, J., Kitsuregawa, M.: Extracting user behavior by web communities technology on global web logs. In: Proceedings of 15th International Conference on Database and Expert Systems Applications (DEXA’04), pp. 957–968. Zaragoza, Spain (2004)
Otsuka, S., Toyoda, M., Kitsuregawa, M.: A study for related words finding methods using global web access logs. IPSJ Transactions on Databases (TOD) 46, 82–92 (2005)
Pereira, F.C.N., Tishby, N., Lee, L.: Distributional clustering of english words. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL’93), pp. 183–190 (1993)
Raghavan, V.V., Sever, H.: On the reuse of past optimal queries. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), pp. 344–350. Seattle, Washington, USA (1995)
Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. JASIS 41(4), 288–297 (1990)
Shi, X., Yang, C.C.: Mining related queries from web search engine query logs using an improved association rule mining model. JASIST 58(12), 1871–1883 (2007)
Siegel, S., Castellan, N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd ed. McGraw-Hill, New York(1988)
Sun, R., Ong, C.H., Chua, T.S.: Mining dependency relations for query expansion in passage retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06), pp. 382–389. Seattle, Washington, USA (2006)
Toyoda, M., Kitsuregawa, M.: Creating a web community chart for navigating related communities. In: Proceedings of the 12th ACM Conference on Hypertext and Hypermedia (HT’01), pp. 103–112. Århus, Denmark (2001)
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 61–69. Dublin, Ireland (1994)
Wen, J.R., Nie, J.Y., Zhang, H.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002)
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 4–11. Zurich, Switzerland (1996)
Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. 18(1), 79–112 (2000)
Zhang, Z., Nasraoui, O.: Mining search engine query logs for query recommendation. In: Proceedings of the 15th International Conference on World Wide Web (WWW’06), pp. 1039–1040. Edinburgh, Scotland, UK (2006)
Zhu, Y., Gruenwald, L.: Query expansion using web access log files. In: Proceedings of the 16th International Conference on Database and Expert Systems Applications (DEXA’05), pp. 686–695. Copenhagen, Denmark (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
The work was done when Lin Li was a Ph.D. student at the University of Tokyo and now she works at Wuhan University of Technology.
Rights and permissions
About this article
Cite this article
Li, L., Otsuka, S. & Kitsuregawa, M. Finding Related Search Engine Queries by Web Community Based Query Enrichment. World Wide Web 13, 121–142 (2010). https://doi.org/10.1007/s11280-009-0077-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-009-0077-1