Skip to main content
Log in

Finding Related Search Engine Queries by Web Community Based Query Enrichment

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The conventional approaches of finding related search engine queries rely on the common terms shared by two queries to measure their relatedness. However, search engine queries are usually short and the term overlap between two queries is very small. Using query terms as a feature space cannot accurately estimate relatedness. Alternative feature spaces are needed to enrich the term based search queries. In this paper, given a search query, first we extract the Web pages accessed by users from Japanese Web access logs which store the users individual and collective behavior. From these accessed Web pages we usually can get two kinds of feature spaces, i.e, content-sensitive (e.g., nouns) and content-ignorant (e.g., URLs), to enrich the expressions of search queries. Then, the relatedness between search queries can be estimated on their enriched expressions. Our experimental results show that the URL feature space produces much lower precision scores than the noun feature space which, however, is not applicable in non-text pages, dynamic pages and so on. It is crucial to improve the quality of the URL (content-ignorant) feature space since it is generally available in all types of Web pages. We propose a novel content-ignorant feature space, called Web community which is created from a Japanese Web page archive by exploiting link analysis. Experimental results show that the proposed Web community feature space generates much better results than the URL feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Improving search engines by query clustering. JASIST 58(12), 1793–1804 (2007)

    Article  Google Scholar 

  2. Beeferman, D., Berger, A.L.: Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 407–416. Boston, MA, USA (2000)

    Google Scholar 

  3. Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart. In: Proceedings of Text REtrieval Conference (TREC’03), pp. 69–080. Gaithersburg, Maryland (1994)

    Google Scholar 

  4. Catledge, L., Pitkow, J.: Characterizing browsing behaviors on the world-wide web. Comput. Netw. ISDN Syst. 27(6) (1995)

  5. Caverlee, J., Liu, L., Rocco, D.: Discovering interesting relationships among deep web databases: a source-biased approach. World Wide Web 9(4), 585–622 (2006)

    Article  Google Scholar 

  6. Chirita, P.A., Firan, C.S., Nejdl, W.: Personalized query expansion for the web. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07), pp. 7–14. Amsterdam, The Netherlands (2007)

    Chapter  Google Scholar 

  7. Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (CIKM’05), pp. 704–711. Bremen, Germany (2005)

    Chapter  Google Scholar 

  8. Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng. 15(4), 829–839 (2003)

    Article  Google Scholar 

  9. Dean, J., Henzinger, M.R.: Finding related pages in the world wide web. Comput. Networks 31(11–16), 1467–1479 (1999)

    Article  Google Scholar 

  10. Fitzpatrick, L., Dent, M.: Automatic feedback using past queries: social searching? In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pp. 306–313. Philadelphia, PA, USA (1997)

    Chapter  Google Scholar 

  11. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: Proceedings of the 6h ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 150–160. Boston, MA, USA (2000)

    Google Scholar 

  12. Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of web communities. IEEE Comput. 35(3), 66–71 (2002)

    Google Scholar 

  13. Gibson, D., Kleinberg, J.M., Raghavan, P.: Inferring web communities from link topology. In: Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (HT’98), pp. 225–234. Pittsburgh, PA, USA (1998)

    Google Scholar 

  14. Glance, N.S.: Community search assistant. In: Proceedings of the 2001 International Conference on Intelligent User Interfaces (IUI’01), pp. 91–96. Santa Fe, NM, USA (2001)

    Google Scholar 

  15. Greco, G., Greco, S., Zumpano, E.: Web communities: models and algorithms. World Wide Web 7(1), 58–82 (2004)

    Article  Google Scholar 

  16. Jansen, B.J., Spink, A., Bateman, J., Saracevic, T.: Real life information retrieval: a study of user queries on the web. SIGIR Forum 32(1), 5–17 (1998)

    Article  Google Scholar 

  17. Khy, S., Ishikawa, Y., Kitagawa, H.: A novelty-based clustering method for on-line documents. World Wide Web 11(1), 1–37 (2008)

    Article  Google Scholar 

  18. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  19. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Comput. Networks 31(11–16), 1481–1493 (1999)

    Article  Google Scholar 

  20. Li, L., Yang, Z., Liu, L., Kitsuregawa, M.: Query-url bipartite based approach to personalized query recommendation. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence,(AAAI’08), pp. 1189–1194. Chicago, Illinois, USA (2008)

    Google Scholar 

  21. Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145 (1991)

    Article  MATH  Google Scholar 

  22. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  23. Otsuka, S., Toyoda, M., Hirai, J., Kitsuregawa, M.: Extracting user behavior by web communities technology on global web logs. In: Proceedings of 15th International Conference on Database and Expert Systems Applications (DEXA’04), pp. 957–968. Zaragoza, Spain (2004)

    Google Scholar 

  24. Otsuka, S., Toyoda, M., Kitsuregawa, M.: A study for related words finding methods using global web access logs. IPSJ Transactions on Databases (TOD) 46, 82–92 (2005)

    Google Scholar 

  25. Pereira, F.C.N., Tishby, N., Lee, L.: Distributional clustering of english words. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL’93), pp. 183–190 (1993)

  26. Raghavan, V.V., Sever, H.: On the reuse of past optimal queries. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), pp. 344–350. Seattle, Washington, USA (1995)

    Chapter  Google Scholar 

  27. Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. JASIS 41(4), 288–297 (1990)

    Article  Google Scholar 

  28. Shi, X., Yang, C.C.: Mining related queries from web search engine query logs using an improved association rule mining model. JASIST 58(12), 1871–1883 (2007)

    Article  MathSciNet  Google Scholar 

  29. Siegel, S., Castellan, N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd ed. McGraw-Hill, New York(1988)

    Google Scholar 

  30. Sun, R., Ong, C.H., Chua, T.S.: Mining dependency relations for query expansion in passage retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06), pp. 382–389. Seattle, Washington, USA (2006)

    Chapter  Google Scholar 

  31. Toyoda, M., Kitsuregawa, M.: Creating a web community chart for navigating related communities. In: Proceedings of the 12th ACM Conference on Hypertext and Hypermedia (HT’01), pp. 103–112. Århus, Denmark (2001)

    Chapter  Google Scholar 

  32. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 61–69. Dublin, Ireland (1994)

    Google Scholar 

  33. Wen, J.R., Nie, J.Y., Zhang, H.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002)

    Article  Google Scholar 

  34. Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 4–11. Zurich, Switzerland (1996)

    Chapter  Google Scholar 

  35. Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. 18(1), 79–112 (2000)

    Article  Google Scholar 

  36. Zhang, Z., Nasraoui, O.: Mining search engine query logs for query recommendation. In: Proceedings of the 15th International Conference on World Wide Web (WWW’06), pp. 1039–1040. Edinburgh, Scotland, UK (2006)

    Chapter  Google Scholar 

  37. Zhu, Y., Gruenwald, L.: Query expansion using web access log files. In: Proceedings of the 16th International Conference on Database and Expert Systems Applications (DEXA’05), pp. 686–695. Copenhagen, Denmark (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Li.

Additional information

The work was done when Lin Li was a Ph.D. student at the University of Tokyo and now she works at Wuhan University of Technology.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Otsuka, S. & Kitsuregawa, M. Finding Related Search Engine Queries by Web Community Based Query Enrichment. World Wide Web 13, 121–142 (2010). https://doi.org/10.1007/s11280-009-0077-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-009-0077-1

Keywords

Navigation