Abstract
We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves classical search result clustering methods in terms of both clustering quality and degree of diversification.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agirre, E., Martínez, D., de Lacalle, O.L., Soroa, A.: Evaluating and optimizing the parameters of an unsupervised graph-based WSD algorithm. In: Proc. of TextGraphs 2006, New York, USA, pp. 89–96 (2006)
Bennett, P.N., Nguyen, N.: Refined experts: improving classification in large taxonomies. In: Proc. of SIGIR 2009, Boston, MA, USA, pp. 11–18 (2009)
Bernardini, A., Carpineto, C., D’Amico, M.: Full-subtopic retrieval with keyphrase-based search results clustering. In: Proc. of WI 2009, Milan, Italy, pp. 206–213 (2009)
Brants, T., Franz, A.: Web 1t 5-gram, ver. 1, ldc2006t13. In: LDC, PA, USA (2006)
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proc. of SIGIR 1998, Melbourne, Australia, pp. 335–336 (1998)
Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using Wikipedia. In: Proc. of SIGIR 2009, MA, USA, pp. 139–146 (2009)
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys 41(3), 1–38 (2009)
Carpineto, C., Romano, G.: Exploiting the potential of concept lattices for information retrieval with CREDO. Journal of Universal Computer Science 10(8), 985–1013 (2004)
Chen, H., Karger, D.R.: Less is more: probabilistic models for retrieving fewer relevant documents. In: Proc. of SIGIR 2006, Seattle, WA, USA, pp. 429–436 (2006)
Chen, J., Zaïane, O.R., Goebel, R.: An unsupervised approach to cluster web search results based on word sense communities. In: Proc. of WI-IAT 2008, Sydney, Australia, pp. 725–729 (2008)
Cheng, D., Vempala, S., Kannan, R., Wang, G.: A divide-and-merge methodology for clustering. In: Proc. of PODS 2005, New York, NY, USA, pp. 196–205 (2005)
Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: Proc. of WI 2005, Compiègne, France, pp. 172–178 (2005)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proc. of SIGIR 1992, Copenhagen, Denmark, pp. 318–329 (1992)
Di Giacomo, E., Didimo, W., Grilli, L., Liotta, G.: Graph visualization techniques for web clustering engines. IEEE Transactions on Visualization and Computer Graphics 13(2), 294–304 (2007)
Harris, Z.: Distributional structure. Word 10, 146–162 (1954)
Kamvar, M., Baluja, S.: A large scale study of wireless search behavior: Google mobile search. In: Proc. of CHI 2006, New York, NY, USA, pp. 701–709 (2006)
Ke, W., Sugimoto, C.R., Mostafa, J.: Dynamicity vs. effectiveness: studying online clustering for scatter/gather. In: Proc. of SIGIR 2009, MA, USA, pp. 19–26 (2009)
Krovetz, R., Croft, W.B.: Lexical ambiguity and Information Retrieval. ACM Transactions on Information Systems 10(2), 115–141 (1992)
Kurland, O.: The opposite of smoothing: a language model approach to ranking query-specific document clusters. In: Proc. of SIGIR 2008, Singapore, pp. 171–178 (2008)
Kurland, O., Domshlak, C.: A rank-aggregation approach to searching for optimal query-specific clusters. In: Proc. of SIGIR 2008, Singapore, pp. 547–554 (2008)
Lee, K.S., Croft, W.B., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: Proc. of SIGIR 2008, Singapore, pp. 235–242 (2008)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. of the 17th COLING, Montreal, Canada, pp. 768–774 (1998)
Liu, S., Yu, C., Meng, W.: Word Sense Disambiguation in queries. In: Proc. of CIKM 2005, Bremen, Germany, pp. 525–532 (2005)
Mandala, R., Tokunaga, T., Tanaka, H.: The use of WordNet in Information Retrieval. In: Proc. of the COLING-ACL Workshop on Usage of Wordnet in Natural Language Processing, Montreal, Canada, pp. 31–37 (1998)
Miller, G.A., Beckwith, R.T., Fellbaum, C.D., Gross, D., Miller, K.: WordNet: an online lexical database. International Journal of Lexicography 3(4), 235–244 (1990)
Navigli, R.: Word Sense Disambiguation: a survey. ACM Computing Surveys 41(2), 1–69 (2009)
Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Boston, USA, pp. 116–126 (2010)
Ngo, C.L., Nguyen, H.S.: A method of web search result clustering based on rough sets. In: Proc. of WI 2005, Compiègne, France, pp. 673–679 (2005)
Nguyen, C.-T., Phan, X.-H., Horiguchi, S., Nguyen, T.-T., Ha, Q.-T.: Web search clustering and labeling with hidden topics. ACM Transactions on Asian Language Information Processing 8(3), 1–40 (2009)
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20(3), 48–54 (2005)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Sanderson, M.: Word Sense Disambiguation and Information Retrieval. In: Proc. of SIGIR 1994, Dublin, Ireland, pp. 142–151 (1994)
Sanderson, M.: Ambiguous queries: test collections need more sense. In: Proc. of SIGIR 2008, Singapore, pp. 499–506 (2008)
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1998)
Schütze, H., Pedersen, J.: Information Retrieval based on word senses. In: Proceedings of SDAIR 1995, Las Vegas, Nevada, USA, pp. 161–175 (1995)
Stokoe, C., Oakes, M.J., Tait, J.I.: Word Sense Disambiguation in Information Retrieval revisited. In: Proc. of SIGIR 2003, Canada, pp. 159–166 (2003)
Swaminathan, A., Mathew, C.V., Kirovski, D.: Essential pages. In: Proc. of WI 2009, Milan, Italy, pp. 173–182 (2009)
Véronis, J.: HyperLex: lexical cartography for Information Retrieval. Computer Speech and Language 18(3), 223–252 (2004)
Voorhees, E.M.: Using WordNet to disambiguate word senses for text retrieval. In: Proc. of SIGIR 1993, Pittsburgh, PA, USA, pp. 171–180 (1993)
Widdows, D., Dorow, B.: A graph model for unsupervised lexical acquisition. In: Proc. of the 19th COLING, Taipei, Taiwan, pp. 1–7 (2002)
Maarek, Y., Ron Fagin, I.B.S., Pelleg, D.: Ephemeral document clustering for web applications. IBM Research Report RJ 10186 (2000)
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proc. of SIGIR 1998, Melbourne, Australia, pp. 46–54 (1998)
Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: Proc. of KDD 1997, Newport Beach, California, pp. 287–290 (1997)
Zhai, C., Cohen, W.W., Lafferty, J.: Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In: Proc. of SIGIR 2003, Toronto, Canada, pp. 10–17 (2003)
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.-Y.: Improving web search results using affinity graph. In: Proc. of SIGIR 2005, Salvador, Brazil, pp. 504–511 (2005)
Zhang, X., Hu, X., Zhou, X.: A comparative evaluation of different link types on enhancing document clustering. In: Proc. of SIGIR 2008, Singapore, pp. 555–562 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Marco, A., Navigli, R. (2011). Clustering Web Search Results with Maximum Spanning Trees. In: Pirrone, R., Sorbello, F. (eds) AI*IA 2011: Artificial Intelligence Around Man and Beyond. AI*IA 2011. Lecture Notes in Computer Science(), vol 6934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23954-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-23954-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23953-3
Online ISBN: 978-3-642-23954-0
eBook Packages: Computer ScienceComputer Science (R0)