WISE 2013: Web Information Systems Engineering – WISE 2013 pp 111-120 | Cite as
Exploiting User Queries for Search Result Clustering
Abstract
Search Result Clustering (SRC) groups the results of a user query in such a way that each cluster represents a set of related results. To be useful to the user, the different cluster should contain the results corresponding to different possible meanings of the user query and the cluster labels should reflect these meanings. However, existing SRC algorithms often ignore the user query and group the results based just on the similarity of search results. This can lead to two problems: low quality cluster, where the results within a single cluster are related to different meanings of the query; and poor cluster labels, where the label of the cluster does not reflect the query meaning associated with the results in the cluster.
This paper presents a new SRC algorithm called QSC that exploits the user query and uses both syntactic and semantic features of the search results to construct clusters and labels. Experiments show that the query senses are good candidates for the cluster labels and the algorithm can lead to high quality cluster and more semantically meaningful labels than other state-of-the-art algorithms.
Keywords
Web Clustering Engine Search Result Clustering Query Senses Document ClusteringPreview
Unable to display preview. Download preview PDF.
References
- 1.Bernardini, A., Carpineto, C., D’Amico, M.: Full-subtopic retrieval with keyphrase-based search results clustering. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, WI-IAT 2009, vol. 1, pp. 206–213. IET (2009)Google Scholar
- 2.Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 73–80. Association for Computational Linguistics (2006)Google Scholar
- 3.Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)MATHGoogle Scholar
- 4.Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys (CSUR) 41(3), 17 (2009)CrossRefGoogle Scholar
- 5.Carpineto, C., Romano, G.: Ambient dataset (2008)Google Scholar
- 6.Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 172–178. IEEE (2005)Google Scholar
- 7.Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 201–212. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 8.Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 1–76 (just accepted, 2013)Google Scholar
- 9.Dorow, B., Widdows, D., Ling, K., Eckmann, J.-P., Sergi, D., Moses, E.: Using curvature and markov clustering in graphs for lexical acquisition and word sense discrimination. arXiv preprint cond-mat/0403693 (2004)Google Scholar
- 10.Hearst, M., Pedersen, J.: Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 76–84. ACM (1996)Google Scholar
- 11.Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)CrossRefGoogle Scholar
- 12.Jabeen, S., Gao, X., Andreae, P.: Harnessing wikipedia semantics for computing contextual relatedness. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS, vol. 7458, pp. 861–865. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 13.Meilă, M.: Comparing clusterings–an information based distance. Journal of Multivariate Analysis 98(5), 873–895 (2007)MathSciNetCrossRefMATHGoogle Scholar
- 14.Meiyappan, Y., Iyengar, N.C.S.N., Kannan, A., Suyoto, Y.D., Suselo, T., Prasetyaningrum, T., Tlili, R., Slimani, Y., Dufreche, S., Zappi, M., et al.: Srcluster: Web clustering engine based on wikipedia. International Journal of Advanced Science and Technology 39(1), 1–18 (2012)Google Scholar
- 15.Milne, D., Witten, I.H.: An open-source toolkit for mining wikipedia. Artificial Intelligence (2012)Google Scholar
- 16.Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 116–126. Association for Computational Linguistics (2010)Google Scholar
- 17.Osiriski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM 2004 Conference held in Zakopane, Poland, p. 359 (2004)Google Scholar
- 18.Pang-Ning, T., Steinbach, M., Kumar, V.: Introduction to data mining. WP Co. (2006)Google Scholar
- 19.Pirolli, P., Schank, P., Hearst, M., Diehl, C.: Scatter/gather browsing communicates the topic structure of a very large text collection. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 213–220. ACM (1996)Google Scholar
- 20.Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), vol. 410, p. 420 (2007)Google Scholar
- 21.Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)Google Scholar
- 22.Véronis, J.: Hyperlex: lexical cartography for information retrieval. Computer Speech & Language 18(3), 223–252 (2004)CrossRefGoogle Scholar
- 23.Zamir, O., Etzioni, O., Madani, O., Karp, R.: Fast and intuitive clustering of web documents. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 287–290. MIT Press (1997)Google Scholar
- 24.Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 10–17. ACM (2003)Google Scholar