Exploiting User Queries for Search Result Clustering

  • Abdul Wahid
  • Xiaoying Gao
  • Peter Andreae
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8180)

Abstract

Search Result Clustering (SRC) groups the results of a user query in such a way that each cluster represents a set of related results. To be useful to the user, the different cluster should contain the results corresponding to different possible meanings of the user query and the cluster labels should reflect these meanings. However, existing SRC algorithms often ignore the user query and group the results based just on the similarity of search results. This can lead to two problems: low quality cluster, where the results within a single cluster are related to different meanings of the query; and poor cluster labels, where the label of the cluster does not reflect the query meaning associated with the results in the cluster.

This paper presents a new SRC algorithm called QSC that exploits the user query and uses both syntactic and semantic features of the search results to construct clusters and labels. Experiments show that the query senses are good candidates for the cluster labels and the algorithm can lead to high quality cluster and more semantically meaningful labels than other state-of-the-art algorithms.

Keywords

Web Clustering Engine Search Result Clustering Query Senses Document Clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bernardini, A., Carpineto, C., D’Amico, M.: Full-subtopic retrieval with keyphrase-based search results clustering. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, WI-IAT 2009, vol. 1, pp. 206–213. IET (2009)Google Scholar
  2. 2.
    Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 73–80. Association for Computational Linguistics (2006)Google Scholar
  3. 3.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)MATHGoogle Scholar
  4. 4.
    Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys (CSUR) 41(3), 17 (2009)CrossRefGoogle Scholar
  5. 5.
    Carpineto, C., Romano, G.: Ambient dataset (2008)Google Scholar
  6. 6.
    Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 172–178. IEEE (2005)Google Scholar
  7. 7.
    Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 201–212. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 1–76 (just accepted, 2013)Google Scholar
  9. 9.
    Dorow, B., Widdows, D., Ling, K., Eckmann, J.-P., Sergi, D., Moses, E.: Using curvature and markov clustering in graphs for lexical acquisition and word sense discrimination. arXiv preprint cond-mat/0403693 (2004)Google Scholar
  10. 10.
    Hearst, M., Pedersen, J.: Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 76–84. ACM (1996)Google Scholar
  11. 11.
    Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)CrossRefGoogle Scholar
  12. 12.
    Jabeen, S., Gao, X., Andreae, P.: Harnessing wikipedia semantics for computing contextual relatedness. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS, vol. 7458, pp. 861–865. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Meilă, M.: Comparing clusterings–an information based distance. Journal of Multivariate Analysis 98(5), 873–895 (2007)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Meiyappan, Y., Iyengar, N.C.S.N., Kannan, A., Suyoto, Y.D., Suselo, T., Prasetyaningrum, T., Tlili, R., Slimani, Y., Dufreche, S., Zappi, M., et al.: Srcluster: Web clustering engine based on wikipedia. International Journal of Advanced Science and Technology 39(1), 1–18 (2012)Google Scholar
  15. 15.
    Milne, D., Witten, I.H.: An open-source toolkit for mining wikipedia. Artificial Intelligence (2012)Google Scholar
  16. 16.
    Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 116–126. Association for Computational Linguistics (2010)Google Scholar
  17. 17.
    Osiriski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM 2004 Conference held in Zakopane, Poland, p. 359 (2004)Google Scholar
  18. 18.
    Pang-Ning, T., Steinbach, M., Kumar, V.: Introduction to data mining. WP Co. (2006)Google Scholar
  19. 19.
    Pirolli, P., Schank, P., Hearst, M., Diehl, C.: Scatter/gather browsing communicates the topic structure of a very large text collection. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 213–220. ACM (1996)Google Scholar
  20. 20.
    Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), vol. 410, p. 420 (2007)Google Scholar
  21. 21.
    Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)Google Scholar
  22. 22.
    Véronis, J.: Hyperlex: lexical cartography for information retrieval. Computer Speech & Language 18(3), 223–252 (2004)CrossRefGoogle Scholar
  23. 23.
    Zamir, O., Etzioni, O., Madani, O., Karp, R.: Fast and intuitive clustering of web documents. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 287–290. MIT Press (1997)Google Scholar
  24. 24.
    Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 10–17. ACM (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Abdul Wahid
    • 1
  • Xiaoying Gao
    • 1
  • Peter Andreae
    • 1
  1. 1.School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations