Word Clouds of Multiple Search Results

  • Rianne Kaptein
  • Jaap Kamps
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6653)

Abstract

Search engine result pages (SERPs) are known as the most expensive real estate on the planet. Most queries yield millions of organic search results, yet searchers seldom look beyond the first handful of results. To make things worse, different searchers with different query intents may issue the exact same query. An alternative to showing individual web pages summarized by snippets is to represent whole group of results. In this paper we investigate if we can use word clouds to summarize groups of documents, e.g. to give a preview of the next SERP, or clusters of topically related documents. We experiment with three word cloud generation methods (full-text, query biased and anchor text based clouds) and evaluate them in a user study. Our findings are: First, biasing the cloud towards the query does not lead to test persons better distinguishing relevance and topic of the search results, but test persons prefer them because differences between the clouds are emphasized. Second, anchor text clouds are to be preferred over full-text clouds. Anchor text contains less noisy words than the full text of documents. Third, we obtain moderately positive results on the relation between the selected world clouds and the underlying search results: there is exact correspondence in 70% of the subtopic matching judgments and in 60% of the relevance assessment judgments. Our initial experiments open up new possibilities to have SERPs reflect a far larger number of results by using word clouds to summarize groups of search results.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bateman, S., Gutwin, C., Nacenta, M.: Seeing things in the clouds: the effect of visual features on tag cloud selections. In: HT 2008: Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, Pittsburgh, PA, USA, pp. 193–202. ACM, New York (2008)CrossRefGoogle Scholar
  2. 2.
    Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of SIGIR 2009, pp. 139–146. ACM, New York (2009)Google Scholar
  3. 3.
    Clarke, C.L.A., Craswell, N., Soboroff, I.: Overview of the trec 2009 web track. In: Proceedings of the Eighteenth Text REtrieval Conference, TREC 2009 (2010)Google Scholar
  4. 4.
    Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: IJCAI 1999: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 668–673 (1999)Google Scholar
  5. 5.
    Glover, E., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring hierarchical descriptions. In: Proceedings of CIKM 2002, pp. 507–514. ACM, New York (2002)Google Scholar
  6. 6.
    Gupta, S., Kaiser, G., Neistadt, D., Grimm, P.: Dom-based content extraction of html documents. In: Proceedings of the 12th International Conference on World Wide Web, WWW 2003, pp. 207–214. ACM, New York (2003)Google Scholar
  7. 7.
    Halvey, M.J., Keane, M.T.: An assessment of tag presentation techniques. In: WWW 2007: Proceedings of the 16th International Conference on World Wide Web, pp. 1313–1314. ACM, New York (2007)Google Scholar
  8. 8.
    Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM Press, New York (2004)Google Scholar
  9. 9.
    Kaptein, R., Hiemstra, D., Kamps, J.: How different are language models and Word clouds? In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 556–568. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Kaptein, R., Serdyukov, P., Kamps, J., de Vries, A.P.: Entity ranking using Wikipedia as a pivot. In: Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM 2010), pp. 69–78. ACM Press, New York (2010)CrossRefGoogle Scholar
  11. 11.
    Kuo, B.Y.-L., Hentrich, T., Good, B.M., Wilkinson, M.D.: Tag clouds for summarizing web search results. In: WWW 2007: Proceedings of the 16th International Conference on World Wide Web, pp. 1203–1204. ACM, New York (2007)Google Scholar
  12. 12.
    Pirolli, P., Schank, P., Hearst, M., Diehl, C.: Scatter/gather browsing communicates the topic structure of a very large text collection. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Common Ground, CHI 1996, Vancouver, British Columbia, Canada, pp. 213–220. ACM, New York (1996)Google Scholar
  13. 13.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st ACM Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)Google Scholar
  14. 14.
    Rivadeneira, A.W., Gruen, D.M., Muller, M.J., Millen, D.R.: Getting our head in the clouds: toward evaluation studies of tagclouds. In: CHI 2007: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, San Jose, California, USA, pp. 995–998. ACM, New York (2007)CrossRefGoogle Scholar
  15. 15.
    Song, M., Song, I. Y., Allen, R. B., Obradovic, Z.: Keyphrase extraction-based query expansion in digital libraries. In: JCDL 2006: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 202–209 (2006)Google Scholar
  16. 16.
    Srikanth, M., Srihari, R.: Biterm language models for document retrieval. In: Proceedings of SIGIR 2002, pp. 425–426. ACM, New York (2002)Google Scholar
  17. 17.
    Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: Proceedings of SIGIR 1998, pp. 2–10. ACM, New York (1998)Google Scholar
  18. 18.
    Tsagkias, M., Larson, M., de Rijke, M.: Term clouds as surrogates for user generated speech. In: Proceedings of SIGIR 2008, pp. 773–774. ACM, New York (2008)Google Scholar
  19. 19.
    Turney, P.: Coherent keyphrase extraction via web mining. In: IJCAI 2003, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 434–442 (2003)Google Scholar
  20. 20.
    Venetis, P., Koutrika, G., Garcia-Molina, H.: On the selection of tags for tag clouds. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, Hong Kong, China, pp. 835–844. ACM, New York (2011)Google Scholar
  21. 21.
    White, R.W., Ruthven, I., Jose, J.M.: Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In: Proceedings of SIGIR 2002, pp. 57–64. ACM, New York (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Rianne Kaptein
    • 1
  • Jaap Kamps
    • 1
    • 2
  1. 1.Archives and Information StudiesUniversity of AmsterdamThe Netherlands
  2. 2.ISLA, Informatics InstituteUniversity of AmsterdamThe Netherlands

Personalised recommendations