Building a Microblog Corpus for Search Result Diversification

  • Ke Tao
  • Claudia Hauff
  • Geert-Jan Houben
Conference paper

DOI: 10.1007/978-3-642-45068-6_22

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8281)
Cite this paper as:
Tao K., Hauff C., Houben GJ. (2013) Building a Microblog Corpus for Search Result Diversification. In: Banchs R.E., Silvestri F., Liu TY., Zhang M., Gao S., Lang J. (eds) Information Retrieval Technology. AIRS 2013. Lecture Notes in Computer Science, vol 8281. Springer, Berlin, Heidelberg

Abstract

Queries that users pose to search engines are often ambiguous - either because different users express different query intents with the same query terms or because the query is underspecified and it is unclear which aspect of a particular query the user is interested in. In the Web search setting, search result diversification, whose goal is the creation of a search result ranking covering a range of query intents or aspects of a single topic respectively, has been shown in recent years to be an effective strategy to satisfy search engine users. We hypothesize that such a strategy will also be beneficial for search on microblogging platforms. Currently, progress in this direction is limited due to the lack of a microblog-based diversification corpus. In this paper we address this shortcoming and present our work on creating such a corpus. We are able to show that this corpus fulfils a number of diversification criteria as described in the literature. Initial search and retrieval experiments evaluating the benefits of de-duplication in the diversification setting are also reported.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ke Tao
    • 1
  • Claudia Hauff
    • 1
  • Geert-Jan Houben
    • 1
  1. 1.Web Information SystemsTU DelftDelftThe Netherlands

Personalised recommendations