Evaluating Text Representations for Retrieval of the Best Group of Documents

Liu, Xiaoyong; Croft, W. Bruce

doi:10.1007/978-3-540-78646-7_43

Xiaoyong Liu¹ &
W. Bruce Croft¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4956))

Included in the following conference series:

European Conference on Information Retrieval

2173 Accesses
15 Citations

Abstract

Cluster retrieval assumes that the probability of relevance of a document should depend on the relevance of other similar documents to the same query. The goal is to find the best group of documents. Many studies have examined the effectiveness of this approach, by employing different retrieval methods or clustering algorithms, but few have investigated text representations. This paper revisits the problem of retrieving the best group of documents, from the language-modeling perspective. We analyze the advantages and disadvantages of a range of representation techniques, derive features that characterize the good document groups, and experiment with a new probabilistic representation as a first step toward incorporating these features. Empirical evaluation demonstrates that the relationship between documents can be leveraged in retrieval when a good representation technique is available, and that retrieving the best group of documents can be more effective than retrieving individual documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Croft, W.B.: A model of cluster searching based on classification. Information Systems 5, 189–195 (1980)
Article Google Scholar
Griffiths, A., Luckhurst, H.C., Willett, P.: Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science 37, 3–11 (1986)
Google Scholar
Hearst, M.A., Pedersen, J.O.: Re-examining the cluster hypothesis: Scatter/Gather on retrieval results. In: SIGIR 1996, pp. 76–84 (1996)
Google Scholar
Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7, 217–240 (1971)
Article Google Scholar
Krovetz, R.: Viewing Morphology as an Inference Process. In: SIGIR 1993, pp. 191–203 (1993)
Google Scholar
Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information retrieval. In: Proceedings of SIGIR 2004 conference, pp. 194–201 (2004)
Google Scholar
Leuski, A.: Evaluating Document Clustering for Interactive Information Retrieval. In: Proceedings of CIKM 2001 conference, pp. 33–40 (2001)
Google Scholar
Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proceedings of SIGIR 2004 conference, pp. 186–193 (2004)
Google Scholar
Liu, X.: Cluster-based retrieval from a language-modeling perspective. In: The Doctoral Consortium of SIGIR 2006 conference, pp. 737–738 (2006), Abstract in SIGIR 2006 Proceedings
Google Scholar
Liu, X., Croft, W.B.: Representing clusters for retrieval. In: Proceedings of SIGIR 2006 conference, pp. 671–672 (2006)
Google Scholar
Miller, D., Leek, T., Schwartz, R.: A hidden Markov model information retrieval system. In: SIGIR 1999, pp. 214–221 (1999)
Google Scholar
Ponte, J., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281 (1998)
Google Scholar
Robertson, S.E.: The probability ranking principle in IR. Journal of Documentation 33, 294–304 (1977)
Article Google Scholar
Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: Proceedings of HLT/NAACL 2006 (2006)
Google Scholar
Tombros, A., Villa, R., Van Rijsbergen, C.J.: The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management 38, 559–582 (2002)
Article MATH Google Scholar
van Rijsbergen, C.J., Croft, W.B.: Document clustering: An evaluation of some experiments with the Cranfield 1400 collection. Information Processing & Management 11, 171–182 (1975)
Article Google Scholar
van Rijsbergen, C.J., Sparck Jones, K.: A test for the separation of relevant and non-relevant documents in experimental retrieval collections. Journal of Documentation 29, 251–257 (1973)
Article Google Scholar
Voorhees, E.M.: The cluster hypothesis revisited. In: SIGIR 1985, pp. 188–196 (1985)
Google Scholar
Voorhees, E.M.: The TREC robust retrieval track. SIGIR Forum 39(1) (2005)
Google Scholar
Willet, P.: Query specific automatic document classification. International Forum on Information and Documentation 10(2), 28–32 (1985)
Google Scholar

Download references

Author information

Authors and Affiliations

CIIR, Computer Science Department, University of Massachusetts, 140 Governors Drive, Amherst, MA 01003, USA
Xiaoyong Liu & W. Bruce Croft

Authors

Xiaoyong Liu
View author publications
You can also search for this author in PubMed Google Scholar
W. Bruce Croft
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Craig Macdonald Iadh Ounis Vassilis Plachouras Ian Ruthven Ryen W. White

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Croft, W.B. (2008). Evaluating Text Representations for Retrieval of the Best Group of Documents. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-78646-7_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78645-0
Online ISBN: 978-3-540-78646-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics