Abstract
We present a novel approach to fusing document lists that are retrieved in response to a query. Our approach is based on utilizing information induced from inter-document similarities. Specifically, the key insight guiding the derivation of our methods is that similar documents from different lists can provide relevance-status support to each other. We use a graph-based method to model relevance-status propagation between documents. The propagation is governed by inter-document-similarities and by retrieval scores of documents in the lists. Empirical evaluation shows the effectiveness of our methods in fusing TREC runs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Croft, W.B.: Combining approaches to information retrieval. In: [33], ch. 1, pp. 1–36.
Croft, W.B., Thompson, R.H.: I3R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science and Technology 38(6), 389–404 (1984)
Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Proceedings of TREC-2 (1994)
Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: SIGIR, pp. 21–28 (1995)
Lee, J.H.: Analyses of multiple evidence combination. In: Proceedings of SIGIR, pp. 267–276 (1997)
Das-Gupta, P., Katzer, J.: A study of the overlap among document representations. In: SIGIR, pp. 106–114 (1983)
Griffiths, A., Luckhurst, H.C., Willett, P.: Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS) 37(1), 3–11 (1986)
Chowdhury, A., Frieder, O., Grossman, D.A., McCabe, M.C.: Analyses of multiple-evidence combinations for retrieval strategies. In: Proceedings of SIGIR, pp. 394–395 (2001), poster
Soboroff, I., Nicholas, C.K., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proceedings of SIGIR, pp. 66–73 (2001)
Beitzel, S.M., Jensen, E.C., Chowdhury, A., Frieder, O., Grossman, D.A., Goharian, N.: Disproving the fusion hypothesis: An analysis of data fusion via effective information retrieval strategies. In: Proceedings of SAC, pp. 823–827 (2003)
van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Butterworths (1979)
Kurland, O., Lee, L.: PageRank without hyperlinks: Structural re-ranking using links induced by language models. In: Proceedings of SIGIR, pp. 306–313 (2005)
Kurland, O.: Inter-document similarities, language models, and ad hoc retrieval, PhD thesis. Cornell University (2006)
Diaz, F.: Regularizing ad hoc retrieval scores. In: Proceedings of CIKM, pp. 672–679 (2005)
Pinski, G., Narin, F.: Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management 12, 297–312 (1976)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International World Wide Web Conference, pp. 107–117 (1998)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of SIGIR, pp. 276–284 (2001)
Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of CIKM, pp. 538–548 (2002)
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of SIGIR, pp. 46–54 (1998)
Craswell, N., Hawking, D., Thistlewaite, P.B.: Merging results from isolated search engines. In: Proceedings of the Australian Database Conference, pp. 189–200 (1999)
Beitzel, S.M., Jensen, E.C., Frieder, O., Chowdhury, A., Pass, G.: Surrogate scoring for improved metasearch precision. In: Proceedings of SIGIR, pp. 583–584 (2005)
Selvadurai, S.B.: Implementing a metasearch framework with content-directed result merging, Master’s thesis. North Carolina State University (2007)
Daniłowicz, C., Baliński, J.: Document ranking based upon Markov chains. Information Processing and Management 41(4), 759–775 (2000)
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: Proceedings of SIGIR, pp. 504–511 (2005)
Kurland, O., Lee, L.: Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In: Proceedings of SIGIR, pp. 83–90 (2006)
Otterbacher, J., Erkan, G., Radev, D.R.: Using random walks for question-focused sentence retrieval. In: Proceedings of HLT/EMNLP, pp. 915–922 (2005)
Diaz, F.: A method for transferring retrieval scores between collections with non overlapping vocabularies. In: Proceedings of SIGIR, pp. 805–806 (2008) (poster)
Diaz, F.: Performance prediction using spatial autocorrelation. In: Proceedings of SIGIR, pp. 583–590 (2007)
Erkan, G., Radev, D.R.: LexPageRank: Prestige in multi-document text summarization. In: Proceedings of EMNLP, pp. 365–371 (2004), poster
Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004), poster
Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)
Croft, W.B. (ed.): Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval. The Kluwer International Series on Information Retrieval, vol. 7. Kluwer, Dordrecht (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kozorovitzky, A.K., Kurland, O. (2009). From “Identical” to “Similar”: Fusing Retrieved Lists Based on Inter-document Similarities. In: Azzopardi, L., et al. Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04417-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-04417-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04416-8
Online ISBN: 978-3-642-04417-5
eBook Packages: Computer ScienceComputer Science (R0)