European Conference on Information Retrieval

ECIR 2002: Advances in Information Retrieval pp 284-302

The Accessibility Dimension for Structured Document Retrieval

  • Thomas Roelleke
  • Mounia Lalmas
  • Gabriella Kazai
  • Ian Ruthven
  • Stefan Quicker
Conference paper

DOI: 10.1007/3-540-45886-7_19

Volume 2291 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Roelleke T., Lalmas M., Kazai G., Ruthven I., Quicker S. (2002) The Accessibility Dimension for Structured Document Retrieval. In: Crestani F., Girolami M., van Rijsbergen C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg

Abstract

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf -idf -acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf -idf -acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Thomas Roelleke
    • 1
    • 2
  • Mounia Lalmas
    • 2
  • Gabriella Kazai
    • 2
  • Ian Ruthven
    • 3
  • Stefan Quicker
    • 4
  1. 1.HySpirit GmbHDortmundGermany
  2. 2.Department of Computer ScienceQueen Mary, University of LondonLondonEngland
  3. 3.Department of Computer and Information SciencesUniversity of StrathclydeGlasgowScotland
  4. 4.Informatik VIUniversity of DortmundDortmundGermany