Chapter

Advances in Information Retrieval

Volume 2291 of the series Lecture Notes in Computer Science pp 284-302

Date:

The Accessibility Dimension for Structured Document Retrieval

  • Thomas RoellekeAffiliated withHySpirit GmbHDepartment of Computer Science, Queen Mary, University of London
  • , Mounia LalmasAffiliated withDepartment of Computer Science, Queen Mary, University of London
  • , Gabriella KazaiAffiliated withDepartment of Computer Science, Queen Mary, University of London
  • , Ian RuthvenAffiliated withDepartment of Computer and Information Sciences, University of Strathclyde
  • , Stefan QuickerAffiliated withKarlsruhe UniversityInformatik VI, University of Dortmund

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf -idf -acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf -idf -acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.