The Accessibility Dimension for Structured Document Retrieval

  • Thomas Roelleke
  • Mounia Lalmas
  • Gabriella Kazai
  • Ian Ruthven
  • Stefan Quicker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2291)

Abstract

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf -idf -acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf -idf -acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R., and Ribeiro-Neto, B. Modern Information Retrieval. Addison Wesley, 1999.Google Scholar
  2. 2.
    Baumgarten, C. A probabilistic model for distributed information retrieval. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Philadelphia, USA, 1997), pp. 258–266.Google Scholar
  3. 3.
    Bordogna, G., and Pasi, G. Flexible querying of structured documents. In Proceedings of Flexible Query Answering Systems (FQAS) (Warsaw, Poland, 2000), pp. 350–361.Google Scholar
  4. 4.
    Chellas, B. Modal Logic. Cambridge University Press, 1980.Google Scholar
  5. 5.
    Chiaramella, Y. Browsing and querying: two complementary approaches for multimedia information retrieval. In Proceedings Hypermedia-Information Retrieval-Multimedia (Dortmund, Germany, 1997). Invited talk.Google Scholar
  6. 6.
    Chiaramella, Y., Mulhem, P., and Fourel, F. A model for multimedia information retrieval. Tech. Rep. Fermi ESPRIT BRA 8134, University of Glasgow, 1996.Google Scholar
  7. 7.
    Edwards, D., and Hardman, L. Lost in hyperspace: Cognitive navigation in a hypertext environment. In Hypertext: Theory Into Practice (1993), pp. 90–105.Google Scholar
  8. 8.
    Frisse, M. Searching for information in a hypertext medical handbook. Communications of the ACM 31, 7 (1988), 880–886.CrossRefGoogle Scholar
  9. 9.
    Fuhr, N., and Roelleke, T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems 14, 1 (1997).Google Scholar
  10. 10.
    Iweha, C. Visualisation of Structured Documents: An Investigation into the Role of Visualising Structure for Information Retrieval Interfaces and Human Computer Interaction. PhD thesis, Queen Marty & Westfield College, 1999.Google Scholar
  11. 11.
    Lalmas, M., and Moutogianni, E. A Dempster-Shafer indexing for the focussed retrieval of hierarchically structured documents: Implememtation and experiments on a web museum collection. In 6th RIAO Conference, Content-Based Multimedia Information Access (Paris, France, 2000).Google Scholar
  12. 12.
    Lalmas, M., and Roelleke, T. Four-valued knowledge augmentation for structured document retrieval. Submitted for Publication.Google Scholar
  13. 13.
    Lalmas, M., and Ruthven, I. Representing and retrieving structured documents with Dempster-Shafer’s theory of evidence: Modelling and evaluation. Journal of Documentation 54, 5 (1998), 529–565.CrossRefGoogle Scholar
  14. 14.
    Mizzaro, S. Relevance: The whole story. Journal of the America Society for Information Science 48, 9 (1997), 810–832.CrossRefGoogle Scholar
  15. 15.
    Myaeng, S., Jang, D. H., Kim, M. S., and Zhoo, Z. C. A flexible model for retrieval of SGML documents. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia, 1998), pp. 138–145.Google Scholar
  16. 16.
    Quicker, S. Relevanzuntersuchung fur das Retrieval von strukturierten Dokumenten. Master’s thesis, University of Dortmund, 1998.Google Scholar
  17. 17.
    Roelleke, T. POOL: Probabilistic Object-Oriented Logical Representation and Retrieval of Complex Objects-A Model for Hypermedia Retrieva. PhD thesis, University of Dortmund, Germany, 1999.Google Scholar
  18. 18.
    van Rijsbergen, C. J. Information Retrieval, 2 ed. Butterworths, London, 1979.Google Scholar
  19. 19.
    Voorhees, E., and Harman, D. Overview of the Fifth Text REtrieval Conference (TREC-5). In Proceedings of the 5th Text Retrieval Conference (Gaitherburg, 1996), pp. 1–29.Google Scholar
  20. 20.
    Wilkinson, R. Effective retrieval of structured documents. In Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (Dublin, Ireland, 1994), pp. 311–317.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Thomas Roelleke
    • 1
    • 2
  • Mounia Lalmas
    • 2
  • Gabriella Kazai
    • 2
  • Ian Ruthven
    • 3
  • Stefan Quicker
    • 4
  1. 1.HySpirit GmbHDortmundGermany
  2. 2.Department of Computer ScienceQueen Mary, University of LondonLondonEngland
  3. 3.Department of Computer and Information SciencesUniversity of StrathclydeGlasgowScotland
  4. 4.Informatik VIUniversity of DortmundDortmundGermany

Personalised recommendations