Skip to main content

Construction of a Test Collection for the Focussed Retrieval of Structured Documents

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Included in the following conference series:

Abstract

In this paper, we examine the methodological issues involved in constructing test collections of structured documents and obtaining best entry points for the evaluation of the focussed retrieval of document components. We describe a pilot test of the proposed test collection construction methodology performed on a document collection of Shakespeare plays. In our analysis, we examine the effect of query complexity and type on overall query difficulty, the use of multiple relevance judges for each query, the problem of obtaining exhaustive relevance assessments from participants, and the method of eliciting relevance assessments and best entry points. Our findings indicate that the methodology is indeed feasible in this small-scale context, and merits further investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. In: 7th WWW Conference, Brisbane, Australia (1998)

    Google Scholar 

  2. Silva, I., Ribeiro-Neto, B., Calado, P., Moura, E., Ziviani, N.: Link-Based and Content-Based Evidential Information in a Belief Network Model. In: 23rd ACM-SIGIR, Athens (2000)

    Google Scholar 

  3. Géry, M., Chevallet, J-P.:Toward a Structured Information Retrieval System on the Web: Automatic Structure Extraction of Web Pages. In: Pre-Proceedings of the International Workshop on Web Dynamics, London (2001)

    Google Scholar 

  4. Wilkinson, R.: Effective Retrieval of Structured Documents. In: 17th ACM-SIGIR, Dublin (1994) 311–317

    Google Scholar 

  5. Kotsakis, E.: Structured Information Retrieval in XML documents. In: Proceedings of the 17th ACM Symposium on Applied Computing (SAC’02), Madrid, Spain (2002)

    Google Scholar 

  6. Myaeng, S., Jang, D.H., Kim, M.S., Zhoo, Z.C.: A Flexible Model for Retrieval of SGML Documents. In: 21st ACM-SIGIR, Melbourne, Australia (1998) 138–145

    Google Scholar 

  7. Roelleke, T.: POOL: Probabilistic Object-Oriented Logical Representation and Retrieval of Complex Objects — A Model for Hypermedia Retrieval, Ph.D. Thesis, University of Dortmund, Verlag-Shaker (1999)

    Google Scholar 

  8. Fuhr, N., Großjohann K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: 24th ACM-SIGIR, New Orleans (2001) 172–180

    Google Scholar 

  9. Chiaramella, Y., Mulhem, P., Fourel, F.: A Model for Multimedia Information Retrieval, Technical Report Fermi ESPRIT BRA 8134, University of Glasgow (1996)

    Google Scholar 

  10. Callan, J.: Passage-Level Evidence in Document Retrieval. In: 17th ACM SIGIR, Dublin (1994) 302–310

    Google Scholar 

  11. Salton, G., Allan, J., Buckley, C.: Approaches to Passage Retrieval in Full Text Information Systems. In: 16th ACM SGIR, Pittsburgh (1993) 49–58

    Google Scholar 

  12. Burkowski, F.J.: Retrieval Activities in a Database Consisting of Heterogeneous Collections of Structured Texts. In: 15th ACM SIGIR, Copenhagen (1992) 112–125

    Google Scholar 

  13. Navarro, G., Baeza-Yates, R.: A Language for Queries on Structure and Content of Textual Databases. In: 18th ACM-SIGIR, Seattle (1995) 93–101

    Google Scholar 

  14. Frisse, M.: Searching for Information in a Hypertext Medical Handbook. Communications of the ACM 31 (1988) 880–886

    Article  Google Scholar 

  15. Lalmas, M., Moutogianni, E.: A Dempster-Shafer Indexing for the Focussed Retrieval of a Hierarchically Structured Document Space: Implementation and Experiments on a Web Museum Collection. In: 6th RIAO Conference on Content-Based Multimedia Information Access, Paris (2000)

    Google Scholar 

  16. Roelleke, T., Lalmas, M., Kazai, G., Ruthven, I., Quicker, S.: The Accessibility Dimension for Structured Document Retrieval. In: 24th European Conference on Information Retrieval Research (ECIR’02), Glasgow (2002)

    Google Scholar 

  17. Kazai, G., Lalmas, M., Roelleke, T.: A Model for the Representation and Focussed Retrieval of Structured Documents based on Fuzzy Aggregation. In: String Processing and Information Retrieval (SPIRE 2001), Laguna De San Rafael, Chile (2001)

    Google Scholar 

  18. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley (1999)

    Google Scholar 

  19. http://www.trec.nist.gov. TREC web site

  20. Chinenyanga, T.P., Kushmerick, N.: Expressive Retrieval from XML Documents. In: 24th ACM-SIGIR, New Orleans (2001) 163–171

    Google Scholar 

  21. Harman, D.K.: The TREC Conferences. In: Kuhlen, R., Rittberger, M. (eds.): Hypertext-Information Retrieval-Multimedia: Proceedings of HIM 95, Konstanz, Germany (1995) 9–28

    Google Scholar 

  22. Janes, J.W.: Other People’s Judgments: A Comparison of Users’ and Others’ Judgments of Document Relevance, Topicality and Utility. Journal of the American Society of Information Science 45 (1994) 160–171

    Article  Google Scholar 

  23. Shaw, W.M., Wood, J.B., Wood, R.E., Tibbo, H.R.: The Cystic Fibrosis Database: Content and Research Opportunities. Library and Information Science Research 13 (1991) 347–366

    Google Scholar 

  24. Vorhees, E.M.: Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. In: Croft, W.B., Moffat, A., van Rijsbergen, C.J., Wilkinson, R., Zobel, J. (eds.): 21st ACM-SIGIR, Melbourne (1998) 315–323

    Google Scholar 

  25. Lalmas, M., Reid, J., Hertzum, M.: Information Seeking Behaviour in the Context of Structured Documents. In preparation

    Google Scholar 

  26. Finesilver, K., Reid J. User behaviour in the Context of Structured Documents. To appear in: 25th European Conference on Information Retrieval Research (ECIR’03), Pisa (2003)

    Google Scholar 

  27. Fuhr, N., Goevert, N., Kazai, G., Lalmas, M. (eds.): INEX Proceedings, Schloss Dagstuhl (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kazai, G., Lalmas, M., Reid, J. (2003). Construction of a Test Collection for the Focussed Retrieval of Structured Documents. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-36618-0_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-01274-0

  • Online ISBN: 978-3-540-36618-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics