Skip to main content

Part of the book series: The Kluwer International Series on Information Retrieval ((INRE,volume 16))

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bailey, P., Craswell, N. & Hawking, D. (2003). Engineering a multi-purpose test collection for web retrieval experiments. Information Processing and Management, 39(6), 853–871.

    Article  Google Scholar 

  • Baldwin, B., Donaway, R., Hovy, E., Liddy, E., Mani, I., Marcu, D., McKeown, K., Mittal, V., Moens, M., Radev, D., Sparck Jones, K., Sundheim, B., Teufel, S., Weischedel, R. & White, M. (2000). An evaluation road map for summarization research. http://duc.nist.gov/roadmapping.html.

    Google Scholar 

  • Carlson, L., Conroy, J., O’Leary, D., Marcu, D., Okurowski, M., Taylor, A. & Wong, W. (2001). An empirical study of the relation between abstracts, extracts, and the discourse structure of texts. In Proceedings of the 2001 document understanding conference (DUC2001), 11–18.

    Google Scholar 

  • Cleverdon, C., Mills, J. & Keen, E. (1966). Factors determining the performance of indexing systems, Vol. 1: design, Vol. 2: test results.} Aslib Cranfield Research Project. Cranfield, England.

    Google Scholar 

  • Firmin, T. & Chrzanowski, M. (1998). An evaluation of automatic text summarization systems. In I. Mani & M. Maybury (Eds.), Advances in Automatic Text Summarization (pp. 325–340). Cambridge, Massachusetts: MIT Press.

    Google Scholar 

  • Fox, E. (1983). Characteristics of two new experimental collections in computer and information science containing textual and bibliographic concepts. Technical Report TR 83-561. Cornell University: Computing Science Department.

    Google Scholar 

  • Fukusima, T. & Okumura, M. (2001). Text summarization challenge: text summarization evaluation at NTCIR Workshop2. In Proceedings of the second CIR workshop meeting on evaluation of Chinese and Japanese text retrieval and text summarization, 9–14.

    Google Scholar 

  • Gao, J., Xun, E., Zhou, M., Huang, C., Nie, J.-Y. & J. Zhang. (2001). TREC-9 CLIR experiments at MSRCN. In Proceedings of the ninth Text REtrieval Conference (TREC-9), 343–354.

    Google Scholar 

  • Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Girju, R., Rus, V. & Morarescu, P. (2001). Falcon: Boosting knowledge for answer engines. In Proceedings of the ninth Text REtrieval Conference (TREC-9), 479–489.

    Google Scholar 

  • Harman, D. (1993). Overview of the first Text REtrieval Conference (TREC-1). In Proceedings of the first Text REtrieval conference (TREC-1), 1–20.

    Google Scholar 

  • Harman, D. & Over, P. (2004). The effects of human variation in DUC summarization evaluation. In Proceedings of the text summarization workshop, ACL 2004, 10–17. Buckley, C. & Voorhees, E. (2000). Evaluating evaluation measure stability. In Proceedings of the 23th annual international ACM SIGIR conference on research and development in information retrieval, 33–40.

    Google Scholar 

  • Hawking, D. & Craswell, N. (1999). Overview of the TREC-7 very large collection track. In Proceedings of the seventh Text REtrieval Conference (TREC-7), 91–104.

    Google Scholar 

  • Jing, H., Barzilay, R., McKeown, K. & Elhadad, M. (1998). Summarization evaluation methods: experiments and analysis. In Intelligent text summarization: Papers from the 1998 AAAI spring symposium, 51–60.

    Google Scholar 

  • Lin, C-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Proceedings of the text summarization workshop, ACL 2004, 74–81.

    Google Scholar 

  • Mani, I., Firmin, T., House, D., Chrzanowski, M., Klein, G., Hirshman, L., Sundheim, B. & Obrst, L. (1998). The TIPSTER text summarization evaluation: final report. http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html}.

    Google Scholar 

  • Mani, I. & Maybury, M. (EdS). (1998). Advances in Automatic Text Summarization. Cambridge, Massachusetts: MIT Press.

    Google Scholar 

  • Marcu, D. (1999) The automatic construction of large-scale corpora for summarization research. In Proceedings of the 22th annual international ACM SIGIR conference on research and development in information retrieval, 137–144. Merchant, R. (1994). TIPSTER phase I: Program overview. In R. Merchant(Ed, Proceedings of the TIPSTER text program — phase I (pp.1–2). San Mateo, California: Morgan Kaufmnann Publishing Co.

    Google Scholar 

  • Nakao, Y. (2001). How small distinction among summaries can the evaluation method identify. In Proceedings of the second NTCIR workshop meeting on evaluation of Chinese and Japanese text retrieval and text summarization, 235–241

    Google Scholar 

  • Salton, G. (1972). A new comparison between conventional indexing (MEDLARS) and automatic indexing (SMART). Journal of the American Society for Information Science, 23, 75–84.

    Google Scholar 

  • Sparck Jones, K. (1981). Information retrieval experiment. London, UK: Butterworths.

    Google Scholar 

  • Sparck Jones, K. & van Rijsbergen, C. (1975). Report on the need for and provision of an “ideal” information retrieval test collection. British Library Research and Development Report 5266. University of Cambridge: Computer Laboratory.

    Google Scholar 

  • Sparck Jones, K. & van Rijsbergen, C. (1976). Information retrieval test collections. Journal of Documentation, 32(1), 59–75.

    Google Scholar 

  • Sparck Jones, K., Walker, S. & Robertson, S. (2000).A probabilistic model of information retrieval: development and comparative experiments–parts 1 and 2. Information Processing and Management, 36(6), 779–840.

    Article  Google Scholar 

  • Voorhees, E. (2001). Evaluation by highly relevant documents. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, 74–82.

    Google Scholar 

  • Voorhees, E. & Tice, D. (2000). Building a question-answering test collection. In Proceedings of the 23th annual international ACM SIGIR conference on research and development in information retrieval, 200–207.

    Google Scholar 

  • Wu, L., Huang, X., Guo, Y., Liu, B. & Zhang, Y. (2001). FDU at TREC-9: CLIR, filtering and QA tasks. In Proceedings of the ninth Text REtrieval Conference (TREC-9), 189–202.

    Google Scholar 

  • Xu, J. & Weischedel, R. (2001). TREC-9 cross-lingual retrieval at BBN. In Proceedings of the ninth Text REtrieval Conference (TREC-9), 106.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer

About this chapter

Cite this chapter

Harman, D. (2005). The Importance of Focused Evaluations: A Case Study of TREC and DUC. In: Tait, J.I. (eds) Charting a New Course: Natural Language Processing and Information Retrieval. The Kluwer International Series on Information Retrieval, vol 16. Springer, Dordrecht. https://doi.org/10.1007/1-4020-3467-9_11

Download citation

  • DOI: https://doi.org/10.1007/1-4020-3467-9_11

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-3343-8

  • Online ISBN: 978-1-4020-3467-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics