The Importance of Focused Evaluations: A Case Study of TREC and DUC

Bailey, P., Craswell, N. & Hawking, D. (2003). Engineering a multi-purpose test collection for web retrieval experiments. Information Processing and Management, 39(6), 853–871.
Article Google Scholar
Baldwin, B., Donaway, R., Hovy, E., Liddy, E., Mani, I., Marcu, D., McKeown, K., Mittal, V., Moens, M., Radev, D., Sparck Jones, K., Sundheim, B., Teufel, S., Weischedel, R. & White, M. (2000). An evaluation road map for summarization research. http://duc.nist.gov/roadmapping.html.
Google Scholar
Carlson, L., Conroy, J., O’Leary, D., Marcu, D., Okurowski, M., Taylor, A. & Wong, W. (2001). An empirical study of the relation between abstracts, extracts, and the discourse structure of texts. In Proceedings of the 2001 document understanding conference (DUC2001), 11–18.
Google Scholar
Cleverdon, C., Mills, J. & Keen, E. (1966). Factors determining the performance of indexing systems, Vol. 1: design, Vol. 2: test results.} Aslib Cranfield Research Project. Cranfield, England.
Google Scholar
Firmin, T. & Chrzanowski, M. (1998). An evaluation of automatic text summarization systems. In I. Mani & M. Maybury (Eds.), Advances in Automatic Text Summarization (pp. 325–340). Cambridge, Massachusetts: MIT Press.
Google Scholar
Fox, E. (1983). Characteristics of two new experimental collections in computer and information science containing textual and bibliographic concepts. Technical Report TR 83-561. Cornell University: Computing Science Department.
Google Scholar
Fukusima, T. & Okumura, M. (2001). Text summarization challenge: text summarization evaluation at NTCIR Workshop2. In Proceedings of the second CIR workshop meeting on evaluation of Chinese and Japanese text retrieval and text summarization, 9–14.
Google Scholar
Gao, J., Xun, E., Zhou, M., Huang, C., Nie, J.-Y. & J. Zhang. (2001). TREC-9 CLIR experiments at MSRCN. In Proceedings of the ninth Text REtrieval Conference (TREC-9), 343–354.
Google Scholar
Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Girju, R., Rus, V. & Morarescu, P. (2001). Falcon: Boosting knowledge for answer engines. In Proceedings of the ninth Text REtrieval Conference (TREC-9), 479–489.
Google Scholar
Harman, D. (1993). Overview of the first Text REtrieval Conference (TREC-1). In Proceedings of the first Text REtrieval conference (TREC-1), 1–20.
Google Scholar
Harman, D. & Over, P. (2004). The effects of human variation in DUC summarization evaluation. In Proceedings of the text summarization workshop, ACL 2004, 10–17. Buckley, C. & Voorhees, E. (2000). Evaluating evaluation measure stability. In Proceedings of the 23th annual international ACM SIGIR conference on research and development in information retrieval, 33–40.
Google Scholar
Hawking, D. & Craswell, N. (1999). Overview of the TREC-7 very large collection track. In Proceedings of the seventh Text REtrieval Conference (TREC-7), 91–104.
Google Scholar
Jing, H., Barzilay, R., McKeown, K. & Elhadad, M. (1998). Summarization evaluation methods: experiments and analysis. In Intelligent text summarization: Papers from the 1998 AAAI spring symposium, 51–60.
Google Scholar
Lin, C-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Proceedings of the text summarization workshop, ACL 2004, 74–81.
Google Scholar
Mani, I., Firmin, T., House, D., Chrzanowski, M., Klein, G., Hirshman, L., Sundheim, B. & Obrst, L. (1998). The TIPSTER text summarization evaluation: final report. http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html}.
Google Scholar
Mani, I. & Maybury, M. (EdS). (1998). Advances in Automatic Text Summarization. Cambridge, Massachusetts: MIT Press.
Google Scholar
Marcu, D. (1999) The automatic construction of large-scale corpora for summarization research. In Proceedings of the 22th annual international ACM SIGIR conference on research and development in information retrieval, 137–144. Merchant, R. (1994). TIPSTER phase I: Program overview. In R. Merchant(Ed, Proceedings of the TIPSTER text program — phase I (pp.1–2). San Mateo, California: Morgan Kaufmnann Publishing Co.
Google Scholar
Nakao, Y. (2001). How small distinction among summaries can the evaluation method identify. In Proceedings of the second NTCIR workshop meeting on evaluation of Chinese and Japanese text retrieval and text summarization, 235–241
Google Scholar
Salton, G. (1972). A new comparison between conventional indexing (MEDLARS) and automatic indexing (SMART). Journal of the American Society for Information Science, 23, 75–84.
Google Scholar
Sparck Jones, K. (1981). Information retrieval experiment. London, UK: Butterworths.
Google Scholar
Sparck Jones, K. & van Rijsbergen, C. (1975). Report on the need for and provision of an “ideal” information retrieval test collection. British Library Research and Development Report 5266. University of Cambridge: Computer Laboratory.
Google Scholar
Sparck Jones, K. & van Rijsbergen, C. (1976). Information retrieval test collections. Journal of Documentation, 32(1), 59–75.
Google Scholar
Sparck Jones, K., Walker, S. & Robertson, S. (2000).A probabilistic model of information retrieval: development and comparative experiments–parts 1 and 2. Information Processing and Management, 36(6), 779–840.
Article Google Scholar
Voorhees, E. (2001). Evaluation by highly relevant documents. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, 74–82.
Google Scholar
Voorhees, E. & Tice, D. (2000). Building a question-answering test collection. In Proceedings of the 23th annual international ACM SIGIR conference on research and development in information retrieval, 200–207.
Google Scholar
Wu, L., Huang, X., Guo, Y., Liu, B. & Zhang, Y. (2001). FDU at TREC-9: CLIR, filtering and QA tasks. In Proceedings of the ninth Text REtrieval Conference (TREC-9), 189–202.
Google Scholar
Xu, J. & Weischedel, R. (2001). TREC-9 cross-lingual retrieval at BBN. In Proceedings of the ninth Text REtrieval Conference (TREC-9), 106.
Google Scholar

Download references

Authors

Donna Harman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing and Technology, University of Sunderland, Sunderland, UK
John I. Tait

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Harman, D. (2005). The Importance of Focused Evaluations: A Case Study of TREC and DUC. In: Tait, J.I. (eds) Charting a New Course: Natural Language Processing and Information Retrieval. The Kluwer International Series on Information Retrieval, vol 16. Springer, Dordrecht. https://doi.org/10.1007/1-4020-3467-9_11

Download citation

DOI: https://doi.org/10.1007/1-4020-3467-9_11
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-3343-8
Online ISBN: 978-1-4020-3467-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Importance of Focused Evaluations: A Case Study of TREC and DUC

Access this chapter

Preview

References