Skip to main content

Tasks, topics and relevance judging for the TREC Genomics Track: five years of experience evaluating biomedical text information retrieval systems

Abstract

With the help of a team of expert biologist judges, the TREC Genomics track has generated four large sets of “gold standard” test collections, comprised of over a hundred unique topics, two kinds of ad hoc retrieval tasks, and their corresponding relevance judgments. Over the years of the track, increasingly complex tasks necessitated the creation of judging tools and training guidelines to accommodate teams of part-time short-term workers from a variety of specialized biological scientific backgrounds, and to address consistency and reproducibility of the assessment process. Important lessons were learned about factors that influenced the utility of the test collections including topic design, annotations provided by judges, methods used for identifying and training judges, and providing a central moderator “meta-judge”.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  • Borlund, P. (2003). The concept of relevance in IR. Journal of the American Society for Information Science and Technology, 54(10), 913–925.

    Article  Google Scholar 

  • Burkhardt, K., Schneider, B., et al. (2006). A biocurator perspective: Annotation at the Research Collaboratory for Structural Bioinformatics Protein Data Bank. PLoS Computational Biology, 2(10), e99.

    Article  Google Scholar 

  • Cohen, A. M., & Hersh, W. R. (2006). The TREC 2004 genomics track categorization task: Classifying full text biomedical documents. Journal of Biomedical Discovery and Collaboration, 1, 4.

    Article  Google Scholar 

  • Cohen, K. B., Fox, L., et al. (2005). Empirical data on corpus design and usage in biomedical natural language processing. AMIA Annual Symposium Proceedings, 156–160.

  • Colosimo, M. E., Morgan, A. A., et al. (2005). Data preparation and interannotator agreement: BioCreAtIvE task 1B. BMC Bioinformatics, 6(Suppl 1), S12. Epub 2005 May 24.

    Article  Google Scholar 

  • Dingare, S., Finkel, J., et al. (2005). A system for identifying named entities in biomedical text: How results from two evaluations reflect on both the system and the evaluations. Comparative and Functional Genomics, 6, 77–85.

    Article  Google Scholar 

  • Gerstein, M., Seringhaus, M., et al. (2007). Structured digital abstract makes text mining easy. Nature, 447(7141), 142.

    Article  Google Scholar 

  • Hahn, U., Wermter, J., et al. (2007). Text mining: Powering the database revolution. Nature, 448(7150), 130.

    Article  Google Scholar 

  • Hersh, W., & Bhupatiraju, R. T. (2003). TREC genomics track overview. The Twelfth Text Retrieval Conference (TREC 2004). Gaithersburg, MD: National Institute for Standards & Technology.

    Google Scholar 

  • Hersh, W. R., Cohen, A., et al. (2006). TREC 2006 genomics track overview. The Fifteenth Text Retrieval Conference (TREC 2006). Gaithersburg, MD: National Institute for Standards & Technology.

    Google Scholar 

  • Hersh, W. R., Cohen, A., et al. (2007). TREC 2007 genomics track overview. The Sixteenth Text Retrieval Conference (TREC 2007). Gaithersburg, MD: National Institute for Standards & Technology.

    Google Scholar 

  • Hirschman, L., Morgan, A. A., et al. (2002). Rutabaga by any other name: Extracting biological names. Journal of Biomedical Informatics, 35(4), 247–259.

    Article  Google Scholar 

  • Hirschman, L., Yeh, A., et al. (2005). Overview of BioCreAtIvE: Critical assessment of information extraction for biology. BMC Bioinformatics, 6(Suppl 1), S1.

    Article  Google Scholar 

  • Hripcsak, G., & Wilcox, A. (2002). Reference standards, judges, and comparison subjects: Roles for experts in evaluating system performance. Journal of the American Medical Informatics Association, 9(1), 1–15.

    Article  Google Scholar 

  • Ide, N. C., Loane, R. F., et al. (2007). Essie: A concept-based search engine for structured biomedical text. Journal of the American Medical Informatics Association, 14(3), 253–263.

    Article  Google Scholar 

  • Lipscomb, C. E. (2000). Medical subject headings (MeSH). Bulletin of the Medical Library Association, 88(3), 265–266.

    Google Scholar 

  • Medlock, B. (2008). Exploring hedge identification in biomedical literature. Journal of Biomedical Informatics, 41(4), 636–654.

    Article  Google Scholar 

  • Pyysalo, S., Airola, A., et al. (2008). Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics, 9(Suppl 3), S6.

    Article  Google Scholar 

  • Salimi, N., & Vita, R. (2006). The biocurator: Connecting and enhancing scientific data. PLoS Computational Biology, 2(10), e125.

    Article  Google Scholar 

  • Voorhees, E. M. (2000). Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, 36, 697–716.

    Article  Google Scholar 

  • Xu, Y. C., & Chen, Z. (2006). Relevance judgment: What do information users consider beyond topicality? Journal of the American Society for Information Science and Technology, 57(7), 961–973.

    Article  Google Scholar 

  • Yeh, A. S., Hirschman, L., et al. (2003). Evaluation of text data mining for database curation: Lessons learned from the KDD Challenge Cup. Bioinformatics, 19(Suppl 1), i331–i339.

    Article  Google Scholar 

Download references

Acknowledgements

The TREC Genomics Track was funded by grant ITR-0325160 to W.R.H. from the U.S. National Science Foundation. The authors would like to thank the Genomics track steering committee, especially Kevin Bretonnel Cohen and Anna Divoli, for helpful discussions about relevance judgments and guidelines.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phoebe M. Roberts.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Roberts, P.M., Cohen, A.M. & Hersh, W.R. Tasks, topics and relevance judging for the TREC Genomics Track: five years of experience evaluating biomedical text information retrieval systems. Inf Retrieval 12, 81–97 (2009). https://doi.org/10.1007/s10791-008-9072-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10791-008-9072-x

Keywords

  • Reference standards
  • Evaluation
  • Inter-annotator agreement
  • Text mining
  • Information retrieval