Skip to main content

A Machine Learning Approach for the Curation of Biomedical Literature

  • Conference paper
  • First Online:
  • 1254 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Abstract

In the field of the biomedical sciences there exists a vast repository of information located within large quantities of research papers. Very often, researchers need to spend considerable amounts of time reading through entire papers before being able to determine whether or not they should be curated (archived). In this paper, we present an automated text classification system for the classification of biomedical papers. This classification is based on whether there is experimental evidence for the expression of molecular gene products for specified genes within a given paper. The system performs preprocessing and data cleaning, followed by feature extraction from the raw text. It subsequently classifies the paper using the extracted features with a Naïve Bayes Classifier. Our approach has made it possible to classify (and curate) biomedical papers automatically, thus potentially saving considerable time and resources. The system proved to be highly accurate, and won honourable mention in the KDD Cup 2002 task 1.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sekimizu, T., Hyun S PARK, Tsujii Junichi Constructing Title Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. (1998) Genome Informatics. Unviersal Academy Press, Inc.

    Google Scholar 

  2. Thomas, J., Milward, D., Ouzounis, C., Pulman S. and Carrol. M., Automatic Extraction of Protein Interactions from Scientific Abstracts in Pacific Symposium on Biocomputing 5, Honolulu, (2000) 538–549

    Google Scholar 

  3. Craven, M., Kumlien, J. Biological Knowledge Bases by Extracting Information from Text Sources Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology (1999)

    Google Scholar 

  4. Roberts, D. 1998, Drosophila: A Practical Approach, IRL Press

    Google Scholar 

  5. Flybase Website—A Database of the Drosophila Genome: http://www.flybase.org

  6. Michie, D., Spiegelhalter, D. J., and C. C. Taylor Machine learning of rules and trees. In Machine Learning, Neural and Statistical Classification. (1994). 50–83, Ellis Horwood, New York

    Google Scholar 

  7. KDD CUP 2002 WEBSITE: http://www.biostat.wisc.edu/~craven/kddcup/

  8. Cheng, J., Greiner, R. Learning Bayesian Belief Network Classifiers: Algorithms and System, (2001), Proceedings of the fourteenth Canadian conference on artificial intelligence) AI’2001

    Google Scholar 

  9. Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), (1997). 1145–1159

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shi, M. et al. (2003). A Machine Learning Approach for the Curation of Biomedical Literature. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_47

Download citation

  • DOI: https://doi.org/10.1007/3-540-36618-0_47

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-01274-0

  • Online ISBN: 978-3-540-36618-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics