A Machine Learning Approach for the Curation of Biomedical Literature

Shi, Min; Edwin, David S.; Menon, Rakesh; Shen, Lixiang; Lim, Jonathan Y. K.; Loh, Han Tong; Sathiya Keerthi, S.; Ong, Chong Jin

doi:10.1007/3-540-36618-0_47

A Machine Learning Approach for the Curation of Biomedical Literature

Min Shi⁵,
David S. Edwin⁵,
Rakesh Menon⁵,
Lixiang Shen⁵,
Jonathan Y. K. Lim⁵,
Han Tong Loh^5,6,
S. Sathiya Keerthi⁶ &
…
Chong Jin Ong⁶

Conference paper
First Online: 01 January 2003

1254 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Abstract

In the field of the biomedical sciences there exists a vast repository of information located within large quantities of research papers. Very often, researchers need to spend considerable amounts of time reading through entire papers before being able to determine whether or not they should be curated (archived). In this paper, we present an automated text classification system for the classification of biomedical papers. This classification is based on whether there is experimental evidence for the expression of molecular gene products for specified genes within a given paper. The system performs preprocessing and data cleaning, followed by feature extraction from the raw text. It subsequently classifies the paper using the extracted features with a Naïve Bayes Classifier. Our approach has made it possible to classify (and curate) biomedical papers automatically, thus potentially saving considerable time and resources. The system proved to be highly accurate, and won honourable mention in the KDD Cup 2002 task 1.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sekimizu, T., Hyun S PARK, Tsujii Junichi Constructing Title Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. (1998) Genome Informatics. Unviersal Academy Press, Inc.
Google Scholar
Thomas, J., Milward, D., Ouzounis, C., Pulman S. and Carrol. M., Automatic Extraction of Protein Interactions from Scientific Abstracts in Pacific Symposium on Biocomputing 5, Honolulu, (2000) 538–549
Google Scholar
Craven, M., Kumlien, J. Biological Knowledge Bases by Extracting Information from Text Sources Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology (1999)
Google Scholar
Roberts, D. 1998, Drosophila: A Practical Approach, IRL Press
Google Scholar
Flybase Website—A Database of the Drosophila Genome: http://www.flybase.org
Michie, D., Spiegelhalter, D. J., and C. C. Taylor Machine learning of rules and trees. In Machine Learning, Neural and Statistical Classification. (1994). 50–83, Ellis Horwood, New York
Google Scholar
KDD CUP 2002 WEBSITE: http://www.biostat.wisc.edu/~craven/kddcup/
Cheng, J., Greiner, R. Learning Bayesian Belief Network Classifiers: Algorithms and System, (2001), Proceedings of the fourteenth Canadian conference on artificial intelligence) AI’2001
Google Scholar
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), (1997). 1145–1159
Article Google Scholar

Download references

Author information

Authors and Affiliations

Design Technology Institute Ltd, Faculty of Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore, 119260
Min Shi, David S. Edwin, Rakesh Menon, Lixiang Shen, Jonathan Y. K. Lim & Han Tong Loh
ME Department, National University of Singapore, 10 Kent Ridge Crescent, Singapore
Han Tong Loh, S. Sathiya Keerthi & Chong Jin Ong

Authors

Min Shi
View author publications
You can also search for this author in PubMed Google Scholar
David S. Edwin
View author publications
You can also search for this author in PubMed Google Scholar
Rakesh Menon
View author publications
You can also search for this author in PubMed Google Scholar
Lixiang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Y. K. Lim
View author publications
You can also search for this author in PubMed Google Scholar
Han Tong Loh
View author publications
You can also search for this author in PubMed Google Scholar
S. Sathiya Keerthi
View author publications
You can also search for this author in PubMed Google Scholar
Chong Jin Ong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi, 1, 56124, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, M. et al. (2003). A Machine Learning Approach for the Curation of Biomedical Literature. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_47

Download citation

DOI: https://doi.org/10.1007/3-540-36618-0_47
Published: 15 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics