PubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining

Eom, Jae-Hong; Zhang, Byoung-Tak

doi:10.1007/978-3-540-30106-6_22

Jae-Hong Eom²⁰ &
Byoung-Tak Zhang²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3192))

Included in the following conference series:

International Conference on Artificial Intelligence: Methodology, Systems, and Applications

849 Accesses
3 Citations

Abstract

PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature is introduced. PubMiner utilize natural language processing and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature data. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language analysis. The extracted interactions are further analyzed with a set of features of each entity which were constructed from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The evaluation of system performance proceeded with the protein interaction data of S.cerevisiae (bakers yeast) from MIPS and SGD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrade, M.A., Borka, P.: Automated extraction of information in molecular biology. FEBS Letters 476, 12–17 (2000)
Article Google Scholar
Chiang, J.H., et al.: GIS: a biomedical text–mining system for gene information discovery. Bioinformatics 20(1), 120–121 (2004)
Article Google Scholar
Blaschke, C., et al.: Automatic extraction of biological information from scientific text: protein–protein interactions. In: Proc. of ISMB 1999, Heidelberg, Germany, pp. 60–67 (1999)
Google Scholar
BioBiblioMetrics, http://www.bmm.icnet.uk/~stapleyb/biobib/
Tanabe, L., et al.: MedMiner: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 27, 1210–1217 (1999)
MathSciNet Google Scholar
Safran, M., et al.: Human gene-centric databases at the Weizmann institute of science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res. 31(1), 142–146 (2003)
Article Google Scholar
Andrade, M., Valencia, A., Automatic, A.: extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)
Article Google Scholar
Perez-Iratxeta, C., et al.: XplorMed: a tool for exploring MEDLINE abstracts. Trends. Biochem. Sci. 26, 573–575 (2001)
Article Google Scholar
Friedman, C., et al.: GENIS: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(Suppl.1), S74–S82 (2001)
Google Scholar
Daraselia, N., et al.: Extracting human protein interactions from MEDLINE using a fullsentence parser. Bioinformatics 20(5), 604–611 (2004)
Article Google Scholar
Nédellec, C., et al.: Machine learning for information extraction in genomics – state of the art and perspectives. In: Sirmakessis, S. (ed.) Text Mining and its Applications. Studies in Fuzzi. and Soft Comp., vol. 138, pp. 99–118. Springer, Heidelberg (2004)
Google Scholar
Humphreys, B.L., et al.: The Unified Medical Language System: an informatics research collaboration. J. Am. Med. Inform. Assoc. 5, 1–11 (1998)
Article Google Scholar
Kim, J.D., et al.: GENIA corpus - semantically annotated corpus for bio-textmining. Bioinformatics 19(Suppl. 1), i180–182 (2003)
Article Google Scholar
Hwang, Y.S., et al.: Weighted probabilistic sum model based on decision tree decomposition for text chunking. Int. J. Comp. Proc. Orient. Lang. 16(1), 1–20 (2003)
Article Google Scholar
Lee, K.J., et al.: Two-phase biomedical NE recognition based on SVMs. In: Proc. of ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 33–40 (2003)
Google Scholar
Eom, J.H., et al.: PubMiner – a machine learning-based biomedical text mining system. Technical Report BI–TR0401), Biointelligence Lab., Seoul National University (2004)
Google Scholar
Christie, K.R., et al.: Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32(1), D311–D314 (2004)
Article Google Scholar
Mewes, H.W., et al.: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 32(1), D41–D44 (2004)
Article Google Scholar
Agrawal, R., et al.: Mining association rules between sets of items in large databases. In: Proc. of ACM SIGMOD 1993, Washington D.C., USA, pp. 207–216 (1993)
Google Scholar
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proc. of SIGIR 2000, Athens, Greece, pp. 208–215 (2000)
Google Scholar
Yu, L., Liu, H.: Feature selection for high dimensional data: a fast correlation-based filter solution. In: Proc. of ICML 2003, Washington D.C., USA, pp. 856–863 (2003)
Google Scholar
Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Press, W.H., et al.: Numerical recipes in C. Cambridge University Press, Cambridge (1988)
MATH Google Scholar
Oyama, T., et al.: Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18, 705–714 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biointelligence Lab., School of Computer Science and Engineering, Seoul National University, Seoul, 151-744, South Korea
Jae-Hong Eom & Byoung-Tak Zhang

Authors

Jae-Hong Eom
View author publications
You can also search for this author in PubMed Google Scholar
Byoung-Tak Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Cisco Systems, Inc, 95134, San Jose, CA, USA
Christoph Bussler
DERI Innsbruck, University of Innsbruck, Austria
Dieter Fensel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eom, JH., Zhang, BT. (2004). PubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining. In: Bussler, C., Fensel, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2004. Lecture Notes in Computer Science(), vol 3192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30106-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-30106-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22959-9
Online ISBN: 978-3-540-30106-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics