Abstract
PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature is introduced. PubMiner utilize natural language processing and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature data. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language analysis. The extracted interactions are further analyzed with a set of features of each entity which were constructed from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The evaluation of system performance proceeded with the protein interaction data of S.cerevisiae (bakers yeast) from MIPS and SGD.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andrade, M.A., Borka, P.: Automated extraction of information in molecular biology. FEBS Letters 476, 12–17 (2000)
Chiang, J.H., et al.: GIS: a biomedical text–mining system for gene information discovery. Bioinformatics 20(1), 120–121 (2004)
Blaschke, C., et al.: Automatic extraction of biological information from scientific text: protein–protein interactions. In: Proc. of ISMB 1999, Heidelberg, Germany, pp. 60–67 (1999)
BioBiblioMetrics, http://www.bmm.icnet.uk/~stapleyb/biobib/
Tanabe, L., et al.: MedMiner: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 27, 1210–1217 (1999)
Safran, M., et al.: Human gene-centric databases at the Weizmann institute of science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res. 31(1), 142–146 (2003)
Andrade, M., Valencia, A., Automatic, A.: extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)
Perez-Iratxeta, C., et al.: XplorMed: a tool for exploring MEDLINE abstracts. Trends. Biochem. Sci. 26, 573–575 (2001)
Friedman, C., et al.: GENIS: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(Suppl.1), S74–S82 (2001)
Daraselia, N., et al.: Extracting human protein interactions from MEDLINE using a fullsentence parser. Bioinformatics 20(5), 604–611 (2004)
Nédellec, C., et al.: Machine learning for information extraction in genomics – state of the art and perspectives. In: Sirmakessis, S. (ed.) Text Mining and its Applications. Studies in Fuzzi. and Soft Comp., vol. 138, pp. 99–118. Springer, Heidelberg (2004)
Humphreys, B.L., et al.: The Unified Medical Language System: an informatics research collaboration. J. Am. Med. Inform. Assoc. 5, 1–11 (1998)
Kim, J.D., et al.: GENIA corpus - semantically annotated corpus for bio-textmining. Bioinformatics 19(Suppl. 1), i180–182 (2003)
Hwang, Y.S., et al.: Weighted probabilistic sum model based on decision tree decomposition for text chunking. Int. J. Comp. Proc. Orient. Lang. 16(1), 1–20 (2003)
Lee, K.J., et al.: Two-phase biomedical NE recognition based on SVMs. In: Proc. of ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 33–40 (2003)
Eom, J.H., et al.: PubMiner – a machine learning-based biomedical text mining system. Technical Report BI–TR0401), Biointelligence Lab., Seoul National University (2004)
Christie, K.R., et al.: Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32(1), D311–D314 (2004)
Mewes, H.W., et al.: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 32(1), D41–D44 (2004)
Agrawal, R., et al.: Mining association rules between sets of items in large databases. In: Proc. of ACM SIGMOD 1993, Washington D.C., USA, pp. 207–216 (1993)
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proc. of SIGIR 2000, Athens, Greece, pp. 208–215 (2000)
Yu, L., Liu, H.: Feature selection for high dimensional data: a fast correlation-based filter solution. In: Proc. of ICML 2003, Washington D.C., USA, pp. 856–863 (2003)
Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Press, W.H., et al.: Numerical recipes in C. Cambridge University Press, Cambridge (1988)
Oyama, T., et al.: Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18, 705–714 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eom, JH., Zhang, BT. (2004). PubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining. In: Bussler, C., Fensel, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2004. Lecture Notes in Computer Science(), vol 3192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30106-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-30106-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22959-9
Online ISBN: 978-3-540-30106-6
eBook Packages: Springer Book Archive