Abstract
The amount of research articles being published over the years has been overwhelming and number continues to rise with each day. This rapid growth combined with the unstructured nature of text written in natural languages has created the need to develop tools and methods that aid the process of information extraction, making it more accessible and utilizable. In this work, we present an approach for language pattern acquisition from the biomedical literature. In our method, all possible patterns are generated (candidates’ enumeration), and those patterns which have a match in the training corpus are selected. Equipped with genes and proteins names glossaries plus keywords database, we achieved a recall rate of 52.2% with precision of 40.9%, identifying 321 gene ontology terms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Putnam, N.C.: Searching MEDLINE free on the Internet using the National Library of Medicine’s PubMed. Clin Excell Nurse Pract. 2(5), 314–316 (1998)
NLM Systems: Data, News and Update Information. PubMed Update. Internet (April 18, 2011), http://www.nlm.nih.gov/bsd/revup/revup_pub.html#med_update
Vastag, B.: NIH launches PubMed Central. J. Natl. Cancer Inst. 92(5), 374 (2000)
The Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32(Database issue), D258–D261 (2004)
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32(Database issue), 262–266 (2004)
Hirschman, L., Colosimo, M., Morgan, A., Yeh, A.: Overview of BioCreAtIvE task 1B: normalized gene lists. BMC Bioinformatics 6(suppl. 1), S11 (2005)
Morgan, A., et al.: Overview of BioCreative II Gene Normalization. Genome Biology 9(suppl. 2), S3 (2008)
Blaschke, C., Hoffmann, R., Oliveros, J.C., Valencia, A.: Extracting information automatically from biological literature. Comparative and Functional Genomics 2(5), 310–313 (2001)
Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: Application to the knowledge domain of protein families. Bioinformatics 14, 600–607 (1998)
Blaschke, C., Oliveros, J.C., Valencia, A.: Mining functional information associated to expression arrays. Functional and Integrative Genomics (2000) (in press)
Andrade, M.A., Brok, P.: Automatic extraction of information in molocular biology. FEBS Lett. 476, 12–17 (1998)
Sudo, K., Sekine, S., Grishman, R.: Automatic pattern acquisition for Japanese information extraction. In: HLT 2001 Proceedings of the First International Conference on Human Language Technology Research (2001)
Mooney, R., Nahm, Y.: Text Mining with Information Extraction. In: Proceedings of the 4th International MIDP Colloquium, Multilingualism and Electronic Language Management, pp. 141–160 (September 2003)
Chowdhary, R., Zhang, J., Liu, J.S.: Bayesian inference of protein–protein interactions from biological literature. Bioinformatics 25(12), 1536–1542 (2009)
Bui, Q., Katrenko, S., Sloot, P.: A hybrid approach to extract protein–protein interactions. Bioinformatics 27(2), 259–265 (2011)
Liu, B., Qian, L., Zhou, G., Zhu, Q.: Exploiting dependency information for feature-based protein-protein interaction extraction. In: Jiang, L. (ed.) ICCE 2011. AISC, vol. 111, pp. 267–272. Springer, Heidelberg (2011)
Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: Protein-protein interactions, pp. 60–66. AAAI Press (1999)
Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information onprotein-protein interactions from the biological literature. Bioinformatics 17(2), 155–161 (2001)
Huang, M., Zhu, X., Hao, Y., Payan, D.G., Qu, K., Li, M.: Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 20(18), 3604–3612 (2004)
Surdeanu, M., Turmo, J., Ageno, A.: A hybrid approach for the acquisition of information extraction patterns. In: Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM 2006). ACL (2006)
Gaudan, S., Jimeno Yepes, A., Lee, V., Rebholz-Schuhmann, D.: Combining Evidence, Specificity, and Proximity towards the Normalization of Gene Ontology Terms in Text. EURASIP Journal on Bioinformatics and Systems Biology 2008, Article ID 342746
Beisswanger, E., Lee, V., Kim, J., Rebholz-Schuhmann, D., Splendiani, A., Dameron, O., Schulz, S., Hahn, U.: Gene Regulation Ontology (GRO): Design principles and use cases. Studies in Health Technology and Informatics. Studies in Health Technology and Informatics 136, 9–14 (2008)
Ashburner, M., Ball, C., Blake, J.A., Botstein, D., Butler, H., Cherry, M., Davis, A.P., Dolinski, K., Dwight, S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene Ontology: tool for the unification of biology, The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000)
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32(Database issue), D262–D266 (2004)
Settles, B.: ABNER: an open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005)
Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), Geneva, Switzerland, pp. 104–107 (2004)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the International Conference on Machine Learning (ICML), Williamstown, MA, USA, pp. 282–289 (2001)
Miller, G.: WordNet: A Lexical Database for English. Communications of the ACMÂ 38(11) (November 1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Alborzi, S.Z. (2013). Automatically Language Patterns Elicitation from Biomedical Literature. In: Nagamalai, D., Kumar, A., Annamalai, A. (eds) Advances in Computational Science, Engineering and Information Technology. Advances in Intelligent Systems and Computing, vol 225. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00951-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-00951-3_15
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00950-6
Online ISBN: 978-3-319-00951-3
eBook Packages: EngineeringEngineering (R0)