Skip to main content

Automatically Language Patterns Elicitation from Biomedical Literature

  • Conference paper
Advances in Computational Science, Engineering and Information Technology

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 225))

  • 1182 Accesses

Abstract

The amount of research articles being published over the years has been overwhelming and number continues to rise with each day. This rapid growth combined with the unstructured nature of text written in natural languages has created the need to develop tools and methods that aid the process of information extraction, making it more accessible and utilizable. In this work, we present an approach for language pattern acquisition from the biomedical literature. In our method, all possible patterns are generated (candidates’ enumeration), and those patterns which have a match in the training corpus are selected. Equipped with genes and proteins names glossaries plus keywords database, we achieved a recall rate of 52.2% with precision of 40.9%, identifying 321 gene ontology terms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Putnam, N.C.: Searching MEDLINE free on the Internet using the National Library of Medicine’s PubMed. Clin Excell Nurse Pract. 2(5), 314–316 (1998)

    Google Scholar 

  2. NLM Systems: Data, News and Update Information. PubMed Update. Internet (April 18, 2011), http://www.nlm.nih.gov/bsd/revup/revup_pub.html#med_update

  3. Vastag, B.: NIH launches PubMed Central. J. Natl. Cancer Inst. 92(5), 374 (2000)

    Article  Google Scholar 

  4. The Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32(Database issue), D258–D261 (2004)

    Google Scholar 

  5. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32(Database issue), 262–266 (2004)

    Article  Google Scholar 

  6. Hirschman, L., Colosimo, M., Morgan, A., Yeh, A.: Overview of BioCreAtIvE task 1B: normalized gene lists. BMC Bioinformatics 6(suppl. 1), S11 (2005)

    Article  Google Scholar 

  7. Morgan, A., et al.: Overview of BioCreative II Gene Normalization. Genome Biology 9(suppl. 2), S3 (2008)

    Article  Google Scholar 

  8. Blaschke, C., Hoffmann, R., Oliveros, J.C., Valencia, A.: Extracting information automatically from biological literature. Comparative and Functional Genomics 2(5), 310–313 (2001)

    Article  Google Scholar 

  9. Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: Application to the knowledge domain of protein families. Bioinformatics 14, 600–607 (1998)

    Article  Google Scholar 

  10. Blaschke, C., Oliveros, J.C., Valencia, A.: Mining functional information associated to expression arrays. Functional and Integrative Genomics (2000) (in press)

    Google Scholar 

  11. Andrade, M.A., Brok, P.: Automatic extraction of information in molocular biology. FEBS Lett. 476, 12–17 (1998)

    Article  Google Scholar 

  12. Sudo, K., Sekine, S., Grishman, R.: Automatic pattern acquisition for Japanese information extraction. In: HLT 2001 Proceedings of the First International Conference on Human Language Technology Research (2001)

    Google Scholar 

  13. Mooney, R., Nahm, Y.: Text Mining with Information Extraction. In: Proceedings of the 4th International MIDP Colloquium, Multilingualism and Electronic Language Management, pp. 141–160 (September 2003)

    Google Scholar 

  14. Chowdhary, R., Zhang, J., Liu, J.S.: Bayesian inference of protein–protein interactions from biological literature. Bioinformatics 25(12), 1536–1542 (2009)

    Article  Google Scholar 

  15. Bui, Q., Katrenko, S., Sloot, P.: A hybrid approach to extract protein–protein interactions. Bioinformatics 27(2), 259–265 (2011)

    Article  Google Scholar 

  16. Liu, B., Qian, L., Zhou, G., Zhu, Q.: Exploiting dependency information for feature-based protein-protein interaction extraction. In: Jiang, L. (ed.) ICCE 2011. AISC, vol. 111, pp. 267–272. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: Protein-protein interactions, pp. 60–66. AAAI Press (1999)

    Google Scholar 

  18. Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information onprotein-protein interactions from the biological literature. Bioinformatics 17(2), 155–161 (2001)

    Article  Google Scholar 

  19. Huang, M., Zhu, X., Hao, Y., Payan, D.G., Qu, K., Li, M.: Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 20(18), 3604–3612 (2004)

    Article  Google Scholar 

  20. Surdeanu, M., Turmo, J., Ageno, A.: A hybrid approach for the acquisition of information extraction patterns. In: Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM 2006). ACL (2006)

    Google Scholar 

  21. Gaudan, S., Jimeno Yepes, A., Lee, V., Rebholz-Schuhmann, D.: Combining Evidence, Specificity, and Proximity towards the Normalization of Gene Ontology Terms in Text. EURASIP Journal on Bioinformatics and Systems Biology 2008, Article ID 342746

    Google Scholar 

  22. Beisswanger, E., Lee, V., Kim, J., Rebholz-Schuhmann, D., Splendiani, A., Dameron, O., Schulz, S., Hahn, U.: Gene Regulation Ontology (GRO): Design principles and use cases. Studies in Health Technology and Informatics. Studies in Health Technology and Informatics 136, 9–14 (2008)

    Google Scholar 

  23. Ashburner, M., Ball, C., Blake, J.A., Botstein, D., Butler, H., Cherry, M., Davis, A.P., Dolinski, K., Dwight, S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene Ontology: tool for the unification of biology, The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000)

    Article  Google Scholar 

  24. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32(Database issue), D262–D266 (2004)

    Article  Google Scholar 

  25. Settles, B.: ABNER: an open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005)

    Article  Google Scholar 

  26. Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), Geneva, Switzerland, pp. 104–107 (2004)

    Google Scholar 

  27. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the International Conference on Machine Learning (ICML), Williamstown, MA, USA, pp. 282–289 (2001)

    Google Scholar 

  28. Miller, G.: WordNet: A Lexical Database for English. Communications of the ACM 38(11) (November 1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyed Ziaeddin Alborzi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Alborzi, S.Z. (2013). Automatically Language Patterns Elicitation from Biomedical Literature. In: Nagamalai, D., Kumar, A., Annamalai, A. (eds) Advances in Computational Science, Engineering and Information Technology. Advances in Intelligent Systems and Computing, vol 225. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00951-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-00951-3_15

  • Publisher Name: Springer, Heidelberg

  • Print ISBN: 978-3-319-00950-6

  • Online ISBN: 978-3-319-00951-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics