Towards an Automated Analysis of Biomedical Abstracts

  • Barbara Gawronska
  • Björn Erlendsson
  • Björn Olsson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4075)


An essential part of bioinformatic research concerns the iterative process of validating hypotheses by analyzing facts stored in databases and in published literature. This process can be enhanced by language technology methods, in particular by automatic text understanding. Since it is becoming increasingly difficult to keep up with the vast number of scientific articles being published, there is a need for more easily accessible representations of the current knowledge. The goal of the research described in this paper is to develop a system aimed to support the large-scale research on metabolic and regulatory pathways by extracting relations between biological objects from descriptions found in literature. We present and evaluate the procedures for semantico-syntactic tagging, dividing the text into parts concerning previous research and current research, syntactic parsing, and transformation of syntactic trees into logical representations similar to the pathway graphs utilized in the Kyoto Encyclopaedia of Genes and Genomes.


Biological Object Training Corpus Proper Noun Common Noun Lexical Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baxevanis, A.D., Ouellette, B.F.F.: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd edn. Wiley-Interscience, Chichester (2004)Google Scholar
  2. 2.
    Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Press (2001)Google Scholar
  3. 3.
    Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)CrossRefGoogle Scholar
  4. 4.
    Becker, K.G., Hosack, D.A., Dennis Jr, G., Lempicki, R.A., Bright, T.J., Cheadle, C., Engel, J.: PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics 4, 61 (2003)CrossRefGoogle Scholar
  5. 5.
    Chaussabel, D., Sher, A.: Mining microarray expression data by literature profiling. Genome Biol. 3(10), research0055.1–research0055.16 (2002)Google Scholar
  6. 6.
    Darasiela, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20(5), 604-611 (2004)Google Scholar
  7. 7.
    Jelier, R., Jenster, G., Dorssers, L.C.J., van der Eijk, C.C., van Mulligen, E.M., Mons, B., Kors, J.A.: Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9), 2049–2058 (2005)CrossRefGoogle Scholar
  8. 8.
    Jenssen, T.K., Öberg, L.M.K., Andersson, M.L., Komorowski, J.: Methods for Large-Scale Mining of Networks of Human Genes. In: Proc. of The First SIAM Conference on Datamining, Chicago (April 2001)Google Scholar
  9. 9.
    Stapley, B., Benoit, G.: Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in Medline abstracts. In: Proceedings of PSB 2000, Hawaii, USA, pp. 529–540 (2000)Google Scholar
  10. 10.
    Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27(6), 1210–1217 (1999)Google Scholar
  11. 11.
    Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V., Garner, H.R.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 389–398 (2004)CrossRefGoogle Scholar
  12. 12.
    Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: A natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17 (2001)Google Scholar
  13. 13.
    Hahn, U., Romacker, M., Schulz, S.: Creating knowledge repositories from biomedical reports: The MEDSYNDIKATE text mining system. In: Pacific Symposium on Biocomputing 2002, Kauai, Hawaii, USA, pp. 338–349 (2002)Google Scholar
  14. 14.
    Park, J.C., Kim, H.S., Kim, J.J.: Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. In: Proceedings of PSB 2001, Hawaii, USA, pp. 396–407 (2001)Google Scholar
  15. 15.
    Pustejovsky, J., Castano, J.: Robust relational parsing over biomedical literature: Extracting inhibit relations. In: Proceedings of PSB 2002, Hawaii, USA, pp. 362–373 (2002)Google Scholar
  16. 16.
    Hishiki, T., Collier, N., Nobata, C., Okazaki-Ohta, T., Ogata, N., Sekimizu, T., Steiner, R., Park, H.S., Tsuji, J.: Developing NLP Tools for Genome Informatics: An Information Extraction Perspective. In: Proceedings of the 9th Workshop on Genome Informatics, pp. 81–90 (1998)Google Scholar
  17. 17.
    Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17, 155–161 (2001)CrossRefGoogle Scholar
  18. 18.
    Ng, S.-K., Wong, M.: Toward Routine Automatic Pathway Discovery from On-Line Scientific Text Abstracts. Genome Informatics 10, 104–112 (1999)Google Scholar
  19. 19.
    Rindflesch, T., Tanabe, L., Weinstein, J., Hunter, L.: EDGAR: Extraction of drugs, genes, and relations from biomedical literature. In: Proceedings of PSB 2000, Hawaii, USA, pp. 517–528 (2000)Google Scholar
  20. 20.
    Novichkova, S., Egorov, S., Daraselia, N.: MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 19(13), 1699–1706 (2003)CrossRefGoogle Scholar
  21. 21.
    Rosario, B., Hearst, M.A.: Classifying semantic relations in bioscience texts. In: Proceedings of ACL 2004, Barcelona, Spain (2004)Google Scholar
  22. 22.
    Roth, D., Yih, W.: A linear programming formulation for global inference in natural language tasks. In: Proc. CoNLL (2004)Google Scholar
  23. 23.
    Gawronska, B., Erlendsson, B.: Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics. In: Sharp, B. (ed.) Natural Language Understanding and Cognitive Science, Miami, USA, May 2005. Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science NLUCS 2005, pp. 68–77 (2005)Google Scholar
  24. 24.
    Gawronska, B., Erlendsson, B., Olsson, B.: Tracking Biological Relations in Text: A Referent Grammar Approach. In: Biomedical Ontologies and Text Processing, Workshop held in conjunction with the European Conference on Computational Biology, ECCB 2005, Madrid, Spain, September 28 (2005)Google Scholar
  25. 25.
    Gawronska, B., Olsson, B., de Vin, L.: Natural Language Technology In Multi-Source Information Fusion. In: Proceedings of the International IPSI 2004k Conference, Kopaonik, Serbia, (April 2004); (published on CD with ISBN 86-7466-117-3)Google Scholar
  26. 26.
    Olsson, B., Gawronska, B., Erlendsson, B.: Deriving Pathway Maps from Automated Text Analysis using a Grammar-based Approach. In: Proceedings of the 2nd Moscow Conference on Computational Molecular Biology (MCCMB), July 18-21, 2005 Moscow, Russia (2005)Google Scholar
  27. 27.
    Olsson, B., Gawronska, B., Erlendsson, B.: Deriving Pathway Maps from Automated Text Analysis using a Grammar-based Approach. Journal of Bioinformatics and Computational Biology (special issue) (to appear)Google Scholar
  28. 28.
    Gamalielsson, J., Olsson, B.: Gosap: Gene Ontology Based Semantic Alignment of Biological Pathways (to appear)Google Scholar
  29. 29.
    Gawronska, B., Erlendsson, B., Duczak, H.: Extracting semantic classes and morphosyntactic features for English-Polish Machine Translation. In: Proceedings of the 9th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2002), Keihanna, Japan, pp. 63–73 (2002)Google Scholar
  30. 30.
    Gawronska, B., Torstensson, N., Erlendsson, B.: Defining and Classifying Space Builders for Information Extraction. In: Sharp, B. (ed.) Proceedings of NLUCS- (Natural Language Understanding and Cognitive Science), Porto, Portugal, pp. 15–27 (April, 2004)Google Scholar
  31. 31.
    Miller, G.A.: WordNet: An on-line lexical database of English. In: Communications of ACM, vol. 38(11), pp. 39–41 (1995)Google Scholar
  32. 32.
    Kyoto Encyclopaedia of Genes and Genomes (2005),,
  33. 33.
    World Wide Web Consortium (W3C) (2005),
  34. 34.
    The Stanford Natural Language Processing Group (2006),
  35. 35.
    Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Barbara Gawronska
    • 1
  • Björn Erlendsson
    • 1
  • Björn Olsson
    • 1
  1. 1.School of Humanities and InformaticsUniversity of SkövdeSkövdeSweden

Personalised recommendations