Towards Automatic Pathway Generation from Biological Full-Text Publications

  • Ekaterina Buyko
  • Jörg Linde
  • Steffen Priebe
  • Udo Hahn
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7014)


We introduce an approach to the automatic generation of biological pathway diagrams from scientific literature. It is composed of the automatic extraction of single interaction relations which are typically found in the full text (rather than the abstract) of a scientific publication, and their subsequent integration into a complex pathway diagram. Our focus is here on relation extraction from full-text documents. We compare the performance of automatic full-text extraction procedures with a manually generated gold standard in order to validate the extracted data which serve as input for the pathway integration procedure.


relation extraction biological text mining automatic database generation pathway generation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Airola, A., Pyysalo, S., Björne, J., Pahikkala, T., Ginter, F., Salakoski, T.: A graph kernel for protein-protein interaction extraction. In: BioNLP 2008 – Proceedings of the ACL/HLT 2008 Workshop on Current Trends in Biomedical Natural Language Processing, Columbus, OH, USA, June 19, pp. 1–9 (2008)Google Scholar
  2. 2.
    Baumgartner Jr., W.A., Cohen, K.B., Fox, L.M., Acquaah-Mensah, G., Hunter, L.: Manual curation is not sufficient for annotation of genomic databases. Bioinformatics (ISMB/ECCB 2007 Supplement) 23(13), i41–i48 (2007)Google Scholar
  3. 3.
    Beisswanger, E., Lee, V., Kim, J.j., Rebholz-Schuhmann, D., Splendiani, A., Dameron, O., Schulz, S., Hahn, U.: Gene Regulation Ontology gro: Design principles and use cases. In: MIE 2008 – Proceedings of the 20th International Congress of the European Federation for Medical Informatics, Göteborg, Sweden, May 26-28, pp. 9–14 (2008)Google Scholar
  4. 4.
    Buyko, E., Beisswanger, E., Hahn, U.: The GeneReg corpus for gene expression regulation events: An overview of the corpus and its in-domain and out-of-domain interoperability. In: LREC 2010 – Proceedings of the 7th International Conference on Language Resources and Evaluation, La Valletta, Malta, May 19-21, pp. 2662–2666 (2010)Google Scholar
  5. 5.
    Buyko, E., Faessler, E., Wermter, J., Hahn, U.: Syntactic simplification and semantic enrichment: Trimming dependency graphs for event extraction. Computational Intelligence 27(4) (2011)Google Scholar
  6. 6.
    Buyko, E., Hahn, U.: Generating semantics for the life sciences via text analytics. In: ICSC 2011 – Proceedings of the 5th IEEE International Conference on Semantic Computing, Stanford University, CA, USA (September 19-21, 2011)Google Scholar
  7. 7.
    Hahn, U., Buyko, E., Landefeld, R., Mühlhausen, M., Poprat, M., Tomanek, K., Wermter, J.: An overview of JCoRe, the Julie Lab Uima component repository. In: Proceedings of the LREC 2008 Workshop ‘Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP’, Marrakech, Morocco, May 31, pp. 1–7 (2008)Google Scholar
  8. 8.
    Hahn, U., Tomanek, K., Buyko, E., Kim, J.J., Rebholz-Schuhmann, D.: How feasible and robust is the automatic extraction of gene regulation events? A cross-method evaluation under lab and real-life conditions. In: BioNLP 2009 – Proceedings of the NAACL/HLT BioNLP 2009 Workshop, Boulder, CO, USA, June 4-5, pp. 37–45 (2009)Google Scholar
  9. 9.
    Luciano, J.S., Stevens, R.D.: e-sience and biological pathway semantics. BMC Bioinformatics 8 (Suppl 3) (S3) (2007)Google Scholar
  10. 10.
    McDonald, R.T., Pereira, F., Kulick, S., Winters, R.S., Jin, Y., Pete, W.: Simple algorithms for complex relation extraction with applications to biomedical IE. In: ACL 2005 – Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI, USA, June 25-30, pp. 491–498 (2005)Google Scholar
  11. 11.
    Nédellec, C.: Learning Language in Logic: Genic interaction extraction challenge. In: Proceedings LLL 2005 – 4th Learning Language in Logic Workshop, Bonn, Germany, August 7, pp. 31–37 (2005)Google Scholar
  12. 12.
    Oda, K., Kim, J.D., Ohta, T., Okanohara, D., Matsuzaki, T., Tateisi, Y., Tsujii, J.: New challenges for text mining: Mapping between text and manually curated pathways. BMC Bioinformatics 9(suppl. 3) (S5) (2008)Google Scholar
  13. 13.
    Odds, F.C.: Candida and Candidosis, 2nd edn. Baillière Tindall, London (1988)Google Scholar
  14. 14.
    Rodríguez-Penagos, C., Salgado, H., Martínez-Flores, I., Collado-Vides, J.: Automatic reconstruction of a bacterial regulatory network using natural language processing. BMC Bioinformatics 8(293) (2007)Google Scholar
  15. 15.
    Sanchez, O., Poesio, M., Kabadjov, M.A., Tesar, R.: What kind of problems do protein interactions raise for anaphora resolution? A preliminary analysis. In: SMBM 2006 – Proceedings of the 2nd International Symposium on Semantic Mining in Biomedicine, Jena, Germany, April 9-12, pp. 109–112 (2006)Google Scholar
  16. 16.
    Viswanathan, G.A., Seto, J., Patil, S., Nudelman, G., Sealfon, S.C.: Getting started in biological pathway construction and analysis. PLoS Computational Biology 4(2), e16 (2008)CrossRefGoogle Scholar
  17. 17.
    Šarić, J., Jensen, L.J., Ouzounova, R., Rojas, I., Bork, P.: Extracting regulatory gene expression networks from PubMed. In: ACL 2004 – Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain, July 21-26, pp. 191–198 (2004)Google Scholar
  18. 18.
    Wermter, J., Tomanek, K., Hahn, U.: High-performance gene name normalization with GeNo. Bioinformatics 25(6), 815–821 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ekaterina Buyko
    • 1
  • Jörg Linde
    • 2
  • Steffen Priebe
    • 2
  • Udo Hahn
    • 1
  1. 1.Jena University Language & Information Engineering (JULIE) LabFriedrich-Schiller-Universität JenaGermany
  2. 2.Research Group Systems Biology / BioinformaticsLeibniz Institute for Natural Product Research and Infection Biology -, Hans Knöll Institute (HKI)Germany

Personalised recommendations