Skip to main content

Supporting Biological Pathway Curation Through Text Mining

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 706))

Abstract

Text mining technology performs automated analysis of large document collections, in order to detect various aspects of information about their structure and meaning. This information can be used to develop systems that make it much easier for researchers to locate information of relevance to their needs in huge volumes of text, compared to standard search mechanisms. With a focus on the challenging task of constructing biological pathway models, which typically involves gathering, interpreting and combining complex information from a large number of publications, we show how text mining applications can provide various levels of support to ease the burden placed on pathway curators. Such support ranges from applications that provide help in searching and exploring the literature for evidence relevant to pathway reactions, to those which are able to make automated suggestions about how to construct and update pathway models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.nactem.ac.uk/Kleio/.

  2. 2.

    http://www.nactem.ac.uk/facta/.

  3. 3.

    http://www.nactem.ac.uk/facta-visualizer/.

  4. 4.

    http://www.nactem.ac.uk/medie/.

  5. 5.

    http://www.nactem.ac.uk/big_mechanism/.

References

  1. Caron, E., et al.: A comprehensive map of the mTOR signaling network. Mol. Syst. Biol. 6, 453 (2010)

    Article  Google Scholar 

  2. Oda, K., et al.: New challenges for text mining: mapping between text and manually curated pathways. BMC Bioinform. 9(Suppl 3), S5 (2008)

    Article  Google Scholar 

  3. Herrgard, M.J., et al.: A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nat. Biotechnol. 26(10), 1155–1160 (2008)

    Article  Google Scholar 

  4. Thiele, I., Palsson, B.Ø.: Reconstruction annotation jamborees: a community approach to systems biology. Mol. Syst. Biol. 6, 361 (2010)

    Article  Google Scholar 

  5. Ananiadou, S., McNaught, J. (eds.): Text Mining for Biology and Biomedicine. Artech House, Boston/London (2006)

    Google Scholar 

  6. Ananiadou, S., Kell, D.B., Tsujii, J.: Text mining and its potential applications in systems biology. Trends Biotechnol. 24(12), 571–579 (2006)

    Article  Google Scholar 

  7. Ananiadou, S.: Text mining bridging the gap between knowledge and text. In: Selected Papers of the XVIII International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2016), vol. 1752, pp. 140–141 (2016). http://ceur-ws.org/

  8. Rak, R., et al.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database: J. Biol. Databases Curation 2012 (2012). bas010

    Google Scholar 

  9. Rak, R., et al.: Interoperability and customisation of annotation schemata in Argo. In: Proceedings of LREC, pp. 3837–3842 (2014)

    Google Scholar 

  10. Ferrucci, D., et al.: Towards an interoperability standard for text and multi-modal analytics. IBM Research Report RC24122 (2006)

    Google Scholar 

  11. Batista-Navarro, R., Rak, R., Ananiadou, S.: Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics. J. Cheminf. 7(Suppl. 1), S6 (2015)

    Article  Google Scholar 

  12. Okazaki, N., Ananiadou, S., Tsujii, J.: Building a high-quality sense inventory for improved abbreviation disambiguation. Bioinformatics 26(9), 1246–1253 (2010)

    Article  Google Scholar 

  13. Alnazzawi, N., Thompson, P., Ananiadou, S.: Mapping phenotypic information in heterogeneous textual sources to a domain-specific terminological resource. PLoS ONE 11(9), e0162287 (2016)

    Article  Google Scholar 

  14. Nobata, C., et al.: Kleio: a knowledge-enriched information retrieval system for biology. In: Proceedings of the 31st Annual International ACM SIGIR, pp. 787–788 (2008)

    Google Scholar 

  15. Tsuruoka, Y., Tsujii, J., Ananiadou, S.: FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21), 2559–2560 (2008)

    Article  Google Scholar 

  16. Tsuruoka, Y., et al.: Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics 27(13), i111–i119 (2011)

    Article  Google Scholar 

  17. Miyao, Y., et al.: Semantic retrieval for the accurate identification of relational concepts in massive textbases. In: Proceedings of ACL, pp. 1017–1024 (2005)

    Google Scholar 

  18. Tsuruoka, Y., Tsujii, J.: Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proceedings of HLT/EMNLP, pp. 467–474 (2005)

    Google Scholar 

  19. Hara, T., Miyao, Y., Tsujii, J.: Adapting a probabilistic disambiguation model of an HPSG parser to a new domain. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 199–210. Springer, Heidelberg (2005). doi:10.1007/11562214_18

    Chapter  Google Scholar 

  20. Cohen, K.B., Palmer, M., Hunter, L.: Nominalization and alternations in biomedical language. PLoS ONE 3(9), e3158 (2008)

    Article  Google Scholar 

  21. Kim, J.-D., et al.: Extracting bio-molecular event from literature—The BioNLP’09 shared task. Computational Intelligence 27(4), 513–540 (2011)

    Article  MathSciNet  Google Scholar 

  22. Kim, J.-D., Pyysalo, S., Nedellec, C., Ananiadou, S., Tsujii, J. (eds.): Selected Articles from the BioNLP Shared Task 2011. BMC Bioinformatics, vol. 13, Suppl. 11 (2012)

    Google Scholar 

  23. Nédellec, C., Kim, J.-D., Pyysalo, S., Ananiadou, S., Zweigenbaum, P. (eds.): BioNLP Shared Task 2013: Part 1. BMC Bioinformatics, vol. 16, Suppl. 10 (2015)

    Google Scholar 

  24. Nédellec, C., Kim, J.-D., Pyysalo, S., Ananiadou, S., Zweigenbaum, P. (eds.): BioNLP Shared Task 2013: Part 2. BMC Bioinformatics, vol. 16, Suppl. 16 (2015)

    Google Scholar 

  25. Thompson, P., Iqbal, S., McNaught, J., Ananiadou, S.: Construction of an annotated corpus to support biomedical information extraction. BMC Bioinform. 10, 349 (2009)

    Article  Google Scholar 

  26. Pyysalo, S., et al.: BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinform. 8, 50 (2007)

    Article  Google Scholar 

  27. Ananiadou, S., et al.: Event-based text mining for biology and functional genomics. Brief. Funct. Genomics 14(3), 213–230 (2015)

    Article  Google Scholar 

  28. Miwa, M., et al.: Event extraction with complex event classification using rich features. J Bioinform. Comput. Biol. 8(1), 131–146 (2010)

    Article  Google Scholar 

  29. Sagae, K., Tsujii, J.: Dependency parsing and domain adaptation with LR models and parser ensembles. In: Proceedings of the CoNLL 2007 Shared Task, pp. 1044–1050 (2007)

    Google Scholar 

  30. Miyao, Y., et al.: Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 25(3), 394–400 (2009)

    Article  Google Scholar 

  31. Miwa, M., Ananiadou, S.: Adaptable, high recall, event extraction system with minimal configuration. BMC Bioinform. 16(Suppl. 10), S7 (2015)

    Article  Google Scholar 

  32. Miwa, M., Thompson, P., Ananiadou, S.: Boosting automatic event extraction from the literature using domain adaptation and coreference resolution. Bioinformatics 28(13), 1759–1765 (2012)

    Article  Google Scholar 

  33. Miwa, M., et al.: Extracting semantically enriched events from biomedical literature. BMC Bioinform. 13, 108 (2012)

    Article  Google Scholar 

  34. Nawaz, R., et al.: Meta-knowledge annotation of bio-events. Proc. LREC 2010, 2498–2507 (2010)

    Google Scholar 

  35. Nawaz, R., Thompson, P., Ananiadou, S.: Evaluating a meta-knowledge annotation scheme for bio-events. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 69–77 (2010)

    Google Scholar 

  36. Thompson, P., et al.: Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinform. 12, 393 (2011)

    Article  Google Scholar 

  37. Hucka, M., et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003)

    Article  Google Scholar 

  38. Hucka, M., et al.: Evolving a lingua franca and associated software infrastructure for computational systems biology: the Systems Biology Markup Language (SBML) project. Syst. Biol. 1(1), 41–53 (2004)

    Article  Google Scholar 

  39. Demir, E., et al.: The BioPAX community standard for pathway data sharing. Nat. Biotechnol. 28(9), 935–942 (2010)

    Article  Google Scholar 

  40. Ohta, T., Pyysalo, S., Tsujii, J.: From pathways to biomolecular events: opportunities and challenges. In: Proceedings of BioNLP 2011 Workshop, pp. 105–113 (2011)

    Google Scholar 

  41. Miwa, M., et al.: A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text. Bioinformatics 29(13), i44–i52 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

The work described in this article has been supported by the BBSRC-funded EMPATHY project (Grant No. BB/M006891/1) and by the DARPA-funded Big Mechanism project Grant No. DARPA-BAA-14-14).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sophia Ananiadou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ananiadou, S., Thompson, P. (2017). Supporting Biological Pathway Curation Through Text Mining. In: Kalinichenko, L., Kuznetsov, S., Manolopoulos, Y. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2016. Communications in Computer and Information Science, vol 706. Springer, Cham. https://doi.org/10.1007/978-3-319-57135-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57135-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57134-8

  • Online ISBN: 978-3-319-57135-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics