Abstract
Text mining technology performs automated analysis of large document collections, in order to detect various aspects of information about their structure and meaning. This information can be used to develop systems that make it much easier for researchers to locate information of relevance to their needs in huge volumes of text, compared to standard search mechanisms. With a focus on the challenging task of constructing biological pathway models, which typically involves gathering, interpreting and combining complex information from a large number of publications, we show how text mining applications can provide various levels of support to ease the burden placed on pathway curators. Such support ranges from applications that provide help in searching and exploring the literature for evidence relevant to pathway reactions, to those which are able to make automated suggestions about how to construct and update pathway models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Caron, E., et al.: A comprehensive map of the mTOR signaling network. Mol. Syst. Biol. 6, 453 (2010)
Oda, K., et al.: New challenges for text mining: mapping between text and manually curated pathways. BMC Bioinform. 9(Suppl 3), S5 (2008)
Herrgard, M.J., et al.: A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nat. Biotechnol. 26(10), 1155–1160 (2008)
Thiele, I., Palsson, B.Ø.: Reconstruction annotation jamborees: a community approach to systems biology. Mol. Syst. Biol. 6, 361 (2010)
Ananiadou, S., McNaught, J. (eds.): Text Mining for Biology and Biomedicine. Artech House, Boston/London (2006)
Ananiadou, S., Kell, D.B., Tsujii, J.: Text mining and its potential applications in systems biology. Trends Biotechnol. 24(12), 571–579 (2006)
Ananiadou, S.: Text mining bridging the gap between knowledge and text. In: Selected Papers of the XVIII International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2016), vol. 1752, pp. 140–141 (2016). http://ceur-ws.org/
Rak, R., et al.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database: J. Biol. Databases Curation 2012 (2012). bas010
Rak, R., et al.: Interoperability and customisation of annotation schemata in Argo. In: Proceedings of LREC, pp. 3837–3842 (2014)
Ferrucci, D., et al.: Towards an interoperability standard for text and multi-modal analytics. IBM Research Report RC24122 (2006)
Batista-Navarro, R., Rak, R., Ananiadou, S.: Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics. J. Cheminf. 7(Suppl. 1), S6 (2015)
Okazaki, N., Ananiadou, S., Tsujii, J.: Building a high-quality sense inventory for improved abbreviation disambiguation. Bioinformatics 26(9), 1246–1253 (2010)
Alnazzawi, N., Thompson, P., Ananiadou, S.: Mapping phenotypic information in heterogeneous textual sources to a domain-specific terminological resource. PLoS ONE 11(9), e0162287 (2016)
Nobata, C., et al.: Kleio: a knowledge-enriched information retrieval system for biology. In: Proceedings of the 31st Annual International ACM SIGIR, pp. 787–788 (2008)
Tsuruoka, Y., Tsujii, J., Ananiadou, S.: FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21), 2559–2560 (2008)
Tsuruoka, Y., et al.: Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics 27(13), i111–i119 (2011)
Miyao, Y., et al.: Semantic retrieval for the accurate identification of relational concepts in massive textbases. In: Proceedings of ACL, pp. 1017–1024 (2005)
Tsuruoka, Y., Tsujii, J.: Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proceedings of HLT/EMNLP, pp. 467–474 (2005)
Hara, T., Miyao, Y., Tsujii, J.: Adapting a probabilistic disambiguation model of an HPSG parser to a new domain. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 199–210. Springer, Heidelberg (2005). doi:10.1007/11562214_18
Cohen, K.B., Palmer, M., Hunter, L.: Nominalization and alternations in biomedical language. PLoS ONE 3(9), e3158 (2008)
Kim, J.-D., et al.: Extracting bio-molecular event from literature—The BioNLP’09 shared task. Computational Intelligence 27(4), 513–540 (2011)
Kim, J.-D., Pyysalo, S., Nedellec, C., Ananiadou, S., Tsujii, J. (eds.): Selected Articles from the BioNLP Shared Task 2011. BMC Bioinformatics, vol. 13, Suppl. 11 (2012)
Nédellec, C., Kim, J.-D., Pyysalo, S., Ananiadou, S., Zweigenbaum, P. (eds.): BioNLP Shared Task 2013: Part 1. BMC Bioinformatics, vol. 16, Suppl. 10 (2015)
Nédellec, C., Kim, J.-D., Pyysalo, S., Ananiadou, S., Zweigenbaum, P. (eds.): BioNLP Shared Task 2013: Part 2. BMC Bioinformatics, vol. 16, Suppl. 16 (2015)
Thompson, P., Iqbal, S., McNaught, J., Ananiadou, S.: Construction of an annotated corpus to support biomedical information extraction. BMC Bioinform. 10, 349 (2009)
Pyysalo, S., et al.: BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinform. 8, 50 (2007)
Ananiadou, S., et al.: Event-based text mining for biology and functional genomics. Brief. Funct. Genomics 14(3), 213–230 (2015)
Miwa, M., et al.: Event extraction with complex event classification using rich features. J Bioinform. Comput. Biol. 8(1), 131–146 (2010)
Sagae, K., Tsujii, J.: Dependency parsing and domain adaptation with LR models and parser ensembles. In: Proceedings of the CoNLL 2007 Shared Task, pp. 1044–1050 (2007)
Miyao, Y., et al.: Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 25(3), 394–400 (2009)
Miwa, M., Ananiadou, S.: Adaptable, high recall, event extraction system with minimal configuration. BMC Bioinform. 16(Suppl. 10), S7 (2015)
Miwa, M., Thompson, P., Ananiadou, S.: Boosting automatic event extraction from the literature using domain adaptation and coreference resolution. Bioinformatics 28(13), 1759–1765 (2012)
Miwa, M., et al.: Extracting semantically enriched events from biomedical literature. BMC Bioinform. 13, 108 (2012)
Nawaz, R., et al.: Meta-knowledge annotation of bio-events. Proc. LREC 2010, 2498–2507 (2010)
Nawaz, R., Thompson, P., Ananiadou, S.: Evaluating a meta-knowledge annotation scheme for bio-events. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 69–77 (2010)
Thompson, P., et al.: Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinform. 12, 393 (2011)
Hucka, M., et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003)
Hucka, M., et al.: Evolving a lingua franca and associated software infrastructure for computational systems biology: the Systems Biology Markup Language (SBML) project. Syst. Biol. 1(1), 41–53 (2004)
Demir, E., et al.: The BioPAX community standard for pathway data sharing. Nat. Biotechnol. 28(9), 935–942 (2010)
Ohta, T., Pyysalo, S., Tsujii, J.: From pathways to biomolecular events: opportunities and challenges. In: Proceedings of BioNLP 2011 Workshop, pp. 105–113 (2011)
Miwa, M., et al.: A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text. Bioinformatics 29(13), i44–i52 (2013)
Acknowledgements
The work described in this article has been supported by the BBSRC-funded EMPATHY project (Grant No. BB/M006891/1) and by the DARPA-funded Big Mechanism project Grant No. DARPA-BAA-14-14).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ananiadou, S., Thompson, P. (2017). Supporting Biological Pathway Curation Through Text Mining. In: Kalinichenko, L., Kuznetsov, S., Manolopoulos, Y. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2016. Communications in Computer and Information Science, vol 706. Springer, Cham. https://doi.org/10.1007/978-3-319-57135-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-57135-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57134-8
Online ISBN: 978-3-319-57135-5
eBook Packages: Computer ScienceComputer Science (R0)