Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform

Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Mihăilă, Claudiu; Thompson, Paul; Rak, Rafal; Nawaz, Raheel; Korkontzelos, Ioannis; Ananiadou, Sophia

doi:10.1007/978-3-642-37247-6_45

Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform

Riza Theresa Batista-Navarro¹⁷,
Georgios Kontonatsios¹⁷,
Claudiu Mihăilă¹⁷,
Paul Thompson¹⁷,
Rafal Rak¹⁷,
Raheel Nawaz¹⁷,
Ioannis Korkontzelos¹⁷ &
…
Sophia Ananiadou¹⁷

Conference paper

2319 Accesses
14 Citations
2 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

The analysis of discourse phenomena is essential in many natural language processing (NLP) applications. The growing diversity of available corpora and NLP tools brings a multitude of representation formats. In order to alleviate the problem of incompatible formats when constructing complex text mining pipelines, the Unstructured Information Management Architecture (UIMA) provides a standard means of communication between tools and resources. U-Compare, a text mining workflow construction platform based on UIMA, further enhances interoperability through a shared system of data types, allowing free combination of compliant components into workflows. Although U-Compare and its type system already support syntactic and semantic analyses, support for the analysis of discourse phenomena was previously lacking. In response, we have extended the U-Compare type system with new discourse-level types. We illustrate processing and visualisation of discourse information in U-Compare by providing several new deserialisation components for corpora containing discourse annotations. The new U-Compare is downloadable from http://nactem.ac.uk/ucompare.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kim, J.D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9, 10 (2008)
Article Google Scholar
Thompson, P., Nawaz, R., McNaught, J., Ananiadou, S.: Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics 12, 393 (2011)
Article Google Scholar
Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)
MATH Google Scholar
Sun, M., Chai, J.Y.: Discourse processing for context question answering based on linguistic knowledge. Knowledge-Based Systems 20, 511–526 (2007)
Article Google Scholar
Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering 10, 327–348 (2004)
Article Google Scholar
Kolluru, B., Hawizy, L., Murray-Rust, P., Tsujii, J., Ananiadou, S.: Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry. PLoS ONE 6, e20181 (2011)
Article Google Scholar
Kano, Y., Baumgartner Jr., W.A., McCrochon, L., Ananiadou, S., Cohen, K.B., Hunter, L., Tsujii, J.: U-Compare: share and compare text mining tools with UIMA. Bioinfomatics 25, 1997–1998 (2009)
Article Google Scholar
Kleinberg, S., Hripcsak, G.: A review of causal inference for biomedical informatics. Journal of Biomedical Informatics 44, 1102–1112 (2011)
Article Google Scholar
Thompson, P., Iqbal, S., McNaught, J., Ananiadou, S.: Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics 10, 349 (2009)
Article Google Scholar
Mihăilă, C., Ohta, T., Pyysalo, S., Ananiadou, S.: BioCause: Annotating and analysing causality in the biomedical domain. BMC Bioinformatics 14, 2 (2013)
Article Google Scholar
Prasad, R., McRoy, S., Frid, N., Joshi, A., Yu, H.: The Biomedical Discourse Relation Bank. BMC Bioinformatics 12, 188 (2011)
Article Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall (2008)
Google Scholar
Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: A Framework for Modeling the Local Coherence of Discourse. Comp. Ling. 21, 203–225 (1995)
Google Scholar
Walker, C.: ACE 2005 Multilingual Training Corpus (2006)
Google Scholar
Su, J., Yang, X., Hong, H., Tateisi, Y., Tsujii, J.: Coreference Resolution in Biomedical Texts: a Machine Learning Approach. In: Ashburner, M., Leser, U., Rebholz-Schuhmann, D. (eds.) Ontologies and Text Mining for Life Sciences: Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, vol. 08131 (2008)
Google Scholar
Batista-Navarro, R.T.B., Ananiadou, S.: Building a coreference-annotated corpus from the domain of biochemistry. In: Proceedings of BioNLP 2011, pp. 83–91 (2011)
Google Scholar
Stenetorp, P., Topić, G., Pyysalo, S., Ohta, T., Kim, J.D., Tsujii, J.: BioNLP Shared Task 2011: Supporting Resources. In: Proceedings of the BioNLP Shared Task 2011 Workshop, pp. 112–120. ACL (2011)
Google Scholar
Sandor, A., de Waard, A.: Identifying Claimed Knowledge Updates in Biomedical Research Articles. In: Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 7–10 (2012)
Google Scholar
Oda, K., Kim, J.D., Ohta, T., Okanohara, D., Matsuzaki, T., Tateisi, Y., Tsujii, J.: New challenges for text mining: mapping between text and manually curated pathways. BMC Bioinformatics 9, S5 (2008)
Article Google Scholar
Yeh, A., Hirschman, L., Morgan, A.: Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics 19, 331–339 (2003)
Article Google Scholar
Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of ACL, pp. 992–999 (2007)
Google Scholar
McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: Proceedings of the AMIA Annual Symposium, pp. 440–444 (2003)
Google Scholar
Mizuta, Y., Korhonen, A., Mullen, T., Collier, N.: Zone analysis in biology articles as a basis for information extraction. Int. J. Med. Inf. 75, 468–487 (2006)
Article Google Scholar
Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.: Corpora for the conceptualisation and zoning of scientific papers. In: Proceedings of LREC, pp. 2054–2061 (2010)
Google Scholar
Wilbur, W.J., Rzhetsky, A., Shatkay, H.: New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics 7, 356 (2006)
Article Google Scholar
Miwa, M., Thompson, P., McNaught, J., Kell, D., Ananiadou, S.: Extracting semantically enriched events from biomedical literature. BMC Bioinformatics 13, 108 (2012)
Article Google Scholar
Savova, G., Masanz, J., Ogren, P., Zheng, J., Sohn, S., Kipper-Schuler, K., Chute, C.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17, 507–513 (2010)
Article Google Scholar
Cunningham, H., Hanbury, A., Rüger, S.: Scaling Up High-Value Retrieval to Medium-Volume Data. In: Cunningham, H., Hanbury, A., Rüger, S. (eds.) IRFC 2010. LNCS, vol. 6107, pp. 1–5. Springer, Heidelberg (2010)
Chapter Google Scholar
Schäfer, U.: Middleware for creating and combining multi-dimensional NLP markup. In: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing, pp. 81–84. ACL (2006)
Google Scholar
Rak, R., Rowley, A., Black, W., Ananiadou, S.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database: The Journal of Biological Databases and Curation 2012 (2012)
Google Scholar
Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on NLP in Biomedicine and its Applications, pp. 104–107. ACL (2004)
Google Scholar
Gabbard, R., Freedman, M., Weischedel, R.: Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 288–293. Association for Computational Linguistics, Portland (2011)
Google Scholar
Tsuruoka, Y., Tsujii, J., Ananiadou, S.: Accelerating the annotation of sparse named entities by dynamic sentence selection. BMC Bioinformatics 9, S8 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The National Centre for Text Mining, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
Riza Theresa Batista-Navarro, Georgios Kontonatsios, Claudiu Mihăilă, Paul Thompson, Rafal Rak, Raheel Nawaz, Ioannis Korkontzelos & Sophia Ananiadou

Authors

Riza Theresa Batista-Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Kontonatsios
View author publications
You can also search for this author in PubMed Google Scholar
Claudiu Mihăilă
View author publications
You can also search for this author in PubMed Google Scholar
Paul Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Rafal Rak
View author publications
You can also search for this author in PubMed Google Scholar
Raheel Nawaz
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Korkontzelos
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Ananiadou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batista-Navarro, R.T. et al. (2013). Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-37247-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics