Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform

  • Riza Theresa Batista-Navarro
  • Georgios Kontonatsios
  • Claudiu Mihăilă
  • Paul Thompson
  • Rafal Rak
  • Raheel Nawaz
  • Ioannis Korkontzelos
  • Sophia Ananiadou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7816)

Abstract

The analysis of discourse phenomena is essential in many natural language processing (NLP) applications. The growing diversity of available corpora and NLP tools brings a multitude of representation formats. In order to alleviate the problem of incompatible formats when constructing complex text mining pipelines, the Unstructured Information Management Architecture (UIMA) provides a standard means of communication between tools and resources. U-Compare, a text mining workflow construction platform based on UIMA, further enhances interoperability through a shared system of data types, allowing free combination of compliant components into workflows. Although U-Compare and its type system already support syntactic and semantic analyses, support for the analysis of discourse phenomena was previously lacking. In response, we have extended the U-Compare type system with new discourse-level types. We illustrate processing and visualisation of discourse information in U-Compare by providing several new deserialisation components for corpora containing discourse annotations. The new U-Compare is downloadable from http://nactem.ac.uk/ucompare.

Keywords

UIMA interoperabilty U-Compare discourse causality coreference meta-knowledge 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kim, J.D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9, 10 (2008)CrossRefGoogle Scholar
  2. 2.
    Thompson, P., Nawaz, R., McNaught, J., Ananiadou, S.: Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics 12, 393 (2011)CrossRefGoogle Scholar
  3. 3.
    Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)MATHGoogle Scholar
  4. 4.
    Sun, M., Chai, J.Y.: Discourse processing for context question answering based on linguistic knowledge. Knowledge-Based Systems 20, 511–526 (2007)CrossRefGoogle Scholar
  5. 5.
    Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering 10, 327–348 (2004)CrossRefGoogle Scholar
  6. 6.
    Kolluru, B., Hawizy, L., Murray-Rust, P., Tsujii, J., Ananiadou, S.: Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry. PLoS ONE 6, e20181 (2011)CrossRefGoogle Scholar
  7. 7.
    Kano, Y., Baumgartner Jr., W.A., McCrochon, L., Ananiadou, S., Cohen, K.B., Hunter, L., Tsujii, J.: U-Compare: share and compare text mining tools with UIMA. Bioinfomatics 25, 1997–1998 (2009)CrossRefGoogle Scholar
  8. 8.
    Kleinberg, S., Hripcsak, G.: A review of causal inference for biomedical informatics. Journal of Biomedical Informatics 44, 1102–1112 (2011)CrossRefGoogle Scholar
  9. 9.
    Thompson, P., Iqbal, S., McNaught, J., Ananiadou, S.: Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics 10, 349 (2009)CrossRefGoogle Scholar
  10. 10.
    Mihăilă, C., Ohta, T., Pyysalo, S., Ananiadou, S.: BioCause: Annotating and analysing causality in the biomedical domain. BMC Bioinformatics 14, 2 (2013)CrossRefGoogle Scholar
  11. 11.
    Prasad, R., McRoy, S., Frid, N., Joshi, A., Yu, H.: The Biomedical Discourse Relation Bank. BMC Bioinformatics 12, 188 (2011)CrossRefGoogle Scholar
  12. 12.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall (2008)Google Scholar
  13. 13.
    Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: A Framework for Modeling the Local Coherence of Discourse. Comp. Ling. 21, 203–225 (1995)Google Scholar
  14. 14.
    Walker, C.: ACE 2005 Multilingual Training Corpus (2006)Google Scholar
  15. 15.
    Su, J., Yang, X., Hong, H., Tateisi, Y., Tsujii, J.: Coreference Resolution in Biomedical Texts: a Machine Learning Approach. In: Ashburner, M., Leser, U., Rebholz-Schuhmann, D. (eds.) Ontologies and Text Mining for Life Sciences: Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, vol. 08131 (2008)Google Scholar
  16. 16.
    Batista-Navarro, R.T.B., Ananiadou, S.: Building a coreference-annotated corpus from the domain of biochemistry. In: Proceedings of BioNLP 2011, pp. 83–91 (2011)Google Scholar
  17. 17.
    Stenetorp, P., Topić, G., Pyysalo, S., Ohta, T., Kim, J.D., Tsujii, J.: BioNLP Shared Task 2011: Supporting Resources. In: Proceedings of the BioNLP Shared Task 2011 Workshop, pp. 112–120. ACL (2011)Google Scholar
  18. 18.
    Sandor, A., de Waard, A.: Identifying Claimed Knowledge Updates in Biomedical Research Articles. In: Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 7–10 (2012)Google Scholar
  19. 19.
    Oda, K., Kim, J.D., Ohta, T., Okanohara, D., Matsuzaki, T., Tateisi, Y., Tsujii, J.: New challenges for text mining: mapping between text and manually curated pathways. BMC Bioinformatics 9, S5 (2008)CrossRefGoogle Scholar
  20. 20.
    Yeh, A., Hirschman, L., Morgan, A.: Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics 19, 331–339 (2003)CrossRefGoogle Scholar
  21. 21.
    Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of ACL, pp. 992–999 (2007)Google Scholar
  22. 22.
    McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: Proceedings of the AMIA Annual Symposium, pp. 440–444 (2003)Google Scholar
  23. 23.
    Mizuta, Y., Korhonen, A., Mullen, T., Collier, N.: Zone analysis in biology articles as a basis for information extraction. Int. J. Med. Inf. 75, 468–487 (2006)CrossRefGoogle Scholar
  24. 24.
    Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.: Corpora for the conceptualisation and zoning of scientific papers. In: Proceedings of LREC, pp. 2054–2061 (2010)Google Scholar
  25. 25.
    Wilbur, W.J., Rzhetsky, A., Shatkay, H.: New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics 7, 356 (2006)CrossRefGoogle Scholar
  26. 26.
    Miwa, M., Thompson, P., McNaught, J., Kell, D., Ananiadou, S.: Extracting semantically enriched events from biomedical literature. BMC Bioinformatics 13, 108 (2012)CrossRefGoogle Scholar
  27. 27.
    Savova, G., Masanz, J., Ogren, P., Zheng, J., Sohn, S., Kipper-Schuler, K., Chute, C.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17, 507–513 (2010)CrossRefGoogle Scholar
  28. 28.
    Cunningham, H., Hanbury, A., Rüger, S.: Scaling Up High-Value Retrieval to Medium-Volume Data. In: Cunningham, H., Hanbury, A., Rüger, S. (eds.) IRFC 2010. LNCS, vol. 6107, pp. 1–5. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  29. 29.
    Schäfer, U.: Middleware for creating and combining multi-dimensional NLP markup. In: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing, pp. 81–84. ACL (2006)Google Scholar
  30. 30.
    Rak, R., Rowley, A., Black, W., Ananiadou, S.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database: The Journal of Biological Databases and Curation 2012 (2012)Google Scholar
  31. 31.
    Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on NLP in Biomedicine and its Applications, pp. 104–107. ACL (2004)Google Scholar
  32. 32.
    Gabbard, R., Freedman, M., Weischedel, R.: Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 288–293. Association for Computational Linguistics, Portland (2011)Google Scholar
  33. 33.
    Tsuruoka, Y., Tsujii, J., Ananiadou, S.: Accelerating the annotation of sparse named entities by dynamic sentence selection. BMC Bioinformatics 9, S8 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Riza Theresa Batista-Navarro
    • 1
  • Georgios Kontonatsios
    • 1
  • Claudiu Mihăilă
    • 1
  • Paul Thompson
    • 1
  • Rafal Rak
    • 1
  • Raheel Nawaz
    • 1
  • Ioannis Korkontzelos
    • 1
  • Sophia Ananiadou
    • 1
  1. 1.The National Centre for Text MiningThe University of ManchesterManchesterUK

Personalised recommendations