Skip to main content

Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

The analysis of discourse phenomena is essential in many natural language processing (NLP) applications. The growing diversity of available corpora and NLP tools brings a multitude of representation formats. In order to alleviate the problem of incompatible formats when constructing complex text mining pipelines, the Unstructured Information Management Architecture (UIMA) provides a standard means of communication between tools and resources. U-Compare, a text mining workflow construction platform based on UIMA, further enhances interoperability through a shared system of data types, allowing free combination of compliant components into workflows. Although U-Compare and its type system already support syntactic and semantic analyses, support for the analysis of discourse phenomena was previously lacking. In response, we have extended the U-Compare type system with new discourse-level types. We illustrate processing and visualisation of discourse information in U-Compare by providing several new deserialisation components for corpora containing discourse annotations. The new U-Compare is downloadable from http://nactem.ac.uk/ucompare.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kim, J.D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9, 10 (2008)

    Article  Google Scholar 

  2. Thompson, P., Nawaz, R., McNaught, J., Ananiadou, S.: Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics 12, 393 (2011)

    Article  Google Scholar 

  3. Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  4. Sun, M., Chai, J.Y.: Discourse processing for context question answering based on linguistic knowledge. Knowledge-Based Systems 20, 511–526 (2007)

    Article  Google Scholar 

  5. Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering 10, 327–348 (2004)

    Article  Google Scholar 

  6. Kolluru, B., Hawizy, L., Murray-Rust, P., Tsujii, J., Ananiadou, S.: Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry. PLoS ONE 6, e20181 (2011)

    Article  Google Scholar 

  7. Kano, Y., Baumgartner Jr., W.A., McCrochon, L., Ananiadou, S., Cohen, K.B., Hunter, L., Tsujii, J.: U-Compare: share and compare text mining tools with UIMA. Bioinfomatics 25, 1997–1998 (2009)

    Article  Google Scholar 

  8. Kleinberg, S., Hripcsak, G.: A review of causal inference for biomedical informatics. Journal of Biomedical Informatics 44, 1102–1112 (2011)

    Article  Google Scholar 

  9. Thompson, P., Iqbal, S., McNaught, J., Ananiadou, S.: Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics 10, 349 (2009)

    Article  Google Scholar 

  10. Mihăilă, C., Ohta, T., Pyysalo, S., Ananiadou, S.: BioCause: Annotating and analysing causality in the biomedical domain. BMC Bioinformatics 14, 2 (2013)

    Article  Google Scholar 

  11. Prasad, R., McRoy, S., Frid, N., Joshi, A., Yu, H.: The Biomedical Discourse Relation Bank. BMC Bioinformatics 12, 188 (2011)

    Article  Google Scholar 

  12. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall (2008)

    Google Scholar 

  13. Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: A Framework for Modeling the Local Coherence of Discourse. Comp. Ling. 21, 203–225 (1995)

    Google Scholar 

  14. Walker, C.: ACE 2005 Multilingual Training Corpus (2006)

    Google Scholar 

  15. Su, J., Yang, X., Hong, H., Tateisi, Y., Tsujii, J.: Coreference Resolution in Biomedical Texts: a Machine Learning Approach. In: Ashburner, M., Leser, U., Rebholz-Schuhmann, D. (eds.) Ontologies and Text Mining for Life Sciences: Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, vol. 08131 (2008)

    Google Scholar 

  16. Batista-Navarro, R.T.B., Ananiadou, S.: Building a coreference-annotated corpus from the domain of biochemistry. In: Proceedings of BioNLP 2011, pp. 83–91 (2011)

    Google Scholar 

  17. Stenetorp, P., Topić, G., Pyysalo, S., Ohta, T., Kim, J.D., Tsujii, J.: BioNLP Shared Task 2011: Supporting Resources. In: Proceedings of the BioNLP Shared Task 2011 Workshop, pp. 112–120. ACL (2011)

    Google Scholar 

  18. Sandor, A., de Waard, A.: Identifying Claimed Knowledge Updates in Biomedical Research Articles. In: Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 7–10 (2012)

    Google Scholar 

  19. Oda, K., Kim, J.D., Ohta, T., Okanohara, D., Matsuzaki, T., Tateisi, Y., Tsujii, J.: New challenges for text mining: mapping between text and manually curated pathways. BMC Bioinformatics 9, S5 (2008)

    Article  Google Scholar 

  20. Yeh, A., Hirschman, L., Morgan, A.: Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics 19, 331–339 (2003)

    Article  Google Scholar 

  21. Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of ACL, pp. 992–999 (2007)

    Google Scholar 

  22. McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: Proceedings of the AMIA Annual Symposium, pp. 440–444 (2003)

    Google Scholar 

  23. Mizuta, Y., Korhonen, A., Mullen, T., Collier, N.: Zone analysis in biology articles as a basis for information extraction. Int. J. Med. Inf. 75, 468–487 (2006)

    Article  Google Scholar 

  24. Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.: Corpora for the conceptualisation and zoning of scientific papers. In: Proceedings of LREC, pp. 2054–2061 (2010)

    Google Scholar 

  25. Wilbur, W.J., Rzhetsky, A., Shatkay, H.: New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics 7, 356 (2006)

    Article  Google Scholar 

  26. Miwa, M., Thompson, P., McNaught, J., Kell, D., Ananiadou, S.: Extracting semantically enriched events from biomedical literature. BMC Bioinformatics 13, 108 (2012)

    Article  Google Scholar 

  27. Savova, G., Masanz, J., Ogren, P., Zheng, J., Sohn, S., Kipper-Schuler, K., Chute, C.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17, 507–513 (2010)

    Article  Google Scholar 

  28. Cunningham, H., Hanbury, A., Rüger, S.: Scaling Up High-Value Retrieval to Medium-Volume Data. In: Cunningham, H., Hanbury, A., Rüger, S. (eds.) IRFC 2010. LNCS, vol. 6107, pp. 1–5. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  29. Schäfer, U.: Middleware for creating and combining multi-dimensional NLP markup. In: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing, pp. 81–84. ACL (2006)

    Google Scholar 

  30. Rak, R., Rowley, A., Black, W., Ananiadou, S.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database: The Journal of Biological Databases and Curation 2012 (2012)

    Google Scholar 

  31. Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on NLP in Biomedicine and its Applications, pp. 104–107. ACL (2004)

    Google Scholar 

  32. Gabbard, R., Freedman, M., Weischedel, R.: Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 288–293. Association for Computational Linguistics, Portland (2011)

    Google Scholar 

  33. Tsuruoka, Y., Tsujii, J., Ananiadou, S.: Accelerating the annotation of sparse named entities by dynamic sentence selection. BMC Bioinformatics 9, S8 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Batista-Navarro, R.T. et al. (2013). Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37247-6_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37246-9

  • Online ISBN: 978-3-642-37247-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics