Enhancing Search: Events and Their Discourse Context

  • Sophia Ananiadou
  • Paul Thompson
  • Raheel Nawaz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7817)

Abstract

Event-based search systems have become of increasing interest. This paper provides an overview of recent advances in event-based text mining, with an emphasis on biomedical text. We focus particularly on the enrichment of events with information relating to their interpretation according to surrounding textual and discourse contexts. We describe our annotation scheme used to capture this information at the event level, report on the corpora that have so far been enriched according to this scheme and provide details of our experiments to recognise this information automatically.

Keywords

event extraction text mining semantic search discourse analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zweigenbaum, P., Demner-Fushman, D., Yu, H., Cohen, K.B.: Frontiers of Biomedical Text Mining: Current Progress. Brief Bioinform. 8, 358–375 (2007)CrossRefGoogle Scholar
  2. 2.
    Ananiadou, S., Kell, D.B., Tsujii, J.: Text Mining and its Potential Applications in Systems Biology. Trends Biotechnol. 24, 571–579 (2006)CrossRefGoogle Scholar
  3. 3.
    Ananiadou, S., Nenadic, G.: Automatic Terminology Management in Biomedicine. In: Ananiadou, S., McNaught, J. (eds.) Text Mining for Biology and Biomedicine, pp. 67–98. Artech House, London (2006)Google Scholar
  4. 4.
    Mihăilă, C., Ohta, T., Pyysalo, S., Ananiadou, S.: BioCause: Annotating and Analysing Causality in the Biomedical Domain. BMC Bioinformatics 14, 2 (2013)CrossRefGoogle Scholar
  5. 5.
    Kim, J., Ohta, T., Tsujii, J.: Corpus Annotation for Mining Biomedical Events from Literature. BMC Bioinformatics 9, 10 (2008)CrossRefGoogle Scholar
  6. 6.
    Miwa, M., Saetre, R., Kim, J.D., Tsujii, J.: Event Extraction with Complex Event Classification using Rich Features. J. Bioinform. Comput. Biol. 8, 131–146 (2010)CrossRefGoogle Scholar
  7. 7.
    Pyysalo, S., Ohta, T., Rak, R., Sullivan, D., Mao, C., Wang, C., Sobral, B., Tsujii, J., Ananiadou, S.: Overview of the ID, EPI and REL Tasks of BioNLP Shared Task 2011. BMC Bioinformatics 13 (suppl. 11), S2 (2012)Google Scholar
  8. 8.
    Pyysalo, S., Ohta, T., Rak, R., Sullivan, D., Mao, C., Wang, C., Sobral, B., Tsujii, J., Ananiadou, S.: Overview of the Infectious Diseases (ID) Task of BioNLP Shared Task 2011. In: BioNLP Shared Task 2011 Workshop, pp. 26–35. Association for Computational Linguistics (2011)Google Scholar
  9. 9.
    Miwa, M., Thompson, P., Ananiadou, S.: Boosting Automatic Event Extraction from the Literature using Domain Adaptation and Coreference Resolution. Bioinformatics 28(13), 1759–1765 (2012)CrossRefGoogle Scholar
  10. 10.
    Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., Tsujii, J.: Evaluating Contributions of Natural Language Parsers to Protein-Protein Interaction Extraction. Bioinformatics 25, 394–400 (2009)CrossRefGoogle Scholar
  11. 11.
    Sagae, K., Tsujii, J.I.: Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles. In: Proceedings of the CoNLL 2007 Shared Task Session of EMNLP-CoNLL 2007, pp. 1044–1050. Association for Computational Linguistics (2007) Google Scholar
  12. 12.
    Miyao, Y., Ohta, T., Masuda, K., Tsuruoka, Y., Yoshida, K., Ninomiya, T., Tsujii, J.: Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases. In: Proceedings of Coling/ACL, pp. 1017–1024. Association for Computational Linguistics (2006)Google Scholar
  13. 13.
    Hara, T., Miyao, Y., Tsujii, J.: Adapting a Probabilistic Disambiguation Model of an HPSG Parser to a New Domain. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 199–210. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Tsuruoka, Y., Tsujii, J.: Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data. In: Proceedings of HLT/EMNLP 2005, pp. 467–474. Association for Computational Linguistics (2005)Google Scholar
  15. 15.
    Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M.: Identifying Sections in Scientific Abstracts using Conditional Random Fields. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing, pp. 381–388. Association for Computational Linguistics (2008)Google Scholar
  16. 16.
    Tsai, R.T., Chou, W.C., Su, Y.S., Lin, Y.C., Sung, C.L., Dai, H.J., Yeh, I.T., Ku, W., Sung, T.Y., Hsu, W.L.: BIOSMILE: a Semantic Role Labeling System for Biomedical Verbs using a Maximum-Entropy Model with Automatically Generated Template Features. BMC Bioinformatics 8, 325 (2007)CrossRefGoogle Scholar
  17. 17.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene Ontology: Tool for the Unification of Biology. Nature Genetics 25, 25–29 (2000)CrossRefGoogle Scholar
  18. 18.
    Thompson, P., McNaught, J., Montemagni, S., Calzolari, N., Del Gratta, R., Lee, V., Marchi, S., Monachini, M., Pezik, P., Quochi, V., Rupp, C.J., Sasaki, Y., Venturi, G., Rebholz-Schuhmann, D., Ananiadou, S.: The BioLexicon: a Large-Scale Terminological Resource for Biomedical Text Mining. BMC Bioinformatics 12, 397 (2011)CrossRefGoogle Scholar
  19. 19.
    Kim, J.T., Moldovan, D.I.: Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction. IEEE Transactions on Knowledge and Data Engineering 7, 713–724 (1995)CrossRefGoogle Scholar
  20. 20.
    Soderland, S.: Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning 34, 233–272 (1999)MATHCrossRefGoogle Scholar
  21. 21.
    Califf, M.E., Mooney, R.J.: Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction. Journal of Machine Learning Research 4, 177–210 (2003)MathSciNetGoogle Scholar
  22. 22.
    Pyysalo, S., Ginter, F., Heimonen, J., Bjorne, J., Boberg, J., Jarvinen, J., Salakoski, T.: BioInfer: a Corpus for Information Extraction in the Biomedical Domain. BMC Bioinformatics 8, 50 (2007)CrossRefGoogle Scholar
  23. 23.
    Pyysalo, S., Ohta, T., Miwa, M., Cho, H.-C., Tsujii, J.I., Ananiadou, S.: Event Extraction across Multiple Levels of Biological Organization. Bioinformatics 28, i575–i581 (2012)Google Scholar
  24. 24.
    Thompson, P., Iqbal, S.A., McNaught, J., Ananiadou, S.: Construction of an Annotated Corpus to Support Biomedical Information Extraction. BMC Bioinformatics 10, 349 (2009)CrossRefGoogle Scholar
  25. 25.
    Nawaz, R., Thompson, P., McNaught, J., Ananiadou, S.: Meta-Knowledge Annotation of Bio-Events. In: Proceedings of LREC 2010, pp. 2498–2507. ELRA (2010)Google Scholar
  26. 26.
    Light, M., Qiu, X.Y., Srinivasan, P.: The Language of Bioscience: Facts, Speculations, and Statements in between. In: Proceedings of the BioLink 2004 Workshop at HLT/NAACL, pp. 17–24. Association for Computational Linguistics (2004)Google Scholar
  27. 27.
    Medlock, B., Briscoe, T.: Weakly Supervised Learning for Hedge Classification in Scientific Literature. In: Proceedings of ACL, pp. 992–999. Association for Computational Linguistics (2007)Google Scholar
  28. 28.
    Ruch, P., Boyer, C., Chichester, C., Tbahriti, I., Geissbühler, A., Fabry, P., Gobeill, J., Pillet, V., Rebholz-Schuhmann, D., Lovis, C.: Using Argumentation to Extract Key Sentences from Biomedical Abstracts. Int. J. Med. Informatics 76, 195–200 (2007)CrossRefGoogle Scholar
  29. 29.
    McKnight, L., Srinivasan, P.: Categorization of Sentence Types in Medical Abstracts. In: Procedings of AMIA Annual Symposium, pp. 440–444. AMIA (2003)Google Scholar
  30. 30.
    Mizuta, Y., Korhonen, A., Mullen, T., Collier, N.: Zone Analysis in Biology Articles as a Basis for Information Extraction. Int. J. Med. Informatics 75, 468–487 (2006)CrossRefGoogle Scholar
  31. 31.
    Teufel, S., Carletta, J., Moens, M.: An Annotation Scheme for Discourse-Level Argumentation in Research Articles. In: Proceedings of EACL, pp. 110–117. Association for Computational Linguistics (1999)Google Scholar
  32. 32.
    Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.: Corpora for the Conceptualisation and Zoning of Scientific Papers. In: Proceedings of LREC 2010, pp. 2054–2061. ELRA (2010)Google Scholar
  33. 33.
    Liakata, M., Saha, S., Dobnik, S., Batchelor, C., Rebholz-Schuhmann, D.: Automatic Recognition of Conceptualisation Zones in Scientific Articles and Two Life Science Applications. Bioinformatics 28(7), 991–1000 (2012)CrossRefGoogle Scholar
  34. 34.
    Vincze, V., Szarvas, G., Farkas, R., Mora, G., Csirik, J.: The BioScope Corpus: Biomedical Texts Annotated for Uncertainty, Negation and their Scopes. BMC Bioinformatics 9, S9 (2008)Google Scholar
  35. 35.
    Rubin, V., Liddy, E., Kando, N.: Certainty Identification in Texts: Categorization Model and Manual Tagging Results. In: Shanahan, J.G., Qu, Y., Wiebe, J. (eds.) Computing Attitude and Affect in Text: Theory and Applications, pp. 61–76. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  36. 36.
    Hyland, K.: Talking to the Academy: Forms of Hedging in Science Research Articles. Written Communication 13, 251–281 (1996)CrossRefGoogle Scholar
  37. 37.
    Hyland, K.: Writing without Conviction? Hedging in Science Research Articles. Applied Linguistics 17, 433–454 (1996)CrossRefGoogle Scholar
  38. 38.
    Rizomilioti, V.: Exploring Epistemic Modality in Academic Discourse Using Corpora. In: Arnó Macià, E., Soler Cervera, A., Rueda Ramos, C. (eds.) Information Technology in Languages for Specific Purposes, pp. 53–71. Springer, New York (2006)CrossRefGoogle Scholar
  39. 39.
    Thompson, P., Venturi, G., McNaught, J., Montemagni, S., Ananiadou, S.: Categorising Modality in Biomedical Texts. In: Proceedings of the LREC 2008 Workshop on Building and Evaluating Resources for Biomedical Text Mining, pp. 27–34. ELRA (2008)Google Scholar
  40. 40.
    de Waard, A., Pander Maat, H.: Categorizing Epistemic Segment Types in Biology Research Articles. In: Proceedings of the Workshop on Linguistic and Psycholinguistic Approaches to Text Structuring, LPTS 2009 (2009)Google Scholar
  41. 41.
    Wilbur, W.J., Rzhetsky, A., Shatkay, H.: New Directions in Biomedical Text Annotations: Definitions, Guidelines and Corpus Construction. BMC Bioinformatics 7, 356 (2006)CrossRefGoogle Scholar
  42. 42.
    Liakata, M., Thompson, P., de Waard, A., Nawaz, R., Maat, H.P., Ananiadou, S.: A Three-Way Perspective on Scientific Discourse Annotation for Knowledge Extraction. In: Proceedings of the ACL Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 37–46. Association for Computational Linguistics (2012)Google Scholar
  43. 43.
    Thompson, P., Nawaz, R., McNaught, J., Ananiadou, S.: Enriching a Biomedical Event Corpus with Meta-knowledge Annotation. BMC Bioinformatics 12, 393 (2011)CrossRefGoogle Scholar
  44. 44.
    Cohen, K.B., Johnson, H.L., Verspoor, K., Roeder, C., Hunter, L.E.: The Structural and Content Aspects of Abstracts versus Bodies of Full Text Journal Articles are Different. BMC Bioinformatics 11, 492 (2010)CrossRefGoogle Scholar
  45. 45.
    Nawaz, R., Thompson, P., Ananiadou, S.: Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers. In: Proceedings of the Third LREC Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012), pp. 24–21. ELRA (2012)Google Scholar
  46. 46.
    Knight, J.: Negative Results: Null and void. Nature 422, 554–555 (2003)CrossRefGoogle Scholar
  47. 47.
    Miwa, M., Thompson, P., McNaught, J., Kell, D.B., Ananiadou, S.: Extracting Semantically Enriched Events from Biomedical Literature. BMC Bioinformatics 13, 108 (2012)CrossRefGoogle Scholar
  48. 48.
    Bjorne, J., Salakoski, T.: Generalizing Biomedical Event Extraction. In: Proceedings of the BioNLP Shared Task 2011 Workshop, pp. 183–191. Association for Computational Linguistics (2011)Google Scholar
  49. 49.
    Kilicoglu, H., Bergler, S.: Adapting a General Semantic Interpretation Approach to Biological Event Extraction. In: Proceedings of BioNLP Shared Task 2011 Workshop, pp. 173–182. Association for Computational Linguistics (2011)Google Scholar
  50. 50.
    Kilicoglu, H., Bergler, S.: Syntactic Dependency Based Heuristics for Biological Event Extraction. In: Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task, pp. 119–127. Association for Computational Linguistics (2009)Google Scholar
  51. 51.
    Nawaz, R., Thompson, P., Ananiadou, S.: Identification of Manner in Bio-Events. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 3505–3510. ELRA (2012)Google Scholar
  52. 52.
    Nawaz, R., Thompson, P., Ananiadou, S.: Something Old, Something New: Identifying Knowledge Source in Bio-Events. In: Proceedings of CICLing 2013 (2013)Google Scholar
  53. 53.
    Nawaz, R., Thompson, P., Ananiadou, S.: Negated Bio-events: Analysis and Identification. BMC Bioinformatics 14, 14 (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sophia Ananiadou
    • 1
  • Paul Thompson
    • 1
  • Raheel Nawaz
    • 1
  1. 1.National Centre for Text Mining, Manchester Institute of BiotechnologyUniversity of ManchesterManchesterUK

Personalised recommendations