Skip to main content
Log in

BioTSA: Annotating token semantic association to support biomedical text mining

  • Computer Science
  • Published:
Wuhan University Journal of Natural Sciences

Abstract

Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among all tokens in a sentence. We proposed an annotation scheme based on feature structure theory for enriching biomedical domain corpora with token semantic association (TSA). There are 227 documents of the BioNLP GE ST training data annotated to form TSA corpus in which each annotated item shows a token semantic association that appears as a triple. The annotation of token semantic association has the potential to significantly advance biomedical text mining by providing rich token semantic information for NLP systems especially for the sophisticated IE systems, such as bio-event extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Verspoor K, Cohen K B, Goertzel B, et al. Introduction to BioNLP’06. Linking natural language processing and biology: Towards deeper biological literature analysis[C]// Proceedings of the HLT-NAACL Workshop on Linking Natural Language and Biology. New York: ACL, 2006:iii-iv.

    Chapter  Google Scholar 

  2. Zweigenbaum P, Demner-Fushman D, Yu H, et al. New frontiers in biomedical text mining[C]// Proceedings of the Pacific Symposium on Biocomputing 12. Wailea, Maui, Hawaii: IEEE Press, 2007: 205–208.

    Google Scholar 

  3. Zweigenbaum P, Demner-Fushman D, Yu H, et al. Frontiers of biomedical text mining: Current progress[J]. Briefings in Bioinformatics, 2007, 8(5): 358–375.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Ananiadou S, McNaught J. Text Mining for Biology and Biomedicine[M]. Boston: Artech House Inc, 2006.

    Google Scholar 

  5. Cohen A M, Hersh W R. A survey of current work in biomedical text mining[J]. Briefings in Bioinformatics, 2005, 6(1): 57–71.

    Article  CAS  PubMed  Google Scholar 

  6. Ananiadou S, Kell D B, Tsujii J. Text mining and its potential applications in systems biology[J]. Trends in Biotechnol 2006, 24(12): 571–579.

    Article  CAS  Google Scholar 

  7. Cohen K B, Hunter L. Getting started in text mining[J]. PLoS Comput Biol, 2008, 4: e20.

    Article  Google Scholar 

  8. Tomanek K, Wermter J, Hahn U. A reappraisal of sentence and token splitting for life sciences documents[J]. Stud Health Technol Inform, 2007, 129 (Pt 1): 524–528.

    PubMed  Google Scholar 

  9. Kulick S, Bies A, Liberman M, et al. White P: Integrated annotation for biomedical information extraction[C]// HLT-NAACL 2004 Workshop: Biolink 2004, Linking Biological Literature, Ontologies and Databases. Boston: Artech House Inc, 2004: 61–68.

    Google Scholar 

  10. Coden A R, Pakhomov S V, Ando R K, et al. Chute CG: Domain-specific language models and lexicons for tagging[ J]. J Biomed Inform, 2005, 36: 422–430.

    Article  Google Scholar 

  11. Lease M, Charniak E. Parsing biomedical literature[C]// Proc 2nd Internat Joint Conf Nat Lang Processing (IJCNLP). Jeju Island: ACL, 2005: 58–69.

    Google Scholar 

  12. Roberts A, Gaizauskas R, Hepple M, et al. Combining terminology resources and statistical methods for entity recognition: an evaluation[C]// European Language Resources Association (LREC). New York: Springer-Verlag, 2008: 2974–2980.

    Google Scholar 

  13. Kim J D, Ohta T, Tsujii J. Corpus annotation for mining biomedical events from literature [J]. BMC Bioinformatics, 2008, 9: 10.

    Article  PubMed Central  PubMed  Google Scholar 

  14. Kim J D, Ohta T, Pyysalo S, et al. Extracting bio-molecular events from literature—The BioNLP’09 shared task[J]. Comput Intell, 2011, 27(4): 513–540.

    Article  Google Scholar 

  15. Mihǎilǎ C, Ohta T, Pyysalo S, et al. BioCause: Annotating and analysing causality in the biomedical domain[J]. BMC Bioinformatics, 2013, 14: 2.

    Article  PubMed Central  PubMed  Google Scholar 

  16. Lee H J, Shim S H, Song M R, et al. CoMAGC: A corpus with multi-faceted annotations of gene-cancer relations[J]. BMC Bioinformatics, 2013, 14: 323.

    Article  PubMed Central  PubMed  Google Scholar 

  17. Nguyen T-V T, Moschitti. A end-to-end relation extraction using distant supervision from external semantic repositories [C]// Proc 49th Annual Meeting of the Association for Computational Linguistics. Portland: Oregon, 2011: 277–282.

    Google Scholar 

  18. Plank B, Moschitti A. embedding semantic similarity in tree kernels for domain adaptation[C]// Proc 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013: 1498–1507.

    Google Scholar 

  19. Li P F, Zhou G D, Zhu Q M, et al. Employing compositional semantics and discourse consistency in Chinese event extraction[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Jeju Island: ACL, 2012: 1006–1016.

    Google Scholar 

  20. Yu H, Lee M, Kaufman D, et al. Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians[J]. Journal of Biomedical Informatics, 2007, 40(3): 236–251.

    Article  PubMed  Google Scholar 

  21. Abacha A B, Zweigenbaum P. Medical question answering: translating medical questions into sparql queries[C]// Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. Miami: ACM Press, 2012: 41–50.

    Chapter  Google Scholar 

  22. Ananiadou S, McNaught J. Text Mining for Biology and Biomedicine[M]. Boston: Artech House, 2006.

    Google Scholar 

  23. Hunter L, Cohen K B. Biomedical language processing: What’s beyond PubMed[J]. Mol Cell, 2006, 21(5):589–594.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Jensen L J, Saric J, Bork P. Literature mining for the biologist: From information retrieval to biological discovery[J]. Nature Reviews Genetics, 2006, 7: 119–129.

    Article  CAS  PubMed  Google Scholar 

  25. Zweigenbaum P, Demner-Fushman D, Yu H, et al. Frontiers of biomedical text mining: Current progress[J]. Brief Bioinform, 2007, 8(5): 358–375.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Hersh W. Information Retrieval: A Health and Biomedical Perspective[M]. 3rd edition. New York: Springer-Verlag, 2008.

    Google Scholar 

  27. Spencer A. Phonology[M]. Oxford: Blackwell Publishers, 1996.

    Google Scholar 

  28. Dalrymple M. Lexical Functional Grammar[M]. Syntax and Semantics Series,Volume 34. New York: Brill Academic Press, 2001.

  29. Chen Bo. Feature Structure and the Construction of Chinese Semantic Resource[D]. Wuhan: Wuhan University, 2011(Ch).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donghong Ji.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China(61202304, 61173095, 61173062, 61202193)

Biography: WEI Xiaomei, female, Ph.D. candidate, research direction: biomedical informatics and natural language processing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, X., Huang, S., Chen, B. et al. BioTSA: Annotating token semantic association to support biomedical text mining. Wuhan Univ. J. Nat. Sci. 20, 134–140 (2015). https://doi.org/10.1007/s11859-015-1071-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11859-015-1071-3

Keywords

CLC number

Navigation