BioTSA: Annotating token semantic association to support biomedical text mining

Wei, Xiaomei; Huang, Sixing; Chen, Bo; Ji, Donghong

doi:10.1007/s11859-015-1071-3

BioTSA: Annotating token semantic association to support biomedical text mining

Computer Science
Published: 13 May 2015

Volume 20, pages 134–140, (2015)
Cite this article

Wuhan University Journal of Natural Sciences

Xiaomei Wei^1,2,
Sixing Huang²,
Bo Chen¹ &
…
Donghong Ji¹

70 Accesses
1 Citation
Explore all metrics

Abstract

Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among all tokens in a sentence. We proposed an annotation scheme based on feature structure theory for enriching biomedical domain corpora with token semantic association (TSA). There are 227 documents of the BioNLP GE ST training data annotated to form TSA corpus in which each annotated item shows a token semantic association that appears as a triple. The annotation of token semantic association has the potential to significantly advance biomedical text mining by providing rich token semantic information for NLP systems especially for the sophisticated IE systems, such as bio-event extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-Dependent Semantic Annotation in Cross-Lingual Biomedical Resources

OBA: Supporting Ontology-Based Annotation of Natural Language Resources

Exploiting sequence labeling framework to extract document-level relations from biomedical texts

Article Open access 27 March 2020

References

Verspoor K, Cohen K B, Goertzel B, et al. Introduction to BioNLP’06. Linking natural language processing and biology: Towards deeper biological literature analysis[C]// Proceedings of the HLT-NAACL Workshop on Linking Natural Language and Biology. New York: ACL, 2006:iii-iv.
Chapter Google Scholar
Zweigenbaum P, Demner-Fushman D, Yu H, et al. New frontiers in biomedical text mining[C]// Proceedings of the Pacific Symposium on Biocomputing 12. Wailea, Maui, Hawaii: IEEE Press, 2007: 205–208.
Google Scholar
Zweigenbaum P, Demner-Fushman D, Yu H, et al. Frontiers of biomedical text mining: Current progress[J]. Briefings in Bioinformatics, 2007, 8(5): 358–375.
Article PubMed Central CAS PubMed Google Scholar
Ananiadou S, McNaught J. Text Mining for Biology and Biomedicine[M]. Boston: Artech House Inc, 2006.
Google Scholar
Cohen A M, Hersh W R. A survey of current work in biomedical text mining[J]. Briefings in Bioinformatics, 2005, 6(1): 57–71.
Article CAS PubMed Google Scholar
Ananiadou S, Kell D B, Tsujii J. Text mining and its potential applications in systems biology[J]. Trends in Biotechnol 2006, 24(12): 571–579.
Article CAS Google Scholar
Cohen K B, Hunter L. Getting started in text mining[J]. PLoS Comput Biol, 2008, 4: e20.
Article Google Scholar
Tomanek K, Wermter J, Hahn U. A reappraisal of sentence and token splitting for life sciences documents[J]. Stud Health Technol Inform, 2007, 129 (Pt 1): 524–528.
PubMed Google Scholar
Kulick S, Bies A, Liberman M, et al. White P: Integrated annotation for biomedical information extraction[C]// HLT-NAACL 2004 Workshop: Biolink 2004, Linking Biological Literature, Ontologies and Databases. Boston: Artech House Inc, 2004: 61–68.
Google Scholar
Coden A R, Pakhomov S V, Ando R K, et al. Chute CG: Domain-specific language models and lexicons for tagging[ J]. J Biomed Inform, 2005, 36: 422–430.
Article Google Scholar
Lease M, Charniak E. Parsing biomedical literature[C]// Proc 2nd Internat Joint Conf Nat Lang Processing (IJCNLP). Jeju Island: ACL, 2005: 58–69.
Google Scholar
Roberts A, Gaizauskas R, Hepple M, et al. Combining terminology resources and statistical methods for entity recognition: an evaluation[C]// European Language Resources Association (LREC). New York: Springer-Verlag, 2008: 2974–2980.
Google Scholar
Kim J D, Ohta T, Tsujii J. Corpus annotation for mining biomedical events from literature [J]. BMC Bioinformatics, 2008, 9: 10.
Article PubMed Central PubMed Google Scholar
Kim J D, Ohta T, Pyysalo S, et al. Extracting bio-molecular events from literature—The BioNLP’09 shared task[J]. Comput Intell, 2011, 27(4): 513–540.
Article Google Scholar
Mihǎilǎ C, Ohta T, Pyysalo S, et al. BioCause: Annotating and analysing causality in the biomedical domain[J]. BMC Bioinformatics, 2013, 14: 2.
Article PubMed Central PubMed Google Scholar
Lee H J, Shim S H, Song M R, et al. CoMAGC: A corpus with multi-faceted annotations of gene-cancer relations[J]. BMC Bioinformatics, 2013, 14: 323.
Article PubMed Central PubMed Google Scholar
Nguyen T-V T, Moschitti. A end-to-end relation extraction using distant supervision from external semantic repositories [C]// Proc 49th Annual Meeting of the Association for Computational Linguistics. Portland: Oregon, 2011: 277–282.
Google Scholar
Plank B, Moschitti A. embedding semantic similarity in tree kernels for domain adaptation[C]// Proc 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013: 1498–1507.
Google Scholar
Li P F, Zhou G D, Zhu Q M, et al. Employing compositional semantics and discourse consistency in Chinese event extraction[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Jeju Island: ACL, 2012: 1006–1016.
Google Scholar
Yu H, Lee M, Kaufman D, et al. Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians[J]. Journal of Biomedical Informatics, 2007, 40(3): 236–251.
Article PubMed Google Scholar
Abacha A B, Zweigenbaum P. Medical question answering: translating medical questions into sparql queries[C]// Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. Miami: ACM Press, 2012: 41–50.
Chapter Google Scholar
Ananiadou S, McNaught J. Text Mining for Biology and Biomedicine[M]. Boston: Artech House, 2006.
Google Scholar
Hunter L, Cohen K B. Biomedical language processing: What’s beyond PubMed[J]. Mol Cell, 2006, 21(5):589–594.
Article PubMed Central CAS PubMed Google Scholar
Jensen L J, Saric J, Bork P. Literature mining for the biologist: From information retrieval to biological discovery[J]. Nature Reviews Genetics, 2006, 7: 119–129.
Article CAS PubMed Google Scholar
Zweigenbaum P, Demner-Fushman D, Yu H, et al. Frontiers of biomedical text mining: Current progress[J]. Brief Bioinform, 2007, 8(5): 358–375.
Article PubMed Central CAS PubMed Google Scholar
Hersh W. Information Retrieval: A Health and Biomedical Perspective[M]. 3rd edition. New York: Springer-Verlag, 2008.
Google Scholar
Spencer A. Phonology[M]. Oxford: Blackwell Publishers, 1996.
Google Scholar
Dalrymple M. Lexical Functional Grammar[M]. Syntax and Semantics Series,Volume 34. New York: Brill Academic Press, 2001.
Chen Bo. Feature Structure and the Construction of Chinese Semantic Resource[D]. Wuhan: Wuhan University, 2011(Ch).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Wuhan University, Wuhan, 430072, Hubei, China
Xiaomei Wei, Bo Chen & Donghong Ji
College of Informatics, Huazhong Agriculture University, Wuhan, 430070, Hubei, China
Xiaomei Wei & Sixing Huang

Authors

Xiaomei Wei
View author publications
You can also search for this author in PubMed Google Scholar
Sixing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Donghong Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donghong Ji.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China(61202304, 61173095, 61173062, 61202193)

Biography: WEI Xiaomei, female, Ph.D. candidate, research direction: biomedical informatics and natural language processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, X., Huang, S., Chen, B. et al. BioTSA: Annotating token semantic association to support biomedical text mining. Wuhan Univ. J. Nat. Sci. 20, 134–140 (2015). https://doi.org/10.1007/s11859-015-1071-3

Download citation

Received: 05 November 2014
Published: 13 May 2015
Issue Date: April 2015
DOI: https://doi.org/10.1007/s11859-015-1071-3

Keywords

CLC number

TP 391.1

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BioTSA: Annotating token semantic association to support biomedical text mining

Abstract

Access this article

Similar content being viewed by others

Context-Dependent Semantic Annotation in Cross-Lingual Biomedical Resources

OBA: Supporting Ontology-Based Annotation of Natural Language Resources

Exploiting sequence labeling framework to extract document-level relations from biomedical texts

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

CLC number

Navigation

BioTSA: Annotating token semantic association to support biomedical text mining

Abstract

Access this article

Similar content being viewed by others

Context-Dependent Semantic Annotation in Cross-Lingual Biomedical Resources

OBA: Supporting Ontology-Based Annotation of Natural Language Resources

Exploiting sequence labeling framework to extract document-level relations from biomedical texts

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

CLC number

Search

Navigation