Meaning Inference of Abbreviations Appearing in Clinical Studies
The number of publicly available clinical studies is constantly increasing, formulating a rather promising corpus of documents for clinical research purposes. However, the abbreviations used in these studies pose a serious barrier to any text mining technique. This paper presents a study conducted in the above domain, which used specifically developed tools and mechanisms in order to process a number of randomly selected documents from clinicaltrialsregister.eu. The analysis performed indicated that abbreviations appear at a large scale without their long form (aka expansion). In order to assess the abbreviations’ true meaning, it is necessary to utilize the appropriate corpus of documents, apply innovative algorithms and techniques to detect their possible expansions, and accordingly select the appropriate ones. Furthermore, the discrimination power of tokens has a distinctive role in abbreviations construction, and hence, it can facilitate the detection of acronym-type abbreviations. Additionally, the expressions in which abbreviations appear, as well as the preceding or following text are of primary importance for selecting the appropriate meaning.
KeywordsAbbreviations Expansion Clinical studies Semantic analysis Corpus annotation
This work is being supported by the OpenScienceLink project  and has been partially funded by the European Commission’s CIP-PSP under contract number 325101. This paper expresses the opinions of the authors and not necessarily those of the European Commission. The European Commission is not liable for any use that may be made of the information contained in this paper.
- 1.Gale, W.A., Church, K.W., Yarowsky, D.: One sense per discourse. In: Proceedings of the Workshop on Speech and Natural Language HLT 1991, pp. 233–237. New York (1992)Google Scholar
- 2.Schwartz, S.A., Hearst, A.M.: A Simple algorithm for identifying abbreviation definitions in biomedical text. In: Proccedings of PSB, pp. 451–462 (2003)Google Scholar
- 3.EU Clinical Trials Register. www.clinicaltrialsregister.eu
- 5.ClinicalTrials.gov. www.clinicaltrials.gov
- 6.Medical Subject Headings (MeSH). http://www.nlm.nih.gov/mesh/
- 8.Karanastasis, E., Andronikou, V., Chondrogiannis, E., Tsatsaronis, G., Eisinger, D., Petrova, A.: The OpenScienceLink architecture for novel services exploiting open access data in the biomedical domain. In: Proceedings of PCI 2014, pp. 28:1–28:6. ACM, New York (2014)Google Scholar
- 10.McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Finding predominant word senses in untagged text. In: Proceedings of ACL 2004, Stroudsburg, PA, USA, pp. 280–287 (2004)Google Scholar
- 11.Stevenson, M., Guo, Y., Amri, A.A., Gaizauskas, R.: Disambiguation of biomedical abbreviations. In: Proceedings of BioNLP 2009, Boulder, Colorado, USA, pp. 71–79 (2009)Google Scholar
- 12.McInnes, B.T., Pedersen, T., Carlis, J.: Using UMLS concept unique identifiers (CUIs) for word sense disambiguation in the biomedical domain. In: AMIA 2007, pp. 533–537 (2007)Google Scholar
- 13.CT abbreviations-annotated corpus. http://22.214.171.124:8080/AbbrAnnotatedCorpus/
- 15.Pustejovsky, J., Castaño, J., Cochran, B., Kotecki, M., Morrell, M.: Automatic extraction of acronym-meaning pairs from MEDLINE databases. Stud. Health Tech. I. 84(1), 371–375 (2001)Google Scholar
- 17.Park, Y., Byrd, R.J.: Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of EMNLP 2001 Conference, pp. 126–133 (2001)Google Scholar