Abstract
The automatic processing of clinical documents created at clinical settings has become a focus of research in natural language processing. However, standard tools developed for general texts are not applicable or perform poorly on this type of documents. Moreover, several crucial tasks require lexical resources and relational thesauri or ontologies to identify relevant concepts and their connections. In the case of less-resourced languages, such as Hungarian, there are no such lexicons available. The construction of annotated data and their organization requires human expert work. In this paper we show how applying statistical methods can result in a preprocessed, semi-structured transformation of the raw documents that can be used to aid human work. The modules detect and resolve abbreviations, identify multiword terms and derive their similarity, all based on the corpus itself.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barrows, J.R., Busuioc, M., Friedman, C.: Limited parsing of notational text visit notes: ad-hoc vs. NLP approaches. In: Proceedings of the AMIA Annual Symposium, pp. 51–55 (2000)
Siklósi, B., Novák, A., Prószéky, G.: Resolving abbreviations in clinical texts without pre-existing structured resources. In: Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, LREC 2014 (2014)
Carroll, J., Koeling, R., Puri, S.: Lexical acquisition for clinical text mining using distributional similarity. In: Gelbukh, A. (ed.) CICLing 2012, Part II. LNCS, vol. 7182, pp. 232–246. Springer, Heidelberg (2012)
Csendes, D., Csirik, J., Gyimóthy, T.: The Szeged Corpus: a POS tagged and syntactically annotated Hungarian natural language corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 41–47. Springer, Heidelberg (2004)
Firth, J.R.: A synopsis of linguistic theory 1930–55, 1952–59, pp. 1–32 (1957)
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the c-value/nc-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)
Friedman, C., Kra, P., Rzhetsky, A.: Two biomedical sublanguages: a description based on the theories of Zellig Harris. J. Biomed. Inform. 35(4), 222–235 (2002)
Harris, Z.S.: The structure of science information. J. Biomed. Inform. 35(4), 215–221 (2002)
Kate, R.J.: Unsupervised grammar induction of clinical report sublanguage. J. Biomed. Semant. 3(S-3), S4 (2012)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 17th International Conference on Computational Linguistics, COLING ’98, vol. 2, pp. 768–774. Association for Computational Linguistics, Stroudsburg, PA, USA (1998)
Meystre, S., Savova, G., Kipper-Schuler, K., Hurdle, J.: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb. Med. Inform. 35, 128–144 (2008)
Nasiruddin, M.: A state of the art of word sense induction: a way towards word sense disambiguation for under-resourced languages. In: CoRR abs/1310.1425 (2013)
Navigli, R.: A quick tour of word sense disambiguation, induction and related approaches. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) SOFSEM 2012. LNCS, vol. 7147, pp. 115–129. Springer, Heidelberg (2012)
Orosz, Gy., Novák, A., Prószéky, G.: Hybrid text segmentation for Hungarian clinical records. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part I. LNCS (LNAI), vol. 8265, pp. 306–317. Springer, Heidelberg (2013)
Orosz, Gy., Novák, A., Prószéky, G.: Lessons learned from tagging clinical Hungarian. Int. J. Comput. Linguist. Appl. 5(1), 159–176 (2014)
Sager, N., Lyman, M., Bucknall, C., Nhan, N., Tick, L.J.: Natural language processing and the representation of clinical data. J. Am. Med. Inform. Assoc. 1(2), 142–160 (1994)
Siklósi, B., Novák, A.: Detection and expansion of abbreviations in Hungarian clinical notes. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part I. LNCS (LNAI), vol. 8265, pp. 318–328. Springer, Heidelberg (2013)
Siklósi, B., Novák, A., Prószéky, G.: Context-aware correction of spelling errors in Hungarian medical documents. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS (LNAI), vol. 7978, pp. 248–259. Springer, Heidelberg (2013)
Siklósi, B., Orosz, Gy., Novák, A., Prószéky, G.: Automatic structuring and correction suggestion system for Hungarian clinical records. In: De Pauw, G., De Schryver, G.-M., Forcada, M.L., Sarasola, K., Tyers, F.M., Wagacha, P.W. (eds.) 8th SaLTMiL Workshop on Creation and use of Basic Lexical Resources for Less-Resourced Languages, pp. 29–34 (2012)
Vincze, V.: Domének közti hasonlóságok és különbségek a szófajok és szintaktikai viszonyok eloszlásában. In: IX. Magyar Számítógépes Nyelvészeti Konferencia, pp. 182–192 (2013)
Acknowledgement
This work was partially supported by TÁMOP – 4.2.1.B – 11/2/KMR-2011-0002 and TÁMOP-4.2.2./B-10/1-2010-0014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Siklósi, B., Novák, A. (2014). Identifying and Clustering Relevant Terms in Clinical Records Using Unsupervised Methods. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-11397-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)