Abstract
This chapter presents the basic building blocks for clinical text processing and relates them to the building blocks for standard text processing using natural languages processing techniques.
Download chapter PDF
References
Allen, J. F. (1984). Towards a general theory of action and time. Artificial Intelligence, 23(2), 123–154.
Bagga, A., & Baldwin, B. (1998). Algorithms for scoring coreference chains. In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, Granada, Spain (Vol. 1, pp. 563–566).
Bejan, C. A., & Denny, J. C. (2014). Learning to identify treatment relations in clinical text. In AMIA Annual Symposium Proceedings (Vol. 2014, p. 282). American Medical Informatics Association.
Carlberger, J., Dalianis, H., Hassel, M., & Knutsson, O. (2001). Improving precision in information retrieval for Swedish using stemming. In Proceedings of NODALIDA ’01 - 13th Nordic Conference on Computational Linguistics.
Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), 301–310.
Chapman, W. W., Hilert, D., Velupillai, S., Kvist, M., Skeppstedt, M., Chapman, B. E., et al. (2013). Extending the NegEx lexicon for multiple languages. Studies in Health Technology and Informatics, 192, 677.
Chinchor, N., & Robinson, P. (1997). MUC-7 named entity task definition. In Proceedings of the 7th Conference on Message Understanding (p. 29).
Clark, A., Fox, C., & Lappin, S. (2013). The Handbook of Computational Linguistics and Natural Language Processing. New York: Wiley.
Costumero, R., Lopez, F., Gonzalo-Martín, C., Millan, M., & Menasalvas, E. (2014). An approach to detect negation on medical documents in Spanish. In International Conference on Brain Informatics and Health (pp. 366–375). Berlin: Springer.
Cotik, V., Roller, R., Xu, F., Uszkoreit, H., Budde, K., & Schmidt, D. (2016). Negation detection in clinical reports written in German. In the Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016), Held in Conjunction with Coling 2016 (pp. 115–124).
Dalianis, H., & Velupillai, S. (2010a). How certain are clinical assessments? Annotating Swedish clinical text for (un) certainties, speculations and negations. In Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010.
Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), 171–176.
de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., & Zhu, X. (2011). Machine-learned solutions for three stages of clinical information extraction: The state of the art at i2b2 2010. Journal of the American Medical Informatics Association, 18(5), 557–562.
Derczynski, L. R. A. (2017). Automatically Ordering Events and Times in Text. Berlin: Springer.
Dziadek, J. (2015). Improving SNOMED Mapping of Clinical Texts Using Context-Sensitive Spelling Correction. Master’s thesis, Department of Computer and Systems Sciences, Stockholm University.
Dziadek, J., Henriksson, A., & Duneld, M. (2017). Improving terminology mapping in clinical text with context-sensitive spelling correction. Informatics for Health: Connected Citizen-Led Wellness and Population Health, 235, 241.
Grigonyte, G., Kvist, M., Velupillai, S., & Wirén, M. Improving readability of Swedish electronic health records through lexical simplification: First results. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations – PITR, Gothenburg, Sweden, April 2014 (pp. 74–83). Association for Computational Linguistics. http://www.aclweb.org/anthology/W14-1209. Accessed 11 Jan 2018.
Grouin, C., Deléger, L., Rosier, A., Temal, L., Dameron, O., Van Hille, P., et al. (2011). Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment. In AMIA Annual Symposium Proceedings (pp. 501–510). American Medical Informatics Association.
Hamon, T., & Grabar, N. (2014). Tuning HeidelTime for identifying time expressions in clinical texts in English and French. In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)@ EACL (pp. 101–105). Citeseer.
Harkema, H., Dowling, J. N., Thornblade, T., & Chapman, W. W. (2009). ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics, 42(5), 839–851.
Hassel, M., Henriksson, A., & Velupillai, S. (2011). Something old, something new: Applying a pre-trained parsing model to clinical Swedish. In Northern European Association for Language Technology (NEALT).
He, T. Y. (2007). Coreference Resolution on Entities and Events for Hospital Discharge Summaries. Master’s thesis, Electrical Engineering and Computer Science, Massachusetts Institute of Technology.
Henriksson, A., Kvist, M., Dalianis, H., & Duneld, M. (2015). Identifying adverse drug event information in clinical notes with distributional semantic representations of context. Journal of Biomedical Informatics, 57, 333–349.
Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., & Duneld, M. (2014). Synonym extraction and abbreviation expansion with ensembles of semantic spaces. Journal of Biomedical Semantics, 5, 6.
Huang, Y., & Lowe, H. J. (2007). A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Informatics Association, 14(3), 304.
Isenius, N., Velupillai, S., & Kvist, M. (2012). Initial results in the development of SCAN. A Swedish clinical abbreviation normalizer. In CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome.
Jung, H., Allen, J., Blaylock, N., De Beaumont, W., Galescu, L., & Swift, M. (2011). Building timelines from narrative clinical records: Initial results based-on deep natural language understanding. In Proceedings of BioNLP 2011 Workshop (pp. 146–154). Association for Computational Linguistics.
Jurafsky, D., & Martin, J. H. (2014). Speech and Language Processing. Pearson London.
Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4), 377–439.
Kvist, M., & Velupillai, S. (2014). SCAN: A Swedish clinical abbreviation normalizer. Further development and adaptation to radiology. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 62–73). Berlin: Springer.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710.
Luo, Y., Uzuner, Ö., & Szolovits, P. (2016). Bridging semantics and syntax with graph algorithms–state-of-the-art of extracting biomedical relations. Briefings in Bioinformatics, 18(1), 160–178.
Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: A review of recent research. Yearbook of Medical Informatics, 35, 128–144.
Mitkov, R. (2014). Anaphora Resolution: The State of the Art. Routledge.
Mitkov, R. (2005). The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press.
Morante, R., & Daelemans, W. (2009). A metalearning approach to processing the scope of negation. In CoNLL ’09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 21–29). Association for Computational Linguistics. ISBN 978-1-932432-29-9.
Mowery, D. L., South, B. R., Christensen, L., Leng, J., Peltonen, L.-M., Salanterä, S., et al. (2016). Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. Journal of Biomedical Semantics, 7(1), 43.
Mutalik, P. G., Deshpande, A., & Nadkarni, P. M. (2001). Use of general-purpose negation detection to augment concept indexing of medical documents. Journal of the American Medical Informatics Association, 8(6), 598–609.
Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., et al. (2016). Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016 (pp. 1659–1666). http://www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf [www.lrec-conf.org].
Nivre, J., Hall, J., & Nilsson, J. (2006). MaltParser: A data-driven parser-generator for dependency parsing. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006 (pp. 2216–2219). http://www.lrec-conf.org/proceedings/lrec2006/pdf/162_pdf.pdf. Accessed 11 Jan 2018.
Patrick, J., & Nguyen, D. (2011). Automated proof reading of clinical notes. In PACLIC, 25th Pacific Asia Conference on Language, Information and Computation (pp. 303–312).
Pirinen, T., & Lindén, K. (2010). Creating and weighting hunspell dictionaries as finite-state automata. Investigationes Linguisticae, 21, 1–16.
Pustejovsky, J., Castano, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., et al. (2003). TimeML: Robust specification of event and temporal expressions in text. New Directions in Question Answering, 3, 28–34.
Roberts, A., Gaizauskas, R., Hepple, M., & Guo, Y. (2008). Mining clinical relationships from patient narratives. BMC Bioinformatics, 9(11), 1.
Rokach, L., Romano, R., & Maimo, O. (2008). Negation recognition in medical narrative reports. Information Retrieval Journal, 11(6), 499–538.
Ruch, P., Robert, B., & Antoine, G. (2003). Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1), 169–184.
Siklósi, B., Novák, A., & Prószéky, G. (2016). Context-aware correction of spelling errors in Hungarian medical documents. Computer Speech & Language, 35, 219–233.
Skeppstedt, M. (2011). Negation detection in Swedish clinical text: An adaption of NegEx to Swedish. Journal of Biomedical Semantics, 2(Suppl 3), S3.
Skeppstedt, M. (2015). Extracting Clinical Findings from Swedish Health Record Text. PhD thesis, Department of Computer and Systems Sciences, Stockholm University.
Skeppstedt, M., Kvist, M., Nilsson, G., & Dalianis, H. (2014). Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. In Journal of Biomedical Informatics, 49, 148–158.
South, B. R., Shen, S., Jones, M., Garvin, J., Samore, M. H., Chapman, W. W., et al. (2009). Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease. BMC Bioinformatics, 10(9), S12.
Strötgen, J., & Gertz, M. (2010). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation (pp. 321–324). Association for Computational Linguistics.
Styler IV, W., Bethard, S., Finan, S., Palmer, M., Pradhan, S., de Groen, P., et al. (2014). Temporal annotation in the clinical domain. Transactions of the Association for Computational Linguistics, 2, 143–154. https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/305. Accessed 11 Jan 2018. ISSN 2307-387X.
Sun, W., Rumshisky, A., & Uzuner, Ö. (2013a). Evaluating temporal relations in clinical text: 2012 i2b2 challenge. Journal of the American Medical Informatics Association, 20(5), 806–813.
Sun, W., Rumshisky, A., & Uzuner, Ö. (2013b). Temporal reasoning over clinical text: The state of the art. Journal of the American Medical Informatics Association, 20(5), 814–819.
Szarvas, G. (2008). Hedge classification in biomedical texts with a weakly supervised selection of keywords. In Proceedings of ACL-08: HLT, Columbus, Ohio, June 2008 (pp. 281–289). Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P08/P08-1033.
Tengstrand, L., Megyesi, B., Henriksson, A., Duneld, M., & Kvist, M. (2014). EACL – Expansion of abbreviations in clinical text. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) (pp. 94–103). Association for Computational Linguistics.
Tissot, H. C. (2016). Normalisation of Imprecise Temporal Expressions Extracted from Text. PhD thesis, Computer Science at the Federal University of Paraná.
Uzuner, Ö., Bodnari, A., Shen, S., Forbush, T., Pestian, J., & South, B. R. (2012). Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association, 19(5), 786–791.
Uzuner, Ö., Solti, I., & Cadag, E. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17(5), 514–518.
Uzuner, Ö., South, B. R., Shen, S., & DuVall, S. L. (2011). 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556.
Velupillai, S. (2011). Automatic classification of factuality levels: A case study on Swedish diagnoses and the impact of local context. In Fourth International Symposium on Languages in Biology and Medicine, LBM 2011.
Velupillai, S. (2014). Temporal expressions in Swedish medical text – A pilot study. In Proceedings of BioNLP 2014, Baltimore, MD, June 2014 (pp. 88–92). Association for Computational Linguistics. http://www.aclweb.org/anthology/W14-3413. Accessed 11 Jan 2018.
Velupillai, S., Mowery, D., South, B. R., Kvist, M., & Dalianis, H. (2015). Recent advances in clinical natural language processing in support of semantic analysis. Yearbook of Medical Informatics, 10(1), 183.
Velupillai, S., Skeppstedt, M., Kvist, M., Mowery, D., Chapman, B. E., Dalianis, H., et al. (2014). Cue-based assertion classification for Swedish clinical text–Developing a lexicon for pyConTextSwe. Artificial Intelligence in Medicine, 61(3), 137–144.
Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11), S9.
Wong, W., & Glance, D. (2011). Statistical semantic and clinician confidence analysis for real-time clinical progress note cleaning. Artificial Intelligence in Medicine, 53, 171–180.
Wong, W., Liu, W., & Bennamoun, M. (2006). Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In Proceedings of the Fifth Australasian conference on Data Mining and Analytics (Vol. 61, pp. 83–89). Australian Computer Society, Inc.
Wu, Y., Rosenbloom, S. T., Denny, J. C., Miller, R. A., Mani, S., Giuse, D. A., et al. (2011). Detecting abbreviations in discharge summaries using machine learning methods. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1541). American Medical Informatics Association.
Xu, H., Stetson, P. D., & Friedman, C. (2007). A study of abbreviations in clinical notes. In AMIA Annual Symposium Proceedings (Vol. 2007, p. 821). American Medical Informatics Association.
Zeng, Q. T., Redd, D., Rindflesch, T. C., & Nebeker, J. R. (2012). Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In AMIA Annual Symposium Proceedings.
Zhou, L., Friedman, C., Parsons, S., & Hripcsak, G. (2005). System architecture for temporal information extraction, representation and reasoning in clinical narrative reports. In AMIA Annual Symposium Proceedings (pp. 869–873).
Zhou, L., & Hripcsak, G. (2007). Temporal reasoning with medical data–A review with emphasis on medical natural language processing. Journal of Biomedical Informatics, 40(2), 183–202.
Author information
Authors and Affiliations
Rights and permissions
This chapter is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
Copyright information
© 2018 The Author(s)
About this chapter
Cite this chapter
Dalianis, H. (2018). Basic Building Blocks for Clinical Text Processing. In: Clinical Text Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-78503-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-78503-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78502-8
Online ISBN: 978-3-319-78503-5
eBook Packages: Computer ScienceComputer Science (R0)