Deep Text Prior: Weakly Supervised Learning for Assertion Classification

  • Vadim LiventsevEmail author
  • Irina Fedulova
  • Dmitry Dylov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11731)


The success of neural networks is typically attributed to their ability to closely mimic relationships between features and labels observed in the training dataset. This, however, is only part of the answer: in addition to being fit to data, neural networks have been shown to be useful priors on the conditional distribution of labels given features and can be used as such even in the absence of trustworthy training labels. This feature of neural networks can be harnessed to train high quality models on low quality training data in tasks for which large high-quality ground truth datasets don’t exist. One of these problems is assertion classification in biomedical texts: discriminating between positive, negative and speculative statements about certain pathologies a patient may have. We present an assertion classification methodology based on recurrent neural networks, attention mechanism and two flavours of transfer learning (language modelling and heuristic annotation) that achieves state of the art results on MIMIC-CXR radiology reports.


Assertion classification Natural language processing Biomedical texts Deep learning Transfer learning Weakly supervised learning 



The authors would like to acknowledge Artem Shelmanov and Ilya Sochenkov for sharing their expertise in natural language processing, mentorship and support.


  1. 1.
    Uzuner, Ö., Zhang, X., Sibanda, T.: Machine learning and rule-based approaches to assertion classification. J. Am. Med. Inform. Assoc. 16(1), 109–115 (2009)CrossRefGoogle Scholar
  2. 2.
    Goff, D.J., Loehfelm, T.W.: Automated radiology report summarization using an open-source natural language processing pipeline. J. Digit. Imaging 31(2), 185–192 (2018)CrossRefGoogle Scholar
  3. 3.
    Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl-1), D267–D270 (2004)CrossRefGoogle Scholar
  4. 4.
    Chute, C.G., et al.: Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17(5), 507–513 (2010). Scholar
  5. 5.
    Soldaini, L., Goharian, N.: Quickumls: a fast, unsupervised approach for medical concept extraction. In: MedIR Workshop, sigir (2016)Google Scholar
  6. 6.
    Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)Google Scholar
  7. 7.
    Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552–556 (2011)CrossRefGoogle Scholar
  8. 8.
    Miranda, E., Aryuni, M., Irwansyah, E.: A survey of medical image classification techniques. In: 2016 International Conference on Information Management and Technology (ICIMTech), pp. 56–61, November 2016.
  9. 9.
    Lai, M.: Deep learning for medical image segmentation. arXiv preprint arXiv:1505.02000 (2015)
  10. 10.
    Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)CrossRefGoogle Scholar
  11. 11.
    Johnson, A.E., et al.: MIMIC-CXR: a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 (2019)
  12. 12.
    Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. arXiv preprint arXiv:1901.07031 (2019)
  13. 13.
    Rubin, J., Sanghavi, D., Zhao, C., Lee, K., Qadir, A., Xu-Wilson, M.: Large scale automated reading of frontal and lateral chest x-rays using dual convolutional neural networks. arXiv preprint arXiv:1804.07839 (2018)
  14. 14.
    Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34(5), 301–310 (2001)CrossRefGoogle Scholar
  15. 15.
    Mehrabi, S., et al.: DEEPEN: a negation detection system for clinical text incorporating dependency relation into NegEx. J. Biomed. Inform. 54, 213–219 (2015)CrossRefGoogle Scholar
  16. 16.
    Enger, M., Velldal, E., Øvrelid, L.: An open-source tool for negation detection: a maximum-margin approach. In: Proceedings of the Workshop Computational Semantics Beyond Events and Roles, pp. 64–69 (2017)Google Scholar
  17. 17.
    Peng, Y., Wang, X., Lu, L., Bagheri, M., Summers, R.M., Lu, Z.: NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. CoRR abs/1712.05898 (2017).
  18. 18.
    Shelmanov, A., Smirnov, I., Vishneva, E.: Information extraction from clinical texts in Russian. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 14, pp. 537–549 (2015)Google Scholar
  19. 19.
    Afzal, Z., Pons, E., Kang, N., Sturkenboom, M.C., Schuemie, M.J., Kors, J.A.: ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus. BMC Bioinform. 15(1), 373 (2014)CrossRefGoogle Scholar
  20. 20.
    Sleator, D.D., Temperley, D.: Parsing English with a link grammar. arXiv preprint cmp-lg/9508004 (1995)Google Scholar
  21. 21.
    McCray, A.T., Srinivasan, S., Browne, A.C.: Lexical methods for managing variation in biomedical terminologies. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, p. 235. American Medical Informatics Association (1994)Google Scholar
  22. 22.
    Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750 (2014)Google Scholar
  23. 23.
    Wu, S., et al.: Negation’s not solved: generalizability versus optimizability in clinical natural language processing. PLoS One 9(11), e112774 (2014)CrossRefGoogle Scholar
  24. 24.
    Apostolova, E., Tomuro, N., Demner-Fushman, D.: Automatic extraction of lexico-syntactic patterns for detection of negation and speculation scopes. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2, pp. 283–287. Association for Computational Linguistics (2011)Google Scholar
  25. 25.
    Zou, B., Zhou, G., Zhu, Q.: Tree kernel-based negation and speculation scope detection with structured syntactic parse features. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 968–976 (2013)Google Scholar
  26. 26.
    Torralba, A., Efros, A.A.: Unbiased look at dataset bias (2011)Google Scholar
  27. 27.
    de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: NRC at i2b2: one challenge, three practical tasks, nine statistical systems, hundreds of clinical records, millions of useful featuresGoogle Scholar
  28. 28.
    Clark, C., et al.: Determining assertion status for medical problems in clinical recordsGoogle Scholar
  29. 29.
    Demner-Fushman, D., Apostolova, E., Islamaj Dogan, R., et al.: NLM’s system description for the fourth i2b2/va challenge. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, Boston, MA, USA: i2b2 (2010)Google Scholar
  30. 30.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) zbMATHGoogle Scholar
  31. 31.
    Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2017)CrossRefGoogle Scholar
  32. 32.
    Olivier Chapelle, B.S., Zien, A.: Semi-Supervised Learning. Adaptive Computation and Machine Learning Series. MIT Press, Cambridge (2010)Google Scholar
  33. 33.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  34. 34.
    Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Hanneke, S., et al.: Theory of disagreement-based active learning. Found. Trends® Mach. Learn. 7(2–3), 131–309 (2014)CrossRefGoogle Scholar
  36. 36.
    Zhang, C., Chaudhuri, K.: Beyond disagreement-based agnostic active learning. In: Advances in Neural Information Processing Systems, pp. 442–450 (2014)Google Scholar
  37. 37.
    Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. arXiv preprint arXiv:1712.05055 (2017)
  38. 38.
    Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Advances in Neural Information Processing Systems, pp. 1189–1197 (2010)Google Scholar
  39. 39.
    Jiang, L., Meng, D., Zhao, Q., Shan, S., Hauptmann, A.G.: Self-paced curriculum learning. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)Google Scholar
  40. 40.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. CoRR abs/1607.04606 (2016).
  41. 41.
    Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
  42. 42.
    Chelba, C., et al.: One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005 (2013)
  43. 43.
    Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017).
  44. 44.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  45. 45.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014).
  46. 46.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  47. 47.
    Varma, S., Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 7(1), 91 (2006)CrossRefGoogle Scholar
  48. 48.
    Sigurd, B., Eeg-Olofsson, M., Van De Weijer, J.: Word length, sentence length and frequency - Zipf revisited. Studia Linguistica 58(1), 37–52 (2004). Scholar
  49. 49.
    Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Philips Innovation Labs RUSMoscowRussia
  2. 2.Skolkovo Institure of Science and TechnologyMoscowRussia

Personalised recommendations