Skip to main content

Extracting Structured Data from Physician-Patient Conversations by Predicting Noteworthy Utterances

  • Chapter
  • First Online:
Explainable AI in Healthcare and Medicine

Part of the book series: Studies in Computational Intelligence ((SCI,volume 914))

Abstract

Despite concerted efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped resource of insights. In this paper, we explore the possibility of leveraging this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words) making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances—parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: North American Association for Computational Linguistics - Human Language Technologies (NAACL-HLT) (2019)

    Google Scholar 

  2. Aronson, A.R.: Effective mapping of biomedical text to the UMLs metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium. American Medical Informatics Association (2001)

    Google Scholar 

  3. Cheng, Y., Wang, F., Zhang, P., Hu, J.: Risk prediction with electronic health records: a deep learning approach. In: SIAM International Conference on Data Mining (SDM). SIAM (2016)

    Google Scholar 

  4. Datla, V., et al.: Automated clinical diagnosis: the role of content in various sections of a clinical document. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE (2017)

    Google Scholar 

  5. Dernoncourt, F., Lee, J.Y., Uzuner, O., Szolovits, P.: De-identification of patient notes with recurrent neural networks. J. Am. Med. Inform. Assoc. 24, 596–606 (2017)

    Google Scholar 

  6. Fries, J.: Brundlefly at SemEval-2016 task 12: recurrent neural networks vs. joint inference for clinical temporal information extraction. In: International Workshop on Semantic Evaluation (SemEval) (2016)

    Google Scholar 

  7. Gardner, M., et al.: AllenNLP: a deep semantic natural language processing platform. In: Workshop for NLP Open Source Software (NLP-OSS) (2018)

    Google Scholar 

  8. Gardner, R.L., et al.: Physician stress and burnout: the impact of health information technology. J. Am. Med. Inform. Assoc. 26, 106–114 (2018)

    Article  Google Scholar 

  9. Guo, D., Duan, G., Yu, Y., Li, Y., Wu, F.X., Li, M.: A disease inference method based on symptom extraction and bidirectional long short term memory networks. Methods 173, 75–82 (2019)

    Article  Google Scholar 

  10. Jagannatha, A.N., Yu, H.: Bidirectional RNN for medical event detection in electronic health records. In: North American Association for Computational Linguistics - Human Language Technologies (NAACL-HLT) (2016)

    Google Scholar 

  11. Li, C., Konomis, D., Neubig, G., Xie, P., Cheng, C., Xing, E.: Convolutional neural networks for medical diagnosis from admission notes. arXiv preprint arXiv:1712.02768 (2017)

    Google Scholar 

  12. Lipton, Z.C., Elkan, C., Naryanaswamy, B.: Optimal thresholding of classifiers to maximize F1 measure. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer (2014)

    Google Scholar 

  13. Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with LSTM recurrent neural networks. In: International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  14. Murray, G., Carenini, G.: Summarizing spoken and written conversations. In: Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2008)

    Google Scholar 

  15. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  16. Rajkomar, A., et al.: Automatically charting symptoms from patient-physician conversations using machine learning. JAMA Internal Med. 179, 836–838 (2019)

    Article  Google Scholar 

  17. Roter, D., Frankel, R.: Quantitative and qualitative approaches to the evaluation of the medical dialogue. Soc. Sci. Med. 34, 1097–1103 (1992)

    Article  Google Scholar 

  18. Roter, D.L.: Patient participation in the patient-provider interaction: the effects of patient question asking on the quality of interaction, satisfaction and compliance. Health Educ. Monographs 5, 281–315 (1977)

    Article  Google Scholar 

  19. Roter, D.L., Hall, J.A.: Physicians’ interviewing styles and medical information obtained from patients. J. General Internal Med. 2, 325–329 (1987)

    Article  Google Scholar 

  20. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22, 1589–1604 (2017)

    Article  Google Scholar 

  21. Sinsky, C., et al.: Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann. Internal Med. 165, 753–760 (2016)

    Article  Google Scholar 

  22. Soldaini, L., Goharian, N.: QuickUMLS: a fast, unsupervised approach for medical concept extraction. In: MedIR Workshop, sigir (2016)

    Google Scholar 

  23. Van Asch, V.: Macro-and micro-averaged evaluation measures. Technical report (2013)

    Google Scholar 

  24. Wang, L., Cardie, C.: Summarizing decisions in spoken meetings. In: Workshop on Automatic Summarization for Different Genres, Media, and Languages. Association for Computational Linguistics (2011)

    Google Scholar 

  25. Wu, Y., Jiang, M., Lei, J., Xu, H.: Named entity recognition in Chinese clinical text using deep neural network. Stud. Health Technol. Inform. 216, 624 (2015)

    Google Scholar 

  26. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

Download references

Acknowledgements

We gratefully acknowledge support from the Center for Machine Learning and Health in a joint venture between UPMC and Carnegie Mellon University and Abridge AI, who created the dataset that we used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kundan Krishna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Krishna, K., Pavel, A., Schloss, B., Bigham, J.P., Lipton, Z.C. (2021). Extracting Structured Data from Physician-Patient Conversations by Predicting Noteworthy Utterances. In: Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L. (eds) Explainable AI in Healthcare and Medicine. Studies in Computational Intelligence, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-030-53352-6_14

Download citation

Publish with us

Policies and ethics