Skip to main content

Making Effective Use of Healthcare Data Using Data-to-Text Technology

  • Chapter
  • First Online:
Data Science for Healthcare

Abstract

Healthcare organizations are in a continuous effort to improve health outcomes, reduce costs, and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial, and operational sources is now overtaking healthcare organizations and their patients. The effective use of this data, however, is a major challenge. Clearly, text is an important medium to make data accessible. Financial reports are produced to assess healthcare organizations on some key performance indicators to steer their healthcare delivery. Similarly, at a clinical level, data on patient status is conveyed by means of textual descriptions to facilitate patient review, shift handover, and care transitions. Likewise, patients are informed about data on their health status and treatments via text, in the form of reports, or via e-health platforms by their doctors. Unfortunately, such text is the outcome of a highly labor-intensive process if it is done by healthcare professionals. It is also prone to incompleteness and subjectivity and hard to scale up to different domains, wider audiences, and varying communication purposes. Data-to-text is a recent breakthrough technology in artificial intelligence which automatically generates natural language in the form of text or speech from data. This chapter provides a survey of data-to-text technology, with a focus on how it can be deployed in a healthcare setting. It will (1) give an up-to-date synthesis of data-to-text approaches, (2) give a categorized overview of use cases in healthcare, (3) seek to make a strong case for evaluating and implementing data-to-text in a healthcare setting, and (4) highlight recent research challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Androutsopoulos, I., Malakasiotis, P.: A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38, 135–187 (2010)

    Article  MATH  Google Scholar 

  2. Barzilay, R., Lapata, M.: Aggregation via set partitioning for natural language generation. In: Proceedings of HLT-NAACL-06, pp. 359–366 (2006)

    Google Scholar 

  3. Bateman, J.A.: Enabling technology for multilingual natural language generation: the KPML development environment. Nat. Lang. Eng. 3(1), 15–55 (1997)

    Article  Google Scholar 

  4. Bohnet, B., Wanner, L., Mille, S., Burga, A.: Broad coverage multilingual deep sentence generation with a stochastic multi-level realizer. In: Proceedings of CoLing-10, pp. 98–106 (2010)

    Google Scholar 

  5. Castro Ferreira, T., Wubben, S., Krahmer, E.: Generating flexible proper name references in text: data, models and evaluation. In: Proceedings of EACL-17, pp. 655–664 (2017)

    Google Scholar 

  6. Cawsey, A.J., Jones, R.B. Pearson, J.: The evaluation of a personalised health information system for patient with cancer. User Model. User-Adap. Inter. 10, 47–72 (2000)

    Article  Google Scholar 

  7. Chen, D.L., Raymond J., Mooney, R.J.: Learning to sportscast: a test of grounded language acquisition. In: Proceedings of ICML-08, pp. 128–135 (2008)

    Google Scholar 

  8. Cohn, T., Lapata, M.: Large margin synchronous generation and its application to sentence compression. In: Proceedings of EMNLP-CoLing-07, pp. 73–82 (2007)

    Google Scholar 

  9. Dale, R., Reiter, E.: Computational interpretations of the Gricean maxims in the generation of referring expressions. Cogn. Sci. 19(2), 233–263 (1995)

    Article  Google Scholar 

  10. Dale, R., White, M.: Shared tasks and comparative evaluation in natural language generation: workshop report. Technical report, Ohio State University, Arlington, VA (2007)

    Google Scholar 

  11. De Rosis, F., Grasso, F.: Affective natural language generation. In: Paiva, A. (ed.) Affective Interactions, pp. 204–218. Springer, Berlin (2000)

    Chapter  Google Scholar 

  12. Dethlefs, N.: Context-sensitive natural language generation: from knowledge-driven to data-driven techniques. Lang. Linguist. Compass 8(3), 99–115 (2014)

    Article  Google Scholar 

  13. Dras, M.: Tree adjoining grammar and the reluctant paraphrasing of text. Ph.D. thesis, Macquarie University, Sydney (1999)

    Google Scholar 

  14. Edmonds, P., Hirst, G.: Near-synonymy and lexical choice. Comput. Linguist. 28(2), 105–144 (2002)

    Article  Google Scholar 

  15. Elhadad, N.: Comprehending technical texts: predicting and defining unfamiliar terms. In: Proceedings of AMIA-06, pp. 239–243 (2006)

    Google Scholar 

  16. Elhadad, M., Robin, J.: An overview of SURGE: a reusable comprehensive syntactic realization component. In: Proceedings of INLG-98, pp. 1–4 (1996)

    Google Scholar 

  17. Elhadad, M., Robin, J., McKeown, K.R.: Floating constraints in lexical choice. Comput. Linguist. 23(2), 195–239 (1997)

    Google Scholar 

  18. Elting, L.S., Martin, C.G., Cantor, S.B., Rubenstein, E.B.: Influence of data display formats on physician investigators’ decisions to stop clinical trials: prospective trial with repeated measures. Br. Med. J. (Clin. Res. Ed.) 318(7197), 1527–1531 (1999)

    Article  Google Scholar 

  19. Filippova, K., Strube, M.: Tree linearization in English: improving language model based approaches. In: Proceedings of NAACL-HLT-09, pp. 225–228 (2009)

    Google Scholar 

  20. Ford E.S., Bergmann, M.M., Boeing, H., Li, C., Capewell, S.: Healthy lifestyle behaviors and all-cause mortality among adults in the United States. Prev. Med. 55(1), 23–27 (2012). https://doi.org/10.1016/j.ypmed.2012.04.016

    Article  Google Scholar 

  21. Ganeshan, D., Duong, P.T., Probyn, L., Lenchik, L., McArthur, T.A., Retrouvey, M., Ghobadi, E.H., Desouches, S.L., Pastel, D., Francis, I.R.: Structured reporting in radiology. Acad. Radiol. 25(1), 66–73 (2018). https://doi.org/10.1016/j.acra.2017.08.005

    Article  Google Scholar 

  22. Garcia-Retamero, R., Galesic, M.: Who profits from visual aids: overcoming challenges in people’s understanding of risks. Soc. Sci. Med. 70(7), 1019–1025 (2010)

    Article  Google Scholar 

  23. Gatt, A., Krahmer, E.: Automatic text generation: a survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Intell. Res. 61, 65–170 (2018)

    Article  MATH  Google Scholar 

  24. Gatt, A., Reiter, E.: SimpleNLG: a realisation engine for practical applications. In: Proceedings of ENLG-09, pp. 90–93 (2009)

    Google Scholar 

  25. Gatt, A., Portet, F., Reiter, E., Hunter, J., Mahamood, S., Moncur, W., Sripada, S.: From data to text in the neonatal intensive care unit: using NLG technology for decision support and information management. AI Commun. 22(3), 153–186 (2009)

    MathSciNet  Google Scholar 

  26. Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  27. Goldberg, Y.: An adversarial review of ‘adversarial generation of natural language’. https://goo.gl/EMipHQ (2017) . Cited 13 July 2018

  28. Grosz, B., Joshi, A.K., Weinstein, S.: Centering: a framework for modeling the local coherence of discourse. Comput. Linguist. 21(2), 203–225 (1995)

    Google Scholar 

  29. Harbusch, K., Kempen, G.: Generating clausal coordinate ellipsis multilingually: a uniform approach based on postediting. In: Proceedings of ENLG-09, pp. 138–145 (2009)

    Google Scholar 

  30. Hardy, W., Powers, J., Jasko, J.G., Stitt, C., Lotz, G., Aloia, M.: SleepMapper: a mobile application and website to engage sleep apnea patients in PAP therapy and improve adherence to treatment. In: Proceedings of SLEEP-14, APSS (2014)

    Google Scholar 

  31. Harris, M.D.: Building a large-scale commercial NLG system for an EMR. In: INLG-08, pp. 157–160 (2008)

    Google Scholar 

  32. Holmes-Rovner, M., Kelly-Blake, K., Dwamena, F., Dontje, K., Henry, R.C., Olomu, A., Rovner, D.R., Rothert, M.L.: Shared decision making guidance reminders in practice (SDM-GRIP). Patient Educ. Couns. 85(2), 214–224 (2011)

    Article  Google Scholar 

  33. Hovy, E.H.: Generating Natural Language Under Pragmatic Constraints. Lawrence Erlbaum Associates, Hillsdale (1988)

    Google Scholar 

  34. Hunter, B., Buckley, C.: Population health management 2017, part 1: validating adoption of PHM functionality. KLAS research report (2017)

    Google Scholar 

  35. Hunter, B., Buckley, C.: Population health management 2017, part 2: balancing collaboration and functionality. KLAS research report (2017)

    Google Scholar 

  36. Hunter, J., Freer, Y., Gatt, A., Reiter, E., Sripada, S., Sykes, C.: Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-Nurse. Artif. Intell. Med. 56(3), 157–172 (2012)

    Article  Google Scholar 

  37. Hüske-Kraus, D.: Text generation in clinical medicine—a review. Methods Inf. Med. 42(1), 51–60 (2003)

    Article  Google Scholar 

  38. Hüske-Kraus, D.: Suregen-2: a shell system for the generation of clinical documents. In: Proceedings of EACL-03, pp. 215–218 (2003)

    Google Scholar 

  39. Kahn, M.G., Fagan, L., Sheiner, L.B.: Model-based interpretation of time-varying medical data. In: Proceedings of Annual Symposium on Computer Application in Medical Care-89, pp. 28–32 (1989)

    Google Scholar 

  40. Kay, M.: Chart generation. In: Proceedings of ACL-96, pp. 200–204 (1996)

    Google Scholar 

  41. Kondadadi, R., Howald, B., Schilder, F.: A statistical NLG framework for aggregated planning and realization. In: CoLing-13, pp. 1406–1415 (2013)

    Google Scholar 

  42. Konstas, I., Lapata, M.: A global model for concept-to-text generation. J. Artif. Intell. Res. 48, 305–346 (2013). https://doi.org/10.1613/jair.4025

    Article  MATH  Google Scholar 

  43. Krahmer, E., Van Deemter, K.: Computational generation of referring expressions: a survey. Comput. Linguist. 38, 173–218 (2012)

    Article  Google Scholar 

  44. Langkilde-Geary, I., Knight, K.:. HALogen statistical sentence generator. In: Proceedings of ACL-02 (Demos), pp. 102–103 (2002)

    Google Scholar 

  45. Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: ACL-05, pp. 65–72 (2005)

    Google Scholar 

  46. Law, A.S., Freer, Y., Hunter, J., Logie, R.H., McIntosh, N., Quinn, J.: A comparison of graphical and textual presentations of time series data to support medical decision making in the neonatal intensive care unit. J. Clin. Monit. Comput. 19(3), 183–194 (2005)

    Article  Google Scholar 

  47. Lee, D., Pate, R.R., Lavie, C.J., Sui, X., Church, T., Blair, S.: Leisure-time running reduces all-cause and cardiovascular mortality risk. J. Am. Coll. Cardiol. 64(5), 472–481 (2014)

    Article  Google Scholar 

  48. Lennox, S., Osman, L., Reiter, E., Robertson, R., Friend, J., McCann, I., Skatun, D., Donnan, P.: The cost-effectiveness of computer-tailored and non-tailored smoking cessation letters in general practice: a randomised controlled study. Br. Med. J. 322, 13–96 (2001)

    Article  Google Scholar 

  49. Lin, C.Y., Hovy, E.H.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of HLT-NAACL-03, pp. 71–78 (2003)

    Google Scholar 

  50. Lin, C.Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of ACL-04, pp. 605–612 (2004)

    Google Scholar 

  51. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)

    Article  Google Scholar 

  52. Madnani, N., Dorr, B.J.: Generating phrasal and sentential paraphrases: a survey of data-driven methods. Comput. Linguist. 36(3), 341–387 (2010)

    Article  MathSciNet  Google Scholar 

  53. Mahamood, S., Reiter, E.: Generating affective natural language for parents of neonatal infants. In: Proceedings of ENLG-2011, pp. 12–21 (2011)

    Google Scholar 

  54. Mahamood, S., Reiter, E., Mellish, C.: Neonatal intensive care information for parents – an affective approach. In: Proceedings of CBMS-08, pp. 461–463 (2008)

    Google Scholar 

  55. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)

    Article  Google Scholar 

  56. McRoy, S.W, Channarukul, S., Ali, S.S.: An augmented template-based approach to text realization. Nat. Lang. Eng. 9(4), 381–420 (2003)

    Article  Google Scholar 

  57. Mille, S., Bohnet, B., Wanner, L., Belz, A.: Multilingual surface realization using universal dependency trees. In: Proceedings of INLG-17, pp. 120–123 (2017)

    Google Scholar 

  58. Monico, E., Schwartz, I.: Communication and documentation of preliminary and final radiology reports. J. Healthc. Risk Manag. 30, 23–25 (2010). https://doi.org/10.1002/jhrm.20039

    Article  Google Scholar 

  59. Pezzullo, J.A., Tung, G.A., Rogg, J.M., Davis, L.M., Brody, J.M., Mayo-Smith, W.W.: Voice recognition dictation: radiologist as transcriptionist. J. Digit. Imaging 21(4), 384–389 (2008)

    Article  Google Scholar 

  60. Poesio, M., Stevenson, R., Di Eugenio, B., Hitzeman, J.: Centering: a parametric theory and its instantiations. Comput. Linguist. 30(3), 309–363 (2004)

    Article  Google Scholar 

  61. Portet, F., Reiter, E., Gatt, A., Hunter, J., Sripada, S., Freer, Y., Sykes, C.: Automatic generation of textual summaries from neonatal intensive care data. Artif. Intell. 173(7–8), 789–816 (2009)

    Article  Google Scholar 

  62. Power, R., Scott, D., Bouayad-Agha, N.: Document structure. J. Comput. Linguist. 29(2), 211–260 (2003)

    Article  Google Scholar 

  63. Rajkumar, R., White, M.: Better surface realization through psycholinguistics. Lang. Linguist. Compass, 8(10), 428–448 (2014)

    Article  Google Scholar 

  64. Reiser, S.: Technological Medicine: The Changing World of Doctors and Patients. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  65. Reiter, E.: An architecture for data-to-text systems. In: Proceedings of ENLG-07, pp. 97–104 (2007)

    Google Scholar 

  66. Reiter, E., Dale, R.: Building natural language generation systems. Nat. Lang. Eng. 3, 57–87 (1997)

    Article  Google Scholar 

  67. Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)

    Book  Google Scholar 

  68. Reiter, E., Robertson, R., Osman, L.M.: Lessons from a failure: generating tailored smoking cessation letters. Artif. Intell. 144(1–2), 41–58 (2003)

    Article  Google Scholar 

  69. Reiter, E., Sripada, S., Hunter, J.R., Yu, J., Davy, I.: Choosing words in computer-generated weather forecasts. Artif. Intell. 167(1–2), 137–169 (2005)

    Article  Google Scholar 

  70. Reiter, E., Gatt, A., Portet, F., Van der Meulen, M.: The importance of narrative and other lessons from an evaluation of an NLG system that summarises clinical data. In: Proceedings of INLG-08, pp. 147–156 (2008)

    Google Scholar 

  71. Salzburg Global Seminar: Salzburg statement on shared decision making. Br. Med. J. (Clin. Res. Ed.) 342, d1745 (2011)

    Google Scholar 

  72. Schiphof-Godart, L., Hettinga, F.J.: Passion and pacing in endurance performance. Front. Physiol. 8, 83 (2017)

    Article  Google Scholar 

  73. Shaw, J.: Clause aggregation using linguistic knowledge. In: Proceedings of IWNLG-98, pp. 138–148 (1998)

    Google Scholar 

  74. Siddharthan, A., Nenkova, A., McKeown, K.R.: Information status distinctions and referring expressions: an empirical study of references to people in news summaries. Comput. Linguist. 37(4), 811–842 (2011)

    Article  Google Scholar 

  75. Siegler, E.L.: The evolving medical record. Ann. Intern. Med. 153(10), 671–677 (2010)

    Article  Google Scholar 

  76. Sinsky, C., Colligan, L., Li, L., Prgomet, M., Reynolds, S., Goeders, L., Westbrook, J., Tutty, M., Blike, G.: Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann. Intern. Med. 165, 753–60 (2016)

    Article  Google Scholar 

  77. Spiegelhalter, D., Pearson, M., Short, I.: Visualizing uncertainty about the future. Science 333(6048), 1393–1400 (2011)

    Article  Google Scholar 

  78. Sripada, S., Gao, F.: Linguistic interpretations of scuba dive computer data. In: Proceedings of ICIV-07, pp. 436–444 (2007)

    Google Scholar 

  79. Stede, M.: The hyperonym problem revisited: conceptual and lexical hierarchies in language. In: Proceedings of INLG-00, pp. 93–99 (2000)

    Google Scholar 

  80. Stiggelbout, A.M., Van der Weijden, T., De Wit, M.P.T., Frosch, D., Légaré, F., Montori, V.M., Trevena, L., Elwyn, G.: Shared decision making: really putting patients at the centre of healthcare. Br. Med. J. 344, e256 (2012)

    Article  Google Scholar 

  81. Tatousek, J., Lacroix, J., Visser, T., Den Teuling, N.: Promoting adherence to CPAP with tailored education and feedback: a randomized controlled clinical trial. In: Proceedings of Sleep 2015 (2016)

    Google Scholar 

  82. Theune, M., Hielkema, F., Hendriks, P.: Performing aggregation and ellipsis using discourse structures. Res. Lang. Comput. 4, 353–375 (2006)

    Article  Google Scholar 

  83. Tintarev, N., Reiter, E., Black, R., Waller, A., Reddington, J.: Personal storytelling: using natural language generation for children with complex communication needs, in the wild. Int. J. Hum. Comput. Stud. 92–93, 1–16 (2016)

    Article  Google Scholar 

  84. Travis, A.R., Sevenster, M., Ganesh, R., Peters, J.F., Chang, P.J.: Preferences for structured reporting of measurement data: an institutional survey of medical oncologists, oncology registrars and radiologists. Acad. Radiol. 21(6), 785–796 (2014)

    Article  Google Scholar 

  85. Van Deemter, K.: Not Exactly: In Praise of Vagueness. Oxford University Press, Oxford (2012)

    Google Scholar 

  86. Van Deemter, K.: Designing algorithms for referring with proper names. In: Proceedings of INLG-16, pp. 31–35 (2016)

    Google Scholar 

  87. Van Deemter, K., Krahmer, E., Theune, M.: Real versus template-based natural language generation: a false opposition? Comput. Linguist. 31(1), 15–24 (2005)

    Article  Google Scholar 

  88. Van der Meulen, M., Logie, R.H., Freer, Y., Sykes, C., McIntosh, N., Hunter, J.: When a graph is poorer than 100 words: a comparison of computerised natural language generation, human generated descriptions and graphical displays in neonatal intensive care. Appl. Cogn. Psychol. 21, 1057–1075 (2007). https://doi.org/10.1002/acp

    Article  Google Scholar 

  89. Van Genugten, L., Calo, R., Van Wissen, A., Vinkers, C., Van Halteren, A.: Psychosocial health coaching for chronically ill in a telehealth context: a pilot study. In: Frontiers in Public Health, Conference Abstract: 2nd Behaviour Change Conference: Digital Health and Wellbeing (2016). https://doi.org/10.3389/conf.FPUBH.2016.01.00108

  90. Walker, M.A.: Redundancy in collaborative dialogue. In: Proceedings of CoLing-92, pp. 345–351 (1992)

    Google Scholar 

  91. Wenger, N., Méan, M., Castioni, J., Marques-Vidal, P., Waeber, G., Garnier, A.: Allocation of internal medicine resident time in a Swiss Hospital: a time and motion study of day and evening shifts. Ann. Intern. Med. 166, 579–586 (2017)

    Article  Google Scholar 

  92. White, M., Rajkumar, R.: Minimal dependency length in realization ranking. In: Proceedings of EMNLP-12, pp. 244–255 (2012)

    Google Scholar 

  93. Wilkinson, K.M., Hennig, S.: The state of research and practice in augmentative and alternative communication for children with developmental/intellectual disabilities. Ment. Retard. Dev. Disabil. Res. Rev. 13, 58–69 (2007)

    Article  Google Scholar 

  94. Wubben, S., Van den Bosch, A.P.J., Krahmer, E.J.: Creating and using large monolingual parallel corpora for sentential paraphrase generation. In: LREC-14, pp. 4295–4299 (2014)

    Google Scholar 

  95. Yu, J., Reiter, E., Hunter, J., Mellish, C.: Choosing the content of textual summaries of large time-series data sets. Nat. Lang. Eng. 13(1), 25–49 (2007)

    Article  Google Scholar 

  96. Zeng-Treitler, Q., Goryachev, S., Kim, H., Keselman, A., Rosendale, D.: Making texts in electronic health records comprehensible to consumers: a prototype translator. In: AMIA-07, pp. 846–850 (2007)

    Google Scholar 

  97. Zeng-Treitler, Q., Goryachev, S., Tse, T., Keselman, A., Boxwala, A.: Estimating consumer familiarity with health terminology: a context-based approach. J. Am. Med. Inform. Assoc. 15(3), 349–356 (2008). https://doi.org/10.1197/jamia.M2592

    Article  Google Scholar 

  98. Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: CoLing-10, pp. 1353–1361 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffen Pauws .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Pauws, S., Gatt, A., Krahmer, E., Reiter, E. (2019). Making Effective Use of Healthcare Data Using Data-to-Text Technology. In: Consoli, S., Reforgiato Recupero, D., Petković, M. (eds) Data Science for Healthcare. Springer, Cham. https://doi.org/10.1007/978-3-030-05249-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05249-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05248-5

  • Online ISBN: 978-3-030-05249-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics