What Happened to Me while I Was in the Hospital? Challenges and Opportunities for Generating Patient-Friendly Hospitalization Summaries

  • Sabita Acharya
  • Andrew D. Boyd
  • Richard Cameron
  • Karen Dunn Lopez
  • Pamela Martyn-Nemeth
  • Carolyn Dickens
  • Amer Ardati
  • Jose D. FloresJr
  • Matt Baumann
  • Betty Welland
  • Barbara Di Eugenio
Research Article
Part of the following topical collections:
  1. Special Issue on Health Behavior in the Information Age


Comprehending medical information is a challenging task, especially for people who have not received formal medical education. When patients are discharged from the hospital, they are provided with lengthy medical documents that contain intricate terminologies. Studies have shown that if people do not understand the content of their health documents, they will neither look for new information regarding their illness nor will they take actions to prevent or recover from their health issue. In this article, we highlight the need for generating personalized hospital-stay summaries and several research challenges associated with this task. The proposed directions are directly informed by our ongoing work in generating concise and comprehensible hospitalization summaries that are tailored to suit the patient’s understanding of medical terminologies and level of engagement in improving their own health. Our preliminary evaluation shows that our summaries effectively present required medical concepts.


Discharge summaries Natural language generation Personalization Simplification Patient-centric information 

1 Introduction

From personal health records and mobile health apps, to the hundreds of health websites that are readily available online, patients have access to an abundance of resources that provide information about their health issues, possible causes, and various treatments [1, 2, 3]. However, more sources of information do not always ensure that patients can understand and utilize all the information they receive. For all the investment in consumer health technology, only 12% of US adults have proficient health literacy (i.e., capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions [4]), while over a third of US adults have difficulty with common health tasks such as following directions on a prescription drug label or adhering to medical instructions [5, 6]. More often, patients end up discarding the health documents that are provided to them, either because they get overwhelmed with too much information or because they find it hard to comprehend the medical terminologies that such documents are usually flooded with [7].

In addition, most health documents are created using “one size fits all” approach and by keeping the “general” population in mind [8]. However, not many people can benefit from such documents because they fail to address various factors (like patient’s concerns, interests, health literacy, etc.) that play a significant role in determining the amount and kind of information a patient can comprehend. It is estimated that patients with chronic issues like heart failure (as is our patient population under consideration) are responsible for around 95% of their chronic illness care, and their daily decisions have a huge impact on their quality of life [9]. Hence, it is essential for such patients to be able to understand and follow their health instructions because they still need to continue much of the care that was provided to them in the hospital [10]. Studies have shown that the patients’ perspective is essential for patient education [11] and that engaging the patients in their own care reduces hospitalizations, improves quality of life, and prevents further deterioration of their health [12, 13]. According to a recent survey conducted on patients with chronic disease by West [14], 91% of the patients mentioned that they need help for managing their disease. Similarly, two in every three patients (66%) said that they do not get valuable personalized information from their provider and receive very general information instead.

In this article, we describe our efforts on providing hospitalization information to heart failure patients in an engaging and understandable manner. After briefly describing our previous work on automatically generating hospital-stay summaries [15] and simplifying difficult terminologies [16], we discuss the next steps in developing personalized summaries targeted to the patient’s level of understanding and engagement. These personalized summaries will be provided to patients when they are discharged from the hospital.

2 Background

Discharge notes have been used by secondary care providers as a means of communicating patient’s diagnosis and related information to primary care providers for a very long time [17]. Often, discharge notes and patient education materials are the only form of communication that accompany a patient to the next setting of care [18]. An effective way of helping patients take care of themselves after discharge is to provide them with a summary of what happened to them, how they were taken care of in the hospital, and what they should do next to maintain their health [19]. Since our summaries are generated for patients with heart issues, we contend that the information from both the physician and nursing documentation should be included, so that even after being discharged, patients can still follow the care regimen that was provided to them by the nurses in the hospital. One of the most frequently used techniques for summarizing information is to identify the “important” sentences from text and arrange them together [20, 21]. However, since our physician and nursing documents are heterogeneous, we need to understand and interpret the content of the source text and convey the information by generating concise sentences. This approach for generating summaries is considered to be harder than the more common sentence extraction approach and has hence been explored less frequently [22, 23]. Existing summarization systems, including the recent neural network-based approaches [24, 25], have usually been applied to non-medical datasets (like newswire, weather data, etc.) [26, 27]. Most of the systems that summarize medical information provide literature abstracts for health professionals [28, 29], while the few that produce content for patients focus on data-to-text summarization, where data mostly consists of test results and observational notes [30, 31].

In summarization, there has been work on producing user-centric content. Diaz and Gervas [32] produce personalized summary by calculating the similarity between the user model for a specific individual and each one of the sentences in the document that are relevant to a given user model. Kumar et al. [33] compare sentences to the information found on the internet about the person as a basis for identifying relevant sentences. A system that develops patient-friendly health content should also take the patient’s ability to understand medical information and their personal concerns into consideration. Even though several studies have shown that patients prefer tailored intervention [34] and personalized health information as compared to generic ones [35, 36], only a few of the existing systems generate personalized content for the patients [30, 37]. The seminal work by Jimison et al. [38] takes a patient’s medical history into account and provides generic materials on the health issue and explanations of treatment that are gathered from textbooks, medical experts, and patient-oriented literature. The drawback of this approach is that it depends upon a manually populated knowledge base that needs to be constantly updated. DiMarco et al. [39] uses a selection and repair approach, which depending upon the specific conditions of the patient selects different portions of text and arranges them in a proper order before providing them to the patient. However, this approach assumes that a master document with annotations for when to include what content is already present. The BabyTalk system [40] generates different documents for people occupying different roles in the Neonatal Intensive Care Unit by making use of handcrafted ontologies, which are very time and resource intensive to construct. Our approach to personalization uses several parameters that determine the content to be included in our summary, similarly to the PERSONAGE system [41], a parameterizable language generator that takes the user’s linguistic style into account and generates restaurant recommendations. To the best of our knowledge, there are no existing systems that generate comprehensible and personalized hospital-stay summaries for patients. Moreover, the combination of the four different factors that guide our personalization process has not been explored before.

Apart from summarizing information, a system that produces patient-centric content should also deal with situations where a patient is not able to understand the content of the health document. While more than 50% of the patients that are admitted to hospital due to heart issues are found to be readmitted within 6 months of discharge [42], research has shown that patients who understand their after-hospital care instructions are 30% less likely to be readmitted [43]. Medical content can be made more understandable by performing tasks like ruling out acronyms and ambiguous words, replacing difficult terminologies with simpler ones, providing explanations, etc. One of the most common approaches for performing this task is to consider the terms that are present in some medical vocabularies [44, 45] or occur less frequently in a corpus [46, 47] as being difficult and replacing them with explanations obtained from the vocabularies. However, these approaches work only for single-word terms and completely rely on existing medical vocabularies, which are not comprehensive enough.

3 Materials and Method

3.1 Data Sources

Motivated by studies which demonstrate that doctors and nurses focus on different aspects of care [48, 49, 50], and that patients need to continue much of the direct care that is provided by hospital nurses [10], we set out to summarize the information from both the physician and nursing documents. We used the physician and nursing documentation for 58 de-identified patients that were discharged with a medical diagnosis of heart failure. The doctor’s discharge summary of the medical aspects of the hospitalization is in free text format (as shown in Fig. 1b), while the nurses use structured nursing terminologies in a plan of care (POC) software called Hands-on Automated Nursing Data System (HANDS) [51] (as shown in Fig. 1a). HANDS uses the following taxonomies [52]: NANDA-I for nursing diagnoses, NOC for outcomes, and NIC for nursing interventions.
Fig. 1

Two heterogeneous sources of information used for generating summary

3.2 Workflow

The workflow of our personalization system, which we will henceforth refer to as PatientNarr, is shown in Fig. 2. PatientNarr is based on our previous work [15], where we set up the core of our Natural Language Generation pipeline, including the extraction module that extracts the medical terms from physician and nursing documents and explores the relation between them. The simplification module is responsible for identifying difficult medical concepts and providing explanations for them [16]. The roles of the extraction and simplification modules in generating concise and comprehensible summaries are described briefly in Sects. 3.2.1 and 3.2.2, respectively. The four factors: Patient Activation Measure (PAM) score (i.e. patient’s degree of engagement), health literacy score, familiarity with health issues, concerns and strengths as shown in Fig. 2 are responsible for guiding our personalization process and are described in detail in Sect. 3.2.3.
Fig. 2

Schematic representation of PatientNarr

3.2.1 Extraction Module

In this module, all the medical concepts that are present in the hospital course section of the doctor’s discharge notes are extracted by using MedLEE [53], a medical information extraction tool. MedLEE maps the entities to concepts in the Unified Medical Language System (UMLS) knowledge source [54]. UMLS consists of more than 3 million concepts, each of which is identified by a unique id called concept unique identifiers (CUIs). For example, the CUI for the concept cerebrovascular accident is C0038454. Since the terminologies from nursing POC are already included in the UMLS, they also have corresponding CUIs. To generate the summary, for each patient, we build a graph by querying UMLS for CUIs that are related to the CUIs extracted from the doctor’s note and nursing POC. From the graph, we extract only those CUIs that either belong to the original documents or connect a pair of doctor-originated and nurse-originated concepts that would otherwise remain unconnected. These concepts are candidates for inclusion in our summaries and are first passed through the Simplification module.

3.2.2 Simplification Module

In this module, concepts are first filtered out based on whether they need to or do not need to be simplified. Our concepts can consist of a single word (e.g. headache) or multiple words (e.g. cerebral tissue perfusion). Existing metrics for assessing health literacy (REALM, TOFHLA, NAALS) and reading level (Felsch, Fry Graph, SMOG) work only on sentences and not on words [55]. Hence, we developed a new metric for assessing the complexity of a medical concept. We extracted several features (the complete list is shown in Table 1) from our training dataset and performed linear regression with the complexity of the concept (i.e., a 0 or 1 label) as the dependent variable. This provided us with a linear regression function and helped us identify the features that are significant for predicting the complexity of a concept. We then performed clustering on the entire dataset with our training data as cluster seeds. We supplied the feature values of the concepts in each cluster to the linear regression function for obtaining their scores. These scores were used to identify consistent thresholds across all the clusters that can be used for distinguishing between simple and complex concepts.
Table 1

Different features that are extracted for predicting complexity of a concept



Lexical features (shallow)

Number of vowels, consonants, prefixes, suffixes, letters, syllables per word, nouns, verbs, adjectives, prepositions, conjunctions, determiners, adverbs, numerals (extracted by the Stanford parser)

Vocabulary-based feature

Normalized frequency from Google n-gram corpus, presence/absence of the word in WordNet

UMLS-derived features

Number of semantic types, synonyms, and CUIs that are identified for the term; Whether the term is present in CHV; Whether the entire term has a CUI; Whether the semantic type of the term is one of the 16 semantic types enlisted by Ramesh et al. [56].

Hence, given a new medical concept, our metric extracts different features from the concept and supplies their values to the linear regression function that was obtained during the training phase. It then uses the thresholds to determine whether the concept needs to be simplified or not (for details, please refer to [16]). Once a concept is identified as being Complex, it is sent to the definition extractor and ranker unit (see Fig. 2). This unit retrieves the definitions of the concept from three different knowledge sources: Wikipedia, WordNet [57], and UMLS,1 and performs the following steps: (a) extracts all the medical concepts present in the definition by mapping them to UMLS, (b) uses our metric for determining the score for each term, (c) adds the scores of the terms in the definition, and (d) chooses the definition with the lowest score as the simplest definition. For instance, given a concept cerebrovascular accident, which is identified as being Complex by our metric, the definition extractor and ranker module extracts definitions of the term from the three knowledge sources and ranks them. In this case, the definition from Wikipedia was identified to be the least complex, and hence the first occurrence of the term cerebrovascular accident in our summary will have the definition when poor blood flow to the brain results in cell death attached to it. All the concepts that are underlined in Fig. 3 are found to be complex and a corresponding definition is provided by PatientNarr.
Fig. 3

Example of our summary for Patient 149. The underlined terms were identified as being Complex by our metric; they are linked to a definition

These concepts, along with relevant verbs, are supplied as features of phrasal constituents via the operations provided by the SimpleNLG API [58], which assembles grammatical phrases in the correct order. The summary thus generated (see Fig. 3) is further enriched during our personalization process, to be explained in detail in Sect. 3.2.3.

3.2.3 Personalization

Personalization is the process of incorporating the perspectives and interest of different users in the same text. We hypothesize that including patient-specific information such as social-emotional status, preferences, and needs in a summary will encourage patients to read and understand its content, thereby making them more informed and involved in understanding and improving their health status. Depending upon the intended application, personalization can be focused on different aspects. While the health literacy of the patient is the most commonly used measure, some studies suggest that the learning style of the patient and his/her familiarity with the health issue should also be taken into account [59, 60, 61]. Our personalization algorithm takes the patient’s health literacy and their level of engagement (i.e., their motivation to take care of themselves) into consideration. In order to gather a preliminary inventory of the issues that patients usually are concerned about, we conducted open-ended interviews with 18 patients who were hospitalized with heart issues. The findings from these interviews led to the conceptualization of two other factors: the patient’s familiarity with the health issue, and their strengths/concerns, which will also be used for guiding our personalization process. Details on these four factors and other parameters that are derived from them are provided below.
  1. 1.

    Patient’s health literacy

The World Health Organization defines health literacy as the skill that determines an individual’s ability to gain access to and use information in ways that promote and maintain good health [62]. Studies have shown that 99% of Americans can read, but nearly nine out of ten adults may lack the skills needed to manage their health and prevent disease [63]. In order to assess the health literacy of the patient, we use the Rapid Estimate of Adult Literacy (REALM) test [64]. REALM consists of 66-itemed word recognition and pronunciation test. Depending upon how correctly a participant pronounces the words in the list, a score is provided. This score tells us whether the health literacy level of the patient is at third grade or below, fourth to sixth grade, seventh to eighth grade, or at high school level. The health literacy level of the patient determines the value of the literacy score parameter as shown in Table 2.
Table 2

Mapping of health literacy level to the values of literacy score parameter

Health literacy level

Literacy score

Third grade or below


Fourth to sixth grade, seventh to eighth grade


High school level


  1. 2.

    Patient engagement


In order to quantify what it means for a patient to be “engaged” in taking care of their health, we use a metric called patient activation measure (PAM) [65]. PAM consists of 13 questions and based on the response given by the patients to these questions, the patient’s level of motivation, also called the “stage of activation” can be determined. In the first stage, patients believe that their role is important but are overwhelmed and hence unable to actively participate in taking care of themselves. Patients in the second stage have the necessary knowledge for self-care but lack the confidence for self-management. Those in the third stage are more involved in maintaining lifestyle changes and know how to prevent further problems. Patients who believe that they can handle their health issues and can maintain their lifestyle even during times of stress are said to be in the fourth level. Several studies have shown that patients with a high PAM score follow their prescribed treatment and engage themselves in self-monitoring activities at home [65, 66]. The parameter PAM score is assigned a value of 1/2/3/4 depending upon the activation stage of the patient.

Apart from the single score that we obtain after the PAM questions are answered by the patient, we infer other parameters from the responses given by the patient to the individual questions in PAM. For instance, three questions—Q4, Q8, and Q9—ask patients to rate their knowledge and understanding of their health issue. We take an average of the scores for these three questions and assign it to a variable called self-efficacy score. Moreover, several guidelines2 are provided by the developers of PAM on how a patient with a particular score should be treated. These guidelines are based on the extensive research of the relationships between patient activation, patient behavior, and their health-related outcomes [67, 68]. For instance, if the PAM score for a patient is low, it means that the patient is overwhelmed with his/her health status and we need to show more empathy. On the other hand, patients who have high PAM score are willing to learn new concepts and should be encouraged to focus on maintaining their behavior. We directly map these guidelines to several parameters that can have a yes/no value: verbosity/conciseness, positive content first, show empathy, show encouragement, open-mindedness/willingness to consider new topics, and reinforce the importance of patient participation. Hence, for patients that are willing to learn new concepts, the open-mindedness/willingness to consider new topics parameter will have a value of 1. Similarly, for patients with high PAM score, the value for the show encouragement parameter is set to 1, which indicates that sentences which encourage the patient to continue taking care of their health will be included in the summary.
  1. 3.

    Patient’s familiarity with the health issue

After thoroughly analyzing the transcripts of the 18 patient interviews we collected, we found that patients who have been suffering from the health issue for some time or have someone in the family who had suffered from the same problem before use the basic disease-specific terminologies more frequently during their conversation. Based on this observation, we introduce two parameters: (a) number of years since first diagnosis and (b) history of the health issue in the family. Number of years since first diagnosis can take values 0/less than 3/greater than 3 and less than 10/greater than 10, where 3 and 10 are arbitrary thresholds that appear consistent with our interviews and we will verify in the future. History of the health issue in the family has a yes/no value.
  1. 4.

    Strengths/concerns of the patient

We are also interested in identifying the patient’s sources of motivation and how the disease has affected their lives. For this purpose, we coded 9 patient interviews for topics that patients talk about using a pure inductive, grounded theory method. We found several categories of strength/concern that most of the patients frequently mention, which are represented by the following four parameters and their possible values:
  1. 1.

    Priorities in life (family/friends/health/career/employment/finance/religion)

  2. 2.

    Changes in life because of the health issue (lifestyle/diet/mobility/physical activity/daily routine)

  3. 3.

    Means of support/sources of strength or courage (desire to get back to normal life/family/friends/support groups/caretaker)

  4. 4.

    Ability to cope with health issues (great/good/able to manage it so far/could have been better/bad)


These categories are not comprehensive and will be updated as we code more interviews.

Hence in real time, when a patient is being discharged, s/he will take the health literacy test and answer the following: (1) 13 questions from PAM, (2) questions that will assess the patient’s familiarity with the health issue, and (3) questions derived from the categories that emerged through grounded theory coding. We also introduce a parameter called familiarity with medical terms, whose value depends upon the 4 other parameters: literacy score, number of years since first diagnosis, history of the health issue in the family, and self-efficacy score. Currently, the 4 constituent parameters are combined in such a way that familiarity with medical terms can have one of the five possible values from 1 to 5. We provide maximum weight to literacy score and then to the self-efficacy score and comparatively lower weight to number of years since first diagnosis and history of the health issue in the family. The range of values that familiarity with medical terms can have, as well as the weight given to the constituent parameters, can be fine-tuned through several iterations of evaluation.

We have developed several rules, which depending upon the values of familiarity with medical terms, PAM score, and the parameters that depend upon the answers to PAM questions (as mentioned in the Patient engagement section above), inform our model on what content should and should not be included in the summary. While most of the phrases and sentences that have been used for expressing our parameters have been derived from the literature on physician-patient and nurse-patient communication [69, 70] as well as some online sources [71, 72, 73], we have also collected samples of phrases from working nursing professionals. Familiarity with medical terms controls the amount of medical details that will be included in the summary. The value of open-mindedness/willingness to consider new topics determines whether an additional link that provides more information about the health issue should or should not be included in the summary. Patient’s response to the questions that are based on the categories that emerged from the coding are explained in a relatable manner and take the PAM score into consideration (in some cases). For instance, given a question “How would you rate your ability to cope up with your health issues?” with possible answers “great/good/able to manage it so far/could have been better/too bad,” if the patient chooses a positive response, we include a phrase “We appreciate your effort in handling your health issues,” else we say “We are sorry it has been such a tough time for you”. Similarly, for a question “What changes did you have to make to your lifestyle because of the health issue?” with options “lifestyle/diet/mobility/physical activity/daily routine,” we take the patient’s choices into account and appreciate the efforts made by a patient with high PAM score by including a sentence “We appreciate that you have been making changes in your way of living, diet and physical activity for maintaining your health.” On the other hand, for a patient with low PAM score, we show empathy and reinforce the importance of their participation by using sentences like “We can understand that you have to make changes in your way of living, diet and physical activity as a result of your health condition. It is great that you are taking these important steps to improve your health”. Two versions of the personalized summaries generated by PatientNarr are shown in Fig. 4. Both versions begin by informing the patient about the reason for hospital admission. The version for patients that have high familiarity with medical terms (i.e., Fig. 4a) includes a summary of the other diagnoses in the first paragraph. It also includes details on the interventions that were made and the outcomes of the interventions in the second and third paragraph. On the other hand, the version for patients that has low familiarity with medical terms (i.e., Fig. 4b) only provides information about the health issues that were treated in the second paragraph. Similarly, the high PAM score version appreciates the patient’s efforts and encourages them to continue taking care of themselves by using sentence like “Keep up the good work” (see fourth paragraph in Fig. 4a). It also includes a link to an additional resource that the patient can refer to for getting more information about the health issue. The low PAM score version includes more empathy (see last sentence of first paragraph in Fig. 4b) and reinforces the importance of patient’s participation in improving their health.
Fig. 4

Two versions of the summary generated by PatientNarr for Patient 149 under the assumption that the values of the parameters are already known. Definitions are provided for concepts that are underlined

4 Discussion and Future Work

In this paper, we discussed some challenges that are associated with automatically generating personalized discharge summaries and outlined possible research directions for addressing these problems. The personalized summary that is generated by PatientNarr is not intended to replace the physician discharge note and education materials that are currently provided to patients during discharge. Instead, it should be considered as a supplementary material that can enhance the patient’s ability to understand their health issues, treatments they received, and the possible next steps that they need to undertake. We have used a modular approach for personalization, where the patient’s familiarity with the health terms and their response to PAM questions are separately responsible for regulating different aspects of our summaries. We provide more empathy and encouragement to patients who are overwhelmed, while we encourage those patients who are already confident in their ability to take care of themselves to continue with their efforts. Decisions on whether a term should be explained or not and the amount of detail that should be provided in the summary are influenced by the patient’s familiarity with health terms. Our motivation behind incorporating these features into our summaries is to empower patients by providing their hospitalization information in such a form that they can understand and relate to it. Since our summaries also contain information about the procedures that were performed by the nurses for taking care of the patient in the hospital, patients who need to continue self-care after being discharged can benefit from them.

In addition to the tasks that we have focused on so far, several future research directions can further contribute towards the goal of generating patient-friendly hospitalization summaries. In order to make the summaries natural and non-repetitive, we need to create more variation in the style and structure of the sentences that are present in the summary. One step towards this is to identify and/or create the resources that can provide examples of the phrases/sentences that reflect different intensities of empathy and encouragement. We can also introduce additional personalization parameters and consider fine-grained value ranges so that many more versions of summaries can be generated. However, one should always be mindful of the amount of time and effort that a patient will need to invest for providing all the information that is required by the personalization parameters. Further, assessing the generalizability of some existing work in non-medical domains in recognizing the personality of the users through conversation and producing affective text [74, 75] may result in better user models and user-friendly content. Similarly, identifying important content from the Electronic Health Records of the patients (as is done in [76]) and using it to add relevant context or to inform patients about the progress of their health issues over time can help make the summaries more coherent.

Apart from making decisions on the content that will be included in the summaries, we should also be careful about its length. Even though the summaries contain significantly less medical content as compared to the original physician and nursing documentation, their length will definitely increase because of the additional content on patient’s preferences that will be included. This can be intimidating for patients, who may prefer not reading the summaries [77], despite the content being comparatively easier. Hence, we need to determine a reasonable method for prioritizing the content that should be included in the summaries. Another important aspect that should be taken into consideration is the evaluation of the automatically generated summaries. Even though metrics like BLEU, METEOR, and ROUGE are widely used in most of the existing machine translation and summarization systems [78, 79], these methods are corpus-based and require a large number of gold-standard data against which comparisons can be made. Since preparing such gold-standards for cases like ours, where the content is dependent upon the collective influence of a large number of parameters is not trivial, we can make use of subjective human judgments. It is very important to test the summaries on the real patients for whom the summaries are intended. We can conduct randomized controlled trials contrasting personalized summaries with non-personalized discharge summaries, stratified by patient characteristics including health literacy, education level, etc. However, while still in the process of developing and improving our summarization system, we can make use of other resources that are more readily accessible. For instance, in order to determine whether our summaries have proper representation of the important medical concepts, we can ask experts (i.e., doctors and nurses in our case) to highlight the content in the original document that they consider important to be included in the automatically generated summaries. We can then compare the presence/absence of those concepts in the automatically generated summaries and improve our summarization algorithm to include more information, if necessary. Similarly, in order to assess the general preferences of patients, we can recruit patient advisors (i.e., individuals who are patient themselves and are involved in research) and obtain their opinions on various aspects like readability, consistency of style and formatting, and clarity of the language used in the summaries. We can also make use of crowd-sourcing platforms (like CrowdFlower or Amazon Mechanical Turk) in order to collect the opinion of the layman for evaluating other aspects of the summaries that do not require medical knowledge.


  1. 1.

    Since the Consumer Health Vocabulary (CHV), a resource that can be used for translating technical terms to consumer friendly language, is already included in UMLS, some of the definitions that we obtain are from this source. However, since CHV was found to cover only 14% of the medical concepts present in our dataset, we use the definitions from other vocabularies in UMLS as well.

  2. 2.


Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflict of interest.


  1. 1.
    Sbaffi L, Rowley J (2017) Trust and credibility in web-based health information: a review and agenda for future research. J Med Internet Res 19(6):e2018CrossRefGoogle Scholar
  2. 2.
    Anthony F, Holden RJ (2016) Consumer health informatics: empowering healthy-lifestyle-seekers through mHealth. Prog Cardiovasc DisGoogle Scholar
  3. 3.
    Gordon NP, Hornbrook MC (2016) Differences in access to and preferences for using patient portals and other eHealth technologies based on race, ethnicity, and age: a database and survey study of seniors in a large health plan. J Med Internet Res 18(3):e50CrossRefGoogle Scholar
  4. 4.
    U.S. Department of Health and Human Services (2000) Healthy people 2010. Washington, DC: U.S. government printing office. Originally developed for Ratzan SC, Parker RM. 2000. Introduction. In National Library of Medicine Current Bibliographies in Medicine: Health Literacy. Selden CR, Zorn M, Ratzan SC, Parker RM, Editors. NLM pub. No. CBM 2000–1. Bethesda, MD: National Institutes of Health, U.S. Department of Health and Human ServicesGoogle Scholar
  5. 5.
    Kutner M, Greenburg E, Jin Y, Paulsen C (2006) The health literacy of America's adults: results from the 2003 National Assessment of adult literacy, NCES 2006–483, National Center for Education Statistics.Google Scholar
  6. 6.
    US Department of Health and Human Services. America’s health literacy (2008) Why we need accessible health information. In: An issue brief from the US Department of Health and Human ServicesGoogle Scholar
  7. 7.
    Choudhry AJ, Baghdadi YM, Wagie AE, Habermann EB, Heller SF, Jenkins DH, Cullinane DC, Zielinski MD (2016) Readability of discharge summaries: with what level of information are we dismissing our patients? Am J Surg 211(3):631–636CrossRefGoogle Scholar
  8. 8.
    Centers for Disease Control and Prevention (2009) Simply put: a guide for creating easy-to-understand materials. Centers for Disease Control and Prevention, Atlanta, GeorgiaGoogle Scholar
  9. 9.
    Funnell MM (2000) Helping patients take charge of their chronic illnesses. Fam Pract Manag 7(3):47–51Google Scholar
  10. 10.
    Cain CH, Neuwirth E, Bellows J, Zuber C, Green J (2012) Patient experiences of transitioning from hospital to home: an ethnographic quality improvement project. J Hosp Med 7(5):382–387CrossRefGoogle Scholar
  11. 11.
    Shapiro J (1993) The use of narrative in the doctor-patient encounter. Family Syst Med 11(1):47–53CrossRefGoogle Scholar
  12. 12.
    Riegel B, Lee CS, Albert N, Lennie T, Chung M, Song EK, Bentley B, Heo S, Worrall-Carter L, Moser DK (2011) From novice to expert: confidence and activity status determine heart failure self-care performance. Nurs Res 60(2):132–138CrossRefGoogle Scholar
  13. 13.
    McGinnis JM, Stuckhardt L, Saunders R, Smith M (eds) (2013) Best care at lower cost: the path to continuously learning health care in America. National Academies Press, Washington, D.C.Google Scholar
  14. 14.
    West (2017) Strengthening chronic care patient engagement strategies for better management of chronic conditions. Last accessed on 3/27/2018Google Scholar
  15. 15.
    Di Eugenio B, Boyd AD, Lugaresi C, Balasubramanian A, Keenan GM, Burton M, Macieira TGR, Lopez KD, Friedman C, Li J et al (2014) PatientNarr: towards generating patient-centric summaries of hospital stays. INLG 2014:6Google Scholar
  16. 16.
    Acharya S, Di Eugenio B, Boyd AD, Lopez KD, Cameron R, Keenan GM (2016) Generating summaries of hospitalizations: a new metric to assess the complexity of medical terms and their definitions. In The 9th international natural language generation conference, page 26.Google Scholar
  17. 17.
    Pocklington C, Al-Dhahir L (2011) A comparison of methods of producing a discharge summary: handwritten vs. electronic documentation. BJMP 4(3):a432Google Scholar
  18. 18.
    Kind AJ, Smith MA (2008) Documentation of mandated discharge summary components in transitions from acute to subacute careGoogle Scholar
  19. 19.
    Paul S (2008) Hospital discharge education for patients with heart failure: what really works and what is the evidence? Crit Care Nurse 28(2):66–82Google Scholar
  20. 20.
    García-Hernández RA, Ledeneva Y (2009) Word sequence models for single text summarization. In Advances in Computer-Human Interactions, 2009. ACHI'09. Second International Conferences on (pp. 44–48). IEEEGoogle Scholar
  21. 21.
    Zhang Y, Zincir-Heywood N, Milios E (2004) Term-based clustering and summarization of web page collections. In: Conference of the Canadian Society for Computational Studies of Intelligence. Springer, Berlin, Heidelberg, pp 60–74Google Scholar
  22. 22.
    Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) Text summarization techniques: a brief survey. arXiv preprint arXiv 1707:02268Google Scholar
  23. 23.
    Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41CrossRefGoogle Scholar
  24. 24.
    Nallapati R, Zhou B, Gulcehre C, Xiang B (2016) Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv 1602:06023Google Scholar
  25. 25.
    Paulus R, Xiong C, Socher R (2017) A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 Google Scholar
  26. 26.
    Belz A (2008) Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Nat Lang Eng 14(4):431–455CrossRefGoogle Scholar
  27. 27.
    Kumar M, Das D, Agarwal S, Rudnicky AI (2009) Non-textual event summarization by applying machine learning to template-based language generation. In Proceedings of the 2009 Workshop on Language Generation and Summarisation (pp. 67–71). Association for Computational LinguisticsGoogle Scholar
  28. 28.
    Becher M, Endres-Niggemeyer B, Fichtner G (2002) Scenario forms for web information seeking and summarizing in bone marrow transplantation. In proceedings of the 2002 conference on multilingual summarization and question answering-Volume 19 (pp. 1–8). Association for Computational LinguisticsGoogle Scholar
  29. 29.
    Feblowitz JC, Wright A, Singh H, Samal L, Sittig DF (2011) Summarization of clinical information: a conceptual model. J Biomed Inform 44(4):688–699CrossRefGoogle Scholar
  30. 30.
    Mahamood S, Reiter E (2011) Generating affective natural language for parents of neonatal infants. In: In proceedings of the 13th European workshop on natural language generation, pages 12–21. September. Association for Computational Linguistics, NancyGoogle Scholar
  31. 31.
    Scott D, Hallett C, Fettiplace R (2013) Data-to-text summarisation of patient records: using computer-generated summaries to access patient histories. Patient Educ Couns 92(2):153–159CrossRefGoogle Scholar
  32. 32.
    Díaz A, Gervás P (2007) User-model based personalized summarization. Inf Process Manag 43(6):1715–1734CrossRefGoogle Scholar
  33. 33.
    Kumar C, Pingali P, Varma V (2008) Generating personalized summaries using publicly available web documents. In: proceedings of the 2008 IEEE/WIC/ACM international conference on web intelligence and international conference on intelligent agent technology. pp 103–106Google Scholar
  34. 34.
    Hibbard JH, Greene J, Tusler M (2009) Improving the outcomes of disease management by tailoring care to the patient's level of activation. Am J Managed Care 15(6):353{360Google Scholar
  35. 35.
    Jones R, Pearson J, McGregor S, Cawsey AJ, Barrett A, Craig N, McEwen J (1999) Randomised trial of personalised computer based information for cancer patients. Bmj 319(7219):1241–1247CrossRefGoogle Scholar
  36. 36.
    Rodgers H, Bond S, Curless R (2001) Inadequacies in the provision of information to stroke patients and their families. Age Ageing 30(2):129–133CrossRefGoogle Scholar
  37. 37.
    Buchanan BG, Moore JD, Forsythe DE, Carenini G, Ohlsson S, Banks G (1995) An intelligent interactive system for delivering individualized information to patients. Artif Intell Med 7(2):117–154CrossRefGoogle Scholar
  38. 38.
    Jimison HB, Fagan LM, Shachter R, Shortli_e EH (1992) Patient-specific explanation in models of chronic disease. Artif Intell Med 4(3):191–205CrossRefGoogle Scholar
  39. 39.
    DiMarco C, Hirst G, Wanner L, Wilkinson J (1995) HealthDoc: customizing patient information and health education by medical condition and personal characteristics. In Workshop on Artificial Intelligence in Patient Education.Google Scholar
  40. 40.
    Gatt A, Portet F, Reiter E, Hunter J, Mahamood S, Moncur W, Sripada S (2009) From data to text in the neonatal intensive care unit: using NLG technology for decision support and information management. AI Commun 22(3):153–186MathSciNetGoogle Scholar
  41. 41.
    Mairesse F, Walker MA (2011) Controlling user perceptions of linguistic style: trainable generation of personality traits. Comput Linguist 37(3):455–488CrossRefGoogle Scholar
  42. 42.
    Chun S, Tu JV, Wijeysundera HC (2012) Lifetime analysis of hospitalizations and survival of patients newly-admitted with heart failure. Circ Heart Fail 5(4):414–421CrossRefGoogle Scholar
  43. 43.
    Education, K. P. (2010). Reducing hospital readmissions with enhanced patient educationGoogle Scholar
  44. 44.
    Ong E, Damay J, Lojico G, Lu K, Tarantan D (2007) Simplifying text in medical literature. J Res Sci Comput Eng 4(1)Google Scholar
  45. 45.
    Zeng-Treitler Q, Goryachev S, Kim H, Keselman A, Rosendale D (2007) Making texts in electronic health records comprehensible to consumers: a prototype translator. In AMIA, pages 846{50.Google Scholar
  46. 46.
    Kauchak D, Leroy G (2016) Moving beyond readability metrics for health-related text simplification. IT Professional 18(3):45{51CrossRefGoogle Scholar
  47. 47.
    Kauchak D, Mouradi O, Pentoney C, Leroy G (2014) Text simpli_cation tools: using machine learning to discover features that identify di_cult text. In 2014 47th Hawaii international conference on system sciences, pages 2616{2625. IEEE.Google Scholar
  48. 48.
    Di Eugenio B, Lugaresi C, Keenan GM., Lussier YA, Li J, Burton MD, Friedman C, Boyd AD (2013) HospSum: integrating physician discharge notes with coded nursing care data to generate patient-centric summaries. In AMIA,Google Scholar
  49. 49.
    Roussi K, Soussa V, Lopez KD, Balasubramanian A, Keenan GM, Burton M, Bahroos N, Di Eugenio B, Boyd A (2015) Are we talking about the same patient? In IOS PressGoogle Scholar
  50. 50.
    Boyd AD, Lopez KD, Lugaresi C, Macieira T, Sousa V, Acharya S et al (2018) Physician nurse care: a new use of UMLS to measure professional contribution: are we talking about the same patient a new graph matching algorithm? Int J Med Inform 113:63–71CrossRefGoogle Scholar
  51. 51.
    Keenan GM, Stocker JR, Geo-Thomas AT, Soparkar NR, Barkauskas VH, Lee JL (2002) The hands project: studying and refining the automated collection of a cross-setting clinical data set. Comput Inform Nurs 20(3):89–100CrossRefGoogle Scholar
  52. 52.
    NNN (2014) Knowledge-based terminologies defining nursing,
  53. 53.
    Friedman C, Shagina L, Lussier Y, Hripcsak G (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11(5):392{402CrossRefGoogle Scholar
  54. 54.
    NIH (2011) Umls quick start guide. Unified Medical Language System, research/ umls/ quickstart.html, Last accessed on 10/10/2016
  55. 55.
    CHIRR. Health literacy (2012) Consumer health informatics research resource, health-literacy.Php.
  56. 56.
    Ramesh BP, Houston TK, Brandt C, Fang H, Yu H (2013) Improving patients’ electronic health record comprehension with NoteAid. In MedInfo, pages 714-718.Google Scholar
  57. 57.
    Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41CrossRefGoogle Scholar
  58. 58.
    Gatt A, Reiter E (2009) Simplenlg: a realisation engine for practical applications. In Proceedings of the 12th European Workshop on Natural Language Generation, pages 90–93. Association for Computational Linguistics.Google Scholar
  59. 59.
    Binsted K, Cawsey A, Jones R (1995) Generating personalised patient information using the medical record. In Conference on Arti_cial intelligence in medicine in Europe, pages 29{41. Springer.Google Scholar
  60. 60.
    Giuse NB, Koonce TY, Storrow AB, Kusnoor SV, Ye F (2012) Using health literacy and learning style preferences to optimize the delivery of health information. J Health Commun 17(sup3):122{140CrossRefGoogle Scholar
  61. 61.
    Ascione FJ, Kirscht JP, Shimp LA (1986) An assessment of different components of patient medication knowledge. Med Care 24:1018–1028CrossRefGoogle Scholar
  62. 62.
    Nutbeam D (1998) Health promotion glossary1. Health Promot Int 13(4):349–364CrossRefGoogle Scholar
  63. 63.
    Quick guide to health literacy, factsbasic.htm. Last accessed on 11/20/2017
  64. 64.
    Davis TC, Long SW, Jackson RH, Mayeaux E, George RB, Murphy PW, Crouch MA (1993) Rapid estimate of adult literacy in medicine: a shortened screening instrument. Fam Med 25(6):391–395Google Scholar
  65. 65.
    Hibbard JH, Mahoney ER, Stockard J, Tusler M (2005) Development and testing of a short form of the patient activation measure. Health Serv Res 40(6p1):1918–1930CrossRefGoogle Scholar
  66. 66.
    Lorig K, Ritter PL, Laurent DD, Plant K, Green M, Jernigan VBB, Case S (2010) Online diabetes self-management program a randomized study. Diabetes Care 33(6):1275–1281CrossRefGoogle Scholar
  67. 67.
    Mosen DM, Schmittdiel J, Hibbard J, Sobel D, Remmers C, Bellows J (2007) Is patient activation associated with outcomes of care for adults with chronic conditions? J Ambul Care Manage 30(1):21–29CrossRefGoogle Scholar
  68. 68.
    Greene J, Hibbard JH (2012) Why does patient activation matter? An examination of the relationships between patient activation and health-related outcomes. J Gen Intern Med 27(5):520–526CrossRefGoogle Scholar
  69. 69.
    Arnold E, Boggs K (2006) Interpersonal relationships: professional communication skills for nurses. Elsevier Science Health Science 504:29–30Google Scholar
  70. 70.
    Cassell EJ (1985) Talking with patients: the theory of doctor-patient communication. Vol:1Google Scholar
  71. 71.
    Examples-Empathetuc Statements to Use, PatientSafety/DisclosureResources/Appendix-2-Examples-Empathetic-Statements-to-Use. Last accessed on 11/20/2017
  72. 72.
    Empathy 101: how to sound like you give a damn, 24/empathy-101. Last accessed on 11/20/2017
  73. 73.
    Incorporate empathy in patient interactions, incorporate-empathy-patient-interactions.html. Last accessed on 11/20/2017
  74. 74.
    Ghosh S, Chollet M, Laksana E, Morency LP, Scherer S (2017) Affect-lm: a neural language model for customizable affective text generation. arXiv preprint arXiv:1704.06851 Google Scholar
  75. 75.
    Li J, Galley M, Brockett C, Spithourakis GP, Gao J, Dolan B (2016) A persona-based neural conversation model. arXiv preprint arXiv:1603.06155 Google Scholar
  76. 76.
    Van Vleck TT, Stein DM, Stetson PD, Johnson SB (2007) Assessing data relevance for automated generation of a clinical summary. In AMIA annual symposium proceedings (Vol. 2007, p. 761). American Medical Informatics AssociationGoogle Scholar
  77. 77.
    Williams MV, Parker RM, Baker DW, Parikh NS, Pitkin K, Coates WC, Nurss JR (1995) Inadequate functional health literacy among patients at two public hospitals. Jama 274(21):1677–1682CrossRefGoogle Scholar
  78. 78.
    Inouye D, Kalita JK (2011) Comparing twitter summarization algorithms for multiple post summaries. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference pp. 298–306. IEEE Google Scholar
  79. 79.
    Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Sabita Acharya
    • 1
  • Andrew D. Boyd
    • 2
  • Richard Cameron
    • 3
  • Karen Dunn Lopez
    • 2
  • Pamela Martyn-Nemeth
    • 4
  • Carolyn Dickens
    • 4
  • Amer Ardati
    • 5
  • Jose D. FloresJr
    • 6
  • Matt Baumann
    • 6
  • Betty Welland
    • 6
  • Barbara Di Eugenio
    • 1
  1. 1.Department of Computer Science, College of EngineeringUniversity of Illinois at ChicagoChicagoUSA
  2. 2.Department of Biomedical and Health Information Sciences, College of Applied Health SciencesUniversity of Illinois at ChicagoChicagoUSA
  3. 3.Department of Linguistics, College of Liberal Arts and SciencesUniversity of Illinois at ChicagoChicagoUSA
  4. 4.Department of Biobehavioral Health Science, College of NursingUniversity of Illinois at ChicagoChicagoUSA
  5. 5.Division of Cardiology, College of MedicineUniversity of Illinois at ChicagoChicagoUSA
  6. 6.Cardiology patient advisors, College of Applied Health SciencesUniversity of Illinois at ChicagoChicagoUSA

Personalised recommendations