Introduction

In clinical practice, consultations with healthcare providers are often short. In the case of poor emotional well-being, e.g. depressive symptoms, there is limited time available for in-depth discussion. Questionnaires that measure patient-reported outcomes (PROs), so called patient-reported outcome measures (PROMs), can be of help. A PRO was defined by the US Food and Drug Administration (FDA) as ‘any report of the status of a patient’s health condition that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else’ [1] (Text box 1). PROMs measuring physical and psychosocial aspects of health and quality of life (QOL) such as physical function or depression, offer complementary information to clinical outcomes such as HbA1c, and can be used to inform people with diabetes about the expected course of disease and treatment, for shared decision making, monitoring outcomes and to improve healthcare [2]. Using PROMs does not need to lengthen the consultation time [3].

figure b

To optimally benefit from using PROMs in research or clinical practice, PROMs should measure those outcomes that are most relevant to people with diabetes. Several initiatives have tried to identify which PROs are most relevant for people with diabetes. An international consortium of people with diabetes, healthcare providers and other relevant stakeholders developed an agreed minimum set of outcomes to be measured in all clinical trials in people with type 2 diabetes (called a core outcome set [COS]). They recommend measuring global QOL and activities of daily living in all clinical trials [4]. The International Consortium for Health Outcomes Measurement (ICHOM) developed a standard set of outcomes to be measured in clinical practice in people with type 1 or type 2 diabetes. They recommend measuring psychological well-being, diabetes distress and depression [5]. Other initiatives recommend yet different PROs [6,7,8,9]. Although ‘quality of life’ is often recommended [8], this concept is defined very differently by different people [10]. There are many different questionnaires available that aim to measure QOL or (aspects of) health-related QOL (HRQL); some are generic, some are disease-specific, and they measure many different things, not always restricted to PROs [11]. Furthermore, the validity, reliability and responsiveness to change over time of many of the questionnaires is often unclear or not sufficient [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26].

The aim of this review is to provide recommendations on the most commonly relevant PROs for adult people with diabetes to measure in clinical practice and research, and good quality PROMs to measure these PROs. We first provide a general conceptual framework of PROs and PROMs. Second, we present a narrative overview of the literature on which PROs are most relevant to measure in people with diabetes. Third, we present an overview of which PROMs have been used in studies involving people with diabetes and what is known about the quality of these PROMs in terms of validity, reliability and responsiveness. In addition, we suggest several well-validated generic PROMs that could be used in people with diabetes. Finally, we provide recommendations and suggestions for the use of PROMs in clinical practice and research.

A conceptual framework of PROs and PROMs

There is considerable heterogeneity in the definition and operationalisation of the terms ‘QOL’, ‘HRQL’, ‘PRO’ and ‘PROM’ between and within studies [10]. In Text box 1, we provide an overview of commonly used terms and definitions, which we adopted in this paper. We adopt the original definition of a PRO of the FDA [1], which has also been adopted by the European Medicines Agency (EMA). PROs therefore refer to health outcomes, including physical, mental, and social symptoms and functioning. Non-health-related constructs, such as overall QOL (which is broader than health), satisfaction, eating behaviour and stigma, are not considered PROs according to the original FDA definition.

It is important to conceptualise how different health and QOL outcomes interrelate. Different models have been proposed in the literature. One commonly used model was developed by Wilson and Cleary [27], who distinguish different levels of health and QOL outcomes and relate them to characteristics of the individual and the environment. For illustration, we placed several relevant health and QOL outcomes and characteristics of the individual and the environment for people with diabetes in the Wilson and Cleary model (Fig. 1).

Fig. 1
figure 1

Several examples of relevant health and QOL outcomes for people with diabetes (list is not exhaustive) placed in the model of Wilson and Cleary [27]. This figure is available as a downloadable slide

In this model, biological and physiological variables, symptoms, functional status and general health perceptions are considered aspects of health status. Overall QOL is broader than health.

Aspects of health can be thought of as existing on a continuum of increasing biological, social and psychological complexity. Starting from the left-hand side (Fig. 1) are biological and physiological aspects of health such as HbA1c, hypoglycaemia, glucose variability or blood pressure. These are measured with clinical measurement instruments, such as glucose sensors, laboratory tests, physical examination, vision tests and imaging techniques.

A biological or physiological abnormality or defect, such as the inability of the pancreas to make insulin, can lead to symptoms, referring to how a patient feels. These could be physical symptoms, such as pain or blurred vision, or emotional or psychological symptoms, such as fear, worry and depressive symptoms. Symptoms are PROs and should be measured with PROMs.

Symptoms can lead to limitations in how an individual functions, in terms of physical function, mental function and social/role function (e.g. performing a job). Functional status can be measured with PROMs, by asking about perceived limitations in functioning, but also with performance-based tests, such as walking tests.

General health perceptions, which refer to the PRO ‘perceived overall health’, are often measured with a single question, e.g. ‘how would you rate your overall health?’, which is a PROM. Finally, overall QOL includes aspects of health, but is broader and also includes factors not related to health, such as material comforts, personal safety and satisfaction with life in general. Overall QOL is actually not a PRO, although (some of) its components can be PROs. Therefore, questionnaires measuring overall QOL are not considered PROMs according to the FDA definition. Wilson and Cleary use the term HRQL as an umbrella term, including symptoms, functional status and general health perceptions [27].

Finally, the model shows that health and QOL outcomes are influenced by contextual factors, i.e. personal factors such as personality, behaviour (diet, medication adherence and physical activity) and coping mechanisms, and environmental factors, such as social support, social stigma and financial aspects.

Commonly used questionnaires in the diabetes field measure different things. Some questionnaires focus on only one level, e.g. symptoms, while others measure outcomes at multiple levels of health, especially questionnaires that aim to measure ‘QOL’ or ‘HRQL’. Some questionnaires classify different outcomes in different subscales, e.g. one subscale for symptoms and another subscale for physical function, but others (undesirably) combine outcomes from different levels into one scale. Many questionnaires include questions or subscales measuring PROs but also contextual factors [11]. These questionnaires are therefore not (entirely) PROMs. Lack of distinction between health and non-health outcomes, between health outcomes and contextual factors, and between PROMs and other questionnaires, results in confusion on what is being measured, lack of content validity of PROMs, difficulty selecting the best PROM for a given study or clinical application, and inability to study causal relationships between health outcomes or the relationship between contextual factors, health outcomes and overall QOL. Text box 2 provides illustrations of measurement issues we encountered in performing systematic reviews of PROMs in people with diabetes [11, 24, 25].

figure c

Researchers and clinicians should be aware of the differences between clinical outcomes, PROs, contextual factors and patient experiences, and the fact that all of these concepts are often included in questionnaires or subscales that aim to measure HRQL, QOL or PROs. An illustration is provided in Text box 3. This situation hampers clear interpretation of what is being measured and is a threat to the validity of diabetes research. We cannot, for example, study the influence of self-care behaviour on physical and psychological functioning in people with diabetes if these concepts are measured in one scale and summarised into one score. We cannot appropriately perform or interpret the results of meta-analyses of studies on the effects of certain medication on HRQL, if the HRQL instruments measure all kind of different concepts, some of them not even related to health. All the concepts shown in Fig. 1 can be important to measure, but it is confusing if they are all called PROs or HRQL, and they should not be combined into one scale score.

figure d

Most relevant PROs to measure in people with diabetes

It is not clear which PROs are most relevant to measure in diabetes research and clinical practice. Qualitative studies revealed a large number of outcomes considered important by people with diabetes [28,29,30]. No explicit distinction was made in these studies between PROs, contextual factors and other outcomes, although Dodd et al classified outcomes using the Core Outcome Measures in Effectiveness Trials (COMET) taxonomy [31], where PROs are classified in the category ‘life impact’.

Relevant international guidelines differ in PROs being recommended. Many recommendations state the importance of psychosocial problems. However, a distinction between psychosocial functioning (a PRO) and psychosocial well-being (broader than a PRO) is not made. Harman et al developed a COS to be measured and reported, as a minimum, in all clinical trials in people with type 2 diabetes [4]. A COS often contains PROs but also other relevant (clinical) outcomes. The COS was developed in a Delphi survey with healthcare professionals, people with type 2 diabetes, researchers in the field and healthcare policymakers. Recommended core outcomes to be reported by patients were ‘global QOL’ and ‘activities of daily living’. Global QOL was defined as ‘someone’s overall quality of life, including physical, mental and social well-being’ [4]. This is actually not a PRO because it is broader than health. Activities of daily living was defined as ‘being able to complete usual everyday tasks and activities, including those related to personal care, household tasks or community-based tasks’ [4]. This refers to a PRO.

The ICHOM consortium developed a standard set for people with diabetes types 1 and 2, to be used in clinical practice, also using a consensus approach among experts. It does not state whether people with diabetes were involved. Recommended outcomes are psychological well-being, diabetes distress and depression. Only the latter two are PROs [5]. This recommendation is in line with recommendations from the ADA and the EASD, which state that providers should consider diabetes distress, depression, anxiety, disordered eating (which is not a PRO), cognitive capacities and chronic pain [6, 32,33,34,35].

Differences in recommendations are at least partly due to different aims, methodology, and (lack of) involvement of people with diabetes. For example, a COS includes only a minimum set of outcomes to be measured and reported in every clinical trial, while other guidelines might include outcomes that could be relevant to measure in addition in specific trials or in clinical practice.

In summary, there is consensus that the PRO ‘activities of daily living’ (which is conceptually similar to physical function) should be measured in all diabetes trials. There is less consensus on which PROs are additionally relevant to measure in specific trials and which PROs are relevant to measure routinely in clinical practice.

In the meantime, there is increasing evidence from several initiatives that some PROs are relevant for many people, irrespective of their disease (Text box 4) [36,37,38]. Symptoms such as pain, fatigue or depression are common across diseases. Furthermore, being able to carry out daily activities and social roles is important to most people. The Patient-Reported Outcomes Measurement Information System (PROMIS) domain framework was developed to capture commonly relevant PROs across three broad aspects of physical, mental and social health based on the WHO definition of health [37, 39]. It was developed through literature reviews of well-established instruments, a consensus-building Delphi process among health outcomes experts and statistical analysis. Patients were not involved in the development of the conceptual model, although patient input was captured by reviewing instruments that were developed with patient input [40]. Five subdomains were selected as the initial areas for PROMIS item bank construction: fatigue, pain, emotional distress (later divided into depression, anxiety and anger), physical functioning and social role participation [37].

figure e

Kroenke et al developed a taxonomy of key pragmatic decisions related to PROM implementation based on literature review, but without patient input [38]. One of the pragmatic issues they address is the selection of generic vs disease-specific PROMs. They noted that some domains are crosscutting in the fact that they occur frequently and often cluster across the majority of medical and mental health disorders, including fatigue, pain, depression, anxiety, sleep and physical function [38].

Terwee et al extracted all PROs and recommended PROMs from 39 ICHOM Standard Sets [36]. Many of these sets were developed with patient input, but not all. More than 300 PROs were categorised into 22 unique PRO concepts. The most commonly included PROs were ability to participate in social roles, physical function, HRQL, pain, depression, general mental health, anxiety and fatigue [36]. The COMET initiative identified similar common PROs included in COS for trials (Text box 4).

In the Netherlands, a national consensus set of PROs and PROMs was recently developed for routine use in Dutch medical specialist care, based on the above mentioned initiative and others, as well as input from patients, healthcare providers and representatives of healthcare organisations. The selected PROs were fatigue, pain, depression, anxiety, physical functioning, social role participation and overall health [41].

Based on these initiatives, we recommend considering the commonly relevant PROs mentioned above to measure in people with diabetes (both type 1 and type 2). These PROs can be supplemented with relevant diabetes-specific symptoms. For example, the WHO report on diabetes lists frequent urination, thirst, feeling hungry (even though you are eating), blurry vision, weight loss (type 1) and tingling hands/feet (type 2) as relevant symptoms of diabetes [42]. In addition, other relevant PROs that are commonly measured could be considered, such as diabetes distress and fear of hypoglycaemia.

Best PROMs for use in people with diabetes

It is very challenging to identify the best PROMs to measure the above suggested PROs in people with diabetes. At least 16 systematic reviews have been published summarising the available PROMs and their measurement properties for people with diabetes [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]. These reviews vary in quality and completeness, while some included selected groups (i.e. only type 1 or type 2 or people with amputations), some focused on only one PRO (e.g. depression), and some were conducted over 10 years ago. As a result, the identified PROMs, evaluation methods, conclusions and recommendations of these reviews vary.

Our systematic review by Langendoen-Gort et al provides the most recent overview of existing PROMs, published up to 31 December 2021, that aim to measure (aspects of) HRQL and that have been validated to at least some extent in people with type 2 diabetes [11]. We identified 116 questionnaires. Not all of these questionnaires actually measure PROs. About half (61) of the 116 questionnaires (also) include items or subscales measuring characteristics of the individual (e.g. aspects of personality and coping) or environment (e.g. social or financial support), or patient experiences and treatment satisfaction. Eight out of the 116 questionnaires measured no PRO at all, even though they claim to measure HRQL [11]. No recommendations were provided on the best PROMs because the measurement properties of the PROMs were not assessed in this review.

The international COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative developed consensus-based standards and criteria for assessing the quality of PROMs [43, 44]. Nine measurement properties are considered important for PROMs: content validity, structural validity, internal consistency, construct validity, reliability, measurement error, cross-cultural validity, criterion validity (only for comparing different versions of the same PROM) and responsiveness (Table 1) [45]. According to COSMIN, the most important measurement property is content validity [46]. In a second review, we assessed the content validity of 54 of the above mentioned 116 PROMs, containing 150 subscales that were specifically developed for people with type 2 diabetes. Using COSMIN methodology [46], we assessed whether all PROM items measure relevant aspects of the construct the PROM (scale) aims to measure, whether no important aspects are missing and whether the items are interpreted by the person as intended. Most previous reviews did not evaluate content validity, or not in as much detail as the COSMIN methodology recommends. We showed that content validity was rated as sufficient for only 41 out of the 150 (27%) PROM subscales [25]. In Table 1 we provide a narrative summary of the relevant evidence on all measurement properties of these PROM subscales, excluding single items and scales developed for subgroups of people with diabetes (e.g. foot ulcers), classified according to the Wilson and Cleary model [27]. Evidence on the measurement properties other than content validity was extracted from the 16 reviews described above as well as from several main validation papers of the PROMs. We did not find such an evidence synthesis for type 1 diabetes.

Table 1 Measurement properties of type 2 diabetes-specific (subscales of) PROMs with sufficient content validity

Table 1 shows that none of the existing diabetes-specific PROM scales have been sufficiently validated. COSMIN states that PROMs with evidence for sufficient content validity (any level) and at least low evidence for sufficient structural validity and internal consistency have the potential to be recommended for use [44]. In addition, evidence on reliability (small measurement error) is important, especially for PROMs used in clinical practice. All PROMs measuring disease-specific symptoms showed positive results for internal consistency, but these results cannot be interpreted properly if evidence that the scale is unidimensional is lacking [47]. Also, important information on test–retest reliability and responsiveness is lacking. The Diabetes Symptom Self-Care Inventory (DSSCI) is most promising for measuring diabetes-specific symptoms because it has the best evidence for content validity. For measuring diabetes distress, the Diabetes Distress Scale (DDS) and the Problem Areas in Diabetes (PAID) scale are most promising based on content validity.

For the diabetes-specific PROMs or subscales measuring fatigue, anxiety, physical function, sexual function, emotional function, social function and overall health, evidence on structural validity, test–retest reliability and responsiveness is missing. Therefore, none of these PROMs can be recommended. Considering that these PROs are commonly relevant across medical conditions (Text box 4) and that the content of disease-specific PROMs and generic PROMs measuring the same PRO are often very similar, we recommend using generic PROMs for these PROs.

For these commonly relevant PROs, high-quality generic PROMs exist that are applicable across populations and diseases (Table 2). Not all of these generic PROMs have been validated in people with diabetes (Table 2), but since they showed good measurement properties in other chronic conditions, it may be reasonable to assume that they will also perform well in people with diabetes. We discuss three generic PROMs that are widely used and tested: the 36-Item Short Form Health Survey (SF-36) [48], the WHO Disability Assessment Schedule 2.0 (WHODAS 2.0) [49] and the PROMIS measurement system [39].

Table 2 Commonly used, well-validated generic PROMs for measuring common PROs

The SF-36, developed in 1992, is the most commonly used generic PROM in the world. It has been extensively validated across medical conditions, illustrated by more than 300 systematic reviews of measurement properties of instruments including this PROM [50]. The SF-36 contains 36 items, divided into eight subscales, measuring physical functioning, bodily pain, role limitations due to physical health problems, role limitations due to personal or emotional problems, emotional well-being, social functioning, energy/fatigue and general health perceptions. Although content validity, structural validity and internal consistency have not been assessed in people with diabetes, evidence for sufficient construct validity and responsiveness has been found in people with diabetes (e.g. Huang et al [51] and Ahroni and Boyko [52]).

The WHODAS 2.0 is a generic instrument covering several domains of function and participation, directly linked to the International Classification of Functioning, Disability and Health (ICF). The original WHO/DAS was published in 1988 and WHODAS 2.0 in 2010 [49]. It includes 36 items, divided into six subscales measuring cognition, mobility, self-care, getting along, life activities and participation [53]. WHODAS 2.0 has been used in large epidemiological studies in people with diabetes [54, 55] but has not been validated in people with diabetes.

The development of PROMIS started in 2004. PROMIS consists of ‘item banks’ instead of fixed PROMs, which has many advantages. An item bank is a large set of items that all measure one PRO (e.g. physical function) and that are ordered on a metric using psychometric methods based on item response theory (IRT) methods [56]. For example, the item ‘are you able to run 5 miles?’ is considered more difficult than the item ‘are you able to get in and out of bed?’ and therefore ordered higher on a physical function metric (if higher scores indicate better function). Individuals get a score on the same metric based on their answers. With items banks, it is not required to administer all items. Instead, a score can be obtained by administering only a subset of items as a short form. The ultimate advantage of item banks is the possibility of computerised adaptive testing (CAT), where after a starting question, the computer selects subsequent questions based on the answers to previous questions. This process continues until a predefined precision, or a maximum number of items is reached. CAT reduces patient burden compared with fixed-item questionnaires [57]. The responsiveness of measures derived from item banks is generally higher than traditional generic PROMs [58,59,60]. This is important because generic PROMs such as the SF-36 and WHODAS 2.0 generally have limited responsiveness for measuring change over time because the questions are broadly formulated. Item banks and CAT are therefore considered by some to be the future of outcome measurement [56]. Item banks are also sustainable because items can be adapted, removed or added without changing the underlying metric. The PROMIS initiative developed a large variety of item banks for measuring key symptoms (fatigue, pain, sleep disturbance, anxiety and depression), functional status (physical function and the ability to perform social roles and activities) and general health perceptions (global health), which have been translated into more than 60 languages and can be administered as short forms or CAT across a wide range of chronic conditions, enabling efficient and interpretable clinical trial and clinical practice applications of PROs [61]. PROMIS uses a T-score metric, where a mean of 50 represents the average of a reference population (usually a general population). Although content validity, structural validity and internal consistency have not been assessed in people with diabetes, Groeneveld et al were the first to show sufficient construct validity, test–retest reliability and responsiveness of seven PROMIS CATs for measuring physical function, pain interference, fatigue, sleep disturbance, anxiety, depression and ability to participate in social roles and activities in 314 people with type 2 diabetes (F. Rutters, unpublished results).

Finally, there are also high-quality generic PROMs available that measure only one PRO, such as the Functional Assessment of Chronic Illness Therapy (FACIT)–Fatigue Scale, or two PROs, such as the Hospital Anxiety and Depression Scale (HADS) for anxiety and depression, that could be considered. A description of relevant SF-36 and WHODAS subscales, PROMIS measures and some other commonly used generic PROMs that focus on only one or a few PROs and that we consider to have good content validity, is presented in Table 2. A narrative summary of evidence on their measurement properties in general, and any evidence that we could find on the measurement properties in people with diabetes, is presented in Table 2.

We recommend selecting a relevant PROM or a subscale of a PROM from Table 2 for each PRO that one aims to measure in a study or clinical application. The SF-36 and WHODAS 2.0 do not need to be administered in total and scales from different PROMs can be mixed based on preferences for a specific context of use. The PROMIS measures are attractive because they take advantage of the modern psychometric technique of IRT, which makes them precise, patient-friendly and short, and they allow for comparisons between disease groups, including those without diabetes and another chronic condition. Another advantage is that these scales are unidimensional, in contrast to several of the other measures mentioned in Table 2. Unidimensional scales measure only one construct, and scores are therefore easier and more valid to interpret.

The generic PROM scales do not assess diabetes-specific constructs such as diabetes distress, and for many studies it can be important to add disease-specific PROMs that measure diabetes-specific symptoms and other relevant diabetes-specific PROs, such as diabetes distress. A combination of disease-specific PROMs for measuring disease-specific symptoms and generic PROMs for measuring general symptoms, functioning and perceived overall health, seems most useful.

Future: where should we be going?

There is a need for further standardisation of PROs and PROMs in the field of diabetes. We recommend researchers and clinicians consider measuring disease-specific symptoms, general symptoms, functional status and general health perceptions. We recommend further validation of diabetes-specific PROMs that have sufficient content validity for measuring diabetes-specific symptoms and diabetes distress. In addition, we recommend using generic PROMs for measuring commonly relevant PROs. In particular, the use of item banks and CAT, such as those of the PROMIS system, offer many potential benefits for measuring commonly relevant PROs. The main advantages are efficient measurement with minimal number of items yet providing reliable scores; flexible measurement because items can be used interchangeably; and precise measurement due to low measurement error. It is also possible to convert scores of many traditional PROMs to the corresponding PROMIS metric (see for example Bingham et al [62]). PROMIS is rapidly being adopted and used across diseases and countries [63]. Koh et al confirmed that PROMIS might provide a generic solution to measure PROs in the field of diabetes. PROMIS covered five of six themes, 15 of 30 subthemes and 19 of 35 codes that were identified by people living with diabetes as important [28].

PROMs are not yet routinely used in the field of diabetes. A systematic review showed a sparse use of PROMs to assess depressive symptoms and distress during routine clinical care in adults with type 2 diabetes [64]. Scholle et al [65] were the first to study the effect of implementing the PROMIS-29 in routine care for people with diabetes. They reported some challenges understanding the PROMIS scales, but also saw the PROM process as an opportunity to increase their engagement in the treatment and management of their diabetes [65]. Preliminary qualitative data from our group showed that Dutch people living with type 2 diabetes found PROMIS CATs acceptable and indicated they could be an efficient way to start the conversation with a healthcare provider as well as provide people with diabetes with more confidence (F. Rutters, unpublished results). However, participants all felt that ‘questionnaires should never replace personal consultations with the physician’ (F. Rutters, unpublished results). To support healthcare providers with the selection and implementation of PROs and PROMs in clinical practice, several practical guidelines exist (e.g. van der Wees et al [66] and Aaronson et al [67]).

The COS developed for clinical trials in people with type 2 diabetes recommended core outcomes but not yet core outcome measurement instruments. This paper suggests the Impact of Weight on Activities of Daily Living questionnaire (IWADL), SF-36 subscale physical functioning and particularly PROMIS Physical Function measures for measuring the core outcome ‘activities of daily living’. The core outcome ‘global QOL’ is not considered a PRO, but nevertheless relevant to measure, for example, with the WHO well-being index (WHO-5, a short self-reported measure of current mental well-being [68]) or the PROMIS Global02 item (a single item addressing overall QOL, included in the PROMIS Global Health [69]). However, consensus among people with diabetes and healthcare providers is needed before making a final recommendation.

A limitation of this study is that no people with diabetes were involved. Our recommendations are based on literature and our own experiences as researchers with different backgrounds and clinicians. Second, this is not a systematic review of all disease-specific and generic PROs and PROMs that could be used in people with diabetes, and their measurement properties. Additionally, our review focusses predominantly on PROMs for adults with diabetes. However, we hope this paper provides sufficient evidence and recommendations to improve the current state of PROs and PROMs use in the field of diabetes, to improve healthcare and ultimately, improve the QOL of people living with diabetes.