1 Introduction

Due to the high global prevalence of type 2 diabetes (˜400 million) combined with the chronic nature of the disease, it is important to measure outcomes that matter most to patients [1, 2]. This can be done by measuring patient-reported outcomes (PROs). PROs are health outcomes directly reported by patients about how they feel or function in relation to a health condition. In clinical research and care an important PRO to measure is (aspects of) health-related quality of life (HRQOL), including symptom status, functional status and general health perceptions [3]. The terms HRQOL and Quality of Life (QOL) are often used interchangeably. However many authors state that (overall) QOL is a broader concept, referring to how happy or satisfied a person is with his/her life as a whole [4,5,6]. Clinicians and researchers in the medical field generally prefer to measure only those aspects of QOL related to health (often referred to as HRQOL) instead of QOL, because the non-medical aspects of QOL are outside the scope of health care interventions. Not only in care, but also clinical trials, the measurement of HRQOL is becoming increasingly important.

One of the most often used conceptual models of HRQOL was developed by Wilson and Cleary [4]. The model contains five levels of outcomes, namely biological and psychological variables, symptom status (including disease specific symptoms, physical symptoms and mental symptoms), functional status (including physical function, psychological function and social/role function), general health perceptions and overall quality of life (including overall quality of life, well-being and life satisfaction). In this review, we define HRQOL as symptoms, functional status and general health perceptions.

To date, many different PRO measurement instruments (PROMs) are available that measure HRQOL in people with type 2 diabetes, identified by previous reviews [7,8,9,10,11,12,13,14,15,16]. However, these reviews included studies in both people with type 1 and 2 diabetes, which represent different pathologies and large differences in age, and therefore different PROs may be relevant or the validity and reliability of PROMs may be different in people with type 1 versus type 2 diabetes [7, 8, 11]. Other reviews only included patients with amputations [14], only PROMs measuring one aspect of HRQOL, e.g. depressive symptoms [11] or were conducted over 10 years ago [9, 12]. A recent review by Wee et al. 2021 aimed to identify all PROMs used for people with diabetes [15]. However, Wee et al. did not classify (subscales of) PROMs according to which specific aspects of HRQOL, based on the Wilson & Cleary model, they measure. This classification is important because instrument selection should be based on the relevant aspects of HRQOL to measure, not on available PROMs, which are mostly multi-dimensional instruments that measure many different things. Therefore, the content and quality of PROMs should be evaluated for each PROM separately. Furthermore, often questionnaires that are being referred to as HRQOL PROMs include (subscales) that measure non-HRQOL aspects, such as characteristics of the individual, overall quality of life, or even patient-reported experience measures (PREMs), which are not part of the HRQOL construct according to the Wilson and Cleary model. This has not been made clear in previous reviews. Because of these research gaps, we aimed to systematically describe and classify the content of all PROMs that have specifically been developed or validated to measure (aspects of) HRQOL in people with type 2 diabetes.

2 Methods

This systematic review has been conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement [17] and the COSMIN guideline for conducting systematic reviews [18]. The protocol was registered in the PROSPERO database on 2 July 2017 (registration number CRD42017071012).

2.1 Literature search

The databases PubMed and EMBASE were searched from date of inception until May 2019 and then updated until 31st of December 2021. This literature search has been performed by researcher CBT in cooperation with a medical librarian from the Amsterdam UMC, Amsterdam, the Netherlands. The search strategy was built up around three blocks of search terms, namely type 2 diabetes, measurement properties (i.e. different search terms for reliability, validity, responsiveness and interpretability) and PROMs (i.e. different search terms for report, questionnaire and survey). For the type 2 diabetes dimension search terms were used to identify studies that focused on people with type 2 diabetes. For finding studies on measurement properties a highly sensitive validated search filter was used [19] and a comprehensive PROM filter, developed by the Patient Reported Outcomes Measurement Group, University of Oxford and available through the COSMIN website, was used to search for PROMs [20]. An overview of the search strategy can be found in Appendix 1.

All identified studies were uploaded in Covidence [21], which is an online platform that supports researchers in conducting systematic reviews by enabling them to upload all of the identified studies, screening of the studies on title and abstract and full-text, resolve disagreements, and export data. Covidence was used during the study to remove duplicates and for the screening and selection process of the retrieved studies.

3 Study selection

Pairs of two researchers (JWB, PJME, AAH, IH, MLG, GM, CACP, FR, CBT and MW) independently reviewed the identified studies based on title and abstract and full-text article. In case of any disagreements between two of the researchers a third researcher was consulted to reach consensus. From the identified studies reference lists of the included articles were checked by one of the researchers (MLG or FR) to search for additional eligible studies, after which pairs of researchers reviewed the studies found through reference search. The screening and selection process was conducted based on pre-defined eligibility criteria.

A study was included when it met all five of the following inclusion criteria: (I) the authors aimed to develop a PROM, evaluate the measurement properties or evaluate the interpretability (e.g. floor and ceiling effects) of a PROM, (II) it concerned a PROM that aims (according to the authors of the included papers) to measure at least (aspects of) symptom status, functional status, general health perceptions or HRQOL based on the model of Wilson and Cleary [4], (III) the PROM is filled in by the patient in self-report, interview or diary form or is completed on behalf of the patient (proxy), (IV) > 50% of the study population consisted of people with type 2 diabetes, as reported in the article or when it could be assumed based on age and type of diabetes medication, or studies that reported measurement properties specifically for a subgroup of people with type 2 diabetes, and (V) the article is available in full-text. There were no restrictions on language in which the article was written.

A study was excluded when any of the following exclusion criteria were met: the PROM (I) was only used as a determinant or outcome measure or was used as a comparison instrument in a validation study of another instrument, (II) solely measured characteristics of the individual or behaviors (e.g. aspects of personality, self-efficacy, coping and eating behavior), characteristics of the environment (e.g. social support and financial support), patient-reported experience measures (PREM, i.e. a measure of a patient's perception of their personal experience of the healthcare they have received, e.g. treatment satisfaction) or overall quality of life (QOL) (e.g. well-being or satisfaction with life in general), or (III) was primarily developed for screening, diagnostic or prognostic purposes. PROMs that measure a combination of (aspects of) HRQOL as well as other constructs were included if the main aim was to measure (aspects of) HRQQL.

3.1 Data synthesis

Information from the included studies was systematically synthesized by one of the researchers (LG, MLG or FR). In case of any uncertainties a second researcher (CBT) was consulted. The characteristics of the PROM, including official name, language in which the PROM was developed, target population for which the PROM was developed (including type 1 or type 2 diabetes), construct(s) being measured, name of (sub)scales as well as number of items were extracted using a study-specific and pilot-tested PROM characteristics table. If necessary, relevant comments were also recorded. With regard to the (sub)scales, we extracted the number of items per subscale and the original names when possible. However, some studies did not clearly mention the number of items per subscale or the names of the subscales and then we noted the total number of items and for the names we either used a name that matched the authors’ description of the subscales or when the authors added or eliminated only a few items (not changing the scales), we used the subscale name of the original instrument.

All PROM (subscales) were classified according to the constructs of HRQOL measured, based on the Wilson & Cleary model [4]. This classification was based on reviewing the names of the (sub)scales and not the content of the PROMs. Some (sub)scales did not measure aspects of HRQOL, but were classified as measures of overall quality of life (including well-being and life satisfaction), characteristics of the individual/environment or PREM. If information on PROM characteristics could not be found in the paper, additional resources such as other articles, Google (e.g. manuals or websites) or the PROQOLID database [22] were consulted.

4 Results

Figure 1 represents the flowchart of the screening and selection process.

Fig. 1
figure 1

Flowchart of the screening and selection process

4.1 Characteristics of the PROMs

A total of 116 unique HRQOL PROMs were identified, of which 82 (70.7%) were specifically developed for people with (type 1 and 2) diabetes (Table 1). Other PROMs were validated in people with type 2 diabetes, but were originally developed for 21 different target populations, the main one being the general population, namely 20/116 (17.2%). The PROMs were developed in 32 different languages, most often in English (N = 68), Dutch (N = 9), Japanese (N = 7) and Spanish (N = 7). 7/116 (6.0%) PROM were developed in more than one language at the same time, such as the World Health Organisation Quality of Life (WHOQOL-100) [23] and the World Health Organisation Quality of Life (WHOQOL)-BREF [24, 25]. For all 116 PROMs, the number of (sub-)scales varied from 1 to 21.

Table 1 Characteristics of the included HRQOL PROMs

We identified numerous different versions of the same PROM, for example 17 different versions were identified for the Diabetes Quality of Life questionnaire (DQOL). For many PROMs, these versions arose from translations, which during the validation process were modified by removing items or adding new items. By modifying, this makes it a new PROM, because it cannot be assumed that measurement properties are the same for different versions. When PROMs were only translated, with the same amount of subscales and items per subscales, we tallied this PROM as one of the same version and added the reference to that row of the PROM in Table 1. Finally, two studies consisted of non-standard PROMs, which were a decision tree [26] and a visual interactive PROM [27].

4.2 Levels of HRQOL measured with the PROMs

Table 2 and Supplemental Table 1 provide an overview of the specific levels of HRQOL that the included PROMs measure based on the Wilson and Cleary model [4]. Of the 116 unique HRQOL PROMs, 91 of their subscales measured symptom status, 60 measured functional status and 26 measured general health perceptions. With regard to symptom status, 22/91 measured diabetes-related symptoms, which included problems with vision, hearing, speaking, neuropathy, hypoglycemia, hyperglycemia, motor agitation and vasomotor function disturbance as well as cardiovascular disease. When examining the PROMs, there is overlap between the diabetes-related symptoms subscales and the general symptom status scales referring to physical symptoms and mental symptoms, such as pain or depressive feelings. For example, the Patient-reported outcomes in Thai patients with type 2 diabetes mellitus (PRO-DM-Thai) states to measure diabetes-related symptoms, but these include sleep problems, sexual problems and pain, which could be considered generic symptoms [28].

Table 2 Overview of the specific levels of HRQOL that the included PROMs measure based on the Wilson and Cleary model [4]

Within the symptom status level, 31/91 of the PROMs (subscales) measured physical symptoms, including pain, energy/fatigue and sleep as well as 69/91 measured mental symptoms, including distress, anxiety/worry and depression. With regard to the functional status level, 40/60 of the PROMs measured physical function, including activities of daily living and sexual function, 28/60 measured psychological function and 38/60 measured social/role function. There is a lot of heterogeneity, for example in the social function level, with many different constructs being measured, such as social well-being, restriction of social function, social role fulfillment and psychosocial disabilities, but also having friends, work and relationships, alienation, barriers and social burden.

In addition, 16/116 of the PROMs measured global quality of life. 61/116 of the HRQOL PROMs also include characteristics of the individual or environment and even PREMs, rather than only aspects of HRQOL. This includes characteristics of the individual, for example positive attitude [29,30,31], characteristics of the environment such as financial situation [31,32,33,34] or PREMs, such as treatment satisfaction [28, 34,35,36,37]. For one PROM it was specifically mentioned that demographics were also assessed as part of the PROM, namely the Diabetes Quality of Life Clinical Trial Questionnaire (DQLCTQ) [35].

Finally, only 9/116 of the HRQOL PROMs measured all aspects of HRQOL based on the Wilson & Cleary model. These PROMs include the DQLCTQ [35], Health Status Questionnaire (HSQ) 2.0 [38], PRO-DM-Thai [28], Quality of Life for Indian diabetes Patients (QOLID) [34], Self-perception of health [39], 12-Item Short Form Health Survey (SF-12) [40,41,42], 20-item Short Form Health Survey (SF-20) [43], 36-Item Short Form Health Survey (SF-36) [29, 44,45,46,47,48,49,50,51,52,53,54,55] and Well-being Enquiry for Diabetics (WED) [56]. Also, despite the fact that the authors of the included papers claimed that the PROM aims to measure at least (aspects of) symptom status, functional status, general health perceptions or HRQOL, 8/116 of the PROMs measured only global quality of life or PREMs and no HRQOL construct(s).

5 Discussion

In our systematic review of the literature, from a total of 220 studies, we identified 116 unique PROMs aiming to measure (aspects of) HRQOL in people with type 2 diabetes. Of these HRQOL PROMs, 80% (of the subscales) measured symptom status, 50% measured functional status and 20% measured general health perceptions. In addition, 15% of the PROMs (subscales) measured global quality of life. 50% of the 116 PROMs (subscales) also include characteristics of the individual (e.g. aspects of personality, coping) or environment (e.g. social or financial support) and patient-reported experience measures (PREMs, e.g. measure of a patient's perception of their personal experience of the healthcare they have received, e.g. treatment satisfaction), which are not part of the HRQOL construct. The (sub-)scales of these PROMs thus presented a great heterogeneity of constructs, with about 5% of the PROMs measuring all aspects of HRQOL based on the Wilson & Cleary model and about 5% not measuring HRQOL (constructs) at all. This review shows the great amount of PROMs developed. Furthermore, some PROMs are very long, which may suggest poor acceptability.

When conducting this review we faced multiple challenges. First, the terminology used for the constructs the (subscales of the) PROMs measure was unclear and definitions of the constructs are mostly lacking. It was therefore unclear to us whether names of the PROMs and subscales represent different or the same concepts. This large variability in operationalization of HRQOL made it difficult to classify the PROMs. This lack of clarity about what a PROM actually measures also makes it difficult or even impossible to know whether a PROM has good validity (i.e. whether it measures what it is supposed to measure). A second challenge was that information regarding the characteristics of the PROMs was often lacking or misleading. For example, the availability and the number and names of (sub-)scales and the number of items per (sub-)scale were often not presented in the paper. As a result we had to consult additional resources, such as other articles, Google (e.g. manuals or websites) or the PROQOLID database [22]. However, even this strategy sometimes failed, which may have resulted in an incomplete overview of the PROMs (Table 1). This poor reporting is possibly due to older papers not meeting our modern day standards, but hampers researchers and health care providers to select the best PROM for their purpose. The poor information status and very large hetereogeneity in PROMs (subscales) is not unique to the diabetes field [243]. PROMs are increasingly used as primary outcome measures in studies and tools for clinical decision making. The poor state makes it very difficult, and potentially even impossible, to compare study results or cohorts directly, since all PROMs measure different constructs and thus different outcomes. In this review, we did not systematically evaluate the measurement properties of the PROMs, such as content validity, construct validity, reliability and responsiveness. Therefore, researchers should be careful when using this review to select PROMs as we cannot guarantee that the content of the PROMs or subscales really match the intended construct and we cannot guarantee that the PROMs are reliable and responsive to change [244].

This review highlights the great amount of PROMs developed and used and the heterogeneity of their content. We feel there is a need to reach consensus on which PROM to measure HRQOL as well as which HRQOL aspects are most important to measure for people with type 2 diabetes. One solution is the development of Core Outcome Sets (COS) or Standard Sets, which are agreed sets of outcomes (and associated measurement instruments) to be measured in all trials or clinical practice. International organizations such as COMET (https://www.comet-initiative.org/) and ICHOM (www.ichom.org) have developed such COSs for type 2 diabetes [245,246,247]. However, the value of these COSs are limited, because they have a strong focus on biological outcomes, such as glycemic control [199,200,201] and there was limited input from people with expertise in PRO measurement or people with type 2 diabetes. This resulted in dissimilar recommendations regarding PROMs between the initiatives, but also inclusion of the ‘Diabetes Treatment Satisfaction Questionnaire’ (which is a PREM) and only inclusion of activities of daily living and overall quality of life, and no other aspects of HRQOL [245,246,247]. Qualitative studies show the importance of ‘To live a good life with diabetes’ for people with type 2 diabetes [248].

6 Limitations and strengths

This systematic review has several limitations and strengths. The first limitation is that the classification of the constructs was made based on reviewing the names of (sub)scales and not their content. We acknowledge that this may have resulted in misclassification, because of misleading construct names that do not reflect the content. It would have been better to look at the content of the PROMs to determine what aspects of HRQOL they measure, rather than using the names of the instrument (scales). We have done so for part of the PROMs, i.e. only the disease-specific HRQOL PROMs, in a separate review [244] where we did a full content validity assessment of these PROMs. However, the fact that there might be a mismatch between our classification and what the PROMs actually measure is a striking finding of this review. It is problematic that the name and description of a PROM as published in the literature does not tell us, or may even mislead us about what the PROM actually measures. This strongly hampers researchers and clinicians to select the optimal PROM for their purpose. Second, even though using an extensive search string, we identified 27% of the included studies from reference lists. However, by using this extensive search strategy our review is more complete than previous reviews specifically on HRQOL in those with type 2 diabetes. For example, we identified over 50 HRQOL PROMs with our search that were not found in the Wee et al. review [15]. We speculate this discrepancy is due to their lack of reference checking. Strengths of this systematic review were the extensive search with no restrictions on publication data or language as well as reference checking. Second, the use of a conceptual model to assess which aspects of HRQOL were measured by PROM (subscales) provides helpful information for researchers and health care providers searching for a PROM to measure one or more specific aspects of HRQOL, that is not provided in previous reviews. As stated before, instrument selection should be based on which relevant aspects of HRQOL one wants to measure and different aspects of HRQOL can be measured with subscales from different PROMs. Even though the Wilson and Cleary model is the most frequently used, other conceptual models are available that might be preferred by other researchers [4]. However, our conclusion on the heterogeneity and lack of clarity of constructs being measured with PROMs in the diabetes field would not have been different. Finally, despite our systematic review providing an overview and identifying the difficulties of the field, it also provides caution and food for thought regarding the use of the PROMs. Future studies are needed to provide definitive recommendations on which PROMs to use in people with type 2 diabetes.

7 Conclusion

A large number of PROMs are available for people with type 2 diabetes, which intend to measure (aspects of) HRQOL. These PROMs measure a large variety of (sub)constructs, which are not all HRQOL constructs, with a small amount of PROMs not measuring HRQOL at all. There is a need for consensus on which aspects of HRQOL should be measured in people with type 2 diabetes and which PROMs to use in research and daily practice.