Patient-reported outcomes for people with diabetes: what and how to measure? A narrative review

Patient-reported outcomes (PROs) are valuable for shared decision making and research. Patient-reported outcome measures (PROMs) are questionnaires used to measure PROs, such as health-related quality of life (HRQL). Although core outcome sets for trials and clinical practice have been developed separately, they, as well as other initiatives, recommend different PROs and PROMs. In research and clinical practice, different PROMs are used (some generic, some disease-specific), which measure many different things. This is a threat to the validity of research and clinical findings in the field of diabetes. In this narrative review, we aim to provide recommendations for the selection of relevant PROs and psychometrically sound PROMs for people with diabetes for use in clinical practice and research. Based on a general conceptual framework of PROs, we suggest that relevant PROs to measure in people with diabetes are: disease-specific symptoms (e.g. worries about hypoglycaemia and diabetes distress), general symptoms (e.g. fatigue and depression), functional status, general health perceptions and overall quality of life. Generic PROMs such as the 36-Item Short Form Health Survey (SF-36), WHO Disability Assessment Schedule (WHODAS 2.0), or Patient-Reported Outcomes Measurement Information System (PROMIS) measures could be considered to measure commonly relevant PROs, supplemented with disease-specific PROMs where needed. However, none of the existing diabetes-specific PROM scales has been sufficiently validated, although the Diabetes Symptom Self-Care Inventory (DSSCI) for measuring diabetes-specific symptoms and the Diabetes Distress Scale (DDS) and Problem Areas in Diabetes (PAID) for measuring distress showed sufficient content validity. Standardisation and use of relevant PROs and psychometrically sound PROMs can help inform people with diabetes about the expected course of disease and treatment, for shared decision making, to monitor outcomes and to improve healthcare. We recommend further validation studies of diabetes-specific PROMs that have sufficient content validity for measuring disease-specific symptoms and consider generic item banks developed based on item response theory for measuring commonly relevant PROs. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1007/s00125-023-05926-3.


Introduction
In clinical practice, consultations with healthcare providers are often short. In the case of poor emotional well-being, e.g. depressive symptoms, there is limited time available for in-depth discussion. Questionnaires that measure patientreported outcomes (PROs), so called patient-reported outcome measures (PROMs), can be of help. A PRO was defined by the US Food and Drug Administration (FDA) as 'any report of the status of a patient's health condition that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else' [1] (Text box 1). PROMs measuring physical and psychosocial aspects of health and quality of life (QOL) such as physical function or depression, offer complementary information to clinical outcomes such as HbA 1c , and can be used to inform people with diabetes about the expected course of disease and treatment, for shared decision making, monitoring outcomes and to improve healthcare [2]. Using PROMs does not need to lengthen the consultation time [3].
To optimally benefit from using PROMs in research or clinical practice, PROMs should measure those outcomes that are most relevant to people with diabetes. Several initiatives have tried to identify which PROs are most relevant for people with diabetes. An international consortium of people with diabetes, healthcare providers and other relevant stakeholders developed an agreed minimum set of outcomes to be measured in all clinical trials in people with type 2 diabetes (called a core outcome set [COS]). They recommend measuring global QOL and activities of daily living in all clinical trials [4]. The International Consortium for Health Outcomes Measurement (ICHOM) developed a standard set of outcomes to be measured in clinical practice in people with type 1 or type 2 diabetes. They recommend measuring psychological well-being, diabetes distress and depression [5]. Other initiatives recommend yet different PROs [6][7][8][9]. Although 'quality of life' is often recommended [8], this concept is defined very differently by different people [10]. There are many different questionnaires available that aim to measure QOL or (aspects of) health-related QOL (HRQL); some are generic, some are disease-specific, and they measure many different things, not always restricted to PROs [11]. Furthermore, the validity, reliability and responsiveness to change over time of many of the questionnaires is often unclear or not sufficient [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26].
The aim of this review is to provide recommendations on the most commonly relevant PROs for adult people with diabetes to measure in clinical practice and research, and good quality PROMs to measure these PROs. We first provide a general conceptual framework of PROs and PROMs. Second, we present a narrative overview of the literature on which PROs are most relevant to measure in people with diabetes. Third, we present an overview of which PROMs have been used in studies involving people with diabetes and what is known about the quality of these PROMs in terms of validity, reliability and responsiveness. In addition, we suggest several well-validated generic PROMs that could be used in people with diabetes. Finally, we provide recommendations and suggestions for the use of PROMs in clinical practice and research.

A conceptual framework of PROs and PROMs
There is considerable heterogeneity in the definition and operationalisation of the terms 'QOL', 'HRQL', 'PRO' and 'PROM' between and within studies [10]. In Text box 1, we provide an overview of commonly used terms and definitions, which we adopted in this paper. We adopt the original definition of a PRO of the FDA [1], which has also been adopted by the European Medicines Agency (EMA). PROs therefore refer to health outcomes, including physical, mental, and social symptoms and functioning. Non-health-related constructs, such as overall QOL (which is broader than health), satisfaction, eating behaviour and stigma, are not considered PROs according to the original FDA definition.
It is important to conceptualise how different health and QOL outcomes interrelate. Different models have been proposed in the literature. One commonly used model was developed by Wilson and Cleary [27], who distinguish different levels of health and QOL outcomes and relate them to characteristics of the individual and the environment. For illustration, we placed several relevant health and QOL outcomes and characteristics of the individual and the environment for people with diabetes in the Wilson and Cleary model (Fig. 1).
In this model, biological and physiological variables, symptoms, functional status and general health perceptions are considered aspects of health status. Overall QOL is broader than health.
Aspects of health can be thought of as existing on a continuum of increasing biological, social and psychological 1 3 complexity. Starting from the left-hand side ( Fig. 1) are biological and physiological aspects of health such as HbA 1c , hypoglycaemia, glucose variability or blood pressure . These are measured with clinical measurement  instruments, such as glucose sensors, laboratory tests,  physical examination, vision tests and imaging techniques. A biological or physiological abnormality or defect, such as the inability of the pancreas to make insulin, can lead to symptoms, referring to how a patient feels. These could be physical symptoms, such as pain or blurred vision, or emotional or psychological symptoms, such as fear, worry and depressive symptoms. Symptoms are PROs and should be measured with PROMs.
Symptoms can lead to limitations in how an individual functions, in terms of physical function, mental function and social/role function (e.g. performing a job). Functional status can be measured with PROMs, by asking about perceived limitations in functioning, but also with performance-based tests, such as walking tests.
General health perceptions, which refer to the PRO 'perceived overall health', are often measured with a single question, e.g. 'how would you rate your overall health?', which is a PROM. Finally, overall QOL includes aspects of health, but is broader and also includes factors not related to health, such as material comforts, personal safety and satisfaction with life in general. Overall QOL is actually not a PRO, although (some of) its components can be PROs. Therefore, questionnaires measuring overall QOL are not considered PROMs according to the FDA definition. Wilson and Cleary use the term HRQL as an umbrella term, including symptoms, functional status and general health perceptions [27]. Finally, the model shows that health and QOL outcomes are influenced by contextual factors, i.e. personal factors such as personality, behaviour (diet, medication adherence and physical activity) and coping mechanisms, and environmental factors, such as social support, social stigma and financial aspects.
Commonly used questionnaires in the diabetes field measure different things. Some questionnaires focus on only one level, e.g. symptoms, while others measure outcomes at multiple levels of health, especially questionnaires that aim to measure 'QOL' or 'HRQL'. Some questionnaires classify different outcomes in different subscales, e.g. one subscale for symptoms and another subscale for physical function, but others (undesirably) combine outcomes from different levels into one scale. Many questionnaires include questions or subscales measuring PROs but also contextual factors [11]. These  [27]. This figure is available as a downl oadab le slide

Confusion between health outcomes and contextual factors
Many authors consider everything reported by patients in a questionnaire as PROs, not only aspects of health. This leads to confusion about what PROs are. Many questionnaires include subscales measuring characteristics of the individual (e.g. positive attitude, self-care ability, diet and eating habits, treatment adherence [160,161]) or environment (e.g. social stigma, support, financial concerns and barriers [162][163][164]). While these are important things to measure in people with diabetes, these should be considered contextual factors, not PROs According to the original definition of the FDA, PROs only refer to patient-reported aspects of health, including symptoms, functional status and general health perceptions [1] To avoid confusion, we recommend that contextual factors are not called PROs and that subscales measuring contextual factors are not included in a PROM, but instead included in separate questionnaires

Mixing different concepts in one scale
In some questionnaires, questions about contextual factors and PROs, or questions about different levels of health (e.g. symptoms and functional status), are combined into one subscale [165][166][167] Mixing different concepts into one score results in insufficient structural validity, limits interpretability and usefulness of the score (what does a score mean?) and limits the ability to study the relationship between contextual factors and HRQL. For example, if questions on financial concerns and social function, or anxiety and self-rated health, are combined in one score, one cannot assess the effect of financial concerns on social function or the effect of anxiety on self-rated health

No distinction between QOL and HRQL
For many questionnaires that are called 'quality of life questionnaire' it is not clear whether they indeed aim to measure (overall) QOL or rather HRQL. Some questionnaires that aim to measure HRQL also include aspects of overall QOL [168,169] The concept of QOL is generally defined very broadly, including not only aspects of health but also aspects of wealth, education, religion, etc. Within the field of healthcare, the focus is often on aspects of health and therefore the concept of HRQL is generally considered more relevant [10] We recommend distinguishing the concept of QOL from PROs. Although the concept of QOL includes PRO concepts, it is broader than health

Unclear construct definition
The term HRQL is often used as an umbrella term. PROMs that aim to measure HRQL often lack a clear definition of the construct of interest and may measure only some aspects of HRQL (e.g. fatigue, cognitive function and social function [164]), rather than a comprehensive set of all relevant physical, mental, emotional and social symptoms and functions. Also, the content of PROMs that aim to measure HRQL is very different [11] The title of a PROM should better reflect the construct of interest of the PROM

Confusion between PREM and PROM
Many questionnaires include questions or subscales that measure patient-reported experience measures (PREMs, e.g. burden of treatment, treatment satisfaction and physician-patient relationship [80,170,171]) in addition to health aspects PROMs and PREMs are two different type of instruments and should not be confused Text box 2: Measurement issues related to a lack of distinction between different health outcomes and contextual factors, derived from systematic reviews [11,24,25] 1 3 The Asian Diabetes Quality of Life questionnaire (AsianDQOL) was developed based on focus group discussions with people living with type 2 diabetes and consists of five subscales: financial aspects, interpersonal relationships, memory and cognition, diet and eating habits, and energy [164]. While all subscales measure things that are important to individuals living with diabetes, these subscales do not all measure PROs according to the FDA definition of a PRO. Some subscales measure PROs (e.g. 'memory and cognition' and 'energy') but other subscales measure characteristics of the individual ('diet and eating habits') or characteristics of the environment ('financial aspects') The subscale 'energy' contains a question about the frequency of fatigue and two questions about the effect of diabetes on daily activities and activities you like. In the subscale score these items on symptoms and functional status are combined into one score, which makes the meaning of the score difficult to interpret. Moreover, the latter two questions measure the consequences of (lack of) energy on functional status, rather than the symptom (lack of) energy itself ( Fig. 1) Distinguishing between different PRO concepts and how they are interrelated and related to contextual factors helps to select relevant outcomes, develop and validate questionnaires, better understand their scores and develop a better understanding of the health outcomes of disease and treatment

Text box 3: An illustration of a mix of health outcomes and contextual factors in a questionnaire
questionnaires are therefore not (entirely) PROMs. Lack of distinction between health and non-health outcomes, between health outcomes and contextual factors, and between PROMs and other questionnaires, results in confusion on what is being measured, lack of content validity of PROMs, difficulty selecting the best PROM for a given study or clinical application, and inability to study causal relationships between health outcomes or the relationship between contextual factors, health outcomes and overall QOL. Text box 2 provides illustrations of measurement issues we encountered in performing systematic reviews of PROMs in people with diabetes [11,24,25]. Researchers and clinicians should be aware of the differences between clinical outcomes, PROs, contextual factors and patient experiences, and the fact that all of these concepts are often included in questionnaires or subscales that aim to measure HRQL, QOL or PROs. An illustration is provided in Text box 3. This situation hampers clear interpretation of what is being measured and is a threat to the validity of diabetes research. We cannot, for example, study the influence of self-care behaviour on physical and psychological functioning in people with diabetes if these concepts are measured in one scale and summarised into one score. We cannot appropriately perform or interpret the results of meta-analyses of studies on the effects of certain medication on HRQL, if the HRQL instruments measure all kind of different concepts, some of them not even related to health. All the concepts shown in Fig. 1 can be important to measure, but it is confusing if they are all called PROs or HRQL, and they should not be combined into one scale score.

Most relevant PROs to measure in people with diabetes
It is not clear which PROs are most relevant to measure in diabetes research and clinical practice. Qualitative studies revealed a large number of outcomes considered important by people with diabetes [28][29][30]. No explicit distinction was made in these studies between PROs, contextual factors and other outcomes, although Dodd et al classified outcomes using the Core Outcome Measures in Effectiveness Trials (COMET) taxonomy [31], where PROs are classified in the category 'life impact'.
Relevant international guidelines differ in PROs being recommended. Many recommendations state the importance of psychosocial problems. However, a distinction between psychosocial functioning (a PRO) and psychosocial well-being (broader than a PRO) is not made. Harman et al developed a COS to be measured and reported, as a minimum, in all clinical trials in people with type 2 diabetes [4]. A COS often contains PROs but also other relevant (clinical) outcomes. The COS was developed in a Delphi survey with healthcare professionals, people with type 2 diabetes, researchers in the field and healthcare policymakers. Recommended core outcomes to be reported by patients were 'global QOL' and 'activities of daily living'. Global QOL was defined as 'someone's overall quality of life, including physical, mental and social well-being' [4]. This is actually not a PRO because it is broader than health. Activities of daily living was defined as 'being able to complete usual everyday tasks and activities, including those related to personal care, household tasks or community-based tasks' [4]. This refers to a PRO.

3
The ICHOM consortium developed a standard set for people with diabetes types 1 and 2, to be used in clinical practice, also using a consensus approach among experts. It does not state whether people with diabetes were involved. Recommended outcomes are psychological well-being, diabetes distress and depression. Only the latter two are PROs [5]. This recommendation is in line with recommendations from the ADA and the EASD, which state that providers should consider diabetes distress, depression, anxiety, disordered eating (which is not a PRO), cognitive capacities and chronic pain [6,[32][33][34][35].
Differences in recommendations are at least partly due to different aims, methodology, and (lack of) involvement of people with diabetes. For example, a COS includes only a minimum set of outcomes to be measured and reported in every clinical trial, while other guidelines might include outcomes that could be relevant to measure in addition in specific trials or in clinical practice.
In summary, there is consensus that the PRO 'activities of daily living' (which is conceptually similar to physical function) should be measured in all diabetes trials. There is less consensus on which PROs are additionally relevant to measure in specific trials and which PROs are relevant to measure routinely in clinical practice.
In the meantime, there is increasing evidence from several initiatives that some PROs are relevant for many people, irrespective of their disease (Text box 4) [36 -38]. Symptoms such as pain, fatigue or depression are common across diseases. Furthermore, being able to carry out daily activities and social roles is important to most people. The Patient-Reported Outcomes Measurement Information System (PROMIS) domain framework was developed to capture commonly relevant PROs across three broad aspects of physical, mental and social health based on the WHO definition of health [37,39]. It was developed through literature reviews of well-established instruments, a consensus-building Delphi process among health outcomes experts and statistical analysis. Patients were not involved in the development of the conceptual model, although patient input was captured by reviewing instruments that were developed with patient input [40]. Five subdomains were selected as the initial areas for PROMIS item bank construction: fatigue, pain, emotional distress (later divided into depression, anxiety and anger), physical functioning and social role participation [37]. Kroenke et al developed a taxonomy of key pragmatic decisions related to PROM implementation based on literature review, but without patient input [38]. One of the pragmatic issues they address is the selection of generic vs disease-specific PROMs. They noted that some domains are crosscutting in the fact that they occur frequently and often cluster across the majority of medical and mental health disorders, including fatigue, pain, depression, anxiety, sleep and physical function [38].

Text box 4: Commonly relevant PROs for all people, irrespective of their disease, as identified by several large initiatives
Terwee et al extracted all PROs and recommended PROMs from 39 ICHOM Standard Sets [36]. Many of these sets were developed with patient input, but not all. More than 300 PROs were categorised into 22 unique PRO concepts. The most commonly included PROs were ability to participate in social roles, physical function, HRQL, pain, depression, general mental health, anxiety and fatigue [36]. The COMET initiative identified similar common PROs included in COS for trials (Text box 4).
In the Netherlands, a national consensus set of PROs and PROMs was recently developed for routine use in Dutch medical specialist care, based on the above mentioned initiative and others, as well as input from patients, healthcare providers and representatives of healthcare organisations. The selected PROs were fatigue, pain, depression, anxiety, physical functioning, social role participation and overall health [41].
Based on these initiatives, we recommend considering the commonly relevant PROs mentioned above to measure in people with diabetes (both type 1 and type 2). These PROs can be supplemented with relevant diabetes-specific symptoms. For example, the WHO report on diabetes lists frequent urination, thirst, feeling hungry (even though you are eating), blurry vision, weight loss (type 1) and tingling hands/feet (type 2) as relevant symptoms of diabetes [42]. In addition, other relevant PROs that are commonly measured could be considered, such as diabetes distress and fear of hypoglycaemia.

Best PROMs for use in people with diabetes
It is very challenging to identify the best PROMs to measure the above suggested PROs in people with diabetes. At least 16 systematic reviews have been published summarising the available PROMs and their measurement properties for people with diabetes [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26]. These reviews vary in quality and completeness, while some included selected groups (i.e. only type 1 or type 2 or people with amputations), some focused on only one PRO (e.g. depression), and some were conducted over 10 years ago. As a result, the identified PROMs, evaluation methods, conclusions and recommendations of these reviews vary.
Our systematic review by Langendoen-Gort et al provides the most recent overview of existing PROMs, published up to 31 December 2021, that aim to measure (aspects of) HRQL and that have been validated to at least some extent in people with type 2 diabetes [11]. We identified 116 questionnaires. Not all of these questionnaires actually measure PROs. About half (61) of the 116 questionnaires (also) include items or subscales measuring characteristics of the individual (e.g. aspects of personality and coping) or environment (e.g. social or financial support), or patient experiences and treatment satisfaction. Eight out of the 116 questionnaires measured no PRO at all, even though they claim to measure HRQL [11]. No recommendations were provided on the best PROMs because the measurement properties of the PROMs were not assessed in this review.
The international COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative developed consensus-based standards and criteria for assessing the quality of PROMs [43,44]. Nine measurement properties are considered important for PROMs: content validity, structural validity, internal consistency, construct validity, reliability, measurement error, cross-cultural validity, criterion validity (only for comparing different versions of the same PROM) and responsiveness (Table 1) [45]. According to COSMIN, the most important measurement property is content validity [46]. In a second review, we assessed the content validity of 54 of the above mentioned 116 PROMs, containing 150 subscales that were specifically developed for people with type 2 diabetes. Using COSMIN methodology [46], we assessed whether all PROM items measure relevant aspects of the construct the PROM (scale) aims to measure, whether no important aspects are missing and whether the items are interpreted by the person as intended. Most previous reviews did not evaluate content validity, or not in as much detail as the COSMIN methodology recommends. We showed that content validity was rated as sufficient for only 41 out of the 150 (27%) PROM subscales [25]. In Table 1 we provide a narrative summary of the relevant evidence on all measurement properties of these PROM subscales, excluding single items and scales developed for subgroups of people with diabetes (e.g. foot ulcers), classified according to the Wilson and Cleary model [27]. Evidence on the measurement properties other than content validity was extracted from the 16 reviews described above as well as from several main validation papers of the PROMs. We did not find such an evidence synthesis for type 1 diabetes. Table 1 shows that none of the existing diabetes-specific PROM scales have been sufficiently validated. COSMIN states that PROMs with evidence for sufficient content validity (any level) and at least low evidence for sufficient structural validity and internal consistency have the potential to be recommended for use [44]. In addition, evidence on reliability (small measurement error) is important, especially Data was extracted from 16 systematic reviews [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26] and some additional validation studies. This is not a comprehensive systematic review, but provides the most relevant evidence on the measurement properties of these PROM scales. No information on criterion validity was found COSMIN criteria for sufficient measurement properties were used [44] for PROMs used in clinical practice. All PROMs measuring disease-specific symptoms showed positive results for internal consistency, but these results cannot be interpreted properly if evidence that the scale is unidimensional is lacking [47]. Also, important information on test-retest reliability and responsiveness is lacking. The Diabetes Symptom Self-Care Inventory (DSSCI) is most promising for measuring diabetes-specific symptoms because it has the best evidence for content validity. For measuring diabetes distress, the Diabetes Distress Scale (DDS) and the Problem Areas in Diabetes (PAID) scale are most promising based on content validity. For the diabetes-specific PROMs or subscales measuring fatigue, anxiety, physical function, sexual function, emotional function, social function and overall health, evidence on structural validity, test-retest reliability and responsiveness is missing. Therefore, none of these PROMs can be recommended. Considering that these PROs are commonly relevant across medical conditions (Text box 4) and that the content of disease-specific PROMs and generic PROMs measuring the same PRO are often very similar, we recommend using generic PROMs for these PROs.
For these commonly relevant PROs, high-quality generic PROMs exist that are applicable across populations and diseases ( Table 2). Not all of these generic PROMs have been validated in people with diabetes ( Table 2), but since they showed good measurement properties in other chronic conditions, it may be reasonable to assume that they will also perform well in people with diabetes. We discuss three generic PROMs that are widely used and tested: the 36-Item Short Form Health Survey (SF-36) [48], the WHO Disability Assessment Schedule 2.0 (WHODAS 2.0) [49] and the PROMIS measurement system [39].
The SF-36, developed in 1992, is the most commonly used generic PROM in the world. It has been extensively validated across medical conditions, illustrated by more than 300 systematic reviews of measurement properties of instruments including this PROM [50]. The SF-36 contains 36 items, divided into eight subscales, measuring physical functioning, bodily pain, role limitations due to physical health problems, role limitations due to personal or emotional problems, emotional well-being, social functioning, energy/fatigue and general health perceptions. Although content validity, structural validity and internal consistency have not been assessed in people with diabetes, evidence for sufficient construct validity and responsiveness has been found in people with diabetes (e.g. Huang et al [51] and Ahroni and Boyko [52]).
The WHODAS 2.0 is a generic instrument covering several domains of function and participation, directly linked to the International Classification of Functioning, Disability and Health (ICF). The original WHO/DAS was published in 1988 and WHODAS 2.0 in 2010 [49]. It includes 36 items, divided into six subscales measuring cognition, mobility, self-care, getting along, life activities and participation [53]. WHODAS 2.0 has been used in large epidemiological studies in people with diabetes [54,55] but has not been validated in people with diabetes.
The development of PROMIS started in 2004. PROMIS consists of 'item banks' instead of fixed PROMs, which has many advantages. An item bank is a large set of items that all measure one PRO (e.g. physical function) and that are ordered on a metric using psychometric methods based on item response theory (IRT) methods [56]. For example, the item 'are you able to run 5 miles?' is considered more difficult than the item 'are you able to get in and out of bed?' and therefore ordered higher on a physical function metric (if higher scores indicate better function). Individuals get a score on the same metric based on their answers. With items banks, it is not required to administer all items. Instead, a score can be obtained by administering only a subset of items as a short form. The ultimate advantage of item banks is the possibility of computerised adaptive testing (CAT), where after a starting question, the computer selects subsequent questions based on the answers to previous questions. This process continues until a predefined precision, or a maximum number of items is reached. CAT reduces patient burden compared with fixed-item questionnaires [57]. The responsiveness of measures derived from item banks is generally higher than traditional generic PROMs [58][59][60]. This is important because generic PROMs such as the SF-36 and WHODAS 2.0 generally have limited responsiveness for measuring change over time because the questions are broadly formulated. Item banks and CAT are therefore considered by some to be the future of outcome measurement [56]. Item banks are also sustainable because items can be adapted, removed or added without changing the underlying metric. The PROMIS initiative developed a large variety of item banks for measuring key symptoms (fatigue, pain, sleep disturbance, anxiety and depression), functional status (physical function and the ability to perform social roles and activities) and general health perceptions (global health), which have been translated into more than 60 languages and can be administered as short forms or CAT across a wide range of chronic conditions, enabling efficient and interpretable clinical trial and clinical practice applications of PROs [61]. PROMIS uses a T-score metric, where a mean of 50 represents the average of a reference population (usually a general population). Although content validity, structural validity and internal consistency have not been assessed in people with diabetes, Groeneveld et al were the first to show sufficient construct validity, test-retest reliability and responsiveness of seven PROMIS CATs for measuring Table 2 Commonly used, well-validated generic PROMs for measuring common PROs

Fatigue
The 13-item Functional Assessment of Chronic Illness Therapy-Fatigue Scale (FACIT-Fatigue) was originally developed for cancer patients. Most validation studies were therefore performed in cancer patients [100][101][102], but it has been validated and used across many other conditions. Evidence on structural validity (i.e. whether the scale measures one or two constructs) seems inconsistent. Some evidence for content validity, internal consistency and test-retest reliability was found in Turkish people with diabetes [103]. More information and available language versions can be found on the FACIT website. The PROMIS Fatigue item bank, short forms and CAT have been validated in several general and clinical populations, including people with kidney disease, and appear to be unidimensional [60,[104][105][106][107]. Evidence for construct validity, test-retest reliability and responsiveness of the PROMIS Fatigue CAT was found in Dutch people with diabetes (F. Rutters, unpublished results). The PROMIS Fatigue short forms are part of the commonly used PROMIS-29, PROMIS-43, and PROMIS-57 [108], which have been validated across general and clinical populations [108][109][110][111][112][113][114]. These measures have been used (but not validated) in people with diabetes in clinical practice and research [65,[115][116][117]. The FACIT-Fatigue has been adopted by the PROMIS initiative and is now also called the PROMIS SF v1.0 Fatigue 13a. Available language versions of PROMIS can be found on the HealthMeasures website (www. healt hmeas ures. net/ explo re-measu rementsyste ms/ promis/ intro-to-promis/ avail able-trans latio ns).
Pain A single 11-point (i.e. 0-10) numerical rating scale (NRS) for measuring pain intensity was recommended by the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) initiative as a core outcome measure in clinical trials of chronic pain treatments [118]. The NRS has been used (but not validated) in diabetes studies (e.g. Higgins et al [119]). The PROMIS Numeric Rating Scale v1.0-Pain Intensity 1a, for example, is an NRS that can be used as a standalone measure or as part of the commonly used PROMIS Global Health [69], PROMIS-29, PROMIS-43 and PROMIS-57 [108]. The PROMIS Global Health, PROMIS-29 and PROMIS-57 have been used (but not validated) in people with diabetes in clinical practice and research [65, 115-117, 120, 121]. The SF-36 is perhaps the most commonly used generic PROM in the world. It was included in more than 300 systematic reviews of measurement properties of PROMs, included in the COSMIN database [50]. The SF-36 subscale Bodily Pain asks about pain severity and the interference of pain with daily activities. Evidence for internal consistency, construct validity and responsiveness of the SF-36 has been found in people with diabetes (e.g. Huang et al [51], Ahroni and Boyko [52], and Martin et al [122]). Available language versions of the SF-36 can be found in the Patient-Reported Outcome and Quality of Life Instruments Database (PROQOLID) (https:// www. qolid. org/ instr uments/ sf_ 36_ sup_r_ sup_ health_ survey_ and_ sf_ 36v2_ sup_ tm_ sup_ health_ survey_ sf_ 36_ sup_r_ sup_ sf_ 36v2_ sup_ tm_ sup/).

Anxiety
The Generalized Anxiety Disorder-7 (GAD-7) is a brief screening tool developed to identify probable cases of generalised anxiety disorder and assess symptom severity [123]. It has been widely used and validated (e.g. Breedvelt et al [124] and Toussaint et al [125]). Findings regarding its structural validity are mixed, with most studies reporting it to be one scale (including a study in people with diabetes in India [126]), whereas others found two subscales [127]. Available language versions of the GAD-7 can be found on the website of Patient Health Questionnaire (PHQ) Screeners. The Hospital Anxiety and Depression Scale (HADS) was published in 1983 as a self-assessment scale for detecting states of depression and anxiety in a hospital setting [128]. The HADS is widely used and has been extensively validated in many different conditions [50] (some may include people with diabetes, but we found no validation study in only people with diabetes), although evidence on structural validity is inconsistent. The HADS consists of two subscales, measuring anxiety and depression, respectively, although others have suggested that it can be used as one unidimensional scale [129]. More information and available language versions of the HADS can be found on the ePROVIDE website (https:// eprov ide. mapi-trust. org/ instr uments/ hospi tal-anxie ty-and-depre ssion-scale). The SF-36 subscale Mental health is widely used, and has been validated in people with diabetes (e.g. Huang et al [51], Ahroni and Boyko [52], and Martin et al [122]). The more recently developed PROMIS Anxiety item bank and derivative short forms and CAT were found to be unidimensional and have been validated in several general and clinical populations [130][131][132][133]. Evidence for sufficient construct validity, test-retest reliability and responsiveness of the PROMIS Anxiety CAT was found in Dutch people with diabetes (F. Rutters, unpublished results). The PROMIS Anxiety short forms are part of the commonly used PROMIS-29, PROMIS-43, and PROMIS-57 [108] (see above) [65,[115][116][117].

Depression
The HADS depression subscale is described above. van Dijk et al concluded in a systematic review that the generic Center for Epidemiologic Studies Depression scale (CESD) was best supported for measuring depressive symptoms in people with diabetes [21]. However, evidence on structural validity is inconsistent. Although the CESD is used as a unidimensional scale, most studies found three or four underlying concepts [134]. The CESD was revised to CESD-R in 2004. More information and available language translations can be found on the CESD website (https:// cesd-r. com/). The Patient Health Questionnaire (PHQ-9) [135] has been used in more than 5000 studies listed on PubMed. It was included in more than 30 systematic reviews of measurement properties of PROMs [50]. van Dijk found evidence for construct validity, and criterion validity in people with diabetes, but evidence for structural validity was inconsistent [21]. Available language versions of the PHQ-9 can be found on the website of Patient Health Questionnaire (PHQ) Screeners (www. phqsc reene rs. com/ select-scree ner). The SF-36 subscale Mental health (see above) is widely used, and has been validated in people with diabetes (e.g. Huang et al [51], Ahroni and Boyko [52], and Martin et al [122]). The more recently developed PROMIS Depression item bank and derivative short forms and CAT were found to be unidimensional and have been validated in several general and clinical populations [58,132,133,[136][137][138]. High internal consistency of the PROMIS Depression 8-item short form was found in people with diabetes [139]. Evidence for construct validity, test-retest reliability and responsiveness of the PROMIS Depression CAT was found in Dutch people with diabetes (F. Rutters, unpublished results). The PROMIS Depression short forms are part of the commonly used PROMIS-29, PROMIS-43 and PROMIS-57 [108] (see above), which have been used (but not validated) in people with diabetes in clinical practice and research [65,[115][116][117].

Sleep disturbances
The Pittsburgh Sleep Quality Index (PSQI) is the most frequently used measure of sleep quality. However, evidence on structural validity was found to be inconsistent [140]. It has been used in more than 5000 studies (PubMed) and was included in 28 systematic reviews of measurement properties of PROMs (https:// datab ase. cosmin. nl). More information and available language versions can be found on the ePROVIDE website (https:// eprov ide. mapi-trust. org/ instr uments/ pitts burgh-sleep-quali ty-index). The PROMIS Sleep Disturbance and Sleep-Related Impairment item banks and derivative short forms and CAT were found to be unidimensional and have been validated in several general and clinical populations [141][142][143][144]. Evidence for construct validity, test-retest reliability and responsiveness of the PROMIS Sleep Disturbance CAT was found in Dutch people with diabetes (F. Rutters, unpublished results). Sufficient responsiveness of the short forms of both PROMIS measures was found in people with type 2 diabetes and sleep apnoea [145]. The PROMIS Sleep Disturbance short forms are part of the commonly used PROMIS-29, PROMIS-43 and PROMIS-57 [108] (see above), which have been used (but not validated) in people with diabetes in clinical practice and research, respectively [65,[115][116][117].

Physical function
Elsman et al concluded in a systematic review that the Diabetic Foot Ulcer Scale short form (DFS-SF) subscale Dependence/Daily Life (developed for people with diabetes and foot ulcers) and the IWADL could best be used to measure physical functioning in people with type 2 diabetes in research or clinical practice, although both scales have some limitations [24]. More information and available language versions of the DFS and DFS-SF can be found on the ePROVIDE website (https:// eprov ide. mapi-trust. org/ instr uments/ diabe tic-foot-ulcer-scale). The SF-36 subscale Physical Functioning (see above) is probably the most commonly used generic unidimensional physical function subscale and has been validated in people with diabetes (e.g. Huang et al [51], Ahroni and Boyko [52], and Martin et al [122]). The unidimensional PROMIS Physical Function item bank and derivative short forms and CAT are the most commonly used and most often translated measures of the PROMIS system and have been validated in several general and clinical populations, most often in people with musculoskeletal disorders [146][147][148][149]. Evidence for construct validity, test-retest reliability and responsiveness of the PROMIS Physical Function CAT was found in Dutch people with diabetes (F. Rutters, unpublished results). The PROMIS Physical Function short forms are part of the commonly used PROMIS-29, PROMIS-43 and PROMIS-57 [108] (see above), which have been used (but not validated) in people with diabetes in clinical practice and research, respectively [65,[115][116][117].

Sexual function
The most widely used measures of sexual function are the Female Sexual Function Index (FSFI) for women and the International Index of Erectile Function (IIEF) for men. However, conflicting and lack of evidence was found for some of their measurement properties [150,151]. On the ePROVIDE website more information and available language versions can be found for the FSFI (https:// eprov ide. mapitrust. org/ instr uments/ female-sexual-funct ion-index) and IIEF (https:// eprov ide. mapi-trust. org/ instr uments/ inter natio nal-index-of-erect ile-funct ion). The PROMIS Sexual Function and Satisfaction Profile measures for women and men were developed more recently and have been validated to at least some extent in cancer patients, but not yet in people with diabetes, and they have so far been used less often [152][153][154].

Cognitive function
The PROMIS Cognitive Function and Cognitive Function-Abilities item banks and derivative short forms and CAT have recently been developed as part of the PROMIS system and have been validated to some extent [155,156].

Participation in social roles and activities
The SF-36 subscales Physical role functioning and Emotional role functioning are widely used, and have been validated in people with diabetes (e.g. Huang et al [51], Ahroni and Boyko [52], and Martin et al [122]). The WHODAS 2.0 is a generic instrument covering several domains of function and participation. The subscale Participation measure joining in community activities. The WHODAS 2.0 is one of the most widely validated measures of participation [157] and has been used in several large population studies (e.g. Alonso et al [54] and Thorpe et al [55]). It has not been validated in people with diabetes. More information on the WHODAS 2.0 and available language versions can be found on the WHO website (www. who. int/ stand ards/ class ifica tions/ inter natio nal-class ifica tion-of-funct ioning-disab ility-and-health/ who-disab ility-asses sment-sched ule). The PROMIS Ability to Participate in Social Roles and Activities and PROMIS Satisfaction with Social Roles and Activities item banks and derivative short forms and CAT have been validated in large general population samples and we found them to be unidimensional [158,159]. Evidence for construct validity, test-retest reliability and responsiveness of the PROMIS Ability to Participate in Social Roles and Activities CAT was found in Dutch people with diabetes (F. Rutters, unpublished results). The PROMIS Ability to Participate in Social Roles and Activities short forms are part of the commonly used PROMIS-29, PROMIS-43 and PROMIS-57 [108] (see above), which have been used (but not validated) in people with diabetes in clinical practice and research, respectively [65,[115][116][117].

Perceived overall Health
The first item of the SF-36 (see above) refers to perceived overall health. This item was adopted by PROMIS (PROMIS Global01) as part of the PROMIS Global Health [69]. The PROMIS Global Health has been used (but not validated) in people with diabetes [115,120,121].
physical function, pain interference, fatigue, sleep disturbance, anxiety, depression and ability to participate in social roles and activities in 314 people with type 2 diabetes (F. Rutters, unpublished results). Finally, there are also high-quality generic PROMs available that measure only one PRO, such as the Functional Assessment of Chronic Illness Therapy (FACIT)-Fatigue Scale, or two PROs, such as the Hospital Anxiety and Depression Scale (HADS) for anxiety and depression, that could be considered. A description of relevant SF-36 and WHODAS subscales, PROMIS measures and some other commonly used generic PROMs that focus on only one or a few PROs and that we consider to have good content validity, is presented in Table 2. A narrative summary of evidence on their measurement properties in general, and any evidence that we could find on the measurement properties in people with diabetes, is presented in Table 2.
We recommend selecting a relevant PROM or a subscale of a PROM from Table 2 for each PRO that one aims to measure in a study or clinical application. The SF-36 and WHODAS 2.0 do not need to be administered in total and scales from different PROMs can be mixed based on preferences for a specific context of use. The PROMIS measures are attractive because they take advantage of the modern psychometric technique of IRT, which makes them precise, patient-friendly and short, and they allow for comparisons between disease groups, including those without diabetes and another chronic condition. Another advantage is that these scales are unidimensional, in contrast to several of the other measures mentioned in Table 2. Unidimensional scales measure only one construct, and scores are therefore easier and more valid to interpret.
The generic PROM scales do not assess diabetes-specific constructs such as diabetes distress, and for many studies it can be important to add disease-specific PROMs that measure diabetes-specific symptoms and other relevant diabetes-specific PROs, such as diabetes distress. A combination of disease-specific PROMs for measuring disease-specific symptoms and generic PROMs for measuring general symptoms, functioning and perceived overall health, seems most useful.

Future: where should we be going?
There is a need for further standardisation of PROs and PROMs in the field of diabetes. We recommend researchers and clinicians consider measuring disease-specific symptoms, general symptoms, functional status and general health perceptions. We recommend further validation of diabetes-specific PROMs that have sufficient content validity for measuring diabetes-specific symptoms and diabetes distress. In addition, we recommend using generic PROMs for measuring commonly relevant PROs. In particular, the use of item banks and CAT, such as those of the PROMIS system, offer many potential benefits for measuring commonly relevant PROs. The main advantages are efficient measurement with minimal number of items yet providing reliable scores; flexible measurement because items can be used interchangeably; and precise measurement due to low measurement error. It is also possible to convert scores of many traditional PROMs to the corresponding PROMIS metric (see for example Bingham et al [62]). PROMIS is rapidly being adopted and used across diseases and countries [63]. Koh et al confirmed that PROMIS might provide a generic solution to measure PROs in the field of diabetes. PROMIS covered five of six themes, 15 of 30 subthemes and 19 of 35 codes that were identified by people living with diabetes as important [28].
PROMs are not yet routinely used in the field of diabetes. A systematic review showed a sparse use of PROMs to assess depressive symptoms and distress during routine clinical care in adults with type 2 diabetes [64]. Scholle et al [65] were the first to study the effect of implementing the PROMIS-29 in routine care for people with diabetes. They reported some challenges understanding the PROMIS scales, but also saw the PROM process as an opportunity to increase their engagement in the treatment and management of their diabetes [65]. Preliminary qualitative data from our group showed that Dutch people living with type 2 diabetes found PROMIS CATs acceptable and indicated they could be an efficient way to start the conversation with a healthcare provider as well as provide people with diabetes with more confidence (F. Rutters, unpublished results). However, participants all felt that 'questionnaires should never replace personal consultations with the physician' (F. Rutters, unpublished results). To support healthcare providers with the selection and implementation of PROs and PROMs in clinical practice, several practical guidelines exist (e.g. van der Wees et al [66] and Aaronson et al [67]).
The COS developed for clinical trials in people with type 2 diabetes recommended core outcomes but not yet core outcome measurement instruments. This paper suggests the Impact of Weight on Activities of Daily Living questionnaire (IWADL), SF-36 subscale physical functioning and particularly PROMIS Physical Function measures for measuring the core outcome 'activities of daily living'. The core outcome 'global QOL' is not considered a PRO, but nevertheless relevant to measure, for example, with the WHO well-being index (WHO-5, a short selfreported measure of current mental well-being [68]) or the PROMIS Global02 item (a single item addressing overall QOL, included in the PROMIS Global Health [69]). However, consensus among people with diabetes and healthcare providers is needed before making a final recommendation.
A limitation of this study is that no people with diabetes were involved. Our recommendations are based on literature and our own experiences as researchers with different backgrounds and clinicians. Second, this is not a systematic review of all disease-specific and generic PROs and PROMs that could be used in people with diabetes, and their measurement properties. Additionally, our review focusses predominantly on PROMs for adults with diabetes. However, we hope this paper provides sufficient evidence and recommendations to improve the current state of PROs and PROMs use in the field of diabetes, to improve healthcare and ultimately, improve the QOL of people living with diabetes.