figure b

Introduction

Both the experience and the risk of hypoglycaemia can have a serious negative impact on the quality of life (QoL) of adults with diabetes [1,2,3,4,5,6,7,8]. Living a life of quality is perhaps the ultimate goal, so protecting QoL is a daily burden for people experiencing or at risk of hypoglycaemia, and one that can be contradictory to the goals of medical therapy [8]. This may particularly be the case in those who aim for very tight glucose targets. The extent of this impact on QoL can be assessed using patient-reported outcome measures (PROMs). PROMs are questionnaires that can be used in both research and/or clinical care. PROMs complement objective data (e.g. actual blood glucose levels) by capturing the individual’s experiences in a quantifiable and standardised manner, across a range of concepts, e.g. health-related QoL, satisfaction with treatment or emotional well-being [9, 10]. When applied to the study of hypoglycaemia in diabetes, PROMs can facilitate an assessment of the psychological and economic burden of hypoglycaemia, which can be used to determine the value of therapeutic approaches to reducing hypoglycaemia frequency and severity.

Given the large number of PROMs available, it can be challenging to determine which PROM(s) to select for a given clinical or research purpose. Factors such as response burden (e.g. mode of administration, number of items [questions]), type of PROM (generic or condition-specific) and the purpose of the data collection will influence choice. However, a more fundamental issue is whether the PROM has been evaluated as ‘fit for purpose’. This evaluation should include assessment of three overall domains (validity, reliability and responsiveness), for which consensus-based standards (COnsensus-based Standards for the selection of health Measurement Instruments [COSMIN]) can be applied [11]. The COSMIN methodology and standards derive from widespread international expert consensus [11, 12] and have been applied to other PROM measures [13,14,15,16,17], but not yet to the assessment of the impact of hypoglycaemia on QoL.

QoL is highly subjective and has been defined in many ways and most people, intuitively, have an understanding of what it means to them [18]. Perhaps the simplest definition is that QoL is a personal evaluation of how good or bad one’s life is [19]. For the purpose of this review, and consistent with the general consensus [9], we operationalised QoL as: (1) a multidimensional construct including components such as physical well-being (e.g. pain/discomfort, mobility, fatigue), psychological well-being (e.g. mood, fear, confidence) and social well-being (e.g. stigma, participation) [20]; (2) a subjective construct based on feelings, values, experiences and priorities (therefore, we do not include objective measures, or purely functional performance or assessment instruments); and (3) a dynamic construct, which changes over time according to the person’s priorities, experiences and situation.

The objectives of this review were to: (1) identify PROMs used to assess the impact of hypoglycaemia on QoL in adults with diabetes; and (2) formally evaluate their content validity, structural validity and other measurement properties. Our intention was to provide researchers and clinicians with a robust evidence base to assist them when selecting PROMs for this purpose. The review was undertaken as part of the Hypoglycaemia REdefining SOLutions for better liVEs (Hypo-RESOLVE) project, an international collaboration of clinicians, scientists, industry partners and people with diabetes [21].

Methods

We used the updated COSMIN guidance [12, 22,23,24].

Data sources and searches

A protocol was developed and registered with PROSPERO [25]. A systematic literature search was conducted during 26–28 November 2018 to identify published evidence around the four concepts of: (diabetes) and (hypoglycaemia) and (psychosocial outcomes) and (measurement properties of measurement instruments). Databases searched include MEDLINE, EMBASE, PsycINFO, CINAHL and The Cochrane Library. Terms for psychosocial outcomes were chosen to include both generic, ‘umbrella’ terms for ‘quality and life’ and ‘well-being’ (sourced from published search filters) and specific psychosocial outcomes of diabetes known to the Hypo-RESOLVE team (e.g. fear of hypoglycaemia). In order to identify studies for the present systematic review, a validated search filter devised for retrieving studies on measurement properties of instruments in PubMed was used [26]. An example search strategy is shown in the electronic supplementary material (ESM) Methods.

Study selection

Inclusion criteria consisted of any study design that included the primary development and/or validation of a hypoglycaemia-specific PROM used to assess the impact of hypoglycaemia on QoL in adults diagnosed with diabetes with any type, e.g. type 1, type 2 and gestational, and who have experienced hypoglycaemia. Studies of hypoglycaemia/hypoglycaemic episodes not associated with diabetes were excluded. Commentaries, reviews, opinion pieces and any other non-empirical work were also excluded. Studies were assessed for inclusion at title and abstract stage by one reviewer (JL). Full-text articles were scrutinised where considered as relevant or potentially relevant or where doubt existed. Twenty per cent of studies were assessed by a second reviewer (JC) to check for consistency. Disagreements were resolved through discussion.

Data extraction

Data extraction included study characteristics (e.g. language; participant characteristics; recall period; analysis model), a brief summary of results and measurement properties of the PROMs. Primary outcomes included measurement properties of identified PROMs, consistent with the COSMIN checklist: PROM development; content validity; structural validity; internal consistency; cross-cultural validity/measurement invariance; reliability; measurement error; criterion validity; hypothesis testing for construct validity; and responsiveness. Definitions of the measurement properties are detailed in Table 1. In accordance with COSMIN guidelines, all data relating to PROM measurement properties were extracted independently by two reviewers (JL and JC) against the respective COSMIN criteria. Discrepancies were resolved through discussion.

Table 1 Definitions of measurement properties

Content validity assessment

Content validity is the extent to which a PROM is deemed to reflect the construct of interest and, arguably, the most fundamental aspect of scale selection [27]. The methodological quality of the PROM development studies and other studies supplementing content validity were assessed using COSMIN standards [28]. The assessment involves three steps (see Fig. 1): (1) evaluation of the quality of the PROM development; (2) evaluation of the quality of any additional content validity studies on the PROM (if available); and (3) evaluation of the content validity of the PROM based on the quality and results of the available studies and the PROM itself. Steps 1 and 2 result in a rating of each COSMIN standard ranked on a four-point scale: ‘very good’, ‘adequate’, ‘doubtful’ and ‘inadequate’. Total ratings are then determined using the lowest rating for any item for that study (i.e. worst score counts) [22].

Fig. 1
figure 1

COSMIN assessment of content validity

Step 3 consists of three sub-stages. Step 3a incorporates reviewer ratings of the identified PROMs whereby reviewers consider relevance, comprehensiveness and comprehensibility. We sought ratings from three key stakeholder groups: (1) researchers (including those with expertise in systematic reviewing, QoL research and psychological aspects of diabetes) (n = 6); (2) clinicians (n = 6); and (3) adults with diabetes (n = 4), including two representatives of the Hypo-RESOLVE Patient Advisory Committee (PAC). All reviewers provided independent ratings of the PROMs based on several criteria: (1) the construct of interest (i.e. does the PROM include items that are relevant in measuring the impact of hypoglycaemia on QoL?); (2) the population of interest; (3) the context of use of interest (i.e. is the PROM suitable for use in research and/or clinical practice?); (4) the appropriateness of response options; (5) the appropriateness of the recall period; (6) the comprehensiveness (i.e. does the PROM assess the impact of hypoglycaemia on QoL as a whole, or only on select domains of QoL?); (7) the suitability/clarity of the PROM instructions; (8) whether PROM items and response options are understandable; (9) the appropriateness of PROM item wording; and (10) the extent to which response options are appropriate to the question being asked. A majority rating was determined for each group (researcher, clinician and PAC). The group ratings were then consolidated to produce an overall reviewer rating for each PROM. Table 2 details how relevance, comprehensiveness and comprehensibility were assessed.

Table 2 COSMIN criteria and rating system for evaluating the content validity of the PROMs (adapted from Terwee et al [28]), with an example shown in italics

Step 3b involves summarising the results of all available studies to provide an overall rating of relevance, comprehensiveness and comprehensibility and an overall content validity rating. This results in an outcome of ‘sufficient’, ‘insufficient’, ‘inconsistent’ or ‘indeterminate’. Finally, in Step 3c, the overall ratings determined in Step 3b are accompanied by a grading of the quality of the evidence using a modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach [29]. Using the modified GRADE approach, the quality of evidence is graded as ‘high’, ‘moderate’, ‘low’ or ‘very low’. The GRADE approach uses five factors to consider the quality of the evidence: risk of bias, inconsistency, indirectness, imprecision and publication bias [29]. Detailed information of the rating process is reported elsewhere [28]. The resultant evaluation of content validity includes an overall rating of: + (‘satisfactory’); − (‘unsatisfactory’); ± (‘inconsistent’); or ? (‘indeterminate’), with a measure of the quality of the evidence to support the content validity rating (‘high’, ‘moderate’, ‘low’, ‘very low’). A worked example of content validity rating and scoring is shown in Table 2. Detailed information on the COSMIN methodology applied is reported elsewhere [28].

Assessment of other psychometric properties

Table 1 defines each of the psychometric properties assessed. As above, a COSMIN rating was determined by assessment across the criteria for measurement properties using the same rating scale (‘sufficient’, ‘insufficient’, ‘inconsistent’ or ‘indeterminate’). The assessment of the quality of the evidence was applied using the GRADE approach. This results in a rating of: + (‘satisfactory’); − (‘unsatisfactory’); ± (‘inconsistent’); or ? (‘indeterminate’), with a measure of the quality of the evidence to support the structural validity rating (‘high’, ‘moderate’, ‘low’, ‘very low’). Full information on the COSMIN methodology applied in this review is reported elsewhere [23].

Quality assurance of the review

The quality of this review was assessed against a COSMIN checklist that was designed to evaluate the quality of systematic reviews of PROMs [30] (ESM Table 1).

Results

The search returned a total of 3661 unique records, from which 214 PROMs were identified as used in studies to assess the impact of hypoglycaemia on QoL or subdomains of QoL (Fig. 2, Table 3). Of these, 17 PROMs were initially identified as hypoglycaemia-specific and for consideration in this review, and nine were subsequently excluded following further scrutiny of the instruments. PROMs were excluded if they were: hypoglycaemia symptom measures that assessed attitudes, awareness and/or attitudes to awareness of symptoms (n = 3); related to specific treatments (n = 2); only a subscale of an overall PROM (n = 2); or not available for full inspection (n = 2). Consequently, the current review includes eight hypoglycaemia-specific PROMs that have been used to assess the impact of hypoglycaemia on QoL or at least one aspect of QoL: the Fear of Hypoglycemia 15-item scale (FH-15); the Hypoglycemia Fear Survey (HFS); the Hypoglycemia Fear Survey version II (HFS-II); the HFS-II short-form; the Hypoglycemic Attitudes and Behavior Scale (HABS); the Hypoglycemic Confidence Scale (HCS); the QoLHYPO questionnaire and the Treatment-Related Impact Measure-Non-severe Hypoglycemic Events (TRIM-HYPO) (Table 4).

Fig. 2
figure 2

PRISMA 2009 Flow Diagram: hypoglycaemia-specific PROMs used to assess the impact of hypoglycaemia on QoL

Table 3 Type and number of PROMs identified in title and abstract sift
Table 4 PROMs identified that have been used to assess the impact of hypoglycaemia on QoL (or its subdomains) in people with diabetes

Overall COSMIN assessment of PROMs

The overall results of the COSMIN assessment are shown in Table 5. There are considerable evidence gaps for the measurement properties of most of the PROMs. The HFS-II, QoLHYPO and TRIM-HYPO were the only instruments that could be rated across all the measurement properties.

Table 5 Summary of psychometric properties of hypoglycaemia-specific PROMs used to assess the impact of hypoglycaemia on QoL

Content validity

ESM Table 2 summarises the key characteristics and COSMIN quality assessment of the PROM development studies. For five of the seven PROMs, there was evidence that adults with diabetes were involved in item generation (HFS, HABS, HCS, QoLHYPO and TRIM-HYPO). COSMIN quality ratings ranged from ‘inadequate’ (HFS, HABS and QoLHYPO), to ‘doubtful’ (HFS-II, HCS and TRIM-HYPO), to ‘very good’ (FH-15). The developers of the HFS-II short-form do not report on content validity, due to the scale being developed based on existing items in the HFS-II [31].

ESM Table 3 details characteristics of the PROM development studies. The overall quality of the PROM development studies was classified as ‘very good’ (FH-15), ‘inadequate’ (HFS, HABS and QoLHYPO) or ‘doubtful’ (HFS-II, HCS and TRIM-HYPO). Only five of the PROMs provided evidence of concept elicitation (all of which were of ‘doubtful’ or ‘inadequate’ quality) (HFS, HABS, HCS, QoLHYPO and TRIM-HYPO). The COSMIN rating for the PROM design ranged from ‘inadequate’ (HFS, HABS and QoLHYPO), to ‘doubtful’ (HFS-II, HCS and TRIM-HYPO), to ‘very good’ (FH-15). Three of the PROMs (HFS, QoLHYPO and TRIM-HYPO) reported on content validity. During the development of the HFS, health professionals were asked about the relevance and comprehensiveness of the PROM (‘doubtful’ COSMIN quality rating) [32]. For the QoLHYPO, adults with diabetes were asked about the comprehensibility, but not relevance, of the PROM (‘doubtful’ COSMIN quality rating) [33]. During the development of the TRIM-HYPO, adults with diabetes were asked about the comprehensibility and relevance of the PROM, but were not asked about comprehensiveness of the PROM ('doubtful' COSMIN quality rating) [34]. Aside from the development studies, no further studies were identified that independently assessed the content validity of the PROMs.

ESM Table 4 details the consensus ratings for the three groups of reviewers (researchers, clinicians, people living with diabetes), and an overall reviewer consensus rating for each PROM. FH-15 had an overall reviewer rating of ‘sufficient’; HFS-II, HABS, HCS and TRIM-HYPO were rated as ‘inconsistent’. For two of the PROMs (HFS and QoLHYPO), relevance, comprehensiveness and comprehensibility ratings resulted in a combination whereby COSMIN guidance is not explicit, and, thus, an overall rating could not be applied [28].

Structural validity

Twelve studies assessed the structural validity of the PROMs, all of which were reported in the development papers (ESM Table 5). No independent assessments of the structural validity were identified. Four studies examined the structural validity of a cultural adaptation/language translation of the HFS [35,36,37,38]. A further study assessed the structural validity of the short-form of HFS-II [31]. COSMIN quality ratings of the HFS-Norwegian, HFS-Singapore and HFS short-form were ‘very good’ and ratings were ‘adequate’ for the remaining PROMs. The same principles as noted above were applied to assess the quality of the evidence for these instruments. The quality of evidence for the HFS-Norwegian, HFS-Singapore and HFS-II short-form instruments was assessed as ‘high’. The HFS-Spanish, HFS-Swedish and TRIM-HYPO instruments were assessed as ‘moderate’. Many of the studies reported exploratory factor analysis (EFA) (rather than the confirmatory factor analysis required to receive a ‘satisfactory’ rating). Those studies reporting confirmatory factor analysis (language versions of the HFS) did so to examine whether the expected two-factor structure (observed for the original HFS) fitted their dataset. However, they all rejected this a priori-defined structure, and therefore went on to explore the latent structure of the tool using EFA.

Internal consistency reliability

Thirteen studies were identified that reported evidence of the internal consistency of the PROMs [31,32,34, 36,36,37,38,39,40,41,43]. Some were undertaken by the instrument developers and some were independent assessments (ESM Table 6). Most studies [32, 33, 36, 39, 40, 42, 43] had an ‘adequate’ COSMIN quality rating. Five studies had a ‘very good’ COSMIN quality rating [31, 34, 37, 38, 41].

Reliability (test–retest)

Seven studies were identified that assessed the test–retest reliability of a PROM measure. Four of the studies were conducted by the instrument developers (FH-15, HFS-II, QoLHYPO and TRIM-HYPO). The remaining studies were assessments of the language versions of the HFS instrument (ESM Table 7). Four studies had an ‘adequate’ COSMIN quality rating [32, 33, 36, 40]. Two studies had a ‘very good’ COSMIN quality rating [37, 38]. One study had a 'doubtful' COSMIN quality rating [34].

Hypothesis testing for construct validity

Ten studies reported on hypothesis testing for construct validity (ESM Table 8) [31, 33,34,36, 38,38,40, 42, 43]. Of these, nine were comparing with other outcome measurement instruments (convergent validity) [31, 33,34,36, 38, 40, 42, 43]. These were HFS-II, HFS-Spanish, HFS-Singapore, HFS-Sweden, HFS-II short-form, HABS, HCS, QoLHYPO and TRIM-HYPO. Six studies included comparisons between subgroups (discriminative or known-groups validity) [34, 38,38,40, 42, 43]. These were FH-15, HFS-II, HFS-Singapore, HABS, HCS and TRIM-HYPO instruments.

Other psychometric properties

No studies were found to demonstrate evidence for cross-cultural validity, measurement error, criterion validity or responsiveness.

Discussion

This systematic review has summarised and critically evaluated published evidence on the psychometric characteristics of PROMs used to assess the impact of hypoglycaemia on QoL in adults with diabetes using COSMIN methodology. Our intention was to provide an evidence base that would help researchers and clinicians when selecting PROMs, based on the robust and comprehensive consensus-based COSMIN criteria. We identified eight PROMs that had been developed to assess the subjective impact of hypoglycaemia on QoL or a subdomain of QoL.

None of the PROMs included in this review had a ‘high’ rating for content validity (in relation to assessing the impact of hypoglycaemia on QoL), which is arguably the most important measurement property of a PROM [28, 44]. All had ‘inconsistent’ COSMIN ratings for content validity, but the quality of the evidence to support those ratings was greater for the HFS and QoLHYPO. To that end, there is some support to recommend the use of HFS and QoLHYPO instruments in research studies and/or clinical practice. However, it is important to acknowledge the conceptual framework from which these two instruments were developed, and how this diverges from our operationalisation of the concept of QoL (i.e. multidimensional, subjective and changing over time). The HFS was developed to measure fear of hypoglycaemia through two subscales—behaviour and worry. Fear is arguably a very specific aspect of the psychological subdomain of QoL. Furthermore, the developers were not explicit in describing the target population for the instrument (i.e. their sample included people with ‘insulin-dependent’ diabetes, but it is unclear whether this included people type 1 and/or type 2 diabetes, and whether it is also applicable to people who manage their diabetes without insulin but experience hypoglycaemia). While the content of the QoLHYPO instrument includes items that assess various domains of QoL (e.g. social relationships, mood, daily activities), it was designed for use only by people with type 2 diabetes. Furthermore, there have been no translations beyond the original Spanish version. Consequently, the format and layout of the QoLHYPO is not clear for English-speaking researchers, and the developers provide no information on domains. Further investigation would be required to determine the suitability of the QoLHYPO instrument in measuring the impact of hypoglycaemia in people with type 1 diabetes and in other language groups.

We have included details of psychometric properties of the PROMs identified as part of the original literature search. However, it is plausible that additional papers have also reported psychometric properties for one or more of the included PROMs (particularly in intervention studies). To that end, the information on measurement properties reported here should not be considered exhaustive. We did not adopt the approach taken by (some of) the PROM authors to consider HbA1c as the ‘gold standard’ in the assessment of criterion validity and criterion approach to responsiveness. Studies have shown that HbA1c it is not a reliable indicator of whether an individual experiences hypoglycaemia [45, 46], nor a surrogate for QoL [47], nor of the impact or burden of hypoglycaemia. Advances in glucose monitoring technologies are continually changing our understanding of diabetes and are contributing to a better understanding of the lived experience of diabetes and hypoglycaemia. Consequently, it may be appropriate in future studies to consider ‘time in range’ or ‘time in hypoglycaemia’ as a marker for the impact of hypoglycaemia on QoL—but the extent to which this will reflect the subjective experience has yet to be elucidated. In the absence of an agreed ‘gold standard’, it is not possible to determine the assessment of any criterion validity or criterion approach to responsiveness for any PROM.

In this systematic review, we followed the robust and comprehensive guidance developed by the COSMIN initiative [23, 28]. However, it is not without its limitations. The assessment of content validity and psychometric performance of PROMs is determined by taking the lowest rating of any standard in the criteria (i.e. the ‘worst score counts’ principle) [22, 28]. This means that a study could be rated as ‘very good’ or ‘good’ on all but one criterion; however, the overall rating could be affected by a ‘doubtful’ or ‘inadequate’ rating, thus reducing the overall score to ‘doubtful’ (or ‘inadequate’). The omission of one key component in reporting (such as whether interviews were recorded and transcribed verbatim) can result in a lower overall content validity rating, which could be argued as overly harsh and should be recognised as a limitation of the COSMIN approach. Where appropriate within this review, we consistently rated in favour of the PROM (rather than assuming the worst). Another limitation of the COSMIN approach was identified in the guidance for determining content validity ratings of studies. Here we noted that there was no information on how to determine overall content validity rating with the combinations achieved. We have documented our approach; however, if the review was to be replicated, others may opt to ‘down-grade’ the overall content validity rating. Furthermore, as part of the content validity assessment, we sought to include the opinion of stakeholders. The COSMIN guidance does not advise on how to ratify ratings should there be conflicting opinions between or within stakeholder groups.

It should be noted that some of the PROMs included within this review are legacy or ‘first generation’ measures; that is, they were developed at a time when there were no international standards for instrument development methods, so these were either not reported, or reported selectively or in little detail. Similarly, the way in which PROMs are developed has changed over time [27]. It is now more common to report the methodological steps undertaken during the instrument development phase. The COSMIN ratings should therefore be interpreted with a degree of caution, and do not provide evidence that the instrument development was not rigorous or that the instruments are not ‘fit for purpose’, but rather expose an absence of key evidence.

While there is published evidence of studies that report hypoglycaemia to negatively impact upon QoL [1,2,3,4,5,6,7,8], we have identified that those that utilise hypoglycaemia-specific PROMs have inadequate reliability and validity for this specific purpose. Thus, the current literature on the impact of hypoglycaemia on QoL is limited (if not flawed) and needs to be interpreted with caution. Given that the content validity of the instruments was lacking, it is plausible that hypoglycaemia impacts individuals in ways that are currently not being measured. It may be that the items within the instruments are no longer relevant (e.g. due to changes in diabetes treatments, monitoring, society, language use), or that the items are not comprehensive enough to fully capture the ways in which hypoglycaemia affects adults in the modern world.

In conclusion, none of the PROMs identified had sufficient evidence to demonstrate satisfactory content validity, i.e. they do not assess the impact of hypoglycaemia on QoL in adults living with diabetes. Furthermore, most were also limited in their published evidence of reliability, validity and responsiveness. There is an urgent need to follow contemporary guidance [27, 48,48,50] to develop new instruments that can assess the impact of hypoglycaemia on QoL.