FormalPara Key Points for Decision Makers

Multiple quality-of-life instruments have been used in economic assessments of sleep interventions, highlighting the importance of understanding their content and covered concepts.

Of the 16 instruments identified in this review, the Sleep Apnea Quality of Life Index (SAQLI) and the 30-item Functional Outcomes of Sleep Questionnaire (FOSQ-30) along with five non-sleep instruments (15 dimensions [15D], Short Form 6-Dimensions [SF-6D], 12-item Short Form Survey [SF-12], 36-item Short Form Survey [SF-36] and the GRID Hamilton Rating Scale for Depression [GRID-HAMD]) had the broadest content coverage based on the International Classification of Functioning, Disability and Health framework.

Choosing the appropriate instrument should factor in both quality-of-life coverage and the specific sleep disorder under consideration. For evaluating body functions, the 15D and SAQLI (for obstructive sleep apnoea) or GRID-HAMD (for insomnia) are recommended (cost-effectiveness analysis and cost-utility analysis). When focusing on activities and participation, combinations such as 15D or SF-6D with 10-item FOSQ (FOSQ-10), Epworth Sleepiness Scale or GRID-HAMD are suggested (cost-effectiveness analysis and cost-utility analysis). For utility measurement, especially in guiding resource allocation across various healthcare settings or sleep disorders (cost-utility analysis only), the 15D and SF-6D are recommended choices.

1 Introduction

Sleep disorders are a major and under-recognised public health issue with a substantial clinical and economic burden on individuals and society [1,2,3,4,5,6,7,8,9,10]. The International Classification of Sleep Disorders version 3 records more than 50 clinically diagnosable sleep disorders [11], with obstructive sleep apnoea (OSA) and insomnia the two most common sleep disorders in the general population [12]. Obstructive sleep apnoea is a sleep-breathing disorder characterised by abnormal breathing reductions (hypopnea) or cessation of airflow (apnoea) during sleep, caused by intermittent partial or complete upper airway obstruction. These lead to blood gas disturbances, cardiovascular system stress, and frequent cortical arousals that fragment sleep. These physiological sequelae can cause pathological sleepiness and negatively impact daytime function, health, and safety [11, 13]. Insomnia is another complex sleep disorder characterised by self-reported difficulties initiating sleep, maintaining sleep, and/or undesired early morning awakenings from sleep with associated daytime impairment [11, 13]. Estimates of the regional and global prevalence of insomnia and OSA vary from 4 to 23% [12, 14,15,16,17] and 9 to 38% [18,19,20], respectively. Sleep disorder impacts vary according to the nature of the underlying sleep problems, but can include pathological daytime sleepiness that increases traffic and workplace accident risks, reduced mental and physical health, productivity, and well-being, and cardiovascular sequelae, including increased risks of hypertension, myocardial infarction, stroke, and premature mortality [20,21,22,23,24,25].

In 2013, annual cost estimates of sleep disorder impacts on communities were estimated at $680 billion across five Organisation for Economic Co-operation and Development countries (USA, Germany, UK, Japan and Canada) [26] and $5.1 billion in Australia [5]. The annual social and economic cost of sleep disorders in Australia was estimated at $35.4 billion in 2021 [27], albeit down from $45.21 billion in 2017 [28]. Sleep tests and sleep disorder interventions (e.g. sleep tests and treatments) are associated with significant healthcare costs, but societal costs of untreated sleep problems are also very high and negative quality-of-life (QoL) impacts are prominent [27].

Given the high prevalence and the significant societal burden potentially attributable to sleep disorders, it is essential to ascertain QoL impacts. Furthermore, healthcare systems worldwide are confronted with perpetually increasing healthcare expenditures, with health spending as a share of gross domestic product across Organisation for Economic Co-operation and Development countries rising from 8.8% in 2019 to 9.7% in 2020 and up by 6% in 2021 [29]. Therefore, robust valid cost-effectiveness evidence is necessary to inform decision making around the allocation of limited healthcare resources among competing health interventions. Accordingly, a comprehensive approach to managing symptomatic or at-risk people with sleep disorders and the evaluation of novel interventions and models of care need to carefully consider how these interventions can improve QoL and clinical outcomes.

The QoL of an individual can be influenced by several factors, including but not limited to the individual’s perspective of the disease and their accompanying coping mechanisms, emotional and psychosocial well-being, independence, material welfare, and the external environment predisposing individuals’ activity and development [30]. To provide a reliable estimate of the cost effectiveness of an intervention that can improve QoL in people with sleep disorders, it is imperative to ascertain the best instrument to comprehensively estimate QoL, particularly in its application within economic evaluations. Instruments to measure QoL can be preference or non-preference based. The former is generated using preferences of the general population sample elicited using one or more valuation methods. For example, a visual analogue scale, time trade-off, discrete choice experiment and standard gamble [31, 32]. Preference-based instruments are widely used in a cost-utility analysis (CUA), a type of economic evaluation where the primary QoL instrument is usually quality-adjusted life-years (QALYs) [32]. Non-preference-based instruments are inappropriate for a CUA because they lack the algorithm for calculating QALYs. However, non-preference-based instruments can still be used in a cost-effectiveness analysis (CEA), where the outcome of relevance can be natural units such as life-years gained, cases detected, events prevented, or indeed non-preference-based QoL [32].

Several instruments have been used to measure QoL within sleep disorders research, including generic instruments such as the EuroQol 5-Dimensions suite of measures (5-level or EQ-5D-5L and 3-level or EQ-5D-3L) and the Short Form surveys (6-Dimension or SF-6D and 36-item or SF-36), and sleep-specific instruments such as the Epworth Sleepiness Scale (ESS) to estimate perceived sleepiness in different daily living situations and the Insomnia Severity Index (ISI) to estimate the likelihood of clinical insomnia and its daytime impacts [33, 34]. However, there is still debate about the most appropriate instruments to measure QoL, specifically in sleep health research [33, 35]. Different QoL instruments can lead to varying conclusions about an intervention conducted in the same population [35]. Instruments such as the ESS and ISI are also predominantly used as diagnostic tools and not strictly QoL instruments. Further, sleep (and circadian) factors strongly influence many aspects of daily mental and physical performance and well-being, thus broad QoL impacts from sleep disorders should be anticipated [36,37,38,39,40,41]. Hence, it is vital to establish an appropriately sensitive, specific, reproducible, and standardised approach to measure QoL as an outcome of treatment that can be applied widely within the economic evaluation framework. To achieve this, it is crucial to clearly differentiate between two key concepts: sensitivity to changes in specific diseases and sensitivity to changes in overall QoL and health-related QoL (HRQoL). While the former is primarily essential for evaluating the clinical effectiveness of treatments for sleep disorders, the latter is required for economic analyses such as CEAs and CUAs. This distinction underscores the need for careful consideration when selecting QoL instruments for economic evaluations in sleep research. The chosen instruments must be sensitive to changes in both specific sleep disorder symptoms and overall QoL and HRQoL to provide valuable insights for both clinical and economic decision making. A preliminary search of PROSPERO, MEDLINE and the Cochrane Database of Systematic Reviews was performed, and no current or ongoing systematic reviews on the topic were identified.

Therefore, this paper sought to identify QoL instruments that have been used in economic evaluations of interventions used within sleep health studies in various contexts. It outlines the methods and results that identify instruments used to measure QoL in individuals suspected of having or suffering from sleep disorders within the economic evaluation framework. The paper also compares the domains and dimensions of these instruments in terms of their content coverage [42] and the conceptual overlap based on the International Classification of Functioning, Disability and Health (ICF) Core Set framework [43]. The ICF was selected as it is the most extensive attempt to classify health concepts within a biopsychosocial model of health, function and disability [43]. The findings will provide evidence-based information for researchers to determine the most suitable outcome measurement approach for the economic evaluation of sleep disorders.

1.1 Review Objectives

We aimed to (1) identify the contexts and populations in which QoL instruments have been used in the published economic evaluation literature in sleep health research and (2) to compare the content of QoL instruments by linking them to meaningful concepts within the ICF framework [43].

2 Methods

The protocol for this review was registered with the International Prospective Register of Systematic Reviews (PROSPERO), registration number CRD42023399598 and International Platform of Registered Systematic Review and Meta-analysis Protocols (INPLASY), registration number INPLASY202350068. This review followed the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) 2020 guidelines for systematic reviews [44, 45]. The PRISMA checklists for the main text and abstracts are provided in Appendices 1 and 2 of the Electronic Supplementary Material (ESM).

2.1 Search Strategy

An initial search was limited to MEDLINE, the National Health Service Economic Evaluation Database and the Cost-Effectiveness Analysis Registry to identify articles on the topic. The text words used in the titles and abstracts and the index terms used to describe the articles were used to develop a full search strategy in MEDLINE, PsychInfo, ProQuest, Cochrane, Scopus, CINAHL, Web of Science and Emcare (Appendix 3 of the ESM). The search strategy was adapted for each database and/or information source and was last used on 30 May, 2023.

2.2 Study Selection

Studies that met the following inclusion criteria were considered: (1) measured QoL and/or HRQoL as the primary or secondary measure of effectiveness in the economic evaluation. Health-related QoL was defined as any description of the physical, role function, social, and psychological aspects of well-being and function [46]; (2) used a preference-based generic and/or preference or non-preference-based sleep-specific QoL instrument; (3) study design was a full economic evaluation applied in sleep health research [32], i.e. a CUA, CEA, cost-benefit analysis, cost-minimisation analysis, or cost-consequence analysis; and (4) published in peer-reviewed journals in the English language from conception to 30 May, 2023.

Studies were excluded if: (1) they were not related to a common primary sleep disorder (e.g. insomnia, OSA, and restless leg syndrome); (2) QoL/HRQoL was measured using an instrument specifically designed for the study; or (3) they were published as dissertations, commentaries, conference papers or review articles or studies for which the full-text article could not be obtained.

2.3 Article Screening

All citations identified during the search were imported into EndNote X9 (Clarivate Analytics, Philadelphia, PA, USA) and the Joanna Briggs Institute System for the Unified Management, Assessment and Review of Information (JBI SUMARI) [47]. The reference lists of all included sources of evidence were screened for additional studies. Titles and abstracts, followed by full texts of eligible studies, were screened by two independent reviewers for assessment against inclusion criteria. Reasons for excluding full-text papers that did not meet the inclusion criteria were recorded and reported. Any potential disagreement between reviewers at each stage was resolved through discussion or with a third reviewer. The search and inclusion process results were reported in full in the final systematic review and presented in a PRISMA flow diagram [48] (Fig. 1).

Fig. 1
figure 1

Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) flow diagram of search and study selection process

2.4 Assessment of Methodological Quality

Two independent reviewers assessed the quality of eligible studies against the JBI Critical Appraisal Checklist for Economic Evaluation [49], a standardised critical appraisal instrument (Appendix 4 of the ESM). As economic evaluation studies often employ various cost perspectives and report distinctive health economic measures in different contexts and regions, the European Network of Health Economic Evaluation Databases (EURONHEED) checklist was used to assess further generalisability and transferability of included studies [50] (Appendix 5 of the ESM). The critical appraisal results were reported in a narrative format and tabulated.

2.5 Data Extraction and Synthesis

Data were extracted from studies using a standardised data extraction tool. Extracted data included specific details about the intervention/s and comparator/s examined, study population/participants and context, study methods, results for resource use, and cost and cost-effectiveness measures. The findings were presented in a narrative format, including tables and figures where appropriate.

2.6 Instrument Conceptual Overlap and Content Coverage

The conceptual overlap between these instruments and their content coverage was assessed by comparison of their dimensions using the ICF Core Set framework [43] (Appendices 6 and 7 of the ESM). The ICF has been linked to many patient-reported outcome development efforts, for example, the Patient-Reported Outcomes Measurement Information System (PROMIS) [51, 52]. In this exercise, conducted by BK and TJW (and AN as the tie-breaker), instrument dimensions were divided into three ICF domains: ‘body functions and structures’ (measuring impairments to [i] physiological and psychological functions of body systems and [ii] anatomical parts of the body such as limbs), ‘activities and participation’ (referring to constructs that cover the full range of life areas such as execution of tasks or actions and involvement in life situations) and ‘environmental factors’ (referring to the physical, social, and attitudinal environment in which people live and conduct their lives, which can be either barriers or facilitators to their function) [43]. Each domain was also broken down into chapters and collapsed into categories. Content coverage was expressed as a percentage of the number of ICF chapters mapped onto by each QoL instrument divided by the potential total number of ICF chapters available.

3 Results

3.1 Study Inclusion

Figure 1 displays the study selection based on the PRISMA guidelines [48] and shows that from 7990 database citations and 15 additional references initially identified, 1900 duplicates and 5551 titles not meeting the criteria were excluded, leaving 554 (539 + 15) full-text articles for a further eligibility assessment. Of the 554, only 57 articles met the criteria as full economic evaluations, measuring QoL and/or HRQoL in sleep health research for the final analysis.

3.2 Methodological Quality

The methodological quality for all included studies was considered good to excellent when assessed against the JBI Critical Appraisal Checklist For Economic Evaluation [49] and EURONHEED [50] (Appendices 4 and 5 of the ESM). The average score for all included studies was 89% against the EURONHEED [50] checklist.

3.3 Characteristics of Included Studies

A summary of the characteristics of the 57 included studies is presented in Table 1.

Table 1 Characteristics of included studies

3.3.1 Study Design

The study designs for OSA interventions varied; 30 were CUAs [53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82], four were CEAs [83,84,85,86] and five were cost minimisation analyses [87,88,89,90,91]. In assessments of insomnia interventions, 16 studies were CUAs [35, 92,93,94,95,96,97,98,99,100,101,102], one reported a CEA [101], and another combined a CEA and a cost-benefit analysis [96]. Economic evaluations were most frequently conducted alongside a randomised controlled trial (n = 30) [35, 54, 55, 57,58,59,60,61,62, 66, 82,83,84,85,86,87,88,89,90,91,92,93, 95,96,97, 99,100,101,102,103]. One retrospective case-crossover [64] and one cohort study [63] ran economic evaluations concurrently. Twenty-five studies were model based, 14 using a Markov model [53, 56, 65, 67,68,69,70,71, 73, 75, 77, 78, 80, 98], five using a decision-tree model [74, 104,105,106,107], one using both Markov and decision-tree models [76], one using a semi-Markov model [79] and one using a decision analytic model [81]. Additionally, two studies used randomised controlled trial-based modelling [94, 108], and one used a case-control-based model [72].

3.3.2 Population

Participants’ numbers ranged from 37 [92] to 830 [94], with model-based studies simulating up to 100,000 participants [98]. Most studies had a mean age of 50 years, but some focussed on distinct age groups: four on those aged 65 years and older [61, 62, 105, 106], one on adolescents aged 12–19 years [95] and one on premature infants [81]. Recruitment strategies varied: in insomnia studies, some involved clinically diagnosed patients, others included those with symptoms but no diagnosis, a few focused on self-referred patients for therapy workshops and one targeted undiagnosed individuals seeking treatment [35, 92, 93, 96,97,98,99,100,101,102,103,104,105,106,107,108]. Two insomnia studies involved populations with comorbid conditions, including depression and schizophrenia [92, 97]. In OSA studies, some included clinically confirmed OSA cases, while others recruited newly diagnosed or suspected cases and one focussed on at-risk infants without a formal diagnosis [54,55,56,57, 59, 61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85, 87, 89,90,91, 109].

3.3.3 Geographical Location, Setting and Timeframe

The included studies were conducted mainly in the UK (n = 12) [35, 60,61,62, 65, 66, 70,71,72,73, 93, 97], USA (n = 11) [74, 75, 78, 79, 90, 94, 98, 100, 106, 108, 110] and Spain (n = 10) [53,54,55,56,57,58, 64, 82, 83, 89]. Japan [76, 92, 104, 105] and Canada [68, 77, 80, 101] each had four studies, The Netherlands had three studies [59, 95, 99], and Germany [96, 102] and France [69, 85] had two studies each. Single studies were conducted in Colombia [67], New Zealand [107], South Korea [103], Finland [63], Australia [84] and Hong Kong [91]. Most studies (n = 28) were conducted in clinical settings, 25 in community settings, and one in a workplace environment. The economic evaluations’ time horizons ranged from 1 day [81] to a lifetime [65]. Studies were published between 2003 [56] and 2023 [67], as shown in Table 1.

3.4 QoL Instruments

3.4.1 Frequency of Use

Of the 57 studies included (Table 1), 32 had one type of QoL apiece: 27 generic [53, 56, 59, 63,64,65, 67,68,69, 71, 72, 74,75,76,77, 79,80,81, 93, 94, 97, 98, 103,104,105, 108, 110], four sleep specific [84, 87, 96, 101], and one depression specific [92]). Thirteen studies used a combination of one generic and one sleep-specific instrument [35, 54, 55, 57, 58, 82, 83, 85, 88, 90, 95, 99, 102], nine used two generic instruments each [60,61,62, 66, 70, 73, 78, 106, 107], two had two sleep-specific instruments [89, 91], one utilised one generic and one depression-specific instrument [110], and one used a combination of a generic, a sleep-specific and osteoarthritis-specific tool [100]. Table 2 summarises the frequency of use of specific instruments to measure QoL.

Table 2 Frequency of quality-of-life instruments used in the identified studies

A total of 16 different QoL instruments were used in the 57 economic evaluations. The EQ-5D-3L (n = 24) and the ESS (n = 10) were the most common generic and sleep-specific QoL instruments used, respectively. In the OSA studies, 11 instruments were used. The EQ-5D-3L was the most frequently used (n = 21) followed by ESS (n = 10), SF-6D (n = 6) and EQ-5D-5L (n = 5). Unspecified EQ-5D and SF-36 were used in three studies, while the FOSQ was employed in two studies. The Sleep Apnea Quality of Life Index (SAQLI), Quebec Sleep Questionnaire (QSQ), 15 dimensions quality of life (15D), Health Utilities Index mark II (HUI-2) and the Nottingham Health Profile (NHP) were each used in one study. Fewer instruments (n = 10) were utilised in insomnia studies: the ISI (n = 6), SF-36, EQ-5D-5L (n = 5 each), EQ-5D-3L, SF-6D (n = 3 each) and GRID-HAMD (n = 2) were most common. The Holland Sleep Disorder Questionnaire (HSDQ), unspecified EQ-5D, the short-form 12 dimensions (SF-12), and the Western Ontario and McMaster Universities Arthritis Index (WOMAC) were each used in single insomnia studies. Table 2 shows that the most frequently used instruments had between five and eight questions that took between 2 and 5 minutes to complete. The only exception was the SF-36, which has 36 questions and takes between 10 and 15 minutes to complete.

3.4.2 Descriptions of QoL Instruments

The EQ-5D-3L, EQ-5D-5L, SF-6D, 15D, and HUI3 are preference-based QoL instruments yielding utility scores (where higher scores denote better QoL), subsequently employed in calculating QALYs [31]. The EQ-5D-3L and EQ-5D-5L both measure five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. They are widely used in economic evaluations and generate utility scores that range from − 0.59 to 1 [31]. The SF-6D measures eight dimensions: physical functioning, role limitations due to physical health problems, role limitations due to mental health problems, social functioning, pain, mental health, vitality and general health perceptions. SF-6D utility scores range from 0.301 to 1 [31]. The 15D has 15 dimensions: mobility, vision, hearing, breathing, sleeping, eating, speech, elimination, usual activities, mental function, discomfort and symptoms, depression, distress, vitality, and sexual activity. Utility scores for the instrument range from 0.11 to 1 [31, 114]. The HUI3 comprises eight domains: vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain. Utility scores for the instrument range from − 0.36 to 1 [31].

The ESS measures the propensity to fall asleep during eight daily activities (sitting and reading, watching television, sitting inactive in a public place, as a passenger in a car for an hour without a break, lying down to rest in the afternoon, sitting and talking to someone, sitting quietly after lunch without alcohol, and sitting in a car, while stopped for a few minutes in traffic) [115]. Summary scores can range from 0 to 24 [116, 117]. The ISI is a widely used 7-item tool for quantifying perceived insomnia severity and its potential daytime impacts relating to sleep-onset difficulty, sleep maintenance difficulty, early morning awakenings, sleep dissatisfaction and interference with work, social and mood functioning [118]. It produces summary scores that range from 0 to 28 [119]. The FOSQ-30 is a widely used instrument constructed to assess the impact of excessive somnolence on adult functional status. It examines five domains: activity levels, vigilance, intimacy and sexual relationships, productivity, and social outcomes. Summary scores ranging from 5 to 20 can be calculated [119]. The SAQLI assesses four QoL domains linked to sleep apnoea: daily functionality, social interactions, emotional well-being and symptoms, with an additional domain, treatment-related symptoms, specifically designed for individuals undergoing a therapeutic intervention. [123]. Summary scores that range from 1 to 7 can be calculated [128]. The QSQ assesses HRQoL in patients with OSA and evaluates the impact of apnoea on five domains, namely hypersomnolence, daytime symptoms, night-time symptoms, emotions, and social interactions [116]. Summary scores that range from 1 to 7 can be calculated [119]. The HSDQ is a 32-item used to screen for six potential sleep disorders: insomnia, parasomnia, circadian rhythm sleep disorder, hypersomnia, restless legs/periodic limb movement disorder, and sleep-related breathing disorder. Averaged scores that range from 1 to 5 can be calculated [129].

The WOMAC is an instrument widely used in evaluating osteoarthritis [130]. Its 24 items can be divided into three subscales (pain, stiffness, and physical function) with total scores that range from 0 to 96 [131]. The 17-item GRID-HAMD is a depression rating scale, which also captures insomnia QoL constructs and enables a rater to measure the intensity and frequency of QoL constructs [132]. Summary scores that range from 0 to 52 can be calculated [132]. Higher scores for the FOSQ-30, SAQLI, and QSQ indicated better outcomes, whereas the converse was true for the ESS, ISI, HSDQ, WOMAC and GRID-HAMD.

3.5 Instrument Conceptual Overlap and Content Coverage

Table 3 shows the distribution of sleep-specific and generic instruments across the major ICF categories and level 2 chapters of the ICF, summarised further in Appendices 6 and 7 of the ESM. One hundred and eighty-seven instrument items/dimensions were compared and matched to 17 ICF chapters and 80 level-two categories. There was 94% agreement between the two linkers (BK and TJW) for 176 items/dimensions (127 sleep specific and 49 generic). Linkages of the rest of the items (8: 6 sleep specific and 5 generic) were determined through a structured discussion with a third expert (AN).

Table 3 Classification of sleep-specific QoL instrument dimensions according to the ICF classificationsa

Table 3 shows there was a conceptual overlap between the sleep and generic QoL instruments in terms of their coverage of the ICF’s ‘Body Functions’ and ‘Activities and Participation’ domains. For the body functions domain, the most overlap was in the ‘b1—mental functions’ chapter, onto which at least one item/dimension from all instruments was mapped. However, more sleep items (71–100% of the total number of items in an instrument) than generic dimensions (20–50% of the total number of dimensions in an instrument) were linked to this chapter. The chapters with the least overlap were ‘b3—voice and speech functions’, ‘b4—functions of the cardiovascular, haematological, immunological and respiratory systems’, ‘b5—functions of the digestive, metabolic and endocrine systems’, and ‘d3—communication’, i.e. only covered by three sleep instruments and one generic instrument. Only the SAQLI was linked to all six ‘Body Functions’ chapters. All items from one sleep instrument (ESS) solely matched onto the ‘Mental Function’ chapter. All generic instruments were linked to up to three ‘Body Functions’ chapters (‘b1—mental functions’, ‘b2—sensory functions and pain’ and ‘b6—genitourinary and reproductive functions’) except for HUI-2, which was additionally linked to ‘b3—voice and speech functions’ and the 15D, which mapped onto all body function chapters.

There seemed to have been a more widespread overlap between the sleep and generic instruments in the ‘Activities and Participation’ domain. However, no single instrument covered all nine chapters of this domain. The chapters in this domain with the most overlap between the instruments were ‘d2—general tasks and demands’ and ‘d7—interpersonal interactions and relationships’ (each covered by seven and four of the sleep and generic instruments, respectively). The chapters with the least overlap were ‘d1—learning and applying knowledge’, which was covered by only one sleep instrument (ESS) and ‘d3—communication’, which was only covered by three sleep instruments. Amongst the sleep and depression-related instruments, the FOSQ-30, ESS, SAQLI, and GRID-HAMD covered the most chapters of the ‘activities and participation’ domain (i.e. eight for the FOSQ-30 and seven for the other instruments). Amongst the generic instruments, the SF-6D, 15D, and SF-36/SF-12 covered the most (seven) of this domain’s chapters. The EQ-5D and NHP, respectively, covered six and five chapters of the ‘activities and participation’ domain.

Considered separately, there was a more apparent overlap amongst generic instruments (100% overlap for six ICF chapters; two for the ‘body functions’ domain and four for ‘activities and participation') than amongst the sleep instruments (100% overlap for only one ICF chapter from the ‘body functions’ domain). Consequently, there seemed to have been more diversity in the concepts covered by the sleep-specific instruments than in the generic instruments.

In terms of the extent of concepts covered, the instruments with the broadest coverage were the SAQLI linked to 76% of all ICF chapters (100%, 67%, and 50% of all ‘body functions’, ‘activities and participation’, and ‘environmental factors’ chapters, respectively), 15D linked to 76% of all ICF chapters (100% and 78% of all ‘body functions’ and ‘activities and participation’ chapters, respectively) and the GRID-HAMD mapped to 71% of the ICF chapters (83% and 78% of all ‘body functions’ and ‘activities and participation’ chapters, respectively).

4 Discussion

This review found an ample choice of instruments available to evaluate the various aspects of QoL among sleep disorder cohorts in economic evaluations. Whilst QoL is multi-dimensional, instruments in sleep disorder cohorts must capture domains important for this cohort. Reimer and Flemons argue for using broad-based instruments in study cohorts to cover concepts that include physical, mental, and social function, the burden of symptoms and an overall sense of well-being [133]. Instruments with broader coverage of QoL concepts have also been recommended in the sleep literature, given that sleep and circadian factors are strongly correlated with broad concepts of daily mental and physical performance and well-being [36,37,38,39]. Our review has shown that nearly 45% of economic evaluations of sleep disorder interventions have a mix of generic and sleep-specific instruments, perhaps in recognition that neither type of instrument may be sufficiently comprehensive to cover the breadth of potential sleep-related QoL constructs. However, the instruments with the most comprehensive coverage of sleep-related constructs were the SAQLI and FOSQ-30 (amongst sleep instruments) and the 15D, SF-6D, SF-36/SF-12, and GRID-HAMD (amongst non-sleep instruments).

While the comprehensiveness of an instrument is a key consideration when selecting an instrument, attention must also be paid to other instrument attributes, including acceptable measurement properties (e.g. ceiling effects, specificity, sensitivity, validity, reliability, and responsiveness), parsimony, ease of completion and scoring, and the potential to provide helpful clinical data [134]. Overall, the most frequently used generic QoL instruments were EQ-5D-3L (n = 17) and SF-36 (n = 11). Several attributes may account for the popularity of the EQ-5D-3L, including its translations, scoring algorithms adapted to several cultures and countries, absence of license fees for non-commercial usage, and its recommended use by the UK National Institute for Health and Care Excellence in economic evaluations [135]. Further, the brevity and ease of administration of the EQ-5D-3L, which uses 3-level Likert scales, delivers a practical advantage with respect to its burden on respondents compared with other instruments with more dimensions. Nevertheless, it must be considered that the EQ-5D-3L has a higher ceiling effect than other generic instruments, such as the SF-36, SF-6D [111, 112], 15D [136], and NHP, with the 15D and SF-6D showing the lowest effect [137, 138]. However, it should be noted that the EQ-5D-5L, which consists of a 5-level Likert scale used in three studies [54, 82, 97], has been shown to reduce this ceiling effect [139, 140]. In the context of sleep health, several studies suggest that the EQ-5D is less sensitive to intervention effects on health status when compared with alternative generic instruments such as the SF-36, SF-12, 15D, HUI-2, or SF-6D, which better detect improvements in line with those indicated by condition-specific clinical metrics [62, 114, 141]. The downside to the SF-36, SF-12, 15D, HUI-2, and SF-6D is that they are relatively longer instruments and may, therefore, lead to a higher respondent burden. Indeed, participant burden and related feasibility impacts are important considerations when selecting an evaluation tool, particularly where health economic outcomes are incorporated as secondary outcomes in research studies. We did not find evidence of the performance of non-sleep instruments in sleep cohorts regarding other measurement properties in the literature, such as sensitivity, validity, reliability, and responsiveness [142].

Multiple studies [35, 53,54,55, 57, 58, 60,61,62, 64, 66, 82,83,84,85, 87,88,89,90,91,92,93, 96, 98,99,100,101,102, 143] identified in this review used sleep-specific instruments. Of all sleep-specific instruments, the ESS (n = 9) was the tool most commonly reported in the economic analyses that we identified, followed by the ISI (n = 6) and FOSQ-30 (n = 2).

The popularity of the ESS is likely driven by its simplicity and brevity in measuring the propensity to fall asleep during daily activities [115]. It also forms part of the diagnostic criteria or assessment for further testing or treatment eligibility and may lead to floor effects when used on non-sleepy people with OSA. Poor test-retest reliability in short time intervals and sensitivity and specificity in interventions in moderate-to-severe OSA cohorts have been noted and must be considered when interpreting the ESS as a potential indicator of QoL [116, 117]. It also had a lower coverage of QoL concepts than the FOSQ-30.

The ISI is a widely accepted and valid tool to quantify perceived insomnia severity and its potential daytime impacts by capturing QoL domains relevant to those with insomnia. This includes a domain that assesses distress caused by sleep disturbance, which impacts QoL [118]. The ISI has shown sensitivity to treatment response [120, 121]. However, it has the lowest coverage of QoL concepts among all sleep instruments.

The FOSQ is more strongly correlated with the ESS than the SAQLI, making it more effective for the evaluation of the impact of sleepiness on QoL. It was also more responsive to continuous positive airway pressure therapy for OSA than the SAQLI [123]. The length of the FOSQ (35 items) brings into question its utility for clinical practice, and large-scale studies may be difficult when treatment progression needs to be monitored [124]. Alternatively, a more recently developed and validated 10-item FOSQ (FOSQ-10) [124] would be more convenient in practice than its predecessor. The FOSQ-10, however, has a lower coverage of QoL concepts than the SAQLI or its 35-item version.

The SAQLI is a sleep-specific instrument for patients with OSA that takes a broad scope on QoL. Notably, its ‘emotional functioning’ domain effectively measures mental health-related aspects of QoL [123], a relevant consideration in patients with sleep disorders. Another domain, tailored specifically for treatment-related symptoms, was created for individuals undergoing therapeutic interventions. This addition enhances its utility in clinical settings, enabling the tracking of symptom improvements and the monitoring of treatment side effects. It has the most comprehensive coverage of QoL concepts of all sleep-specific instruments. The drawbacks of using the SAQLI is the requirement of a trained interviewer to administer the questionnaire and the complex scoring algorithm [119].

The 17-item GRID-HAMD had the second broadest coverage of sleep-associated QoL concepts of all sleep-related instruments and should therefore be considered in economic evaluations of insomnia interventions. However, its requirement for trained individuals familiar with mood assessments in depressed populations limits its widespread use [132]. The rest of the non-generic instruments (QSQ, HSDQ, and WOMAC) had a low coverage of QoL concepts and would, therefore, not be appropriate for economic evaluations.

Generic preference-based instruments should be considered for use in economic evaluations of sleep disorders, where it is essential to make standardised comparisons across disease areas within a CUA. However, given that some of the measurement properties of non-sleep QoL instruments, when used in sleep populations, have yet to be reported or demonstrated, we recommend a combination approach of generic (within CUAs) and sleep-specific instruments (within CEAs) in economic evaluations of sleep disorder interventions. The choice of the instrument used should consider QoL coverage of sleep-related constructs and the sleep disorder being addressed. If the primary interest is to evaluate concepts relating to body functions, the 15D and SAQLI (for OSA) or GRID-HAMD (for insomnia) should be considered (CEA and CUA). If the goal is to evaluate concepts relating to activities and participation, either the 15D or SF-6D could be paired with the FOSQ-10, ESS, or GRID-HAMD (CEAs and CUAs). For utility measurement, especially in guiding resource allocation across various healthcare settings or sleep disorders (CUA only), the 15D and SF-6D are recommended choices. Given the prominence of certain instruments, such as the ESS and EQ-5D suite of instruments, in sleep-related economic evaluations, transitioning to alternative instruments demands a comprehensive evidence-based approach. Initially, a stronger case for alternative instruments’ superiority in psychometric properties, adaptability to change and alignment with research objectives must be established through robust evidence. A shift to alternative instruments also requires training researchers and practitioners in applying, scoring and comprehending these alternative tools. Accessibility plays a pivotal role; ensuring affordability and availability of new instruments through open-source models or cost-effective licensing options can widen their adoption. Finally, efforts are needed to create preference weights for sleep-specific tools to guide their utilisation in CUAs to expand their utility in healthcare assessments.

A key area for future research should be to investigate whether current QoL instruments employed in an economic evaluation of sleep disorders adequately capture all dimensions relevant to people with such conditions rather than dimensions presumed to be relevant based on expert opinion [144]. Patient-centred perspectives should also concurrently compare the measurement properties of both sleep and non-sleep instruments in sleep populations most relevant to and negatively impacted by the QoL effects of poor sleep. Future research should also be dedicated to developing a sleep-specific preference-based instrument enabling a QALY calculation to facilitate the economic evaluation of sleep health technologies.

A limitation of this review was that, because of heterogeneity and a lack of data from the studies included, a meta-analysis was not conducted on some studies that assessed OSA interventions and all studies that evaluated insomnia interventions. There were also no economic analysis data on interventions for sleep disorders other than insomnia and OSA (e.g. narcolepsy, restless legs syndrome, central sleep apnoea and circadian rhythm disturbance) or on comorbid insomnia and OSA.

5 Conclusions

Given the breadth and variability of tools used to evaluate QoL impacts in sleep disorders, there is a clear need for a preference-based ‘gold standard’ instrument to support economic evaluations of sleep health interventions that includes domains considered most important to people with sleep disorders. Inadequacies within existing generic and sleep-specific QoL instruments, when used alone, support the conclusion that a QoL assessment within sleep health economic evaluations is best captured using a combination of the two.