Measuring outcomes in adult spinal deformity surgery: a systematic review to identify current strengths, weaknesses and gaps in patient-reported outcome measures

Adult spinal deformity (ASD) causes severe disability, reduces overall quality of life, and results in a substantial societal burden of disease. As healthcare is becoming more value based, and to facilitate global benchmarking, it is critical to identify and standardize patient-reported outcome measures (PROMs). This study aims to identify the current strengths, weaknesses, and gaps in PROMs used for ASD. Studies were included following a systematic search in multiple bibliographic databases between 2000 and 2015. PROMs were extracted and linked to the outcome domains of WHO’s International Classification of Functioning and Health (ICF) framework. Subsequently, the clinimetric quality of identified PROMs was evaluated. The literature search identified 144 papers that met the inclusion criteria, and nine frequently used PROMs were identified. These covered 29 ICF outcome domains, which could be grouped into three of the four main ICF chapters: body function (n = 7), activity and participation (n = 19), environmental factors (n = 3), and body structure (n = 0). A low quantity (n = 3) of papers was identified that studied the clinimetric quality of PROMs. The Scoliosis Research Society (SRS)-22 has the highest level of clinimetric quality for ASD. Outcome domains related to mobility and pain were well represented. We identified a gap in current outcome measures regarding neurological and pulmonary function. In addition, no outcome domains were measured in the ICF chapter body structure. These results will serve as a foundation for the process of seeking international consensus on a standard set of outcome domains, accompanied PROMs and contributing factors to be used in future clinical trials and spine registries.


Introduction
Adult spinal deformity (ASD) refers to a broad spectrum of abnormal spinal curvatures seen in adulthood. In the aging population, ASD causes a very high level of functional disability due to severe back and leg pain, subsequently reducing overall quality of life [1]. The prevalence of such curvatures of the spine has been reported being as high as 68% in healthy volunteers over the age of 60 [2][3][4]. It is expected that with the growing elderly population an increase in prevalence and incidence rates of symptomatic ASD will be seen [5,6]. In the Unites States, the societal burden of this disorder has been reflected by a 2.5-fold increase of hospital discharges for treatment of ASD over the past 10 years leading to substantial healthcare Electronic supplementary material The online version of this article (doi:10.1007/s00586-017-5125-4) contains supplementary material, which is available to authorized users. expenditures [7,8]. Consequently, spine surgeons managing these spinal deformities are under increasing pressure to demonstrate the value of treatment (outcome per unit cost) provided.
In this era of value-based care, outcome measures covering the overall quality of life, functioning, and disability derived from patient-reported outcome measures (PROMs) will play an important role in future reimbursement and healthcare systems as patients are more actively involved in the management of their disease [9][10][11]. In order to measure outcome and evaluate the effectiveness of treatment in ASD, multicentre, regional and national spine registries have started [12]. Despite this, there is a lack of consensus on the choice of PROMs. This has resulted in inconsistent reporting, making it difficult to pool and compare outcomes between registries [12]. Moreover, it hinders the application of research findings to formulate clinical guidelines and inform policy makers regarding different treatment strategies [13][14][15]. This great diversity of PROMs in the field of spinal deformity surgery has been emphasized before [16,17]; however, no efforts have been made for international consensus and there is an increased awareness in the research community that this issue needs to be addressed [18,19].
In order to develop an international minimal standardized set of PROMs, it is first important that a universally accepted framework is adopted and used [19,20]. The International Classification of Functioning, Disability and Health (ICF) framework, adopted by the World Health Organisation, provides a necessary universal language for health outcome measures that has previously been used to identify whether currently used PROMs are adequate to portray all relevant outcomes in several health conditions [21][22][23][24][25]. This will subsequently provide insights and recommendations for future patient evaluation improvements and provide reliable and valid information for reaching consensus on a minimal standardized set of PROMs. When implemented in registries and future clinical trials, this will subsequently allow for data pooling, benchmarking, and comparison of results [25]. Before fair comparison can be made, correction for patient risk factors (i.e., risk stratification) is required. Without correcting for patient risk factors, hospitals that manage patients with more comorbidities would appear to have worse outcomes [26]. Therefore, in order to make fair comparisons between national, institutional, and multicentre spine registries, it is also essential to commit to measuring a minimum sufficient set of pretreatment risk factors necessary for adequate patient evaluation.
The objective of this paper is to identify PROMs currently used in clinical studies in ASD surgery through a critical systematic review of the literature. The focus will be patient centered; hence, the retrieved PROMs will be categorized using the outcome domains of the ICF framework and their clinimetric properties will be evaluated. This will subsequently highlight the current strengths, weaknesses, and gaps in PROMs used for assessment of ASD, provide recommendations for future improvements, and serve as a foundation for the process of seeking global consensus on standardizing PROMs and contributing risk factors in future clinical trials and spine registries.

Methods
This project was registered with the Core Outcome Measures in Effectiveness Trials (COMET) database (http:// www.comet-initiative.org) and is supported by the AO Spine Knowledge Forum Deformity and the Scoliosis Research Society. Guidelines for Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [27] and recent publications on the development of a Core Outcome Set (COS) [28] were applied.

Search strategy and eligibility criteria
Relevant published studies involving outcome measurement after ASD surgery that were published between 01.01.2000 and 01.01.2016 were identified by a systematic search conducted by an experienced medical information specialist and through backwards citation of obtained papers. The search was conducted in bibliographic databases PubMed, EMBASE.com, Cinahl (via EBSCO), and The Cochrane Library (eTable 1; Supplementary Material). Keywords used to identify a relevant design were as follows: randomized controlled trial, longitudinal observational study, retrospective study of prospectively collected data, case series, or cross-sectional in the title or in the abstract. The titles and abstracts identified by the literature search were independently screened by two reviewers (MvH and TH). Full-text articles were retrieved if the abstract passed the first eligibility screening or provided insufficient information. All obtained full-text studies were reviewed independently for inclusion by two reviewers (MvH and TH) according to the following inclusion criteria: -a diagnosis of ASD, which included diagnoses of (progressive) adult idiopathic scoliosis and de novo degenerative lumbar scoliosis; -N C 20 ASD patients who underwent surgery; -age C25 years; and -at least one patient-reported outcome measure (PROM) is reported.
Differences in judgment regarding inclusion or exclusion of studies were resolved through discussion to achieve Eur Spine J (2017) 26:2084-2093 2085 consensus. In case of persistent disagreement, a third independent reviewer (SF) made the final decision. As defined by the US Food and Drug Administration (FDA) [29], measures on different concepts of quality of life and functional status that come directly from the patient, without interpretation of the patient's response by a clinician, are considered as patient-reported outcomes (e.g., multi-item questionnaires, medication review, and singleitem questions such as ''Is there a loss of strength in your legs or arms?'' and ''Does your breathing ever sound wheezy?''). Concepts of function and quality of life that are evaluated by equipment (e.g., forced expiratory volume in one second) or classification systems (e.g., ASIA grade to evaluate neurological function) that need interpretation by a clinician or anyone other than the patient may be a patient-'related' outcome but should not be considered as a PROM.

Data extraction
From each included study, the following data were extracted: authors, region of origin, year of publication, study design, mean age of study population, diagnosis, patient-reported questionnaires (e.g., ODI, SRS-22), singleitem questions on different concepts (e.g., satisfaction with overall treatment), and any additional non-validated questionnaires. In case of inadequate specified items of questionnaires, a reference check and additional literature search were performed to retrieve the complete questionnaires.

Linking PROMs to the ICF framework
The ICF framework is intended to describe functional states associated with various health conditions and provides a universal language for health outcome measures according to a hierarchical classification system [30]. The most recent versions of PROMs used in at least three studies were linked to the online ICF framework according to linking rules developed by Cieza et al. [31,32]. Items of PROMs can be linked to one or more ICF domains depending on the number of meaningful concepts contained in that item (e.g.: Item 4 of the Oswestry Low Back Pain Disability Questionnaire ''pain does not prevent me from walking any distance'' refers to the meaningful concepts ''pain'' and ''walking distance'' and can be linked to the domains 'b280 sensation of pain' and 'd450 walking,' respectively). First, items were linked to a third-or fourthlevel ICF domain and subsequently aggregated to their related second-level component. If a concept behind an item was not sufficiently specified, it was assigned to the domain ''not definable'' or as ''not covered'' by the ICF.
One reviewer (SF) performed the linking process and the results were checked by a second reviewer (MvH). In case of disagreement, a third independent reviewer (TH) was consulted and a final decision was made. Finally, absolute frequencies and percentages of identified ICF domains and accompanying PROMs were calculated. To avoid frequency bias, if an ICF domain could be linked to more than one item of a PROM, it was counted only once.

Clinimetric properties of PROMs
PROMs identified in the included studies were subsequently subjected to a quality assessment based on the criteria developed by Terwee et al. [33]. For this purpose, an additional literature search was performed to find relevant clinimetric studies on these PROMs in the ASD population. The following clinimetric properties were evaluated: (1) content validity, (2) internal consistency, (3) construct validity, (4) reproducibility, (5) responsiveness (5a agreement and 5b reliability), (6) floor or ceiling effect, and (7) interpretability.

Search results
The systematic search generated a total of 2532 papers. After removing duplicates, 2120 papers remained of which the title and abstract were screened. Of these 2120 papers, we identified 335 potentially eligible papers that were sought for full-text screening. Finally, 144 papers were eligible for inclusion (Fig. 1).

Characteristics of included studies
Studies published between 2000 and 2015 were most frequently conducted in North America (65.9%). The vast majority (60.4%) of identified studies included a mixture of diagnosed ASD patients (e.g., progressive adult idiopathic scoliosis, de novo degenerative lumbar scoliosis). Noteworthy is the increasing number of publications over time during the study period. Table 1 provides the main characteristics of all the included studies.

Extracted data
PROMs used in one or two studies are presented in eTable 2; Supplementary Material. Nine PROMs were used in at least three papers and are presented in Table 2. The ODI was the most frequently used single PROM (62.3%), followed by the SRS-22 questionnaire (43.8%).

Outcomes linked to ICF domains
The individual items (questions) of retrieved PROMs were subsequently linked to the domains of the ICF framework. A total of 29 second-level domains were identified and aggregated to their corresponding major ICF chapter: body function (n = 7), body structure (n = 0), activity and participation (n = 19), and environmental factors (n = 3) ( Table 3). The most frequently measured second-level outcome domains were found to be related to the first-level domains 'mobility' (n = 8) and 'mental functions' (n = 5). The linking results of the identified PROMs are included in eTable 3; Supplementary Material.
Eight of the nine identified PROMs in this literature review measured the outcome 'sensation of pain'. The SF-36 was the PROM that measured the largest number of second-level domains (n = 17) (eTable 3; Supplementary Material). No PROMs were identified that measured neurological or pulmonary outcome domains (Table 3).

Discussion
Adult spinal deformity (ASD) causes severe functional disability, reduces overall quality of life, and results in a substantial societal burden of disease. In light of the continuously expanding global societal problem of ASD related to aging populations and emphasis on the value of treatment provided, a common language and approach of outcome measurement is needed. However, there is no consensus on how to measure outcome in ASD surgery, which can be performed using PROMs. This has subsequently resulted in inconsistent outcome measurement, making it difficult to pool and compare outcomes between studies and spine registries. The aim of this systematic review was to identify outcome domains measured in ASD surgery by linking PROMs currently used in clinical studies to the universally accepted ICF framework. The ICF framework is intended to highlight the current strengths, weaknesses, and gaps in PROMs by linking question items of PROMs to outcome domains according to a hierarchical classification system. This will subsequently provide insights and recommendations for future patient evaluation improvements and provide reliable and valid information for reaching consensus on measuring outcomes.
In total, nine PROMs were identified in a total of 144 papers and question items of the identified PROMs could be linked to a total of 29 outcome domains, covering three of four major chapters of the WHO ICF framework ( Table 3). The results of this study will support the process of seeking international consensus on a minimum standard of assessing and reporting PROMs in future clinical trials and spine registries.

Outcome domains in ASD research
The clinical presentation of ASD and its influence on the quality of life vary greatly from minimal or no symptoms to severe back and leg pain with gait disturbance, with or without neurologic, pulmonary, bowel, or bladder dysfunctions [37]. For patients seeking care, the magnitude of the impact of this disorder on the overall quality of life is large and in part due to limitations in physical function (e.g., mobility) and bodily pain [37][38][39]. A recent study performed by Kleinstuck et al. demonstrated the implementation of a core set of outcome measures in adult degenerative scoliosis surgery using the Core Outcome Measures Index (COMI) questionnaire [40]. The COMI was developed for assessing the main outcomes of importance to patients with various spine conditions and back problems (pain, function, symptom-specific well-being, quality of life, disability) [41,42]. However, the wide range and specific symptoms seen in ASD (e.g., neurologic, pulmonary, bowel, or bladder dysfunctions) emphasize the need for developing a core set of outcome measures specifically for this group of patients. In the present study, using the WHO ICF framework, outcome domains related to 'sensation of pain' and 'mobility' (e.g., walking, changing a body position, moving around, etc.) are currently most frequently reported in ASD research (Table 3). It is beyond the scope of this paper to discuss the appropriateness of the identified outcome domains, whether they should be included in a core outcome set and whether the currently used PROMs adequately represent all relevant aspects of functioning and quality of life for patients with ASD, but it is remarkable that outcome domains related to 'neurological function' (e.g., muscle power function, bladder or bowel functions) and 'pulmonary function,' both observed to be affected before and after ASD surgery [43][44][45], are not evaluated by PROMs in the current literature (Table 3). Lenke et al. [44] and Lehman et al. [45] demonstrated that both these outcome domains are significantly affected in ASD patients. However, both these recent publications used clinician-reported outcome instruments (lower extremity motor function and the forced expiratory volume in one second, respectively) to evaluate outcome, and do therefore not meet the inclusion criteria for the present study. Still, it may be that both pulmonary and neurological functions should somehow be included in a future PROM core outcome set (patient self-reported, rather than evaluated by clinical tests), despite the fact that currently they are not frequently measured outcome domains.

PROMs in ASD research
Outcome domains can be measured with different measurement instruments, which can be categorized into clinician-based instruments (e.g., forced expiratory volume in one second) and patient self-reported instruments (i.e., PROMs). To date, there is large variability in PROMs used to assess outcome after ASD surgery, subsequently leading to the large variability in the measured outcome domains ( Table 3). The ODI is a condition-specific instrument to evaluate the disturbance of functional status caused by low back pain [46]. However, despite common use, this PROM does not cover the full clinical presentation of ASD by failing to evaluate neurologic and pulmonary dysfunction, noted before and after ASD surgery [44,45]. This can be explained by the fact that the ODI was introduced as a PROM to assess the functional status in low back pain patients, rather than for ASD patients [47]. Currently, there are no other condition-specific outcome measurements for spine deformity patients other than the Scoliosis Research Society-22 (SRS-22) questionnaire. The SRS-22 was introduced as a condition-specific PROM for Adolescent Idiopathic Scoliosis (AIS) and consists of five domains: function, pain, self-image, mental health, and satisfaction [48]. Although it is one of the most easily accessible and widely validated and translated questionnaires in AIS [49][50][51] and ASD [34][35][36], it has limitations. The outcome domains measured with the SRS-22 can differ substantially in importance for an adolescent with AIS, compared to an adult patient with ASD. Where patients with AIS are relatively asymptomatic and mostly undergo surgery to halt curve progression and pulmonary deterioration, and to improve self-confidence and cosmesis, patients with ASD seeking surgery mostly want relief of symptoms, improvement of quality of life and employment rather than a cosmetic satisfying result [52][53][54]. Therefore, it could be that in each specific group of patients different PROMs should be used to measure the most relevant outcome domains. In addition, the most frequently used PROMs (ODI and SRS) questionnaires have a substantial overlap in outcome domains which highlights the need to use a core set of outcome measures specific for the ASD population. If, after reaching consensus on a core set of outcome domains, no PROMs are available to evaluate a certain core outcome domain, these will need to be developed.

Clinimetric properties of PROMs
Finally, we studied the clinimetric properties of identified PROMs, but the available evidence was very limited ( Table 4). The low quantity of clinimetric studies (n = 3) conducted in the ASD population makes it difficult to evaluate the clinimetric properties of PROMs (Table 4) [34][35][36]. Overall the ODI, SRS-22, SF-12, and SF-36 are the widely used PROMs that have been translated and validated in more than 10 different languages, making them suitable for global use. The SRS-22 appears to have Eur Spine J (2017) 26:2084-2093 2089 the highest level of clinimetric quality compared to the ODI, SF-12, and SF-36 and seems most suitable in the ASD population (Table 4). The ODI has demonstrated to be a reliable and valid tool to measure the functional status in the low back pain patients [47]. More research is needed to demonstrate the specific clinimetric properties of the ODI, SF-36, and SF-12 in the ASD population.

Limitations
Studies published prior to 2000 and non-English studies were not included in order to obtain the most relevant PROMs that are used in current clinical research. Furthermore, relevant studies hidden in unknown databases may have been missed. Therefore, the possibility of publication bias cannot be excluded. We found no published studies that evaluated the clinimetric properties of SRS-24, SRS-30, and VAS/NRS in the ASD population. Therefore, it was not possible to evaluate the clinimetric properties of these PROMs. The allocation of items from the PROMs questionnaires to ICF outcome domains by the authors may have been influenced by the perception about defining features of identified questions. Based on an arbitrary cut-off point, PROMs used in less than 3 studies were not included to obtain the most prevalent and relevant outcome domains that are currently measured in clinical research. It is possible that outcome domains may have been missed. Finally, PROMs were included regardless of whether a license fee is required when implemented in clinical trials or spine registries.

Future steps
In the next phase, using a Delphi method, the list of outcome domains (Table 3) derived from this systematic review will be used in an international consensus process of stakeholders to develop a set of core outcome domains, accompanied PROMs, and contributing (risk) factors that should be assessed when evaluating ASD patients. The need for such international standard has become important given the expanding interest in patient-centered care and increasing treatment costs. This will subsequently allow for international data pooling and benchmarking of standardized risk-adjusted PROMs, and in turn highlight the value of provided treatments.

Conclusion
Great diversity exists in outcome domains and PROMs used in 144 studies on ASD surgery. This hampers our current ability in comparing different treatment strategies within and between care facilities, both nationally and globally. Overall, outcome domains related to 'mobility' and 'sensation of pain' were well represented, albeit that several different PROMs are frequently used in which these outcome domains are measured. Outcome domains related to 'neurological function' and 'pulmonary function' were not reported. More research is needed to evaluate the methodological quality (i.e., clinimetric properties) of PROMs used in this specific population. The results of this study will support the process of seeking international consensus on a minimum set of core outcome domains, accompanied PROMs, and contributing risk factors. When universally applied, this will help improve outcome measurement and facilitate international comparisons and benchmarking, ultimately enhancing value-based healthcare.