Background

Adolescent idiopathic scoliosis (AIS) is the most common form of scoliosis, affecting approximately 0.47 and 5.2% of individuals with scoliosis worldwide, and is associated with high health costs [1,2,3]. Scoliosis is defined as a three-dimensional deformity of the spine characterized by lateral tilt and axial rotation of the vertebrae. Its magnitude in the frontal plane is identified by a Cobb angle greater than 10°. The literature has shown a greater predominance in females, and its progression is more tangible at the growth peak that occurs at puberty, between 11 and 14 years of age [4, 5].

The progression of the scoliosis curve during adolescence can be marked by changes in respiratory and functional status, pain intensity and aesthetic appearance. Concern about appearance is associated with worse health-related quality of life, and this one of the main reasons for referral to health care professionals [6,7,8]. In addition, scoliosis can strongly affect self-image, mental health and activities of daily living [6, 9]. In recent years, understanding of health and disability has increased, with greater emphasis placed on evaluation and treatment measures related to quality of life in this population [10, 11].

The therapeutic process of AIS is characterized by aligning the expectations and goals of the patient regarding treatment. One way to evaluate treatment is via patient-reported outcome measures (PROMs) [12]. Based on the self-report of patients and measured by instruments, there are data on, for example, adolescents’ perception of their appearance and expectations about their image of adolescents with idiopathic scoliosis. In this context, the Spinal Appearance Questionnaire (SAQ) is a validated instrument that measures the perception of spine appearance and deformity and AIS patient expectations about self-image [13,14,15].

The original version of the SAQ in English was developed by Sanders et al. (2007) with seven design items for the indication of spinal deformities and the progression of severity. Subsequently, it was improved by Carreon et al. (2011), who demonstrated the SAQ as a tool with greater sensitivity for self-image when compared to the Scoliosis Research Society-22 (SRS-22) [16, 17] Thus, the clinical practice guidelines recommend that health professionals use instruments adapted and validated cross-culturally with adequate methodological quality. However, there are no studies that have grouped and evaluated the properties of the SAQ in a systematic way and analyzed its degrees of recommendation based on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) [18,19,20]. Thus, the objective of this study was to systematically examine the clinimetric properties of the SAQ instrument in its cross-cultural adaptations in different languages.

Methods

This systematic review was registered in the International Prospective Register of Systematic Review (PROSPERO), CRD42021250114. The search, writing and systematic review strategies were developed according to the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [21, 22].

Studies that used or examined the SAQ and reported data regarding the clinimetric properties of the questionnaire in different languages were eligible. There were no time restrictions, and the inclusion of studies was primarily based on the main outcomes of translation, adaptation and validation of the instrument in clinical or academic contexts. Incomplete studies, those that were limited to analyses of adults or those that used the SAQ with spinal deformities other than AIS were excluded.

The databases Medline (PubMed), CINAHL, EMBASE, Science Direct, PsycINFO and WorldWideScience.org were used to screen studies until July 16, 2022. The search strategy consisted of three groups of search terms combined with the Boolean operator ‘AND’, represented by the following components: [1] Scoliosis, [2] Adolescent and [3] Spinal Appearance Questionnaire measurement properties (e.g., reliability, validity, responsiveness). The complete search strategies adapted to each database are described in Appendix A. The search descriptors were limited to English and human studies. The searches and selection of articles were performed by two independent reviewers in the databases (MCMS and DYAM). In case of discrepancies, a third reviewer mediated (LCM).

Initially, there was a screening to assess the suitability of the articles per the inclusion criteria based on the titles and abstracts, followed by a complete reading of the selected articles. Eligible articles were assessed for methodological quality. The measurement properties were divided into three domains: reliability (including internal consistency, reliability and measurement error), validity (including content validity, construct validity and criterion validity), and responsiveness. It is noteworthy that the split construct validity can present itself with the properties of structural validity, hypothesis testing and cross-cultural validity. Thus, the “Consensus-based Standards for the selection of health Measurement INstruments (COSMIN)” were used to assess methodological quality based on the COSMIN protocol for systematic reviews of measurement properties [20]. This stage with the included studies was performed by two independent authors, and consensus was reached in meetings with the third author when necessary.

The methodological quality rating was first determined by collecting data on PROM characteristics. The population included the results of measurement properties and information on scores from that PROM in each study. In sequence, quality classification was determined to be very good, adequate, doubtful, or inadequate in each study by measurement property (cross-cultural adaptation, internal consistency, reliability, error measure, responsiveness, content validity, structural validity, criterion validity and validity of construct) were compared to the results of classification for good measurement properties and classified as sufficient (+), insufficient (-) or indeterminate (?). In addition, the analyses allow a grouped result of the measurement properties with general classifications of sufficient (+), insufficient (-), inconsistent (±) or indeterminate (?) and the classification of the quality of evidence of these properties, as proposed by the COSMIN manual, according to the modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) scale, composed of stages: high, moderate, low, and very low [20].

The comparison instrument used was the SRS-22 considered the most robust and specific to individuals with adolescent idiopathic scoliosis, having a domain of spine self-image and used with the comparator instrument in all studies included in this work. Thus the hypothesis of this study is based on a comparison of the SAQ and the self-image dimension of the Scoliosis Research Society-22 (SRS-22) and the premise that this would show a correlation between the instruments with similar constructs of low or fair aspects (0.25–0.50) [20].

Results

The search strategy resulted in 95 articles selected by title and abstract. After removal of duplicates, 52 studies were analyzed for inclusion, and 11 met the eligibility criteria. In addition, the follow-up of references resulted in two (n = 2) articles being included (Fig. 1).

The reasons for exclusion were as follows: full article not published or available (n = 2), articles that did not perform evaluation of cross-cultural adaptation or psychometric properties (n = 1) and another version of the SAQ was used for kyphosis (n = 1). Thus, the review was conducted with 13 articles that addressed the psychometric properties of the SAQ in 11 languages (Fig. 1).

Fig. 1
figure 1

Flowchart with search steps

The study population comprised 3,420 adolescents with AIS, the majority of whom were female (Table 1). The groups were organized by degrees of severity, Cobb angle and type of treatment. The description of the follow-up period of the participants occurred in five articles, with a mean of 14.6 ± 7.1 months [23,24,25,26,27,28].

Table 1 Sample characterization

Methodological quality

The properties evaluated in this review were internal consistency, reliability, content validity, cross-cultural validity, construct validity and structural validity (Table 2). There is a description of the number of participants, aspects of methodological quality and a classification for good measurement properties (Table 3).

Table 2 Methodological quality in each study by measurement property (COSMIN Checklist)
Table 3 Grouped results of the SAQ measurement properties with overall classification (COSMIN Checklist)

The original version was developed in English by Sanders et al. (2007) [16] and later improved by Carreon et al. (2011) [17]. All other versions of the instrument had only one publication in each language. The SAQ has been translated into Polish [24], French Canadian [29], Simplified Chinese [25], Spanish (Europe) [30], Danish [31], Traditional Chinese [26], Portuguese (Brazil) [32], Korean [33], German [27], Turkish [28] and Persian [9].

Internal consistency was tested with a population of 3,177 participants, and its methodological quality was determined in 8 articles as “very good”, 3 “doubtful” and 2 “inadequate”; the pooled classification result was “sufficient” with Cronbach’s alpha values of 0.70–0.94 in 11 studies [16, 17, 24, 26,27,28, 30,31,32,33].

Reliability was tested with a population of 2,933 participants, and its methodological quality was determined to be “adequate” in 1 article, “doubtful” in 3 and “inadequate” in 9. The pooled classification result was “sufficient”, with intraclass correlation coefficients (ICCs) of 0.89–0.98 in 5 studies [9, 25, 28, 31, 33].

Content validity was tested with a population of 127 participants only in the original article [16], and its methodological quality was “inadequate” with insufficient data for continuing analysis.

The cross-cultural validity was tested with a population of 1,237 participants, and its methodological quality was presented as “doubtful” in 3 articles [9, 26, 27] and “inadequate” in 8 articles [24, 25, 28,29,30,31,32,33]. The grouped classification result was “insufficient”, without correct statistical tests to test the populations.

Structural validity was tested in a population of 3,177 participants and found to be of “doubtful” methodological quality in 1 article [16] and “inadequate” in 12 articles. The grouped result was “indeterminate”, as the authors of the study diverged from the group and demonstrated one of the statistical tests necessary for the correct correlation of the data.

The construct validity was evaluated in 11 articles and a total population of 2,494 people with a methodological quality of “inadequate” in 1 article [17], “doubtful” in 5 articles [16, 27, 28, 30, 31], “adequate” in 3 articles [25, 29, 33] and “very good” in 2 articles [9, 26].

Discussion

The present study aimed to systematically review the clinimetric properties of the Spinal Appearance Questionnaire (SAQ) in adolescents with idiopathic scoliosis and analyze its cross-cultural adaptations according to the COSMIN guidelines [34]. The main results suggest that the 13 articles included in this review have methodological inconsistencies regarding psychometric properties and, especially, among statistical tests.

The evidence suggests that the SAQ, in its version for patients, presented a modified quality of evidence (GRADE) of moderate for construct validity, low for internal consistency, and very low for reliability and cross-cultural validity; the content and structural validity properties did not present sufficient data for classification.

The unidimensional structure of the questionnaire was confirmed in most articles with populations exclusively of adolescents [9, 15,16,17, 24,25,26, 28, 29, 31,32,33]. According to the modified GRADE analysis, only two studies reported their results associating groups of adolescents and adults with scoliosis, a fact that added indirect risk of bias due to the partial use of other populations [27, 30].

For a better understanding of these analyses carried out, the measurement properties were arranged in topics from the domains mentioned in reliability (including internal consistency, reliability and measurement error), validity (including content validity, construct validity and criterion validity), and responsiveness.

Reliability

Internal consistency is defined as the extent to which the items of a scale or subscale of the questionnaire are correlated, measuring the same construct. As a measurement property, it is an important requirement for one-dimensional instruments, which aim to measure a single construct using several items, as in the case of the SAQ. Its evaluation is given by Cronbach’s alpha, a coefficient that reflects the degree of covariance between the items of a scale, with compliance parameters between 0.70 and 0.95 [19, 35].

During the evaluations of this study, 11 articles [9, 16, 17, 24, 26,27,28, 30,31,32,33] with grouped Cronbach’s alpha values between 0.70 and 0.94 and a classification of “sufficient” according to COSMIN were obtained. The two other articles did not perform analyses with adequate statistical tests to measure internal consistency [25, 29]. However, the modified GRADE was considered low due to the risk of bias being estimated as “serious” due to the indirect risk of population bias and the number of studies with inadequate or dubious methodological quality [34].

Reliability is part of an expression of the stability of the reproducibility of the instruments with different people (test-retest) that allow similar, coherent and precise responses. Among the reliability coefficients, the ICC is most appropriate for the evaluation of continuous measures. For this property, Pearson’s correlation coefficient is inadequate because systematic differences are not taken into account. For ordinal measures, the weighted Cohen Kappa coefficient should be used. In the case of ICC or weighted Kappa, a minimum standard of 0.70 is recommended for good reliability and in a sample of at least 50 people [35, 36].

The reliability property was evaluated in four studies [9, 25, 28, 33], which demonstrated correct approaches to coefficients. The present study observed pooled results of ICC between 0.92 and 0.98 and “sufficient” classification according to COSMIN [34]. However, of the 13 articles analyzed in this study, nine presented incompatibilities, such as in the statistical tests of the measures evaluated, failure to consider systematic differences in their populations and the minimum sample size. In addition, the modified GRADE quality classification was considered very low due to a “very serious” risk of bias because there was only one study with “adequate” quality [34].

Validity

Cross-cultural validity refers to the degree to which the performance of the items in a translated or culturally adapted instrument reflects the original version of the instrument. To assess cross-cultural validity, property measurement data from at least two different groups are required for comparison, with differences such as gender or language [34]. However, none of the studies included in this study observed the achievement of this fundamental point of the COSMIN. This fact made it impossible to estimate the quality of evidence.

The process of translation and adaptation of instruments refers to the resolution of differences in customs, language and perception of health between different countries and cultures, allowing comparisons between different populations and exchange of information across linguistic and cultural barriers [37, 38]. In this regard, all studies followed the international guidelines recommended by Beaton et al. (2000) [37] presenting translated versions of the SAQ in Polish, Canadian French, Simplified Chinese, European Spanish, Danish, Traditional Chinese, Brazilian Portuguese, Korean, German, Turkish and Persian [16, 17, 24, 26,27,28, 30,31,32,33, 39].

Structural validity refers to the degree to which the patient-reported outcome (PROM) scores are an adequate reflection of the dimensionality of the construct to be measured [40]. However, only one study performed the evaluation of structural validity [16], and the authors tested the property by means of standardized fit statistics with z scores and with the absence of other standard tests expected by the COSMIN, such as item response theory. Due to the inconsistency of the results presented in this property, the modified GRADE classification was not performed [34].

Criterion validity refers to the degree to which the scores of a PROM are an adequate reflection of the gold standard [34]. The review team of this article decided not to address the suggested tables for the evaluation of the COSMIN criterion validity and responsiveness, as these two properties are based on comparisons with a gold standard for health status questionnaires in the target population. This fact is not possible with the SAQ due to its illustrative design, which is a characteristic of the instrument. However, all evidence of the validity of the articles will be included in the analyses of construct validity [41, 42].

Construct validity reflects the ability of an instrument to measure the theoretical dimensions of a construct. As abstract constructions do not manifest themselves directly as physical events, their inferences may derive from observable behaviors and patient self-report. For COSMIN, the construct validity is assessed through the hypothesis test, where the consistency of the scores of a PROM are estimated from the comparison of instruments. Thus, the more specific the hypotheses and the more hypotheses tested, the more evidence is collected for this measurement property [40].

The property of construct validity can be observed in the articles through the subtopic of convergent validity. In this systematic review, the hypothesis was established that correlations should vary from weak to moderate (r = 0.25 to 0.50) [23]. Eleven articles performed the association and description of the variables following the appropriate correlation tests, such as Spearman and Pearson, obtaining a clustered result of “sufficient” for the measurement property of the SAQ. In addition, the modified GRADE quality classification was considered “moderate” due to the indirect risk of population bias [34]. The articles analyzed during the review were compared using the SRS-22 instrument’s self-image domain, which is aimed at teenagers and has had its psychometric structures extensively tested.

Content validity is the degree to which the content of an instrument is an adequate reflection of the construct to be measured and is of interest to the target population. This property should systematically involve patients and professionals in the field to achieve aspects of relevance, scope and understandability of the items. The definition of the target population and the context in which the instrument is used are important aspects for the evaluation of content validity, and it is recommended to perform this evaluation only in its original version [21].

Thus, during the evaluation of the original article of the instrument [16], the measured construct and evaluative context are well described; however, its target population is not clearly cited, involving only adolescents with AIS. Furthermore, the participants are not included in the process of constructing the questionnaire together with experts. Thus, the lack of comprehensiveness and comprehensibility tests as recommended by COSMIN resulted in the methodological quality being classified as “inadequate”. Regarding the other studies, this property was not evaluated; thus, the GRADE classification of this psychometric property was not performed.

Strengths and limitations

The strengths of the study are that the analyses performed were in accordance with the most recent guidelines of the COSMIN manual for systematic reviews. Thus, the systematic review was based on a broad investigation of articles that addressed the SAQ in adolescents. In addition to preventing data loss, all articles analyzed the psychometric properties of the instrument, including its original version.

However, the limitations observed in this study were that grouped data of the articles present discrepancies that can lead to a false grouping of the data for the measured property. Therefore, this article does not present data on a meta-analysis of the data. Another point was the predilection for the use of English in database searches and obtaining articles transcribed in English.

Conclusion

After extensive investigation of the clinimetric properties of the Spinal Appearance Questionnaire instrument in its cross-cultural adaptations, the quality of the evidence regarding the questionnaire in adolescents with idiopathic scoliosis was low due to the absence of clinimetric properties or dubious methodological quality. However, for clinical practice and research, we recommend the use of the instrument to assess the self-perception of the spine in adolescents. For future translations and adaptations, we recommend the use of the COSMIN guidelines.