Introduction

Adolescent idiopathic scoliosis (AIS) is a three-dimensional deformity of the spine and trunk, resulting in body asymmetry such as uneven shoulder level or rib hump. The deformity mostly arises in otherwise healthy, adolescent girls during the pubertal growth spurt [1]. The patients’ subjective perception of their appearance is often affected by their deformity and can influence the health-related quality of life [2].

In AIS patients, the revised Scoliosis Research Society 22-item (SRS-22r) [3] questionnaire is recommended as a condition-specific instrument to measure quality of life. The SRS-22r includes five questions regarding the patients’ appearance (self-image domain) and gives a broad overview of the general perception of appearance. The Spinal Appearance Questionnaire (SAQ) was developed to provide more detailed information about the patients’ perception of their trunk characteristics [4]. It uses illustrations of physical appearances based on the Walter Reed Visual Assessment Scale. After evaluating the measurement properties, Carreon et al.[5] recommended refining their version of the SAQ to a 14-item questionnaire based on two domains (appearance and expectations) with less burden for the AIS patient. This two-domain version has been translated and cross-cultural adapted into multiple versions of the SAQ and their measurement properties were evaluated [5,6,7,8,9]. Since no validated Dutch version of the SAQ exists, this study aimed to translate and cross-culturally adapt the recommended short English version of the SAQ into Dutch and to evaluate its measurement properties in AIS patients in The Netherlands.

Methods

The guidelines for cross-cultural adaptation of Beaton et al. [10] were used to translate and adapt the appearance and expectations domain of the SAQ (14 items; recommended short English version [5]) and the instructions into Dutch. The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) Study Design checklist for PROMs [11] and the quality criteria of Terwee et al. [12] were used to assess the methodological quality and measurements properties of the Dutch SAQ. The study was approved by the institution’s internal investigational review board. Exemption for ethical approval was obtained by the medical ethical committee of the Radboud University Medical Center (file number: 2021–13280). Informed consent was obtained from all patients and/or parents or caregivers.

Translation, cross-cultural adaptation and pretest

The multistep approach of Beaton et al. [10], consisting of translation, synthesis, back translation, expert committee review and pretesting, was used (Appendix 1). The expert committee developed the prefinal version of the Dutch SAQ. This version was tested by 30 consecutive AIS patients in three hospitals in the Netherlands. The expert committee discussed the patients’ feedback on the prefinal version and made a few adjustments to the questionnaire (Appendix 1). The patients’ impossibility to evaluate their physical appearance from all sides, including from behind and while bending forward, could not be solved during this phase. After the adjustments, the final version of the Dutch SAQ (Appendix 2) was determined and tested for its measurement properties.

Study plan to evaluate the measurement properties

From December 2021 to October 2022, a cross-sectional study was conducted involving 113 AIS patients from four hospitals in the Netherlands. The sample size was based on the COSMIN checklist (i.e., ≥ 100 participants and seven times the number of items (14 items)) [11]. AIS patients ranging from 10 to 21 years, with all treatment types (observation, brace treatment and surgery), and who were able to read Dutch were included. Diagnosed psychiatric disorders were an exclusion criterion, due to possible interference with perception of appearance. Study data was obtained through surveys (patient and physician) using Castor EDC (Electronic Data Capture)[13]. Baseline characteristics included age, gender, curve type (according to Lenke classification), coronal Cobb angle, Angle of trunk rotation (ATR; measured with scoliometer during forward bending test) and Risser stage. Measurements were performed on the most recent radiograph by physicians. Thirty-four patients completed the Dutch SAQ twice within a two week interval, for test–retest reliability.

Self-report measures

The patient survey consisted of Dutch versions of the SAQ, SRS-22R [14] and Numeric Pain Rate Scale (NPRS; back pain). Scoring was performed according to the guidelines of the questionnaires. All patients completed the questionnaires without missing items. The SAQ consists of two domains: appearance (10 pictorial items) and expectations (4 items). The five pictorial answer options of the appearance domain show varying severities of trunk and spinal deformities, scored from 1 (best) to 5 (worst). An exemption is item 9 (position of the head), where score 1 shows backward head position and 5 shows forward head position. The expectations domain is scored from ‘not true’(1) to ‘very true’(5). Scores for the appearance and expectations domain range from 10 to 50 and from 4 to 20, respectively. The total score consisted of the sum of both domains.

The SRS-22r is a condition-specific quality of life measure consisting of 22 items divided into five domains: function, pain, self-image, mental health, and satisfaction with management [14]. Each domain consists of five items scored from 1 (worst) to 5 (best), except for satisfaction with management which has two items. The average domain scores (total domain score divided by the number of items) range from 1 to 5, with higher scores indicating better patient outcomes. The 11-point NPRS (0–10) was used to assess back pain intensity.

Statistical analysis

Study data are presented as percentages for categorical variables, means with standard deviations for normally and median and ranges for non-normally distributed continuous data. Distribution of the study variables was assessed by the Kolmogorov–Smirnov test. Missing data of baseline characteristics were, if present, clearly described. All statistical analyses were performed in SPSS (IBM® SPSS® Statistics version 27) and a p-value < 0.05 was considered statistically significant.

Measurement properties

Floor and ceiling effects

Floor and ceiling effects of the SAQ were assessed for domain and total scores, and were considered present if > 15% of patients achieved the lowest or highest score, respectively [12].

Internal consistency

For each domain and the total score of the SAQ, a Cronbach’s α was calculated. A Cronbach’s α between 0.70 and 0.95 is considered as an acceptable value [12].

Reliability and measurement error

Reproducibility of the SAQ was assessed by test–retest reliability analysis of the first and second SAQ with Intraclass correlation coefficients (ICC, two-way random effects, single measurement, for absolute agreement). An ICC ≥ 0.70 demonstrates good test–retest reliability [12]. The standard error of measurement (SEM) and smallest detectable change (SDC) were calculated based on test–retest score differences, providing valuable information on the instrument’s reliability by indicating the range of the theoretical ‘true’ values. The standard deviation of this difference (SDdifference) was used to calculate the SEM (Formula 1) [15]. The individual SDC (SDCind) and the SDC for a group (SDCgroup) were calculated according Formula 2 and 3, respectively [15].

$$SEM= \frac{{SD}_{difference}}{\sqrt{2}}$$
(1)
$${SDC}_{ind}=1.96\cdot \sqrt{2} \mathrm{SEM}$$
(2)
$${SDC}_{group}=\frac{{SDC}_{ind}}{\sqrt{n}}$$
(3)

Hypotheses testing for construct validity

To analyze the construct validity, nine hypotheses were predefined, based on results of previous literature, if applicable, and expected differences between subgroups (Table 1). No previous usable correlations were found for hypothesis 6 and 7. A moderate positive correlation was expected between appearance domain (SAQ) and Angle of trunk rotation (hypothesis 6, Table 1), due to 3D-deformity of the spine and trunk in AIS. In general, AIS does not cause back pain and thus a weak positive correlation was expected between appearance domain (SAQ) and NPRS (back pain) scores (hypothesis 7, Table 1). Subgroups for hypothesis 6 and 7 were based on the median score of radiological (Cobb angle) or clinical (ATR) outcomes to create subgroups with adequate number of patients (> 50 patients [11]). To determine construct validity, ≥ 75% of the results should be in correspondence with the predefined hypotheses (≥ 7; Table 1) [12]. The strength of the correlations is interpreted as “weak” (r = 0.10–0.30), “moderate” (r = 0.31–0.50), or “strong” (r = 0.51–1.00) [16].

Table 1 Nine predefined hypotheses for construct validity analysis and the calculated correlations based on the predefined hypotheses

Structural validity

Exploratory factor analysis (EFA) with principal axial technique and oblique rotation (direct oblimin) was conducted to determine whether the items of the SAQ were covered by one or more factor(s) [12]. In the absence of a clear factor structure, additional EFA with two factors was performed, based on the two factor (two domain) loading previously found [5]. Data factorability was assessed by inspection of inter-item correlations and by calculation of the significance level of the Bartlett test of sphericity and Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy. A KMO value < 0.6 was considered inadequate [17]. Determination of the factors was based on eigenvalues (EV) > 1, factor interpretability, a scree plot, and unique variances of > 5%. Items with factor loadings < 0.4 and cross-loadings of a smaller difference than 0.2 to previous loading were removed, because of inadequate discrimination.

Results

Patient characteristics (Table 2)

Table 2 Patient characteristics (n = 113)

Of the 113 included patients, 24 (21.2%) were male. The mean age of the patients was 15.4 years (SD 2.2, range 10–21 years). The median major Cobb angle was 25.0° (range 10.0°- 68.0°). Fifty-eight patients (51.3%) had a Lenke 1 curve type.

Measurement properties

Floor and ceiling effects (Table 3)

Table 3 Domain and total scores of the self-report measures and floor and ceiling effects and internal consistency of the Dutch SAQ domains and total score (n = 113)

The median values of appearance, expectations domains and total scores of the SAQ were 20.0 (11.0–33.0), 12.0 (4.0–20.0) and 35.0 (15.0–48.0), respectively. No floor and ceiling effects were found for each domain and total score (Table 3).

Internal consistency (Table 3)

The Cronbach’s alpha, was 0.84 (appearance), 0.89 (expectations) and 0.85 (total score).

Reliability and measurement error (Table 4)

Table 4 Reliability of the short Dutch SAQ (test–retest; n = 34)

The test–retest reliability ranged from 0.76 (95%CI 0.57–0.87) to 0.77 (95%CI 0.59–0.88). The SEM for the total score was 4.11 and for the domains 2.23 (appearance) and 2.57 (expectations). The total score SDCind and SDCgroup were 11.40 and 1.96, respectively. The SDCind and SDCgroup for the domains can be found in Table 4.

Construct validity (Table 1 and Fig. 1 A and B)

Fig. 1
figure 1

Box-whisker plots for the Appearance domain scores of the SAQ (0–50), with in Fig. 1A) the comparison between two subgroups based on the median score of 25.0° for the Cobb angle (≤ 25.0°; > 25.0°), and in Fig. 1B) the comparison between two subgroups based on the median score of 9.0° for the ATR (≤ 9.0°; > 9.0°). Patients with a Cobb angle of > 25.0° had a significant higher score on the Appearance domain compared with patients with a Cobb angle of ≤ 25.0° (Z = − 4.837, P =  < 0.001). A significant higher Appearance domain score was found for patients with an ATR of > 9.0° compared with patients with an ATR of ≤ 9.0 (Z = − 4.480, P =  < 0.001). ATR angle of trunk rotation, n number, Q1 first quartile, Q3 third quartile *p < 0.05. A Cobb angle (degrees) Cobb ≤ 25.0 (n = 58): 14.0 (Q1), 18.0 (Median), 20.0 (Q3) Cobb > 25.0 (n = 55): 19.0 (Q1), 23.0 (Median), 25.0 (Q3) B Angle of Trunk Rotation (degrees) ATR ≤ 9.0 (n = 57): 15.0 (Q1), 18.0 (Median), 20.0 (Q3) ATR > 9.0 (n = 56): 18.0 (Q1), 22.5 (Median), 26.0 (Q3)

Eight of the nine (88.9%) predefined hypotheses of the expected correlations were confirmed. Only the hypothesis that the SAQ appearance domain correlated with NPRS for back pain, was not confirmed (r = 0.32, moderate positive correlation).

Structural validity (Table 5)

Table 5 Characteristics of the Dutch SAQ, including exploratory factor analysis

EFA showed a KMO measure of sampling adequacy value of 0.80 and a significant Bartlett's Test of Sphericity (χ2 = 771.07, df = 91, P < 0.001). Floor effects were found for 11 of the 14 items (19.5% to 48.7%), while ceiling effects were present in the four items of the expectations domain (23.9%–28.3%). The correlations between items score and total score ranged from 0.34 (item 9) to 0.82 (item 11). Item 1 to 10 loaded on factor 1 (EV 5.13; 36.63% explained variance), except for item 9 (factor loading < 0.4). Item 11, 12, 13 and 14 loaded on factor 2 (EV 2.27; 16.24% explained variance). EFA showed two evident factors, explaining 52.87% of the total variance and corresponding to the two factor structure of the original SAQ (appearance [factor 1] and expectations domain [factor 2]).

Discussion

In this study, the recommended short English version of the SAQ was successfully translated and cross-cultural adapted into Dutch. Its measurement properties are adequate in terms of validity and reliability to assess the appearance of AIS patients in The Netherlands. In this process, the guidelines by Beaton et al. [10] and Terwee et al. [12] and the COSMIN Study Design checklist [11] were followed.

No floor or ceiling effects were found for domain or total score of the Dutch SAQ (Table 3). This reveals that the questionnaire is able to discriminate among the lowest (best) and highest (worst) scores of the domains and the total questionnaire. The SAQ showed acceptable internal consistency, with Cronbach’s α ranging from 0.84 to 0.89, meaning homogeneity of the items. Furthermore, good reliability was found for the test–retest of the SAQ, with ICC’s between 0.76 and 0.77. This shows that the questionnaire is consistent over time in AIS patients. Unlike most previous SAQ translations and as recommended by the COSMIN checklist [11], this study used predefined hypotheses and factor analysis to assess the validity of this 14-item SAQ. Eighty-nine percent (8/9) of the associations (with SRS-22r and NPRS (back pain)) and differences between subgroups based on Cobb angle and ATR) were as expected. Significant higher scores for the appearance domain were found for both radiological (Cobb > 25.0°) and clinical based (ATR > 9.0°) subgroups. This implicates the potential discriminative ability of the appearance domain between patients with a Cobb angle ≤ 25.0° and > 25.0° and between patients with an ATR ≤ 9.0° and > 9.0°. EFA found a two-factor structure of this 14-item SAQ, supporting the previously described two-factor (two-domain: appearance and expectation domain) structure [5]. The two factors explained 53% of the variance, suggesting that other non-measured factors or domains are involved in the concept of patient-experienced appearance.

The calculated internal consistencies of the SAQ are similar to previous translations of this version of the SAQ (appearance domain ranged from 0.89 to 0.94, expectations domain 0.81–0.89, and total scores 0.88–0.91) [5,6,7,8,9]. The Danish, English, German, Spanish and Turkish versions found ICC’s between 0.84 and 0.98 (appearance), 0.67 and 0.97 (expectations), and 0.80 and 0.98 (total score) [5,6,7,8,9].

Although no floor and ceiling effects were found for the domain and total score, both effects were present for almost all items (except for item 1 and 9; Table 5). The discriminative ability of these items might be limited, because of the inability to distinguish within the group of patients scoring the lowest or highest score. In our study, factor analysis showed a factor load < 0.4 for item 9 (position of the head) and removal of this item could be considered. The internal consistency of the appearance domain (factor 1) marginally improved after omission of item 9 (Cronbach’s alpha improved from 0.84 to 0.85). One possible reason might be the non-identical sequence of the illustrations of this item, which may be difficult for patients to recognize this by themselves. Furthermore, the position of the head (sagittal view, item 9) might be more relevant for patients with i.e. kyphosis than with AIS. To our knowledge, no other publication including translations of the SAQ described considerations of omitting item 9. Further research is needed to determine if item 9's effect is unique to the Dutch version. Nevertheless, to maintain questionnaire comparability with other versions, all 14 items are included in the Dutch version.

When considering appearance in patients with AIS, the SRS-22r includes five questions about the patients’ appearance in general (self-image domain). The SAQ includes 10 pictorial questions about scoliosis-specific physical appearances. This study revealed that the SAQ appearance domain strongly correlates with the SRS-22r self-image domain (Table 1, hypothesis 1 [r = 0.55]). Both questionnaires complement each other and provide physicians comprehensive information about patients' appearance perception, which can affect quality of life and treatment outcomes. As such, the SAQ can provide additional detailed information about appearance perception in clinical practice.

Some limitations of this study should be mentioned. First, in this study, we used the appearance and expectations domain of the SAQ for cross-cultural adaptation into Dutch, as recommended by Carreon et al. [5], to reduce patient burden. The original SAQ [4] had 33 items and was subsequently reduced to 20 items with nine domains. Some translations [4, 18,19,20,21,22,23] used these versions with a different scoring method and thus comparisons with their results were not possible. A second limitation is related to the questionnaire itself, as it measures the patients’ perception of appearance although not every part of their deformity is directly visible to them. Sixty percent of the patients in the prefinal version testing phase were unable to see their own trunk and spine from all directions, indicating difficulties with the patients' perception of appearance. This feedback, along with Simony et al.'s [7]. reported difficulties with two items specific about shoulder and shoulder blade, can be useful for future research whether and how this influences the self-reported values. The third limitation was the limited sample size for subgroup analysis, based on Cobb angle and ATR (hypothesis 6 and 7; Table 1). To obtain an adequate number of patients in each subgroup (> 50 patients [11]), we created subgroups based on median scores. Future research should focus on larger sample size to be able to investigate clinical relevant threshold values for Cobb angle and ATR. Finally, future longitudinal studies are recommended to determine measurement properties as responsiveness and clinically relevant cutoff and change scores to define successfulness of scoliosis treatment.

Conclusion

The Dutch short SAQ is an adequate, valid and reliable instrument to evaluate the perception of appearance and thereby the health-related quality of life in AIS patients. It is a valuable condition-specific PROM and we recommend it for use in future AIS research to evaluate the outcome of all types of scoliosis treatment in patients with AIS.