Background

Back pain and neck pain are leading causes for years lived with disability [1] and have a large impact on individuals, their families, employers and healthcare systems [2]. Spinal pain starts early in life and its prevalence increases with age, in particular around the age of 12 to 15 years [3, 4]. Back pain in childhood and adolescence is a significant risk factor for developing back pain in adulthood: the number of days in a given year with low back pain (LBP) in adolescence was shown to be associated with the risk of developing LBP in adulthood [5]. However, research on spinal pain in childhood and adolescence shows large heterogeneity in assessing pain prevalence, pain intensity and associated disability [3], and validated assessment tools are scarce [6].

In 2013, the Young Spine Questionnaire (YSQ) was developed in Denmark as an instrument to measure spinal pain in the young population [7], though it has so far only been tested for content validity in preliminary versions during the developmental process in a population of Danish schoolchildren in the age range of 9–11 years [7]. To allow for the practical implementation of the YSQ, it must be shown to provide accurate, valid and interpretable data. Therefore, the assessment of instrument validity and reliability is essential [8, 9]. Because the YSQ was originally not designed for capturing change, it is not known whether it is suitable to measure change in a longitudinal study design [7].

Thus, the first aim of this study was to translate the YSG into German (G-YSQ) to obtain a standardized assessment tool of child and adolescent back and neck pain which can be used in Switzerland (and other German-speaking countries). The second aim was to test the G-YSQ for validity, reliability and responsiveness in children and adolescents between 10 and 16 years.

Materials and methods

Translation

After authorization by the original authors of the YSQ, the questionnaire was translated into German in five steps according to the guidelines by Beaton et al. [10] (Fig. 1).

Fig. 1
figure 1

The five steps in translating the English Young Spine Questionnaire into the German version according to procedures recommended by Beaton et al. [10]

Two native German speakers (two researchers) independently translated the questionnaire forward from English to German (stage I) and produced one German version (stage II). Two bilingual (English and German) individuals (two researchers) independently translated this German version back to English (stage III). From these two versions, one common prefinal version was produced by the four forward or back translators (stage IV), which was then pilot tested (stage V) with 17 children and adolescents between 9 and 15 years (mean age = 12.25 ± 1.70 years). They were asked to report any difficulties in completing the questionnaire on their own and to make suggestions for improvement if considered necessary.

During the translation process, two cultural adaptations were made to the G-YSQ compared to the YSQ: (i) the terms ‘chiropractor’ and ‘physical therapist’ were omitted, because it was questioned whether Swiss children are familiar with the chiropractic profession and because there is no direct access to physiotherapy in Switzerland; (ii), the terms ‘stepfather’ and ‘stepmother’ were omitted, because these terms are uncommon in contemporary Swiss German. During the pilot testing of the prefinal version of the G-YSQ, three adolescents (aged 13 and 15) reported that they would prefer text or scales instead of faces indicating pain intensity but none of the children and adolescents reported any difficulties in completing the questionnaire. Nevertheless, the statements of several participants during the main study indicated the necessity for an answer option ‘I don’t know’ in the questions about paternal back problems.

Study participants

Patients between 10 and 16 years of age with neck or back pain were recruited in seven private outpatient chiropractic clinics. The majority (76/100) came from one clinic specialized in the treatment of children and adolescents. Age- and gender-matched pupils (frequency matching) were recruited in four primary and secondary schools: After agreement of the teacher, two researchers from the Balgrist University Hospital, Zurich, Switzerland (FR, BW) personally informed the pupils about the study during a school lesson and distributed the information sheets. The study was approved by the ethics commission of the Canton of Zürich, Switzerland (BASEC-Nr 2016_00568) and was registered at ClinicalTrials.gov (NCT02955342). According to Swiss law, written informed consent of adolescents older than 14 years and their parents/legal guardians was required for participation. For participants between 10 and 14 years, oral informed consent of the children and adolescents and written informed consent of their parents/legal guardians was required.

Outcomes

There were three measurement time points: at enrolment (start), at two weeks, and at three months after start. The patients filled in the questionnaires at the start in the chiropractic clinics and received the two-week and three-month questionnaires by mail from the coordinating researchers at the Balgrist (FR, BW). To acquire follow-up data of the control participants, the same researchers physically distributed the questionnaires at start, after two weeks and three months at the schools, mostly during a lesson in physical education. Both groups answered the questionnaires at all time points in paper form. The questionnaires consisted of the G-YSQ and the KIDSCREEN-10: the YSQ/G-YSQ assesses pain frequency, week pain prevalence, point pain prevalence and pain intensity (revised Faces Pain Scale (rFPS) [11]) in each of the three spinal regions (each visualized in a drawing) and asks about pain-related consequences (school absenteeism, activity restrictions in sports and care seeking behavior) as well as parental back problems and related work absenteeism [7]. The KIDSCREEN-10 [12, 13], including ten questions on physical and emotional well-being and one question on self-rating of general health, was used to estimate the impact of spinal pain on health-related quality of life (HRQoL). After two weeks, the control participants additionally answered a question on whether the state of their back problem remained stable during the past two weeks, and the patients completed the Patients’ Global Impression of Change (PGIC), which evaluates the patients’ rating of overall improvement on a seven-point Likert scale [14]. For the present study, PGIC was dichotomized (improved=’very much better’ or ‘much better’) [15, 16]. The assessment after three months included the G-YSQ, the KIDSCREEN-10 and the PGIC (patients only).

Data analysis

Several participants reported spinal pain in the question on pain frequency, but reported a pain intensity of zero. This was observed for all spinal regions (neck: N = 8; midback: N = 7; low back: N = 8). Incompletely filled in questionnaires were included in all analyses for which they provided values. Pupils who previously sought medical care for back or neck pain were analyzed in the patient group because their care-seeking due to back or neck pain was considered an indicator for a back or neck problem. The G-YSQ data was analyzed by calculating sum scores across the three spinal regions for pain frequency (0 = no pain; 1 = yes, once in a while; 2 = yes, once or twice; 3 = yes, often), pain prevalence (0 = no, 1 = yes) and pain intensity (first face/no pain = 0, sixth face/very much pain = 5), resulting in a pain frequency sum score (range 0 to 9), a week prevalence and point prevalence sum score (each ranging from 0 to 3) and a pain intensity sum score (range 0 to 15). The sum score of the KIDSCREEN-10 was calculated according to the KIDSCREEN handbook [12].

Validity

Construct validity was investigated using the start data by testing a priori hypotheses [17, 18]: (i) The sum scores of pain intensity, frequency and prevalence (week, point) are higher in patients than in controls; (ii) The sum scores of pain intensity, frequency and prevalence (week, point) are higher in older (> 12 [19,20,21]) than in younger controls; (iii) Self-rating of general health via the KIDSCREEN-10 significantly differs between patients and controls and strongly correlates (rS>0.6) with the sum scores of pain intensity, frequency and prevalence; (iv) The KIDSCREEN-10 total score moderately correlates (rS>0.4) with the sum scores of week and point prevalence (all measures refer to the last week); (v) Participants with severe pain report low general health and low total scores on the KIDSCREEN-10. To test this hypothesis, the participants were sub-grouped based on pain severity, defined by pain frequency and intensity, across spinal regions [22]. The resulting three severity levels ‘no pain’ (frequency: never, once or twice or once in a while; intensity: lowest two intensities on the rFPS), ‘severe pain’ (frequency: once in a while or often; intensity: highest three intensities on the rFPS) and ‘moderate pain’ (in between) were combined to an ‘overall spinal pain composite variable’ [22] with five levels: ‘multiple severe pain’ (‘severe pain’ in two or three spinal regions), ‘one-sited severe pain’ (‘severe pain’ in one spinal region), ‘multiple moderate pain’ (‘moderate pain’ in two or three spinal regions), ‘one-sited moderate pain’ (‘moderate pain’ in one spinal region), and ‘no pain’ (‘no pain’ in all spinal regions). If pain severity differed between the three spinal regions, the region with the most severe pain was used for the overall spinal composite variable. The hypotheses (i) and (ii) were tested using Mann-Whitney U-tests, hypothesis (iii) using Mann-Whitney U-test and Spearman correlation, and hypothesis (iv) was tested using Spearman correlation. Spearman’s coefficient values (rS) were interpreted as excellent (> 0.9), good (0.7–0.9), moderate (0.5–0.69), fair (0.2–0.49), or minimal to absent (0.0–0.19) [23]. Hypothesis (v) was tested using a one-way ANOVA (KIDSCREEN-10 total score) and Kruskal Wallis test (self-rated general health) and post-hoc tests in case of significance, between the five levels of the ‘overall spinal pain composite variable’.

Reliability

Test–retest reliability was assessed based on the agreement between start data and data after two weeks. For the reliability of week and point prevalence, the data of those children and adolescents who reported no or stable spinal pain was used. Unweighted Cohen‘s Kappa and Intraclass correlation (ICC)(3,1) was used to analyze categorical and ordinal data, respectively [17]. Kappa values were interpreted as 0.01–0.20 none to slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect [24, 25]. ICC-values > 0.90 were considered excellent, 0.75–0.90 good, 0.50–0.74 moderate and < 0.50 as poor [26].

Responsiveness

To test responsiveness, the area under the curve (AUC) of the receiver operating characteristic (ROC) curve was calculated as a measure to discriminate between two groups according to an external gold standard [17]. The ROC curve is a probability curve in which each value represents the sensitivity versus 1-specificity for all possible cut-off points. The AUC represents the integral under the ROC curve fitted through these points and serves as a measure of discrimination. A value of 0.5, represented by the diagonal, indicates that the measurement instrument has no discrimination capacity to distinguish between the two groups, and an AUC greater 0.70 is recommended for sufficient responsiveness [17]. In the present study, perceived recovery after three months (PGIC dichotomized) was used as external measure, and data of patients at start and after three months were analyzed.

The statistical analyses were conducted using R (version 3.5.0) for the validation measures and IBM SPSS (version 25) for the remainder. The significance level alpha was set at 0.05.

Results

Data was collected between January 2017 and February 2019. In total, 240 participants (166 females and 74 males; mean age = 13.05 ± 1.70 years) were included, namely 100 patients recruited in chiropractic practices and 140 pupils in schools. Of the 140 pupils, 109 had never sought medical care because of spinal pain and served as control participants. 31 pupils had previously sought care for spinal pain and were included in the patient group alongside the 100 patients from the chiropractic practices, resulting in 131 individuals in the patient group. At start, all 240 participants filled in the questionnaires. After two weeks, 106 patients (response rate = 81 %; sum score of pain frequency at baseline: responders median = 6, non-responders median = 4) and 85 control participants (response rate = 78 %) answered the questionnaires. After three months, 95 patients (response rate = 73 %; sum score pain of frequency at baseline: responders median = 6, non-responders median = 5) and 101 control participants (response rate = 93 %) could be reached. Detailed information about the study population is presented in Table 1.

Table 1 Sociodemographic characteristics of study participants at start, after two weeks and three months

Scores per spinal region and the calculated sum scores are shown in Table 2.

Table 2 Scores per spinal region and sum scores of the German version of the Young Spine Questionnaire at start, after 2 weeks and 3 months

Validity

Patients reported significantly higher sum scores of pain frequency (p < 0.001), week prevalence (p < 0.001), point prevalence (p < 0.001) and pain intensity (p < 0.001) compared to the control participants (Table 2). Control participants older than 12 years had significantly higher sum scores of pain frequency, week prevalence and pain intensity compared to the younger controls, indicating that the G-YSQ captures the natural age-related increase in spinal pain prevalence [3, 4]. In contrast, older and younger control participants did not significantly differ in point prevalence (Table 3).

Table 3 Comparison of sum scores of the G-YSQ between control participants older than 12 years and control participants aged 12 or younger

Significantly fewer patients than controls reported excellent general health (p < 0.001) (Table 4).

Table 4 Self-rated general health in the two cohorts

Lower self-ratings of general health were associated with higher sum scores in the G-YSQ (Fig. 2): the correlations between self-rated general health and the pain frequency sum score (rs=0.46), the pain prevalence sum scores (week prevalence: rs=0.42; point prevalence: rs=0.28) and the pain intensity sum score (rs=0.39) were moderate to fair.

Fig. 2
figure 2

Pain frequency sum score (A) and pain intensity sum score (B) in the five categories of self-rated general health

The two groups showed comparable KIDSCREEN-10 total scores [control participants 50.56+/-9.27; patients 50.24+/-8.33; t(233) = 0.28, p = 0.778], and the KIDSCREEN-10 total scores were fairly correlated with the sum scores of week prevalence (rs=-0.30) and point prevalence (rs=-0.20) (Fig. 3).

Fig. 3
figure 3

Scatter plots of pain frequency sum score (A) and pain intensity sum score (B) against KIDSCREEN-10 total score

The pain severity subgroups significantly differed in the KIDSCREEN-10 sum score [F(4,230) = 7.26, p < 0.001] and in the self-rating of general health [H(4) = 51.94, p < 0.001]. Post-hoc tests indicated that the ‘no pain’ and the ‘one-sited moderate pain’ groups reported significantly higher KIDSCREEN-10 sum scores and higher ratings of general health than the ‘one-sited severe pain’ and the ‘multiple severe pain’ groups (p-values for sum score = 0.001 and 0.005; p-values for general health < 0.001). The ‘multiple moderate pain’ subgroup did not significantly differ from any of the other pain groups in terms of the KIDSCREEN-10 sum score, but rated general health significantly better than the ‘multiple severe pain’ subgroup (p = 0.021): e.g. 28 % of the participants in the ‘multiple moderate pain’ subgroup, compared to 5 % in the ‘multiple severe pain’ subgroup, rated their health as excellent (Table 5).

Table 5 Health-related quality of life in the five pain severity subgroups

Thus, although the correlations with the KIDSCREEN-10 sum score and the self-rated general health were lower than expected, all hypotheses could be confirmed, which indicates good construct validity of the G-YSQ.

Reliability

The G-YSQ showed good reliability in terms of pain intensity and pain frequency (Table 6). No pain or stable pain during the past two weeks was reported by 83 control participants, and the reliability of week prevalence in those data was moderate for all spinal regions (the neck, the middle and the lower back) as well as for the sum score. Reliability of point prevalence was fair to moderate for the neck, the middle and the lower back, and moderate for the sum score. The reliability of the questions on the consequences of spinal pain was good for school absence and doctor visits and moderate for restriction in sports. All questions on parental spinal pain showed moderate reliability (Table 6).

Table 6 Reliability of all domains of the German version of the Young Spine Questionnaire

Responsiveness

Of the 68 patients who answered the PGIC after three months (27 patients answered the G-YSQ, but did not fill in the PGIC), 61.8 % reported clinically significant improvement. AUC was 0.69 (95 %CI = 0.57–0.82) for the sum of week prevalence and 0.67 (95 %CI = 0.54–0.80) for the sum of point prevalence (Fig. 4), indicating insufficient responsiveness.

Fig. 4
figure 4

Receiver operating characteristic (ROC) curve and corresponding area under the curve (AUC) for week prevalence (A) and point prevalence (B). Confidence interval of AUC is shown in brackets

Discussion

The aims of this study were to translate the YSQ into German according to scientific guidelines and to determine construct validity, test-retest reliability, and responsiveness of the G-YSQ. The translation process of the original YSQ into German was straightforward apart from two minor issues in terms of cross-cultural adaptation. Except for the question on point prevalence, the G-YSQ was shown to possess construct validity and sufficient test-retest reliability, but its responsiveness needs to be improved.

Although the study population of the present study was older than the population for which the original questionnaire was designed, the G-YSQ discriminated well between patients and controls and between older and younger children and adolescents, depicting the natural age-related increase in spinal pain prevalence [3, 4]. The G-YSQ (week and point prevalence) correlated fairly with the KIDSCREEN-10 sum score. This finding might reflect a weaker association between back pain in childhood and adolescence and psychosocial issues than commonly believed [27,28,29,30,31,32] because eight of the ten questions of the KIDSCREEN-10 are on psychosocial topics and only two on physical health. Self-rated general health was better in the control group. However, the correlation to pain frequency, pain intensity and week prevalence was fair and weaker than hypothesized, which might reflect that the majority of children and adolescents, approximately 9 out of 10 [33], are not seriously impacted by their back pain [33, 34]. Nevertheless, a minority is and identifying this minority, i.e. differentiating between trivial and consequential back pain, is one of the biggest challenges with respect to back pain in childhood and adolescence [35]. Although most spinal pain in childhood and adolescence is self-limiting [36] or ‘trivial’ [35], pain problems in children and adolescents in general [37], and back pain [38] and LBP [33] in particular, are associated with lower health-related quality of life. A decline in quality of life might be an early indicator of potential vulnerability during development [39] and might be a symptom of a multidimensional process [33], possibly leading to consequences in adulthood (‘consequential’ pain [35]). To avoid medicalization of the problem and to target treatment, consequential back pain should be differentiated from trivial back pain in childhood and adolescence [35]. In the present study, the cut-off for potentially consequential spinal pain as defined by a significant impact on health-related quality of life was having severe pain (one-sited or multiple), and 32 % of the controls belonged to these subgroups. The present study shows that both, pain frequency and intensity, need to be assessed in each spinal region to capture all children and adolescents whose spinal pain impacts their quality of life and might therefore be consequential. Whether these individuals will indeed develop a back problem in adulthood needs to be verified in longitudinal studies.

The G-YSQ also showed test-retest reliability, particularly for pain intensity and pain frequency. The lower values for validity and reliability of the question on point prevalence might reflect the natural clinical course of spinal pain in childhood and adolescence, which appears to fluctuate [40], rather than a deficiency of the assessment tool. Nevertheless, regarding moderate reliability, deficiencies in construct validity and insufficient responsiveness, the benefit of asking for point prevalence is questionable, and omitting this question is recommended. In its present form, the YSQ/G-YSQ cannot be recommended to be used to measure change. This is not surprising given that the answers to the only questions possibly reflecting change (presence or absence of low back pain, mid back pain or neck pain during the last week) are binary (yes/no). Refining this dichotomous structure similarly to the question on pain frequency (‘often’, ‘once in a while’, ‘once or twice’, ‘never’) might enhance the responsiveness of the YSQ/G-YSQ. Similarly, it might be worth attempting to address the problem of inconsistent answers to the question on pain frequency and corresponding pain intensity by rearranging these items (rFPS for pain intensity directly following the question on pain frequency).

The age range of 10 to 16 years was selected as it appears to be the crucial period for developing spinal pain [3, 4]. Nevertheless, this implies that the findings cannot be generalized to younger children. Pupils who indicated that they have seen a doctor because of back or neck pain (N = 31) were analyzed in the group of patients because we considered them as having a back or neck problem. However, these participants did not necessarily have back pain at the time of the survey. We therefore repeated the analysis without these pupils. Results were similar in terms of group differences, ICC- and AUC-values. Group differences became even more pronounced without these pupils. Three adolescents (aged 13 and 15) of the pilot phase reported that they preferred indicating pain intensity on a scale rather than on the rFPS. There is no optimal pain assessment to be used throughout development from childhood to adolescence [41, 42]: for school-aged children (from 8 to 12 years) the rFPS is recommended [42]. Adolescents prefer visual analogue scales (VAS) or numeric rating scales (NRS) [42], although faces scales have been reported to also be well accepted [41, 42]. A supplementation of the rFPS with a VAS or NRS might be useful to make the YSQ/G-YSQ applicable throughout childhood and adolescence. Not all participants could be reached for the follow-up after two weeks and three months. Because data of pupils were collected at schools, these missing values were most likely at random. However, a closer look at differences between responding and non-responding patients revealed that they were comparable in terms of gender and age. There was a tendency for the responders to have more frequent and more intense back or neck pain at start compared to the non-responders. Because only the start data were used for validity assessment and the assessment of reliability and responsiveness involved only intra-subject comparisons, it seems unlikely that data from the non-responders would have changed the findings. Another study limitation might have been that the first re-assessment took place after two weeks, although the question on week prevalence in the YSQ and the questions in the KIDSCREEN-10 refer to the last week. However, there is no standard for an ideal time period for testing reliability, and a time interval of two weeks is commonly used [17].

Conclusions

The G-YSQ was shown to possess construct validity and sufficient test-retest reliability to assess back pain in children and adolescents between 10 and 16 years of age. To measure responsiveness, the YSQ/G-YSQ has to be adapted, possibly by asking for pain frequency instead of (dichotomous) pain prevalence during the last week. Severe pain, either in one or multiple spinal sites, was associated with reduced health-related quality of life and might be consequential, which needs to be verified in prospective studies from adolescence into adulthood.