Introduction

Scoliosis is characterised by a three-dimensional structural misalignment of the spine [1]. Scoliosis is defined as idiopathic (IS) after ruling out specific causes and accounts for 80% of cases [2]. Adolescent idiopathic scoliosis (AIS) is the most common type of IS affecting 2–3% of the population [1, 3, 4]. IS curves progress faster during puberty [5]. Although curve progression slows at the end of growth, curves over 30° can progress 0.5°/year in adults [6] and relate to back pain [7]. Curves over 50o in adults can progress by 1°/year [8]. However, there is variability in curve progression during growth, which relates to numerous factors [9].

Natural history is the progression of a disease over time when untreated [10]. Understanding natural history helps inform treatment selection or avoid overtreatment [11]. For IS, scoliosis-specific exercise is recommended in small curves in skeletally immature patients, exercises and progressively more aggressive brace treatments are recommended for moderate and severe curves in 10% of growing children, and invasive corrective surgery is recommended in severe curves at risk of continued progression in adulthood for 0.1–0.3% of cases [1].

Natural history studies of IS report variable curve progression between individuals [12,13,14,15]. A recent meta-analysis showed that 42–49% of juvenile and adolescent IS show curve progression [16]. Many predictors are identified such as age, Risser sign, and baseline curve severity, but they were not consistently studied. Noshchenko et al. also meta-analysed studies predicting curve progression. While they identified some predictors, they concluded that no prediction method could be recommended for clinical use [17]. Low-grade evidence supported age, curve pattern, initial Cobb angle, and skeletal immaturity as predictors of progression and some advanced laboratory tests not available in many clinics [17]. These authors [16, 17] reported high heterogeneity of populations, follow-up durations, and outcomes leaving important uncertainty about predicting progression. Further, many prediction studies included patients receiving treatments, and few included both sexes.

Interestingly, the observation group data from the recent BrAIST trial on bracing offer modern insight into natural history [18]. It showed that 52% of untreated patients, more than anticipated based on previous studies [14], reached the surgical threshold by skeletal maturity. The BrAIST results, by demonstrating that bracing reduced progression to surgery to 28%, may prevent future attempts at studying patients without treatments until maturity. It would now be unethical to withhold this effective treatment. Nevertheless, more research is needed to predict, which patients are at risk of curve progression over different intervals based on personal characteristics.

Many patients reach our Institute with X-rays predating their initial clinical examination with a specialist collected while under observation (untreated). This allows monitoring curve progression under natural history conditions. While this may present limitations, we must remember it is now unethical to withhold treatments. Thus, we aimed to develop two models to predict IS curve progression using data systematically collected from patients with multiple radiographs while previously untreated. Our first model aimed to predict curve progression at a future timepoint of interest selected by the clinician from clinical and radiographic data collected at an initial examination without prior data available. Our second model aimed to predict curve progression from data collected at both an initial encounter with a scoliosis specialist and at prior examination while untreated. We hypothesised that simple clinical and radiographic predictors could predict future curve angles with good precision.

Material and methods

Study design and ethics

This is a retrospective study of natural history using data predating and from an initial consultation visit at specialized scoliosis clinics over an untreated interval. Ethics approval was granted by the Health Research Ethics Board in Milan and at University of Alberta.

The clinical records predating the initial assessment are prospectively collected by clinicians during routine clinical assessments of scoliosis at our tertiary care Institute in Italy. Patients were also asked to bring their previous radiographs when visiting their specialist. At the time of case selection, records from 22,387 patients were available. During consultation, the specialist prescribes treatment or refers to surgery as indicated by the SOSORT guidelines [1]. Patients from the Institute are a representative sample of patients referred to specialized scoliosis clinics. The Institute’s clinics are one of the main referral destinations in Italy for conservative scoliosis treatment. In many regions, a similar clinic is not available by the National Health System. Discounted fees available for low-income families ensure full representation. Patients are referred to the Institute by their family physician (10%), a medical specialist (18%), other professionals (12%, physiotherapists, orthotists, etc.), friends or family (35%), and 26% accessed the clinic directly.

Participants

We included records from children and adolescents with idiopathic scoliosis from 6 to 25 years old, previously untreated at the time of their first consultation at our Institute, and with at least one prior X-ray available. Some patients with a first consult after skeletal maturity were included for predicting curve change during the full growth spectrum. Nevertheless, we set an upper age limit to avoid including those with curve progression long after bone maturity. When scheduling their initial consultation, patients were asked to bring all documentation related to their scoliosis clinical history including copies of any prior radiograph, if available. Patients without a recent X-ray (within three months) were prescribed a new radiograph.

We used the following inclusion criteria:

  • With a radiograph taken at the initial specialist consultation and with at least one measurable frontal full-spine radiograph obtained before this consultation.

  • Untreated (under observation) prior to their initial specialist consultation or treated with general exercises.

  • Aged 6–18 years old at the first available prior radiograph.

  • Diagnosed with idiopathic scoliosis with a Cobb angle over 10° and evidence of a rib hump or a lumbar prominence at the initial specialist consultation.

The following exclusion criteria were used:

  • Previous treatment using scoliosis-specific exercise, or bracing

  • Previous spine, thoracic, pelvic, abdominal, or lower extremity surgery or dysfunctions unrelated to scoliosis.

  • Any disease known to cause secondary scoliosis or history of trauma affecting the spine and lower extremities.

Scoliosis-specific exercise was operationally defined as per SOSORT recommendations as a program including autocorrection in 3D-tailored individually to the patient’s curve characteristics, focused on stabilizing this corrected posture, offering education on how the specific patient’s scoliosis is affecting their posture/activities, and teaching integration of this posture correction in activities of daily living [1]. A Cochrane systematic review [19], the report of the US preventative task force on screening [20], a recent overview of systematic review [21], and studies on the effect of the Scientific Exercise Approach for Scoliosis (the scoliosis-specific exercise approach used in our clinics) [22, 23] found that scoliosis-specific exercises have significant effects on curve severity and would represent a confounding factor in a prediction study. In contrast, the effect of general exercises (no individualised posture corrections) on curve severity does not differ from observation [24].

Image acquisition

Participants were asked to bring copies of their available radiographs as a full-size film, high-resolution paper print, or digital radiograph. At initial consultation, the specialist measured these images and entered results in the clinical record and/or took high-resolution digital pictures of these images hung on the view box while maximally zooming in. These digital photographs or the digital radiographs were included in the clinical records.

Extraction of candidate predictor variables

The medical history captured at specialist consultations included: age, treatments received for scoliosis (if any), and prior torso and lower extremity surgeries. A physical assessment ruled out other causes of scoliosis. Clinicians from the Institute complete training by conducting about 300 consultations with an expert colleague. During training, their clinical information is reviewed comparing current guidelines, treatments recommended, and the patient’s characteristics.

The radiographic measurements were extracted by the specialist using MicroDICOM Viewer or Surgimap software. For 17% of radiographs predating the initial consultation, measurements had not been extracted. These radiographs were retrieved and measured by a specialist blinded to other radiographs, measurements, and to the aim of study. Most radiographic measurements had been done by the clinician during the consultation visit. To verify the reliability of these recorded radiographic measures, a random sample of 297 images was extracted and measured by a blinded expert assessor. The major curve Cobb angle of this subsample was 27.4 ± 7.8° for the clinicians and did not differ significantly from 26.7 ± 7.9° for the independent assessor. The Pearson correlation was 0.87.

A careful check of the routinely collected data was done by two expert clinicians as follows: 1) retrieval of all images to be measured, 2) completion of the missing radiographic measures, and 3) checking all clinical records against the eligibility criteria, particularly to exclude those previously treated before accessing our Institute. We asked for information first from the treating physician and then from the patient if in doubt. We excluded all patients with uncertain prior treatments.

The following variables were used as candidate predictors:

Sex coded as 1 for females and 0 for males.

Maximum Cobb angle at the first available radiograph was the largest of any curve angle measured on the first radiograph available while untreated prior to the specialist’s consult. The Cobb angle was measured between the upper endplate of the upper end vertebra and the lower endplate of the lower end vertebra [25].

Maximum Cobb angle was the largest Cobb angle of any curve measured on a radiograph while untreated at the first consultation (occurring after the first available radiograph).

Time was measured in half-years and was defined during modelling as the time elapsed between the first available radiograph and the time of the outcome determination (i.e. last radiograph while untreated). For future users of the model, this should be defined as the time in the future at which a clinician wishes to predict the curve angle from the visit at which the predictor data are collected. The square (Time2) and the cubic form of time (Time3) were also investigated to explore nonlinear associations with future curve severity. This was explored because of the description by Duval-Beaupère of slow progression before puberty, rapid progression during puberty, and slower progression after maturity [26, 27].

Age at the initial specialist consultation was recorded in years.

The European Risser grade was recorded for each radiograph. A 0 indicates no ossification of the iliac crest, 1 reflects appearance of the ossification nucleus, 2 indicates its expansion to start covering the iliac crest, and 3 indicates full coverage. A 4 indicates the beginning of fusion of the growth plate, and 5 indicates full fusion (skeletal maturity) [4, 28].

Dummy variables defined 5 curve types using single thoracolumbar or lumbar curves (TLL = apex < T12) as reference. The apex location for curve over 10° defined curve types following the Scoliosis Research Society description [4] including: double curves (thoracic and TLL or L apex), no curves (< 10°), single thoracic (T5-T11/12 apex), and OTHER curves (combining single upper thoracic [apex above T5], double thoracic [only 2 apices above T11/12], or triple or quadruple curves [3 or 4 apices]).

We defined interactions terms between time and some candidate predictors because their effect may change over time. Thus, interactions of the first available maximum Cobb angle with time, quadratic time, and cubic time, of time with sex, and time with Risser grade were tested in the model.

Statistical analysis

Two linear mixed-effect models (an extension of simple linear models) with random effects (SAS procedure MIXED) and maximum likelihood estimate were used to predict the dependent variable, the scoliosis Cobb angle, if untreated at a given time. The independent variables included in the model equations were different for each model. The first model corresponds to when a clinician encounters a patient for the first time and can only use information from this initial consultation to predict future curve severity (only uses the radiograph from this specialist consultation). This linear mixed-effect model examined the effect of the following independent variables: age at this first specialist consultation, sex, maximum Cobb angle at this first specialist consult, time (between the first specialist consult and the prediction target timepoint [using the last untreated Institute visit for modelling]), Risser grade, and curve type while accounting for repeated measures from the same patient. Interactions of the maximum Cobb angle at the first specialist consult with time, quadratic time, and cubic time, of time with sex, and of time with Risser grade were also tested.

The second model predicts future curve severity when, at the initial encounter with the specialist making the prediction, a patient has prior radiographs available while untreated. This model examined the effect of the following independent variables: sex, age, and maximum Cobb angle at the first available visit, Max Cobb angle at the consultation when prediction is made, time (between the first available radiograph and the outcome visit), Risser grade, and curve type while accounting for repeated measures from the same patient. The same interactions as for model 1 were also tested. For both models, the variance component’s structure was used as covariance matrix. Each model’s goodness of fit was evaluated by the smallest Akaike information criterion (AIC) and Bayesian information criterion (BIC).

Internal validation

Datasets for each model were divided into 10 groups of 232 participants for tenfold cross-validation. Each round, a model was fit on the subset of non-selected participants (2000+) and tested on the subset of participants selected for this round (≈232). Each participant was selected exactly once for testing. Therefore, each patient contributed to an understanding of how the model performs predicting some new data. To test prediction accuracy during the validation, we estimated the precision of the standard prediction intervals. We also estimated the proportion of observed values within an interval of a specified width centred at the predicted values obtained from the model. We estimated the proportions within the recognized radiographic measurement error (± 5°) threshold [29], as well as within 10° and 15°. Statistical analyses were performed using SAS Ver.9.4 (SAS Institute Inc., Cary, NC, USA).

Results

Sample characteristics

We included records from 2317 patients of which 83% were females. In this group, 71% of cases had only 1 prior radiograph, 21.1% had 2, 5.6% had 3, and 1.9% had 4 or more (with maximum 8). Their mean age was 13.9 ± 2.2yrs (median 13) ranging from 6.9 to 24.8 years old where 81.4% had an AIS diagnosis with the rest presenting juvenile idiopathic scoliosis.

Curve type at the time the outcome was recorded was: 49.8% double, 25.8% thoracolumbar-lumbar, 16.2% thoracic, and 8.1% other. Curve types on all the 3255 prior radiographs combined were as follows: 40.4% double, 26.8% thoracolumbar-lumbar, 17.8% single thoracic, 7.2% with no curve over 10o, and 7.9% other types. Cobb angle at the first available X-ray was 20 ± 10° (median 18, range 0–80°) vs 29 ± 13° (median 26, 6–122°) at the predicted outcome visit with a mean change over this interval of 9.6 ± 9.7° (median 8°, -10 to 72°). Time between the first X-ray and the outcome determination was 27.6 ± 22.2mths (Table 1).

Table 1 Description of the sample characteristics

Model 1 predicting future curve angle using clinical data from only an initial consult (only uses the radiograph from this specialist consultation)

A larger Cobb angle at the initial consult, a longer time to the desired prediction timepoint, and a curve type other than a single thoracolumbar or lumbar curve, all predicted larger future curve angles (Table 2). Specifically, curve types associated with the largest future Cobb angles were single thoracic curves, followed by double curves, and then, similar results were observed in those with curves below 10o and the category combining all the other curve types. The effect of time2 combined with other time variables (time and time3) such that larger relative increases in curve angles per time unit were predicted at short compared to longer-term intervals. In contrast, older age at the initial consultation, larger values for the interaction between the time to the prediction and the initial Risser grade and between this time and being a female rather than a male predicted lower future Cobb angle values. Sex or Risser grade alone and time3 were not statistically significant predictors of future curve angles in the model.

Table 2 Linear mixed-effects model to predict maximum curve angle at a future time of interest using clinical data from only an initial consultation (without prior radiographs while untreated)

Tenfold cross-validation found a median error of 5.5° (worst interquartile range limits 2.7–9.9°). The prediction accuracy described as percent of observed values falling within 5°, 10°, and 15o of predicted values were 47%, 80%, and 91%, respectively.

Model 2 predicting future curve angle using clinical data from both an initial consultation and from a prior radiograph while untreated

In the best model, larger values of the following variables predicted larger future curves: maximum Cobb angle at the first available prior visit, maximum Cobb angle (at initial consult when prediction is made), and combined effect of time to the target prediction from the first available visit (time, time2, and time3 in half-years). Larger values on the following variables predicted a smaller future maximum Cobb angle: age (in years) and Risser at the first available visit, time*Risser interaction, and time*female sex interaction. Sex alone was not a statistically significant predictor of future curve angles in this second model (Table 3).

Table 3 Linear mixed-effects model to predict maximum curve angle at a future time of interest using clinical data from both an initial consultation and from when prior radiographs were available while untreated

Tenfold cross-validation found a median prediction error of 4.5° (worst interquartile range limits 1.8–8.9°). A proportion of 54.9% of the predicted values was within 5o of the true values, 84% were within 10°, and 94% within 15°.

Scenarios describing predictions of scoliosis severity according to Model 1 using only data from the initial specialist consultation

Figure 1 demonstrates the predicted curve severity after different follow-up durations using only data from the initial consultation (Model 1). Assuming a female with AIS was 12 years old at the initial consult, the figure shows that predicted curves get progressively more severe over time. Varying whether a single thoracic or thoracolumbar curve was small (15°), moderate (25°), or severe (35°), or if Risser was 0 or 4 at presentation affects predictions (Fig. 1A). For small curves, by age 16, only the single thoracic curve, skeletally immature at presentation, is predicted to exceed 30o at the end of the usable predicted time range (defined as the mean follow-up duration plus one standard deviation in our development sample). For 25° curves at initial presentation, none of the scenarios predicted progression above the surgical threshold of 45°. In 35° single thoracic curves at presentation, both Risser 0 and 4 were predicted to exceed the surgical threshold within 1 year, while only the immature thoracolumbar curve was predicted to reach this threshold and only after 3 years.

Fig. 1
figure 1

Predicted curve severity after different follow-up durations using only data from the initial clinical consultation (using Model 1) A in a female with AIS aged 12 years old at the initial encounter showing progressively more severe future curve if presenting with a small (15°), moderate (25°), or severe curve (35°) or with Risser 0 rather than 4, and when presenting a single thoracic (Th = highest risk) rather than a single thoracolumbar curve type (TLL = lower risk). B Lower future curve severity when females with AIS and a moderate single thoracic curve (25°) are older (12–15 years) and more skeletally mature (Risser 0, 1 or 4) at presentation to the initial consultation

Figure 1B outlines predictions for a female with AIS with a moderate single thoracic curve (25°) aged 12, 13, 14 or 15 years old and Risser 0 at the initial encounter, as well as, aged 15 years old with Risser 1 and 4 at initial presentation. It demonstrates that curve severity by a given age is predicted to be larger if this moderate curve was detected at a younger age while immature. Figure 1B also shows how progression is reduced by reaching higher skeletal maturity (Risser 4 vs 0 or 1) with differences becoming more marked over time.

Scenarios describing predictions of scoliosis severity according to Model 2 using data from both a first available visit while untreated and from the initial specialist consultation

Figure 2A demonstrates the difference between curves with a history of progression or not, and different severity at presentation to specialist consult in a common AIS patient: a 12-year-old female with Risser 0 at the first available visit. A small curve with a progression history is predicted to exceed 30° by age 17 (the end of the usable follow-up period). A moderate curve (25°) with a progression history is predicted to exceed the surgical threshold by the end of this period but not one without a history of progression before the first consultation. Yet, both a progressive and a stable large curve would exceed the surgical threshold without treatment in such a patient.

Fig. 2
figure 2figure 2

Predicted curve severity after different follow-up durations using clinical data from a prior visit while untreated and at an initial consultation with a specialist. A A more severe future curve, in a 12-year-old female, with Risser 0 on the first available radiograph, with progression detected at the initial specialist consult (by 5° vs no change) 6 months later, contrasting presenting a small 15°, moderate 25°, or large 35° at the first available visit. B A similar pattern but lower curve severity if Risser 1 at the first available visit. C More severe future curve angles in a female with progression at initial specialist consult (by 5°) 6 months after the first available visit, depending on whether the first available visit occurred at age 12, 13, 14, or 15, illustrating an important impact of skeletal maturity with minimal progression after Risser 4 compared to Risser 0 and 1

Figure 2B also illustrates the effect of curve severity and whether progressive by the initial consult but in a patient with Risser 1. Again, only the progressive small curve is predicted to exceed 30o but only by a few degrees and later in the follow-up. Neither the stable nor progressive moderate curve is predicted to reach the surgical threshold, while both the stable and progressive larger curves are still expected to exceed this threshold if untreated, albeit 6 m to 1 year later when presenting at Risser 1 (Fig. 2B) rather than 0 (Fig. 2A).

Figure 2C outlines predictions for a female with AIS with a prior 5o progression of a moderate single thoracic curve (25°) aged 12, 13, 14, or 15 years old and Risser 0 at the initial consultation, as well as, at age 15 with Risser 1 or 4. It demonstrates how predicted curve severity of a progressive curve by a given age is larger if this moderate curve was detected at a younger age while immature. Figure 2C shows how progression is reduced by reaching higher skeletal maturity (Risser 4 vs 0 or 1) before the initial consultation with only Risser 0 predicted to exceed the surgery threshold by the end of the 4.5 years usable prediction window.

Discussion

As hypothesised, we developed models using simple clinical variables collected from previously untreated patients both, at and before, or only at the initial specialist consultation to predict future curve severity at timepoints of the choice of the clinicians. The proposed model cross-validation showed good prediction accuracy with 80% or more of the true curve severity values falling within 10° of the predicted values. While the variables retained in the prediction models have previously been shown to predict curve progression in scoliosis, our models introduced a time variable allowing clinicians to determine the time at which they were interested in predicting the severity. This has many advantages including allowing predictions of whether and when a patient may reach a treatment threshold. This may allow planning when to introduce more aggressive treatments.

Further, introducing nonlinear effects of time allowed modelling the nonlinear increases of curve severity observed in growing patients with idiopathic scoliosis. With this approach, a single simple predictive model is sufficient to make prediction over the full growth period. Duval-Beaupère described, in patients with idiopathic and neuromuscular scoliosis, the slower progression of scoliosis before puberty, its rapid progression during puberty, and the slowed progression after maturity. [26, 27] Our models are among few models also accounting for the fact that curve progression occurs earlier in females than in males and that it slows down once presenting more skeletal maturity by introducing interaction terms relating the target prediction time to these two factors [17]. Our large sample also allowed the effect of curve types on future curve severity to be included in model 2.

Clinical application

First, we present an example of the use of Model 1 where only clinical data from the initial specialist consult are available (Table 2). Let’s assume the common case where a 12-year-old female presents with a single thoracic curve measured at 25 degrees and showing a Risser 1 sign for skeletal maturity, and we want to predict her curve severity at age 16.5 (time to prediction of 4.5 years or 9 half-years). The prediction equation is as follows:

$$\begin{gathered} {\mathbf{Future}}{\text{ }}{\mathbf{Curve}}{\text{ }}{\mathbf{angle}}{\text{ }}{\mathbf{using}}{\text{ }}{\mathbf{model}}{\text{ }}{\mathbf{1}}\; = \;{\text{16}}.0{\text{3 }} + {\text{ 1}}.0{\text{6}}*\left( {{\text{2}}{{\text{5}}^{\text{o}}}\, {\text{Cobb angle}}} \right){\text{ }} + {\text{ 1}}.{\text{56}}*\left( {{\text{9 half}} {\text {-}} {\text{years time}}} \right)\; \hfill \\ \; - \;{0.04}*\left( {{{\text{9}}^{\text{2}}}{\text{half}} {\text {-}} {\text{years tim}}{{\text{e}}^{\text{2}}}} \right){\text{ }} - {\text{1}}.{\text{17}}*\left( {{\text{age 12 years}}} \right){\text{ }} + {\text{3}}.{\text{64}}*\left( {0{\text{ not a double curve}}} \right){\text{ }} \hfill \\ \; + {\text{ 1}}.{\text{63}}*(0{\text{ did not}}\;{\text{present without a measurable curve}}){\text{ }} + {\text{4}}.{\text{33}}*\left( {{\text{1 has a single thoracic curve}}} \right){\text{ }} \hfill \\ \; + {\text{ 1}}.{\text{64}}*(0{\text{ does not}}\;{\text{have Other curve type}}){\text{ }} - 0.{\text{1}}*\left[ {\left( {{\text{9 half}} {\text {-}} {\text{years time}}} \right)*\left( {{\text{1 Risser}}} \right)} \right]{\text{ }} \hfill \\ \; - \;0.{\text{26}}*[\left( {{\text{9 half}} {\text {-}} {\text{years time}}} \right)*({\text{1}}\;{\text{female}})] \hfill \\ \end{gathered} $$
$${\mathbf{Future\,curve\,angle}} = 40.38^{\circ} = 16.03 + 26.5 + 14.04 - 3.24 - 14.04 + 0 + 0 + 0 + 4.33 - 0. 9 - 2.34 $$

This second example illustrates, using model 2, where the patient described above also would have a prior radiograph available at age 11.5 when the curve measured 200. The predicted target age measured from the time of this first available radiograph was 16 years (a similar time to prediction of 9 half-years later).

$$\begin{gathered} {\mathbf{Future}}{\text{ }}{\mathbf{Curve}}{\text{ }}{\mathbf{angle}}{\text{ }}{\mathbf{using}}{\text{ }}{\mathbf{model}}{\text{ }}{\mathbf{2}}{\text{ }} = 3.5 + 0.{\text{1}}*\left( {{\text{2}}{0^{\text{o}}}\,{\text{Cobb angle on first available radiograph}}} \right){\text{ }} \hfill \\ \; + \;{\text{1}}.0{\text{4}}*\left( {{\text{2}}{{\text{5}}^{\text{o}}}\,{\text{Cobb angle at specialist visit}}} \right){\text{ }} + {\text{2}}.{\text{49}}*\left( {{\text{9 half}} {\text{-}} {\text{years time}}} \right){\text{ }}- -0.{\text{12 }}\left( {{{\text{9}}^{\text{2}}}\,{\text{half}} {\text{-}}{\text{years tim}}{{\text{e}}^{\text{2}}}} \right){\text{ }} \hfill \\ \; + \; 0.00{\text{2}}*\left( {{{\text{9}}^{\text{3}}}\,{\text{half}} {\text{-}} {\text{years tim}}{{\text{e}}^{\text{3}}}} \right){\text{ }} - {\text{1}}.{\text{13}}*\left( {{\text{1 Risser}}} \right){\text{ }} - 0.{\text{25}}*\left( {{\text{12 years old age}}} \right){\text{ }} \hfill \\ \; - \; 0.{\text{29}}*\left[ {\left( {{\text{9 half}} {\text{-}} {\text{years time}}} \right)*\left( {{\text{1 Risser}}} \right)} \right]{\text{ }} - 0.{\text{18}}*\left[ {\left( {{\text{9 half}} {\text{-}} {\text{years time}}} \right)*\left( {{\text{1 female}}} \right)} \right] \hfill \\ \end{gathered} $$
$${\mathbf{Future}}{\text{ }}{\mathbf{curve}}{\text{ }}{\mathbf{angle}} = {\text{ 37}}.{\text{2}}{{\text{9}}^{\text{o}}} = {\text{ 3}}.{\text{5}}0{\text{ }} + {\text{2}}.00{\text{ }} + {\text{26}}.00{\text{ }} + {\text{22}}.{\text{41 }} - {\text{9}}.{\text{72 }} + {\text{1}}.{\text{45 }} - {\text{1}}.{\text{13 }} - {\text{3 }} - {\text{2}}.{\text{61 }} - {\text{1}}.{\text{62}}$$

By changing the target prediction time, a clinician could try to predict curve severity at different timepoints to examine when a patient may reach a critical threshold to inform treatment decision. For example, SOSORT guidelines recommend implementing progressively more intensive treatment, progressing from simple observation to only scoliosis-specific exercise to different intensities of bracing and eventually on to surgery depending on whether a curve is small, moderate, or severe and judging the risk of progression based on the skeletal maturity [1]. Our models can assist judging the risk for progression.

Nevertheless, clinicians are reminded that prediction are characterised by a degree of error. Our validations demonstrated that 55% of the true values at follow-up were within 5o of the predicted values, and 80% were within 10°. A 5o error is commonly accepted as measurement error in monitoring scoliosis progression [29, 30]. While this works well over follow-up every 6 months [31], the reported accuracy within 10o seems reasonable in the context of long-term predictions when additional factors may affect the risk of progression.

It is difficult to compare the prediction accuracy of our model because published models are heterogeneous in terms of participants studied, predictors considered, the outcomes predicted, and of lengths of follow-up [17, 32]. Cross-validation involves reporting the number of observed values falling within the prediction interval. However, prediction intervals width depends on the sample size and vary between studies [33]. Therefore, we also reported the proportion of observed values falling within clinically relevant interval widths of 5°, 10°, and 15°.

Comparison with studies predicting curve progression or severity

Previous meta-analyses [16, 17] on predicting scoliosis curve progression also identified the predictors retained in our models. Lenz et al. [32] recently reviewed 28 articles. Like our results, they identified the following predictors of curve progression: age, skeletal maturity (Risser < 1, Sanders digital maturity scale < 5), initial Cobb angle, and thoracic single or double curve patterns. Predictors they identified, which were  not available in the present study included: family history, bone mineral status, and height velocity. Our study was unique in including a time variable allowing predictions at a time chosen by the model user, interaction effects, and in using data from consultations prior to the specialist visit.

Noshchenko et al. also meta-analysed 25 studies predicting curve progression using a variety of dichotomous outcome definitions in AIS over 10 years of age [17]. However, 11 studies included treated patients or did not specify if treated or not. Only 13 studies examined both sexes. Follow-up durations varied widely from 3 months to 22 years. While they identified low-to-moderate levels of evidence for some predictors, they concluded that no method could be recommended for clinical use. Eighty percent of the included studies presented risk of selection bias, 100% of detection bias, 24% of performance bias, 60% of reporting bias, and 80% of attrition bias. Consistent with our results, these authors found low-grade evidence from 3 to 8 studies per predictor to support age, curve pattern, initial Cobb angle, and skeletal immaturity as predictors [17]. Other predictors identified required special laboratory tests (genetic markers, platelet calmodulin, melatonin signalling, and Gi protein functional status) [17]. In addition, the following simple clinical variables, unavailable in our study, showed promise: osteopenia, brain stem dysfunction, pre-menarche status, and rib–vertebral angle. Because, unfortunately, menarche was not consistently available in our database, and it is irrelevant in males, this variable was not used in our models built to be applicable to both sexes.

Noshchenko et al. [17] found 7 prediction models for a dichotomized curve progression outcome with between 2 and 6 predictors including: skeletal maturity, various curve patterns, initial Cobb angle, imbalance, spine growth velocity, osteopenia, age, gender, menarche, growth index, and electromyography asymmetry in paraspinal muscles. Many of these predictors were retained in our models, but interaction terms or time to the prediction were not previously tested. Because different outcomes were predicted, we cannot compare prediction accuracy.

Di Felice et al. [16] also noted heterogeneity in her meta-analysis in terms of curve patterns, duration of follow-up, initial Cobb angle values, Risser sign, setting, and different criteria to define curve progression. None of the reviewed studies predicted specific Cobb angles at follow-up. While curve progression relation to curve pattern varied among studies, as with our study, lumbar curves consistently presented a lower risk of progression [16].

Notably, our model included the predictors from the widely used Lonstein and Carlson prediction equation [(baseline Cobb angle–Risser sign)/chronological age] for the probability of curve progression [14]. This classic retrospective study followed 727 patients under the age of 19 for a similar duration of 25 months (12–88 months), but with initial curves only under 30o, while our sample includes a wide range of curve severity at baseline. They also noted a lower risk of progression for lumbar/thoracolumbar curve and, similar to our findings, more progression in double and single thoracic curves [14]. Another classic study by Peterson and Nachemson [15] developed a prediction model of curve progression by maturity in 159 females (10–15 yrs) with curves at baseline between 25 and 35°. Similar to our results, they found a Risser of 0 or 1, and an apex located above T12 to be predictive with imbalances larger to than 1 cm.

Interestingly, there is controversy among studies with regard to the effect of sex/gender on the risk of progression, this may be due to studies having too low a sample of males to obtain stable estimates [32, 34]. In our model, sex only played a significant prediction role as part of its interaction with time to the prediction but interactions have rarely been studied.

In a systematic review, Jalalabadi found only 4 studies on predicting a change in Cobb angle in o/year over short-term intervals and 3 over longer terms (> 1 year) with no studies specifically predicting a future Cobb angle [9]. While Nault et al. obtained good accuracy to predict the future Cobb angle, their model requires time-consuming extraction of variables from the 3D stereoradiographic reconstruction [35]. Most of the 88 studies she reviewed have focused on predicting a dichotomized outcomes at a specific interval. She reported limited evidence that curve pattern (thoracic, double, and triple curves), large Cobb, and low age predicted progression over short-term follow-up intervals. Further, there is conflicting evidence about whether Risser sign predicts progression over short-term intervals. Jalalabadi’s review [9] identified eight other predictors, which were not available in the present study, each in a single study with moderate or high risk of bias providing unclear evidence.

Strengths and limitations

We addressed common limitations seen in previous studies. Our study included the full spectrum of growth and of baseline severities. Our large sample from a national Institute specialized in scoliosis care was expected to be representative of the target population. Our average follow-up duration allowed developing prediction models for a wide range of prediction timepoints and not only for short or long-term. Aware that relying on a clinical record could affect data quality, we implemented thorough data checks to the ensure selection of only previously untreated cases, and we re-measured radiographs maintaining evaluator blinding, confirming the accuracy of existing clinical record measurements.

Our study used a unique design to study natural history in today’s context where high-quality evidence on the good effectiveness of scoliosis treatments [1, 18, 19, 36] makes it unethical to withhold these treatments until skeletal maturity. Instead, we used all available radiographs while untreated from our Institute until implementation of a treatment or discharge without treatment. With this approach, however, risk of progression may be underestimated because the cases with the highest risk of progression stop contributing to documenting the natural history after they undertake treatment. On the other hand, there is also a risk of overestimating the risk of progression by relying on data from patients who sought more than one radiographic assessment of their scoliosis; truly stable patients may not seek additional follow-ups. Despite having a long average follow-up time, this approach also limits the number of follow-up intervals ending after skeletal maturity.

While simple and commonly used clinical predictor variables showed good ability to predict future curve severity, other predictor variables could not be studied. Further, while Risser sign scoring of maturity was an important prediction factor, novel methods such as scoring the skeletal maturity using proximal phalanges [37,38,39] or the ulna [40] may have better predictive ability. Finally, ours is one of the rare studies reporting validation, and we showed good prediction accuracy using tenfold internal cross-validation. Future studies, using independent samples, are needed to complete external validation of our models.

Conclusion

Prediction models were proposed, which can help clinicians predict future curve severity expected in patients not receiving treatment. Our models offer the flexibility to predict at a future timepoints over the full growth period. One model allows such predictions using only simple clinical and radiographic data from an initial specialist consultation, and the other model allows taking advantage of clinical data from prior visits while untreated to improve prediction accuracy. The prediction accuracy of these validated models was very good with 80% or more predicted within 10°. Important predictors varied between models and included: curve severity, documented progression, curve types, skeletal maturity, age at consultation, time to the target prediction, and interactions between time and maturity and time and sex. The nonlinear effects of time in both models account for the rapid increase in curve angle at the beginning of growth and the slowed progression after maturity. Improved prediction ability may help clinicians inform treatment prescription or show families why no treatment is recommended.