Identifying Scoliosis in Population-Based Cohorts: Automation of a Validated Method Based on Total Body Dual Energy X-ray Absorptiometry Scans
Scoliosis is a 3D-torsional rotation of the spine, but risk factors for initiation and progression are little understood. Research is hampered by lack of population-based research since radiographs cannot be performed on entire populations due to the relatively high levels of ionising radiation. Hence we have developed and validated a manual method for identifying scoliosis from total body dual energy X-ray absorptiometry (DXA) scans for research purposes. However, to allow full utilisation of population-based research cohorts, this needs to be automated. The purpose of this study was therefore to automate the identification of spinal curvature from total body DXA scans using machine learning techniques. To validate the automation, we assessed: (1) sensitivity, specificity and area under the receiver operator curve value (AUC) by comparison with 12,000 manually annotated images; (2) reliability by rerunning the automation on a subset of DXA scans repeated 2–6 weeks apart and calculating the kappa statistic; (3) validity by applying the automation to 5000 non-annotated images to assess associations with epidemiological variables. The final automated model had a sensitivity of 86.5%, specificity of 96.9% and an AUC of 0.80 (95%CI 0.74–0.87). There was almost perfect agreement of identification of those with scoliosis (kappa 0.90). Those with scoliosis identified by the automated model showed similar associations with gender, ethnicity, socioeconomic status, BMI and lean mass to previous literature. In conclusion, we have developed an accurate and valid automated method for identifying and quantifying spinal curvature from total body DXA scans.
KeywordsScoliosis ALSPAC Bristol DXA scoliosis method Machine learning
Scoliosis is defined as lateral curvature of the spine ≥ 10°, as measured using the Cobb method on a standing spinal radiograph . The most common form is adolescent-onset idiopathic scoliosis (AIS), defined as occurring between age 10 years and skeletal maturity . It is not always a benign structural abnormality, although the mortality rate for individuals with AIS is comparable to that of the general population . Severe AIS may result in early degenerative joint disease , negative body image  and psychosocial disturbances . Even small spinal curves in adolescents, which may not have presented to spinal units, are associated with an increased risk of future back pain and time off school .
However, our understanding of the causes of curve initiation and progression is hampered by lack of prospective population-based studies, driven mainly by the serious ethical concerns over performing spinal radiographs in healthy populations because of the radiation exposure, equivalent to an entire year’s background radiation .
To address this, we have validated a manual method for measuring spinal curvature using total body dual energy X-ray absorptiometry (DXA) scans for research purposes : the DXA Scoliosis Method (DSM). As previously published , the manual DSM is reliable (substantial agreement was seen with a kappa of 0.75), repeatable (95% of repeat measures were within 5°, and there was no change in interobserver variability as curve size increased) and accurate (comparison with the gold standard of using the Cobb method on standing spinal radiographs was as expected). The manual DSM also produced valid estimates of prevalence of scoliosis, with expected gender ratio .
This has allowed us to start to identify predictors of AIS onset utilising population-based cohorts that have already performed DXA scans for previous research into determinants of bone density. The DSM has been applied to participants in the Avon Longitudinal Study of Parents and Children (ALSPAC) at age 9 years (n = 7000) and age 15 years (n = 5000) and results have shown we can identify altered body composition  and reduced physical activity  in children, prior to onset of their spinal curve. Interestingly, reduced physical ability is seen as early as age 18 months in those who go on to develop AIS between ages 9 and 15 . This suggests that clinical features other than characteristics of the spinal deformity itself may indeed be useful prognostic indicators.
The main goal of further epidemiological analysis of scoliosis is to identify predictors of spinal curve progression. This would allow generation of a clinical prediction tool to identify people at low risk of curve progression, for example, who would then require less rigorous monitoring. However, even though the prevalence of AIS is relatively common (5.9% at age 15 ), ALSPAC is not large enough on its own, and to carry out appropriately powered epidemiological studies we need to combine data from multiple research cohorts. We have identified additional research cohorts that already have total body DXA scans already performed (approximately 84,000 scans). However, application of our manual DSM on all relevant DXA images is unfeasible in terms of time and cost. For example, the original annotation 12,000 ALSPAC images required 200 staff days of analysis time.
Therefore the aim of this work was to develop and validate a fully automated version of the manual DSM method using a machine learning approach. The intended purpose of this automated method is to exploit population-based cohorts for research purposes. The availability of large datasets and increasingly powerful computational resources has made the development of such techniques feasible with applications ranging from fibrotic lung disease  to ophthalmology . The scoliosis automation proposed here is based on the ideas developed in the SpineNet software , a deep-learning based automated tool for quantitative assessment of spinal degeneration on lumbar MRI imaging studies.
ALSPAC is a geographically-based UK cohort that recruited pregnant women residing in Avon (South-west England), with an expected date of delivery between 1 April 1991 and 31 December 1992 [15, 16]. A total of 14541 pregnancies were enrolled, with 14062 children born; see www.alspac.bris.ac.uk for more information. The study website contains details of all the data that are available through a fully searchable data dictionary and variable search tool available at https://www.bris.ac.uk/alspac/researchers/our-data/. This study is based on 7298 children who had DXA scans at the aged 9 research clinic, 5122 who had DXA scans at the aged 15 research clinic, and 4969 who had DXA scans at the aged 17 research clinic.
Overall Study Design
The DXA images were performed by trained technicians using a Lunar Prodigy (GE Healthcare, Madison, WI) and were obtained in a standard supine manner. The scans from age 9 and age 15 were combined, then randomly split into a training, a validation and a test set. A similar automated system for three-dimension images of the spine based on magnetic resonance imaging (MRI) scans has already been developed . We planned to modify this system to allow automatic collection of data on spinal scoliosis from total body DXA scans for future research purposes. This modification process was planned to have two stages: (1) development of a new software algorithm to extract the required features and classify spinal images based on a subset of anonymised DXA images from ALSPAC; and (2) validation of the software on a further dataset of anonymised images from ALSPAC.
Development of the New Software Algorithm
Validation of the Automated Model
Identification of Cut-Off Point for Binary Classification
Scans with evidence of body positioning error based on a score of > 0.5 were excluded. The scoliosis suspiciousness score was reduced to a binary output of scoliosis (yes/no). Different cut-offs of the suspiciousness score were assessed for validity by comparison with the manual DSM annotation on the age 9 and age 15 data using values for sensitivity, specificity and area under the receiver operator curve (AUC). In addition, to estimate the positive predictive value (PPV) and negative predictive value (NPV) of a specific cut-off, the relevant sensitivity and specificity were projected onto a hypothetical population of 10,000 using the known prevalence of 5.9% . The hypothetical population was also used to calculate the percentage of individuals who would be identified (that is, predicted) as having scoliosis according to a given cut-off. A final cut-off (the final model) was chosen by consensus of the study authors based on a high AUC, and optimisation of the sensitivity and PPV.
Reliability of Final Automated Model
Since identification of spinal curves is influenced by positioning of people on the DXA scanner bed, evaluation of reliability requires assessment of repeat scans in the same individual, with repositioning in-between. Therefore, the automation was also run on a subset of repeat DXA scans carried out 2–6 weeks later in 284 participants from the aged 15 research clinic. To assess reliability, an unweighted Cohen’s kappa statistic was calculated, with standard definitions of categories of agreement .
Assessment of Discrepancies
Using the final cut-off model, where discrepancies between the automated model and manual annotations (DSM) occurred, a random selection of images were reviewed by clinicians (JF and IH who had not previously reviewed these images, and EC who developed the original manual DSM) to identify if the manual annotations were correct or incorrect. All were unaware of the automated prediction, and all read the scans without knowledge of each other’s interpretation. In addition, the automated model was run on the age 17 scans for the discrepancies to identify if the images were still classified as scoliosis by the model, and these age 17 scans were also reviewed by the clinicians.
Validity of Final Automated Model
The final model was then run on the non-annotated age 17 scans. Percentages were calculated for prevalence of scoliosis. χ2 tests were used to assess associations between gender and the presence of scoliosis identified by the automated model. Previously, independent associations were identified in ALSPAC between body composition  and scoliosis identified by the manual DSM. To assess clinical validity of the automated model, χ2 tests were used to assess if similar associations were seen between potential predictors (body composition and physical activity) and scoliosis identified by the automated method. For continuous variables such as lean mass unpaired t-tests were used to assess associations with scoliosis.
The heatmaps of those images with high suspiciousness score for scoliosis consistently highlight specific regions of the spine, indicating these regions contribute the most to the suspiciousness score.
Identification of the Final Cut-Off Model
Identification of the final cut-off point of the continuous suspiciousness score for scoliosis based on the age 15 data after exclusion of those scans with evidence of body positioning error
Various cut-off levels of the scoliosis suspiciousness score for scoliosis produced by the automation
Using the validation set from ALSPAC
Applied to a hypothetical population of 10,000
Calculated prevalence (%)
Final model: Automated prediction of scoliosis (suspiciousness score cut off of 0.999) excluding those with body positioning error (suspiciousness score cut-off of 0.5) (A) compared to manual prediction (DSM) based on a test set from within ALSPAC age 9 and age 15 total body DXA scans; and (B) applying the 5.9% prevalence , the identified specificity of 96.9% and the identified sensitivity of 86.5% to a hypothetical population of 10,000
(A) Compared to manual prediction in test set
(B) Applied to a hypothetical population of 10,000
Reliability of Final Automated Model
There was almost perfect agreement of identification of those with scoliosis on repeated DXA scans taken 2–6 weeks apart (kappa of 0.90, 95%CI 0.72–1.00).
Assessment of Discrepancies: Re-assessment of Images by Clinicians
A random sample of 20 of the scans where the manual method and the automated method did not agree were reviewed by three clinicians. Of the scans where the manual method identified no scoliosis, but the automated method did identify scoliosis, 55.6% were re-classified as having scoliosis (in agreement with the automated model) by all three clinicians, suggesting the manual annotation was incorrect in these cases. Similarly, of the scans where the manual method identified scoliosis, but the automated method did not, 60.0% were re-classified as not having scoliosis (in agreement with the automated model) by all three clinicians, suggesting the manual annotation was incorrect in these cases. There was therefore no clear pattern or direction of judgement by the automation. There was no agreement for the remaining discrepant scans as to whether scoliosis was present or not due to the small size of spinal abnormality.
Assessment of Discrepancies: Comparison with Automated Model Prediction on Age 17 Data
The automated model was run on the age 17 images for those randomly selected discrepant scans described above. For 82.0% of participants, the automated model classified their spines the same at age 15 and age 17, thereby increasing the confidence that the model output is valid.
Description of Scoliosis Identified by the Automated Model in ALSPAC at Age 17
Descriptive statistics of those participants from ALSPAC identified by the final automated model with and without scoliosis at age 17, with comparisons by Chi-squared statistics or unpaired t-tests as appropriate
N = 3235
N = 449
P value for difference
Level 1 (none or CSE only)
Level 2 (vocational)
Level 3 (O levels)
Level 4 (A levels)
Level 5 (°)
BMI categories at age 17
P value for difference
Total body lean mass at age 15 (kg)
We have developed a fully automated method of identification of scoliosis from total body DXA scans for research purposes. The final model has good reliability, accuracy, sensitivity, specificity and AUC. Those identified with scoliosis using this method have similar associations with gender, socio-economic status, ethnicity, BMI and lean mass as the known epidemiology of this condition [9, 10]. Disagreement between the automated model and the manual annotation is likely to be explained by errors with the original manual annotation in at least half the cases. Now we are confident the automated model is valid, we are working on training the model to measure size of spinal curve, to allow future research into the predictors of curve size progression.
The benefits of our fully automated model compared to manual annotation of DXA scans is the vast reduction in time required to look at each spinal image, with the consequent large reduction in financial costs. To run the automation on all 12,000 DXA scans from ALSPAC took approximately 5 min. This has resulted in the first feasible and low-radiation process for identification of spinal curves in large populations for research purposes. Other no-radiation techniques are available such as EOS machines, but their use is limited by lack of availability. It is increasingly difficult to justify regular conventional spinal radiography because of the radiation risks, especially to adolescent females who may have an increased risk of breast and uterine carcinoma with increased radiation exposure .
The model is not perfect. The sensitivity, specificity and NPV are high, but PPV is low. This, combined with the estimated percentage with scoliosis identified by the model of 8.0%, suggests the model identifies more spinal curves than traditional manual methods. However, it is increasingly recognised that spinal curvature in humans is a continuum, and it is possible our automated method identified more of the small curves than manual methods. Most previous population-based studies of prevalence of scoliosis are based on the Adams forward bending test, and it is recognised that this clinical assessment will miss small curves. It is therefore possible our automated method is correctly identifying a higher prevalence of spinal curves. This could be important, as the current paradigm of using a cut-off of spinal curvature of ≥ 10 ° on standing radiographs  carries the implication that lesser curves are not pathological and are ‘normal variants’ . However, previous work by our group has shown that small curves are associated with future back pain and time off school/work .
Alternatively, our automated method may be identifying false-positives, but we think this is less likely given that our results are similar to the known epidemiology of scoliosis. The intended purpose of this automated method of scoliosis identification from total body DXA scans is for exploitation of large research datasets. In UK Biobank for example, there will be 100,000 total body DXA scans which will not be able to be analysed for spinal curvature manually because of the enormous time commitment. Our automated method, despite the potential for a proportion of false positives, will allow exploitation of this unique resource, sacrificing some precision for a vast reduction in time required for analysis. Another limitation of this study is that we were unable to confirm that those identified with scoliosis by the automated method were true cases, due to ethical issues regarding over-exposing otherwise normal individuals from ALSPAC to substantial levels of ionising radiation. As previously discussed in the paper describing the validation of the manual method , DXA scans are performed in the supine position, which unsurprisingly results in an under-estimation of curve size by approximately 10° in the ALSPAC cohort, similar to other authors . Also as previously published , analysis of the supine DXA imaging identifies a higher prevalence of double or triple curves, perhaps explained that without clinical examination we are unable to distinguish compensatory curves that are correctable. However, using a binary cut-off to categorise scans into scoliosis or no scoliosis reduced the impact of this potential limitation.
A final limitation is that both the manual method and the automation described in this paper have been developed on DXA scans performed on a Lunar Prodigy machine produced by GE Healthcare. Other DXA manufacturers are available, (machines produced by GE Healthcare and Hologic comprise the majority), and it is currently unknown how our automation will perform on such images, although we are currently in the process of testing it on Hologic images and outputs are encouraging [un-published data]. However, the intended use of our automation is for research purposes in population-based cohort studies where the serial images are taken on the same machines. We do not intend to use our automation on repeat scans in individuals taken on machines by different manufacturers.
We are now in a position to use this fully automated method to insert the scoliosis phenotype into population-based research cohorts with total body DXA scans around the globe. This will facilitate well-powered studies of the risk factors for initiation of spinal curves, and is likely to produce a step-change in our understanding of this little-researched disease. With future automation development we will also be in the position to study the risk factors for curve progression.
We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses. The UK Medical Research Council and Wellcome Trust (Grant Ref: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. A comprehensive list of grants funding is available on the ALSPAC website https://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf. This research was specifically funded by the British Scoliosis Research Foundation. This publication is the work of the authors, and EC will serve as guarantor for the contents of this paper, which do not reflect the views of the ALSPAC executive.
EC and JF planned the overall objective, whilst AJ, TK and AZ planned and conducted the automation, and TP provided statistical advice on the analyses. AJ and EC wrote the first draft. EC, JF and IH provided clinical input. EC is responsible for the overall content as the guarantor of the paper. All authors revised the paper critically for intellectual content and approved the final version. All authors agree to be accountable for the work and to ensure that any questions relating to the accuracy and integrity of the paper are investigated and properly resolved.
Compliance with Ethical Standards
Conflicts of interest
Amir Jamaludin, Jeremy Fairbank, Ian Harding, Timor Kadir, Tim J Peters, Andrew Zisserman and Emma M Clark declare they have no conflicts of interest.
Human and Animal Rights
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Parental consent and child’s assent were obtained for all measurements made.
Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time.
- 1.Kane WJ (1977) Scoliosis prevalence: a call for statement of terms. Clin Orthop. 126:43–46Google Scholar
- 6.Fowles JV, Drummond DS, L'Ecuyer DS (1978) Untreated scoliosis in the adult. Clin Orthop 134:212–217Google Scholar
- 8.Watson SJ, Jones AL, Oatway WB, Hughes JS (2005) Ionising radiation exposure of the UK population. 2005 Review. HPA-RPD-001.Google Scholar
- 10.Clark EM, Taylor HJ, Harding I, Hutchinson J, Nelson I, Deanfield JE, Ness AR, Tobias JH (2014) Association between components of body composition and scoliosis: a prospective cohort study reporting differences identifiable before the onset of scoliosis. J Bone Miner Res 29(8):1729–1736CrossRefGoogle Scholar
- 13.Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316:2402–2411CrossRefGoogle Scholar
- 18.Jamaludin A, Kadir T, Clark EM, Zisserman A (2018) Predicting scoliosis in DXA scans using intermediate representations. In: 5th international workshop and challenge on computational methods and clinical applications for spine imaging post-proceedings, Springer's Lecture Notes in Computer Science. 11397(15–27). https://doi.org/10.1007/978-3-030-13736-6.
- 20.Brenner A, Preston AD, Sakata R, Sugiyamam H, Berrington de Gonzales A, French B, Utada M, Cahoon E, Sadakane A, Ozasa K, Grant E, Mabuchi K (2019) Radiation risks of uterine cancer in atomic bomb survivors 1958–2009. JNCI Cancer Spectrum. 2:pky081.Google Scholar
- 21.Cobb J (1948) Outline for the study of scoliosis. Am Acad Orthop Surg Instr Course Lect. 5:261–275Google Scholar
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.