Introduction and objective
Although being standard for scoliosis curve size estimation, COBB angle measurement is well known to be inaccurate, due to a high interobserver variance in end vertebra selection and end plate contour delineation. We propose a stepwise improvement by using a spline constructed from vertebra centroids to resemble spinal curve characteristics more closely. To enhance precision even further, a neural net was trained to detect the centroids automatically.
Materials & Methods
Vertebra centroids in AP spinal X-ray images of varying quality from 551 scoliosis patients were manually labeled by 4 investigators. With these inputs, splines were generated and the computed curve sizes were compared to the manually measured COBB angles and to the curve estimation obtained from the neural net.
Splines achieved a higher interobserver correlation of 0.92–0.95 compared to manual COBB measurements (0.83–0.92) and showed 1.5–2 times less variance, depending on the anatomic region. This translates into an average of 1° of interobserver measurement deviation for spline-based curve estimation compared to 3°–8° for COBB measurements. The neural net was even more precise and achieved mean deviations below 0.5°.
In conclusion, our data suggest an advantage of spline-based automated measuring systems, so further investigations are warranted to abandon manual COBB measurements.
Drawbacks of traditional COBB method
COBB angle measurement on coronal whole-spine standing X-rays for the quantification of scoliosis curve severity has been implemented as gold standard in 1966 by the Scoliosis Research Society (SRS). The method itself is flawed by high inter- and intraobserver errors, ranging from 3° to 10° (95% CI), which can mainly be attributed to the manual nature of end vertebra selection and delineation . In addition to this inaccuracy, the COBB method neglects important parts of the curve characteristics, e.g., apical vertebra translation.
Whole-spine X-ray images are currently produced at a varying expense in radiation exposure (ranging from conventional X-ray to modern low-dose imaging systems) with variable results. Thus, treatment decisions are being based on a high image acquisition variability and only moderate measurement reliability. However, it has been shown that the error originating from imaging modality ranges from none to below 2° [2, 3].
Approaches to mitigate COBB method-associated inaccuracy
To level out observer-based errors, a certain amount of automation (e.g., for the selection of end vertebrae) was proposed. However, it has been realized that the original COBB method is not suitable for automation tasks. To overcome this problem, it would be necessary to focus on vertebra centroids that—following the original idea of Ferguson—offer the advantage of generating spinal curves that resemble global scoliotic spinal deformity more closely. In addition, they are less prone to measurement errors by smoothing curves through cubic interpolation . Thus far, first semiautomated computerized measurement projects have reported mixed results in terms of reliability. However, they all could show a good correlation between COBB angle and the investigated method [4,5,6].
In the advent of modern neural network (NN)-driven image processing, automatic segmentation, curvature estimation and even 3D-reconstruction from 2D data have come at reach. Wu et al.  were able to reduce the standard deviation of COBB angle estimation to 1° (circular mean absolute error of 4°) by iterative training of a convolutional NN incorporating AP and lateral views from 154 patients. However, as their method requires a rather standardized approach with biplanar imaging, it might not be of use for the above-mentioned scenario with varying image quality and only AP imaging information. However, we used Wu’s results as reference as it resembles the most precise measurement modality known to us.
The neglectable quality aspect of X-ray images for a proper scoliosis curve estimation was discussed in depth by Bonanni. He proposed an angle function over a structural curve drawn on either side of the spine to quantify the scoliotic deformation from AP X-ray or MRI images. He thus dropped non-necessary radiographic details (like endplates) and rather focused on the extraction of global spinal information (e.g., curve shape) [8, 9]. The group could show that the standard deviation of measurements could be brought below 5°—even in very low-quality images (54 patients). However, they had to rely on a fixed scheme with preselected landmarks carried out by trained scoliosis surgeons to do the measurements .
Automatic vertebrae detection
In contrast to CT and MRI images, the automatic detection of vertebrae in X-ray images is a very challenging computer vision task, due to multiple overlapping shadows of the ribs and pelvis, relatively weak contrast, and the need to identify all thoracic and lumbar vertebrae individually [10, 11]. Lecron, Xu and coworkers were the first to show the possibility of vertebrae detection in X-ray images using classical machine learning techniques [12, 13]. With the advance of convolutional NN (CNN) architectures, the opportunities have improved dramatically. De Carvalho, Al Arif and coworkers showed groundbreaking results with extraordinary accuracy in the prediction of vertebrae in X-ray images [2, 14]. However, these methods were limited to a small part of the spine (lumbar or cervical) and were trained on high-quality images.
Aiming at higher precision at no additional cost
The aim of this project was to analyze, if there could be a solution for quantitative radiographic scoliosis assessment that:
offers high precision measurement,
is easy to use in an everyday clinical setting,
does not depend on special hard- or software,
even works in difficult situations (e.g., low-quality X-rays)—thus reducing the need of repeated radiation exposure,
incorporates modern machine learning techniques (e.g., deep learning),
had no additional cost.
Material and methods
Image material and labeling
This study has been approved by our institutional review board (#EK339092018). Standing coronal X-rays displaying the whole vertebral column or at least the most scoliotic part of the thoracic or lumbar column originated from different individuals with any form of idiopathic scoliosis subjected to bracing and without prior treatment. Image quality ranged from digitally acquired (most) to film or even paper scans. Images were saved with a random 8-digit identifier alias.
The images were split into 30-image-sized batches (551 images in total) that were successively labeled with a specially developed label tool by a group of 4 analyzers after appropriate training but with different educational levels, ranging from medical student (M), orthopedic residents in training (C and W) to experienced scoliosis surgeon (P).
Labeling included centroid identification of each vertebra, start and end vertebra description and angle line drawing following COBBs method (Fig. 1 and supplemental material for the workflow). Precision of the labeling task was supported by a magnifying glass in a second window. The selected centroids were used to generate a spline using cubic interpolation automatically. Perpendicular lines on this spline were generated at each vertebral level. The most tilted lines (with respect to the horizon) were used to calculate the corresponding COBB angle. Only values > 10° were processed further.
Allocation of measurements to three distinct spinal regions was labeled as “high thoracic,” “thoracic” and “lumbar” if the curve was located above Th5, L2 or below, respectively .
Neural net training
Manually labeled vertebrae of all 4 analyzers were used to train a NN via deep learning. Splines were generated from vertebral centroids that were detected by the NN. The architecture of the NN is based on UNet, that is a standard approach in medical image segmentation . It comprises an encoder/decoder structure, incorporating the special feature of skip connections between the encoder and decoder in order to retain important information over several layers . To obtain better results as the original UNet, we implemented masked loss functions that learn to penalize areas outside the vertebrae. The whole dataset consisted of 571 X-ray images. In training process, we randomly split the data into chunks of 457 and 114 images for training and validation, respectively. The size of X-ray images was resized to 200 × 500 pixels in width x height. We also normalized the labeled vertebrae centroids as a percentage of the scaled image.
For the statistical analysis, we used standard methods such as Pearson’s correlation coefficient and student’s t test. In case of missing values, the respective row was deleted from the observer’s dataset.
Results and discussion
End vertebra selection
Selection of end vertebrae revealed the typical double curve scoliotic pattern with peaks at Th5/6, Th11 and L4 for any investigator as well as the NN (Fig. 2).
Interobserver differences were 0.56 vertebrae over all measurements for manual end vertebra selection. When the selection was left to the spline algorithm, it was only a little more uniform (0.45 vertebrae deviation). Interestingly, the deviation of the centroids was not zero, which shows that even the simple task of setting centroid points onto vertebrae yields observer-dependent results.
Another source of inconsistency is that observers were required to indicate the range of vertebrae visible on each X-ray (e.g., from T2 to L5). As vertebral recognition is not always easy, due to variable formation of the 12th rib, it might explain the observed variability. It seems to confirm Vrtovec’s view being end vertebra selection as one main source of error in COBB angle determination .
To this end, we investigated more closely if fuzzy end vertebra determination has an impact on measurement variance for spline and NN approaches that estimate the global curve characteristics.
In our analysis, we have found out that spline-based COBB measurement showed higher correlations than manual COBB measurements (Fig. 3). This phenomenon was especially pronounced in the border regions (high thoracic and lumbar). Mean spline-based measurements intraclass interobserver correlation coefficients ranged from 0.92 (high thoracic) to 0.95 (main thoracic and lumbar), whereas traditional COBB-based measurements yielded coefficients ranging from 0.83 (high thoracic) to 0.89 (lumbar) and 0.92 (main thoracic). De Cavalho et al. achieved an interobserver correlation of 0.88–0.9, using high-quality images with predefined end vertebrae . However, they did not convey if they observed the same region dependence nor did any other author. We hypothesize that a certain amount of variability, especially in the case of manual COBB measurement, might result from kyphosis-dependent obfuscation of vertebral structures. This uncertainty translates directly into bad correlation. Consequently, interclass (spline versus COBB) correlation was rather poor, ranging from 0.84 (high thoracic) to 0.92 (main thoracic).
Interestingly, any observer (manual or spline)—NN correlations were always above 0.9, which is another view on the nn’s possibilities to mitigate interobserver variance, even in difficult anatomic regions. Further analysis will reveal how correlation can be translated into angle measurement error.
Distribution of differences between investigators
Our dataset was derived from X-ray images of patients subjected to brace treatment, typically in the range from 20° to 40°. Especially in this range, the detection of minor angle changes is most important to optimize conservative treatment results. Manual COBB angle measurement is still considered gold standard for scoliosis curve estimation, despite of a known inaccuracy of 3°–10° . This high deviation can obviously lead to erroneous decisions, even surgically.
As expected, we had to confirm the inaccurate nature of manual COBB angle measurement. We saw region-dependent interobserver medians spanning 8° (high thoracic), 4° (thoracic) and 3° (lumbar) of deviation; see Fig. 4. Using standardized measurement setups, other authors were able to decrease the average angle difference to 2°–3° [2, 21].
To answer the question if there was a training effect in measurement precision, a rolling mean evaluation of the above-mentioned deviations with a window size of 16 datasets had been performed (Fig. 5). A trend toward deviation reduction over the dataset could be observed only for one observer (W). All investigators retained an error range, summing up to 4°–8° for all spinal regions. We therefore conclude that manual COBB measurement cannot be ameliorated by training. This disadvantage is caused by the method!
Spline-based curve size estimation showed an overall constant median angle, deviating not more than 1° between observers in any spinal region (Fig. 4).
Stability of measurements
In order to demonstrate the stability of measurements over the angle range of the dataset, Bland–Altman plots were generated (Fig. 6a and additional Table 1). We see that rather unexperienced analyzers significantly tended to manually underestimate (C, W, M, positive values), whereas the experienced analyzer (P) only overestimated the high thoracic curve (negative value) and stayed close to the spline values in the remaining spine regions, despite some considerable high outliers (blue dots). However, no systematic bias for lower or higher angles could be found, demonstrating the rigidity of the spline method.
The above-mentioned data have demonstrated, that the NN achieves high correlations with all observers. The stability of measurements is further displayed in the Bland–Altman plot in Fig. 6b. There, the nn’s measurements have been plotted against all human obtained values (manual measurements and spline measurements). It can be seen, that the confidence interval for any region is narrower, comprising only one half to two-thirds of the confidence intervals of spline versus manual COBB measurements. The reasons for this stability are explained in the next section.
The strength of nn-based measurements
Our NN achieves a sensitivity of 0.87 and a relative pixel error of 0.011 for detecting the correct vertebrae in an AP X-ray of the spine (Fig. 7, Table 3). This is three times more precise than the NN of Wu et al. who reported an error of 0.0398 . This translates to an angle deviation of < 1° for any region with high correlations in the main thoracic and lumbar spine (Table 2), which is also better than Wu’s error of 1°. Again, we need to point out that Wu used biplanar standardized imaging—we have used only AP images of heterogeneous quality (Table 3).
In conclusion, we were able to show, that even in the case of poor image quality, the reliability of COBB angle measurement can be substantially raised by
investigating the spinal curve in a more global fashion by using a vertebral centroid generated spline
automating vertebra detection by neural network which was trained by several observers
Measurement error can be brought down below 0.5°, which is less than one-tenth of traditional COBB measurement error.
Limitations of our current approach include the uncertainty in the higher thoracic spine which will be subject to optimization in our further investigations. We will also need to include various other types of spinal X-ray imaging to enhance vertebra recognition even further. Also, the angle range as well as the scoliosis spectrum (e.g., malformations) has to be exploited further, as severe rotated curves might exhibit different geometric properties.
From the data of this work and by taking into account all the criticism already mentioned over the years, manual COBB measurement cannot be recommended for scoliosis curve estimation any more. It is flawed by considerable interobserver deviations that can be eliminated completely by machine-based diagnosis, even if the image material is of poor quality.
The implementation of this knowledge into clinical practice is now mandatory. However, there are still further steps to accomplish. The current version of NN-based spline generation worked well with a limited set of images. Future model training has to include a broader image database, e.g., including operated spines, early-onset and adult scoliosis—to mention just a few. We plan to provide open access of the model to the scientific and medical community.
Availability of data and material
Raw data are published at Mendeley data repository: https://doi.org/10.17632/ssfmrw2fg9.2.
Vrtovec T, Pernuš F, Likar B (2009) A review of methods for quantitative evaluation of spinal curvature. Eur Spine J 18:593–607
De Carvalho A, Vialle R, Thomsen L et al (2007) Reliability analysis for manual measurement of coronal plane deformity in adolescent scoliosis. Are 30 × 90 cm plain films better than digitized small films? Eur Spine J. https://doi.org/10.1007/s00586-007-0437-4
Wills BPD, Auerbach JD, Zhu X et al (2007) Comparison of Cobb angle measurement of scoliosis radiographs with preselected end vertebrae: Traditional versus digital acquisition. Spine (Phila Pa 1976). https://doi.org/10.1097/01.brs.0000251086.84420.d1
Dutton KE, Jones TJ, Slinger BS et al (1989) Reliability of the Cobb angle index derived from traditional and computer-assisted methods. Australas Phys Eng Sci Med 12:16
Tanure MC, Pinheiro AP, Oliveira AS (2010) Reliability assessment of Cobb angle measurements using manual and digital methods. Spine J. https://doi.org/10.1016/j.spinee.2010.02.020
Shea KG, Stevens PM, Nelson M et al (1998) A comparison of manual versus computer-assisted radiographic measurement: Intraobserver measurement variability for Cobb angles. Spine (Phila Pa 1976). https://doi.org/10.1097/00007632-199803010-00007
Wu H, Bailey C, Rasoulinejad P, Li S (2018) Automated comprehensive Adolescent Idiopathic Scoliosis assessment using MVC-Net. Med Image Anal. https://doi.org/10.1016/j.media.2018.05.005
Bonanni PG (2017) Contour and angle-function based scoliosis monitoring: relaxing the requirement on image quality in the measurement of spinal curvature. Int J Spine Surg. https://doi.org/10.14444/4022
Papaliodis DN, Bonanni PG, Roberts TT et al (2017) Computer assisted Cobb angle measurements: a novel algorithm. Int J Spine Surg. https://doi.org/10.14444/4021
Huang SH, Chu YH, Lai SH, Novak CL (2009) Learning-based vertebra detection and iterative normalized-cut segmentation for spinal MRI. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2009.2023362
Glocker B, Zikic D, Konukoglu E, et al (2013) Vertebrae localization in pathological spine CT via dense classification from sparse annotations. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics)
Lecron F, Benjelloun M, Mahmoudi S (2012) Fully automatic vertebra detection in x-ray images based on multi-class SVM. In: Medical imaging 2012: image processing
Xu X, Hao HW, Yin XC, et al (2012) Automatic segmentation of cervical vertebrae in X-ray images. In: Proceedings of the international joint conference on neural networks
Al Arif SMMR, Knapp K, Slabaugh G (2018) Fully automatic cervical vertebrae segmentation framework for X-ray images. Comput Methods Programs Biomed. https://doi.org/10.1016/j.cmpb.2018.01.006
Brink RC, Wijdicks SPJ, Tromp IN et al (2018) A reliability and validity study for different coronal angles using ultrasound imaging in adolescent idiopathic scoliosis. Spine J. https://doi.org/10.1016/j.spinee.2017.10.012
Takács M, Orlovits Z, Jáger B, Kiss RM (2018) Comparison of spinal curvature parameters as determined by the ZEBRIS spine examination method and the Cobb method in children with scoliosis. PLoS ONE. https://doi.org/10.1371/journal.pone.0200245
Knott P, Pappo E, Cameron M, et al (2014) SOSORT 2012 consensus paper: reducing x-ray exposure in pediatric patients with scoliosis. Scoliosis
Lenke LG, Edwards CC 2nd, Bridwell KH (2003) The Lenke classification of adolescent idiopathic scoliosis: how it organizes curve patterns as a template to perform selective fusions of the spine. Spine (Phila Pa 1976) 28:S199–S207. https://doi.org/10.1097/01.BRS.0000092216.16155.33
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition
Stokes IAF, Aronsson DD (2006) Computer-assisted algorithms improve reliability of king classification and cobb angle measurement of scoliosis. Spine (Phila Pa 1976). https://doi.org/10.1097/01.brs.0000203708.49972.ab
The authors thank Thomas Neumann who shared his experience in graphical neural network construction.
This research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors. Open Access funding provided by Projekt DEAL.
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
This study has been approved by the institutional review board of the University Hospital (#EK339092018).
Consent for publication
Only X-ray images were used in a blinded fashion. In the study, a trace toward the individual participant could not be made, so that the obtainment of an informed consent was not applicable.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Bernstein, P., Metzler, J., Weinzierl, M. et al. Radiographic scoliosis angle estimation: spline-based measurement reveals superior reliability compared to traditional COBB method. Eur Spine J 30, 676–685 (2021). https://doi.org/10.1007/s00586-020-06577-3
- Scoliosis curve
- Automatic measurement
- COBB angle
- Low image quality
- Deep learning