Introduction

Adolescent idiopathic scoliosis (AIS) is a three-dimensional spinal disorder which affects 3% of adolescents. To diagnose AIS, posteroanterior (PA) and lateral (LAT) spinal radiographs are usually required at the initial clinic visit [1]. Cobb angle (CA), axial vertebral rotation (AVR), kyphotic angle (KA), and lordotic angle (LA) are then measured from the PA and LAT radiographs to determine the severity of scoliosis and assist clinicians in making treatment decisions. The PA and LAT radiographs can be taken simultaneously in a center with the EOS X-ray system (EOS Imaging Inc, France). A 3D spine can be reconstructed if needed using the PA and LAT radiographs. The reconstructed 3D spine may provide better visualization and assessment. According to the literature, the CA, AVR, KA, and LA measured from EOS systems are comparable to those measured from plain 2D radiography [2, 3]. The reproducibility for Cobb, AVR, KA, and LA were reported to be 4°, 6.5°, 4°, and 7o from the EOS images, respectively [4]. The intra-rater and inter-rater reliability of CA, AVR, KA, and LA were also reported to be excellent [5, 6]. However, to accurately reconstruct the 3D spine and measure curvature parameters manually, the operator needs to have a good knowledge of spinal anatomy and receive adequate training. Even for an experienced operator, manual reconstruction and measurement extraction are time-consuming. Currently, the most widely used reconstruction method is to utilize the commercially available software, SterEOS (EOS Imaging Inc, Paris, France). Humbert et al. [7] reported that using the SterEOS to reconstruct a 3D spine required operators to digitize the spinous process, the vertebral body, and the transverse processes at every vertebral level. Another study also reported that, even for an experienced user, it took approximately 13 min to perform all the digitalization and to generate a deformed spine [4]. Although the 3D spine and curvature measurements were available after the reconstruction, the results had poor interpretability. Figure 1 illustrates the reconstruction procedures of the SterEOS software for PA and LAT radiographs.

Fig. 1
figure 1

The EOS PA and LAT Radiographs (Left) and the manual 3D reconstruction procedures in using the SterEOS software (Right)

Compared to human reconstruction and measurements, fully automated reconstruction with artificial intelligence (AI) can provide much faster reconstruction speed and remove human variance. Currently, AI is the hottest topic in many applications, especially in health-related problems. Machine learning (ML) is a subset of AI that uses data and algorithms to mimic human learning. The most common machine learning technique for medical imaging is the Convolution Neural Network (CNN), which is primarily used for image recognition, classification, and object detection. It has been applied to perform 3D reconstruction of spinal radiographs. Aubert et al. [8] applied CNNs to regress the 2D coordinates for vertebral body centers (VBCs) and to register them to the 3D statistical spine models. Extending this method, Bakhous et al. [9] detected not only VBCs of C7 to T12 vertebrae, but also pedicles with another CNN-based regression model. Later, Aubert et al. [10] boosted the reconstruction speed by registering detected 2D landmarks, including the VBCs, the centers of upper and lower endplates, and the pedicles, to a simplified generic vertebral model. The average reconstruction time was reduced to around 30 s for 68 testing datasets, which consisted of asymptomatic, moderate, and severe scoliosis. Among 44/68 scoliotic cases, the measurements of CA achieved a mean absolute difference (MAD) of 2.8° and 3.6° between automatic and manual measurements for moderate and severe cases, respectively.

In summary, existing 3D automated reconstruction approaches for the spine achieved relatively accurate measurements for the curvature assessment. Nevertheless, spending 30 to 50 s to reconstruct a 3D radiographical spine in a busy clinic is still considered slow. Therefore, this study aimed to validate a new and fast AI-based automatic method, which reconstructed the 3D spine and reported curvature measurements, and to evaluate the accuracy, reliability, and speed of reconstruction.

Methods

Dataset

The local health ethics board granted chart review ethics approval (Pro0010244). Three hundred and eighty biplanar low-dose PA and LAT radiographs, acquired from the EOS X-ray system, were randomly exported from the local scoliosis clinic. The inclusion criteria for radiograph selection were a) diagnosed with AIS, b) non-surgical cases, and c) out-of-brace radiographs. Among those 380 paired images, 304 (80%) were used for ML algorithm training, and 76 (20%) were used for testing.

AI-based automated reconstruction methods

To reconstruct the 3D spine, several CNNs were developed to identify and extract the entire spinal column (T1-L5), individual vertebrae, and pedicles at each vertebral level. Then, a 3D/2D registration process was applied to register the silhouette of 3D generic vertebral models to the segmented 2D vertebral boundaries. The overall procedure of the AI algorithm is shown in Fig. 2.

Fig. 2
figure 2

Procedure of the automated 3D spine reconstruction performed by the developed AI algorithm

The 3D/2D registration began with the initialization of the preliminary 3D generic models of vertebrae. The 3D vertebral model was designed to be a deformable cylinder with typical shapes for spinous and transverse processes. The location, dimensions, and rotation angles of each 3D vertebra were initially determined by the knowledge of 2D vertebrae in PA and LAT views. Then, the registration process minimized the disagreements between silhouettes of 3D vertebrae from T1 to L5 projected to the PA and LAT planes. The boundaries of 2D vertebrae were restricted based on the information from the PA and LAT views. Modifications on the dimensions of each vertebral body were conducted during the registration process.

Once the minimum disagreements were resolved or a maximum registration step was achieved, the registration process was finished to reconstruct a 3D spine. The CA, AVR, KA, and LA for the reconstructed 3D spine were computed from the projected silhouettes of 3D vertebrae. The AI algorithm outputs the 3D spinal image, the measurement results, and the corresponding measurement lines. The display of measurement lines provided confidence to clinicians on how the measurements were made.

To boost the segmentation and reconstruction speed, models were exported to ONNX format after training for inference with C++ implementation. In addition, the 3D/2D registration process was implemented using C++ programming to improve the processing speed. Using C++ programming can significantly speed up the process.

Validation analysis

To evaluate the accuracy and reliability, 4 clinical parameters: Cobb Angle (CA), axial vertebral rotation (AVR), kyphotic angle (KA) between T1 and T12, and lordotic angle (LA) between L1 and L5 were used to compare the manual versus the automatic measurements. The manual measurements were performed by a rater who had 8 + years of experience in scoliosis research and was blinded to the automatic measurements. The Cobb, KA, and LA were measured using the Cobb method. The AVR was measured based on the Stokes’ method [11]. All curves in common from both methods were used for analysis. The accuracy was evaluated based on the mean absolute difference and standard deviation (MAD ± SD). In addition, the percentages of the automatic measurements within the clinical acceptance errors for CA and AVR (5°) and the KA and LA (9°) were calculated. The inter-method intraclass correlation coefficient ICC [1, 2] with 95% confidence interval and standard error of measurement (SEM) were calculated to assess the reliability. Based on Koo's definitions, ICC [1, 2] indicated levels of reliability as poor (< 0.5), moderate (0.5–0.75), good (0.75–0.90), and excellent (≥ 0.90) [12].

A Bland–Altman analysis was also performed to evaluate the bias and agreement between the manual vs. the automated method in the CA only, because CA is primary for measuring the curvature of the spine to diagnose scoliosis [13]. Besides the overall accuracy and reliability, results grouped by the curve region and curve severity were also analyzed. The curve region was grouped according to the location of the apex, where upper thoracic (UT) corresponded to above T4, main thoracic (MT) T5-T11, thoracolumbar (TL) T12-L1, and lumbar (L) L2-L5. Since the range of the manually measured Cobb angle was 10°–50°, and the number of curves between 45° and 50° was small, the curve severity analysis was divided into 2 groups only: mild (< 25°) and moderate (25°–50°).

To report the speed of 3D reconstruction and measurement time, an average of 76 cases was used. It counted from inputting biplanar radiographs to outputting the automatic measurements. The inference and reconstruction were running on a Windows computer with an i7-12,700 Intel CPU and 16 GB RAM.

Results

The machine learning algorithms method was successfully developed to reconstruct 3D spinal images from bi-planar radiographs. Table 1 presents the comparisons of CA in accuracy and reliability including overall, different curve regions and different curve severities. In total, 134 and 128 CA were exported automatically and measured manually, respectively. The automatic method identified all the measured curves (100%). Among the 128 curves, the overall accuracy and reliability were excellent with MAD ≤ 3.3°, ICC [1, 2] > 0.95, SEM = 0.27°, and > 98% of measurements were within the clinical acceptance errors.

Table 1 AI-based versus manual measurements of Cobb Angle

Table 2 shows the analysis of AVR at the apical level with overall, neutral vertebrae (|AVR|≤ 5 o) and rotated vertebrae (|AVR|> 5°). The range of vertebral rotation was between −28.5° and 18.4°. Among the 128 measurements, the results also showed excellent accuracy and reliability MAD ≤ 1.5°, ICC [1, 2] > 0.98, SEM = 0.21°, and > 99% of measurements were within the clinical acceptance errors.

Table 2 AI-based vs. Manual Measurements of AVR at the apical vertebra

Table 3 shows the results from KA and LA on 76 measurements from both AI-based and manual measurements. The accuracy and reliability were excellent as well because the MAD was ≤ 3.3°, ICC [1, 2] > 0.94, SEM = 0.64o and > 98% of measurements were within the clinical acceptance errors for KA. Regarding the LA, the MAD ≤ 3.5°, ICC [1, 2] > 0.95, SEM = 0.56°, and 100% of LA measurements were within the clinical acceptance errors.

Table 3 Automatic versus manual measurements of KA and LA for Sagittal Balance

Figure 3 shows the Bland–Altman plot of the CA between the AI-based and manual measurements. There was almost no bias (0.6°) between the AI-based versus the manual measurements and all measurements were within the 95% confidence interval. From the figure, the distribution of the differences between manual and automatic measurements are also relatively uniform throughout the range of curve severity.

Fig. 3
figure 3

A Bland–Altman plot of automatic vs. manual CA measurements, with bias (black line) and overall average of measurements (green line). Mean of Cobb is the average of AI and manual CA, and the Cobb angle differences is AI CA–manual CA

Regarding the speed, the average time to generate 3D spinal images was 5.15 ± 1.19 s when using a PC with an Intel® I7 CPU and 16 GB RAM. Figure 4 shows the visualization result of the 3D reconstruction, with reference lines and curvature measurements.

Fig. 4
figure 4

Visualization results with measurements from the automated 3D reconstruction method

Discussion

This study reported a fast AI-based method (6 s) to reconstruct 3D spinal images, and the automatic curvature measurements (CA, AVR, KA, LA) achieved > 98% within the clinical acceptance errors. In the curvature severity analysis, the Cobb angle measurements have no statistically significant difference between the mild and moderate cases. Regarding the curvature region analysis, the upper thoracic region shows slightly lower accuracy and reliability. However, since the number of curves in the upper thoracic region is the smallest (15 only), this may affect the accuracy and reliability of measurements. Hence, more curvatures in the upper thoracic region are needed to validate the model truly. In the analysis of AVR, the neutral vertebrae (non-rotated) show a small MAD = 1.2° and 100% agreement within the clinical acceptance error. However, the reliability and SEM are relatively worse than those of the rotated vertebrae. This agrees with the Stokes method, which states that neutral vertebrae have more measurement variation than rotated vertebrae. Nevertheless, the AVR demonstrates excellent results with a large range of axial rotation, −28.5° to 18.4°. Regarding the sagittal measurements, both KA and LA reported excellent results. Even though the segmentation of T1–T12 is more challenging than L1–L5. Especially in the upper thoracic region between T1 and T5, it is more difficult to identify the boundary due to the overlapping bone structures around the arm area. The developed CNN algorithms with sufficient training datasets could report excellent results.

As the visualization results shown in Fig. 4, this automated method can identify both left and right curves in different regions and with different curve severity. The interpretation results show the apex vertebrae in brown color and the curvature measurements with reference lines.

Overall, the high accuracy, excellent reliability, and quick measurement time combined with interpretable outputs on-screen outperformed other studies reported in the literature. Table 4 compares 3 other studies [8,9,10] with the developed method. Our method showed (a) the difference between the manual and automatic measurement is the smallest, (b) the number of testing cases is the largest, (c) the displayed result is interpretable, and (d) the reconstruction result is the fastest.

Table 4 Comparison of 3D reconstruction performance based on validation sets for various automatic algorithms

Regarding limitation, the developed model only trains on radiographs with the Cobb angle < 50°. The vertebral bodies may deviate more from the general vertebrae for more severe scoliosis cases. Hence, there may be more disagreements about the generic model that we used. The processing may take longer due to more iterations. Although CNNs have the generalization ability to handle unseen cases, their accuracy, reliability, and speed may need to be validated again in severe cases. Also, the biplanar radiographs that were used for training were all from a single center. Even though the EOS X-ray system already provides a standard protocol for acquisition, each center may still adjust slightly on the energy dosage. The image quality may then be different. Hence, radiographs from different centers should be tested to investigate how scalable the AI algorithm is.

Another limitation lies in missing details of the reconstructed 3D vertebrae. To boost the reconstruction speed, the deformations of the 3D generic model were only performed on the dimensions of the vertebral bodies, which were considered cylinders during the 3D/2D registration process. Hence, the rotation angles in three dimensions merely rely on the 2D segmentation results. This comes from the assumption that the vertebral body should not have a severe wedge on its endplates, and their surfaces are regular. However, this assumption may not apply to severe cases [14], meaning that the reconstruction method should expand to vertebrae with wedges. At the same time, the silhouettes projected from the generic models might not perfectly match the actual 3D shapes of the vertebrae, which might also lead to inaccuracy in automatic measurements. Further work should be explored to provide more realistic shapes for the vertebrae to reconstruct the 3D spine.

Conclusion

This study reported a fast reconstruction AI method, which only takes 5.2 ± 1.3 s to display 3D spinal images with interpretable visualization results. The CA, AVR, LA, and KA measurements showed excellent accuracy, high reliability, and a high percentage of measurements within clinical acceptance errors. Further work involves further developing the model for severe scoliosis so that this model can be applied to a broader spectrum of cases.