Automated Clustering Technique (ACT) for Early Onset Scoliosis: A preliminary report

Purpose While the C-EOS system helps organize and classify Early Onset Scoliosis (EOS) pathology, it is not data-driven and does not help achieve consensus for surgical treatment. The current study aims to create an automated method to cluster EOS patients based on pre-operative clinical indices. Methods A total of 1114 EOS patients were used for the study, with the following distribution by etiology: congenital (240), idiopathic (217), neuromuscular (417), syndromic (240). Pre-operative clinical indices used for clustering were age, major curve (Cobb) angle, kyphosis, number of levels involved in a major curve (Cobb angle) and kyphosis along with deformity index (defined as the ratio of major Cobb angle and kyphosis). Fuzzy C-means clustering was performed for each etiology individually, with one-way ANOVA performed to assess statistical significance (p < 0.05). Results The automated clustering method resulted in three clusters per etiology as the optimal number based on the highest average membership values. Statistical analyses showed that the clusters were significantly different for all the clinical indices within and between etiologies. Link to the ACT-EOS web application: https://biomed.drexel.edu/labs/obl/toolkits/act-eos-application. Conclusion An automated method to cluster EOS patients based on pre-operative clinical indices was developed identifying three unique, data-driven subgroups for each C-EOS etiology category. Adoption of such an automated clustering framework can help improve the standardization of clinical decision-making for EOS.


Introduction
Early onset scoliosis (EOS) is a progressive spine deformity defined by an onset of spinal curvature greater than 10 degrees in children under 10 years of age [1,2]. EOS is believed to account for nearly 10% of all pediatric scoliosis cases, with the true prevalence unknown [3,4]. While EOS encompasses different etiologies with a variable natural history, the resulting progressive spine and rib cage deformity that can very likely lead to thoracic insufficiency syndrome which is characterized by reduced lung development and growth [4][5][6][7]. To help better understand clinical presentations in the heterogeneous EOS patient population, the Classification of EOS (C-EOS) was developed by an expert committee of spine surgeons to group patients using preoperative clinical indices such as age, etiology (congenital, idiopathic, neuromuscular, syndromic), major curve (Cobb) angle (< 20°, 20-50°, > 50-< 90°, > 90°) and kyphosis (< 20°, 20-50°, > 50°) [8]. The C-EOS system helps assign a patient to 1 of 48 subgroups based on the aforementioned clinical indices. In 12 of these subgroups, patients have < 20° of major curve (Cobb) angle group and rarely need surgical intervention due to the low extent of the deformity. C-EOS is the only available classification system to group EOS patients based on the pre-operative clinical indices. While the system is both reliable and accurate, the cut-offs for major curve (Cobb) angle and kyphosis are not based on a data-driven approach [9]. In addition, the 48 subgroups in C-EOS limit meaningful analysis due to a small number of patients in each subgroup, thereby making it difficult to correlate interventions with outcomes [10]. While C-EOS has helped establish a standardized method to communicate different aspects of EOS deformity, it is not widely used like the Lenke classification system for Adolescent Idiopathic Scoliosis (AIS) to help guide surgical treatment [11,12].
Currently, there is limited consensus among surgeons on treatment selection with variations in the timing of surgery, selection of treatment modality, and levels involved in the instrumentation [13][14][15][16]. Management decisions are guided more by a clinician's experience and training, complicating outcome comparisons between institutions. This is in part due to C-EOS system not providing guidance on treatment decision-making. Clustering methods have been used to detect clinically relevant curve patterns for the AIS population using pre-operative clinical indices and a few studies have even used a data-driven approach to cluster such patients [17][18][19][20][21]. Such automated clustering methods and analysis have yet to be attempted for EOS.
Due to the need for data-driven grouping of the heterogenous EOS patient population, the objective of this study is to use machine learning-based clustering methods to generate a limited number of automated, meaningful subgroups based on pre-operative clinical indices of EOS patients. Such a framework can be used along with the existing C-EOS system as the foundation to build a novel automated classification system informed by patient data to provide standardized guidance for surgical interventions for EOS.

Patient data
After institutional review board approval, EOS patients were retrospectively selected from the prospectively maintained database of the Pediatric Spine Study Group (PSSG) over a 20-year period (1995-2015). For all subjects, pre-operative PA, and lateral radiographs along with measurement of preoperative clinical indices were obtained. For patients with multiple clinical visits, only the clinical indices from the first visit were considered. Subjects with missing clinical indices were excluded, which resulted in a total of 1170 subjects (congenital (n = 259), idiopathic (n = 224), neuromuscular (n = 440), and syndromic (n = 247)) being included for the automated clustering analyses.

Clinical indices measurement and automated clustering method
Pre-operative clinical indices used for clustering were age, major curve (Cobb) angle, kyphosis, number of levels involved in a major curve (Cobb angle) and kyphosis along with deformity index (defined as the ratio of major Cobb angle and kyphosis. Fuzzy C-means clustering algorithm is an automated method to group data into a specific number of clusters provided by the user. The algorithm works by initially finding cluster centroids based on the data, after which every datapoint is assigned a membership value that denotes how close it is to the cluster centroid. Based on these membership values, datapoints are clustered together [22]. In the current study, a Fuzzy C-means clustering algorithm was used to cluster EOS patients for each etiology based on age, major curve (Cobb) angle, kyphosis, number of levels involved in the major curve, number of levels involved in kyphosis and deformity index. Hence, four cluster analyses were performed for each of the four EOS etiologies namely congenital, idiopathic, neuromuscular, and syndromic. One of the advantages of the fuzzy c-means clustering algorithm is that it considers the heterogeneity of the dataset and is able to identify outliers using the membership values.
For this study, the optimal number of clusters for each etiology was decided based on the average membership values of the clusters. Different number of clusters were used to perform the clustering analysis, and the one that resulted in the highest average membership value was selected as the optimal number.

Clustering evaluation metric
The membership values calculated during the fuzzy C-means analyses describe how close a datapoint is to the calculated cluster centroid, hence offering the basis to detect outliers. These membership values help quantify the certainty (i.e. accuracy) of a data point belonging to a particular cluster.
Since these values provide accuracy on a data point and its cluster center, these can also be used a metric to determine outliers by defining a lower bound. For the current study, a membership value of 0.6 was used as the lower bound with values below this deemed as outliers. Hence, subjects who had clinical indices with membership values lower than 0.6 were considered as not belonging to a cluster and were excluded from the clustering analyses.

Statistical analysis
MATLAB (v2020b, The MathWorks Inc, Natick, MA) was used to perform statistical analyses. Average and standard deviation values for each cluster within an etiology were computed and compared with each other using a one-way ANOVA (< 0.01), to detect differences between the clusters. Furthermore, Tukey's post-hoc analysis was performed to detect differences between the clusters of all the etiologies.

Results
For fuzzy-C-means, based on the average membership values, three clusters per etiology resulted in the highest membership values. Out of 1170 eligible subjects, a total of 56 subjects (congenital (n = 19 out of 259), idiopathic (n = 7 out of 224), neuromuscular (n = 23 out of 440), and syndromic (n = 7 out of  Figs. 1, 2, 3, 4, respectively. For each etiology, oneway ANOVA showed that the clinical indices for any cluster were significantly different (p < 0.01) from other clusters. Link to the ACT-EOS web application: https:// biomed. drexel. edu/ labs/ obl/ toolk its/ act-eos-appli cation.

Discussion
This is the first study to perform automated fuzzy C-means clustering of EOS subjects based on pre-operative clinical indices, complementing the existing C-EOS classification system. While the C-EOS system uses Cobb angle and kyphosis, which were based on expert review informed by normative values derived from the literature, the current clustering method uses a data-driven approach, by using the highest average membership values to determine subject

Exemplar images (Sagittal)
in each cluster would be classified under multiple C-EOS groups which would not be conducive to clinical decision making [23]. This is supported by previous studies that have reported limitations of large EOS subgroups to aid in clinical outcome determination [10].
In this study, we identified unique clusters within each EOS etiology that were significantly different from each other based on all the clinical indices. Although one cluster in the idiopathic and congenital groups, respectively, had severe kyphotic deformities, these clusters still had a deformity index greater than one (i.e. greater major curve Cobb angle as compared to kyphosis). In contrast, both neuromuscular and syndromic groups had one cluster each with a deformity index less than one indicative of greater kyphosis as compared to the major curve Cobb angle. Although further validation is required for these methods to be widely adopted, the novel preliminary findings reported in the current study may point to differences in general deformity patterns observed between EOS etiologies, which are not immediately discernible with the C-EOS system. Additionally, while post-hoc testing showed significant differences across all clusters for all etiologies, we may not fully capture the heterogeneity as these are solely based on radiographic measurements.
Limited previous studies for automated clustering of patients with spine deformity have used techniques such as ISOData and K-means + + [20,24]. While these methods were able to create clusters based on patient deformity characteristics, they do not provide a quantitative basis to assign a patient to a particular cluster, and also lack the ability to assess whether or not a patient is an outlier. On the other hand, Fuzzy C-means, which is an automated unsupervised clustering algorithm that performs well on heterogenous datasets, uses membership values to provide a probabilistic estimate of a patient belonging to a particular cluster. Higher membership values (> 0.6) would indicate a greater probability of a patient's assignment to a cluster, and thereby also help identify and exclude any outliers.
The current study is not without limitations. First, to better address the heterogeneity of EOS subjects, the current clustering method can be further improved. For example, a small number of subjects (n = 56 out of 1170), were excluded in any cluster due to low membership values (i.e. < 0.6) of the limited radiographic measurements. Inclusion of additional structural (such as thoracic deformity parameters), functional (such as pulmonary functional parameters) and time-based measures (Annual Progression Modifier (APM)) may help better address the variable presentations of EOS deformity. Secondly, we used a multi-institutional and multi-modal dataset with clinical indices measured made by multiple observers. Variations associated with such datasets would impact any classification method equally. However, the fuzzy C-means algorithm would be less affected by such variations due to the automated selection that is unbiased by preset cut-offs for clinical indices.

Conclusion
An automated framework to cluster EOS patients based on pre-operative clinical indices was developed identifying three unique, data-driven subgroups for each C-EOS etiology category. Adoption of this clustering framework with further validation may assist with clinical applications such as surgical planning, optimization of intervention type and timing to improve clinical outcomes and reduce complications.
Acknowledgements This project was partly supported by funding from Drexel University, and the Wyss-Campbell Center for Thoracic Insufficiency Syndrome at the Children's Hospital of Philadelphia.
Author contributions GV: Conception, data collection, data curation and analysis, methodology, data visualization, data analysis and interpretation, data visualization, writing-original draft and editing, agree to be accountable for the work, approval of the final version of the manuscript. PJC: Methodology, data curation and analysis, resources, formal analysis, writing-critical review and editing, approval of the final version of the manuscript, agree to be accountable for the work. MGV: Methodology, data curation and analysis, resources, formal analysis, writing -critical review and editing, approval of the final version of the manuscript, agree to be accountable for the work. BAW: Formal analysis, data curation and analysis, writing-critical review and editing, approval of the final version of the manuscript, agree to be accountable for the work. Pediatric Spine Study Group: Data curation and analysis, formal analysis, writing-critical review and editing, approval of the final version of the manuscript, agree to be accountable for the work. SB: Conceptualization, methodology, formal analysis, resources, data collection, data curation and analysis, writing-original draft, visualization, supervision, project administration, funding acquisition, approval of the final version of the manuscript, agree to be accountable for the work.
Funding Wyss-Campbell Center for Thoracic Insufficiency Syndrome.
Data availability All data are available with the Pediatric Spine Study Group.

Conflict of interest The authors declare no conflicts of interest.
Ethical approval This study was given Ethical Approval by the Institutional Review Board prior to the conduct of this study. Informed consent and/or a waiver of consent was obtained.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.