FormalPara Key Summary Points

We consecutively screened 348 patients with AS through complete blood routine examinations, liver function tests, and kidney function tests.

We used three ML methods [LASSO, random forest, and support vector machine recursive feature elimination (SVM-RFE)] to screen feature variables and then took the intersection to obtain the prediction model. In addition, we used the prediction model on the validation cohort.

Our diagnostic models can help orthopedic surgeons devise more personalized and rational clinical strategies.

Introduction

Ankylosing spondylitis (AS) is a chronic progressive inflammatory disease of the spine and its affiliated tissues [1]. AS mainly affects the axial bone, sacroiliac joint, hip joint, spinal facet, and adjacent ligaments [2]. The main clinical manifestations are pain in the waist, sacroiliac joints, and hip and progressive joint stiffness, resulting in joint mobility limitation and joint deformity [3].

Millions of people are diagnosed with AS each year, but the cause of AS remains unknown. The ratio of male to female patients with AS is approximately 3:1 [4], and the incidence rate among relatives of patients is 20 times higher than that in the general population [5]. However, the genetic heritability of AS is influenced by variations in the MHC region, accounting for 40–50% of the total genetic risk of developing the disease, specifically with positivity for HLA-B27 [3].

Machine learning (ML) is a scientific discipline focusing on how computers learn using data. It is the intersection of statistics, which learns relationships from data, and computer science, which emphasizes efficient computational algorithms. ML is now widely used in the study of clinically relevant data [6, 7]. Liang et al. used LASSO regression to find that the platelet-to-lymphocyte ratio could be an independent factor in diagnosing AS [8]. Zhang et al. used machine learning to predict the volumetric response of patients with acute kidney injury [9].

AS starts gradually, and its early symptoms are mild. Patients with AS often have nephropathy and cardiovascular disease [10, 11]. Some hospitals lack HLA-B27 and related imaging instruments to assist in the diagnosis of AS. There are relatively few studies on liver function and kidney function of patients with AS. We used ML methods to construct diagnostic models based on blood routine examination, liver function test, and kidney function test of patients with AS to help clinicians enhance diagnostic efficiency and allow patients to receive systematic treatment as soon as possible.

Methods

Patients

Subjects volunteering for the study signed informed consent forms. The Ethics Committee of the First Affiliated Hospital of Guangxi Medical University approved this study, which adhered to the tenets of the Helsinki Declaration of 1964.

From 2012 to 2021, we consecutively screened 348 patients with AS through complete blood routine examination, liver function test, and kidney function test at the First Affiliated Hospital of Guangxi Medical University according to the modified New York criteria (diagnostic criteria for AS) [12]. Inclusion criteria: (1) patients with AS who met the Modified New York Criteria; (2) patients who had good compliance and no serious cardiovascular and cerebrovascular diseases; (3) patients who voluntarily accepted blood routine examinations, liver function tests, and kidney function tests. Exclusion criteria: (1) patients who cannot tolerate blood drawing or have coagulation dysfunction; (2) patients with a temperature > 37.3 °C at admission; (3) patients with liver and kidney disease .

A total of 360 patients without AS were recruited from among all the inpatients diagnosed with the other disease to complete the blood drawing test. Inclusion criteria: (1) patients clearly diagnosed with non-AS; (2) patients who had good compliance and no serious cardiovascular and cerebrovascular diseases; (3) patients who voluntarily accepted blood routine examinations, liver function tests, and kidney function tests. Exclusion criteria: (1) patients who cannot tolerate blood drawing or have coagulation dysfunction; (2) patients with a temperature > 37.3 °C at admission; (3) patients with liver and kidney disease .

By using random sampling, the patients were randomly divided into training and validation cohorts. The training cohort included 258 patients with AS and 247 patients without AS, whereas the validation cohort included 90 patients with AS and 113 patients without AS (Fig. 1).

Fig. 1
figure 1

Recruitment and screening

All clinical data were obtained from the Information System of the First Affiliated Hospital of Guangxi Medical University. The information of patients was identified by their ID number. Age, diagnosis, erythrocyte sedimentation rate (ESR), high-sensitivity C-reactive protein (hs-CRP), blood routine examination, liver function examination, and kidney function examination of all the patients were collected and statistically analyzed. Blood routine examination included white blood cell (WBC) count, red blood cell count (RBC), hemoglobin (HGB), hematocrit value (HCT), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), blood platelet count (BPC), mean platelet volume (MPV), platelet distribution width (PDW), absolute value of neutrophil (NEUT#), percentage of neutrophils (NEUT), absolute value of lymphocytes (LYM#), percentage of lymphocytes (LYM), monocyte absolute value (MONO#), percentage of monocytes (MONO), absolute value of eosinophils (ESO#), percentage of eosinophils (ESO), absolute value of basophils (BASO#), percentage of basophils (BASO), red cell distribution width (RDW), and thrombocytocrit (PCT). The liver function examination included total bilirubin (TBil), direct bilirubin (DBil), indirect bilirubin (IBil), DBil/IBil, total protein (TP), albumin (ALB), globulin (GLB), ALB/GLB ratio, gamma-glutamyl transpeptidase (GGT), total bile acid (TBA), aspartate aminotransferase (AST), alanine aminotransferase (ALT), AST/ALT, A-alkaline phosphatase (ALP), prealbumin (PAB), and cholinesterase (ChE). The kidney function examination included blood urea nitrogen (BUN), creatinine (Cr), uric acid (UA), bicarbonate radical (HCO), creatinine clearance rate (Ccr), and cysteine C (Cys-C). We tried to keep the data as complete as possible, and we excluded the very few patients who were missing before we did the statistical analysis.

Statistical Analysis

We used IBM SPSS Statistics 23 and R software (version 4.1.3; https://www.R-project.org) for data analysis. Student t-test was used to compare the mean of continuous variables between the two cohorts (i.e., patients with AS and patients without AS). t-Test data are normally distributed and have homogeneous variances. We verified the data calculation several times. hs-CRP was analyzed using chi-square test. A two-sided probability value of less than 0.05 was considered to be statistically significant for all analyses.

The nomogram survival model was constructed to predict AS by using the “rms” package [13]. The “rms” package was also used to calculate the C value and multifactor logistic regression [14, 15] to verify nomogram prediction ability. The area under the curve (AUC) of the receiver operating characteristic curve (ROC) curve and Harrell’s concordance index were used to evaluate the performance of nomogram predictions. Harrell’s concordance index was calculated to assess nomogram discrimination by using a bootstrap method with 1000 samples [16]. The “corrplot” package was used to analyze the correlation of the independent variables [17]. Decision curve analysis was conducted to determine the clinical usefulness of the nonadherence nomogram by quantifying the net benefits at different threshold probabilities in patients with AS [18]. The net benefit was calculated by subtracting the proportion of all patients who were false-positive cases from the proportion of the patients who were true-positive cases and by weighing the relative harm of forgoing interventions compared with the negative consequences of an unnecessary intervention [19]. In this study, the “rms” and “rmda” packages were used to obtain the thresholds and visualize them.

If all variables are included, machine learning operations become very difficult. So, for the training cohort, we first screened all variables with P < 0.05 using the SPSS software and then continued screening using three ML methods.

Random Forest

The random forest model uses the “randomforest” package in R software to screen out variables and calculate and visualize the relative importance of variables [20]. “%IncMSE” means an increase in the mean squared error. Values are randomly assigned to each prediction variable; if the prediction variable is more important, the model prediction error increases after its value is randomly replaced [21]. Therefore, the greater the value, the greater the importance of the variable. “IncNodePurity” denotes an increase in node purity, which is measured as the sum of the squares of residual errors and represents the impact of each variable on the heterogeneity of observed values at each node of the classification tree. The greater the value, the greater the importance of the variable [22]. Either “%IncMSE” or “IncNodePurity” was chosen as an indicator to judge the importance of the prediction variable. The most important quantity was obtained as the most suitable predictive variable through five iterations of tenfold cross-validation.

Lasso Regression

The LASSO regression model is a contraction method that actively selects from a large, potentially multicollinear set of variables in the regression to screen out risk factors and optimal predictive characteristics from the data of patients with AS. The dependent variables with P < 0.05 after calculation using Student’s t-test were used, and the “glmnet” package in R software was used for LASSO regression analysis and visualization [23, 24].

Support Vector Machine Recursive Feature Elimination

Support vector machine (SVM) recursive feature elimination (SVM-RFE) is a powerful tool, often grouped under ML. The SVM-RFE model was constructed to predict AS by using the “rms” package. In this study, tenfold cross-validation was performed on the data, the output vector characteristic index was obtained, and the variables were sorted from “most useful” to “least useful.” The smaller the AvgRank value, the greater the influence of the dependent variable on the independent variable. After sorting, we generalized error estimation for the entire data and screened the variable with the lowest common diagnosis error rate [25, 26].

Results

Data feature

Tables 1, 2, 3 and 4 show the differences in sex, age, ESR, blood routine examination, liver function and kidney function, and hypersensitive C-reactive protein between patients with AS and patients without AS in the training and validation cohorts. In the training cohort, the majority of patients with AS were male (Tables 1 and 2). The mean age of patients with AS was smaller than that of patients without AS, but the difference was not statistically significant. The proportion of ESR and hs-CRP > 10 in patients with AS was higher than that in patients without AS, and the proportion of hs-CRP < 0.8 was significantly lower than that in patients without AS. WBC, RBC, HGB, HCT, BPC, PDW, NEUT#, NEUT%, and MONO# were higher in patients with AS than in patients without AS, whereas MCHC, MPV, LYM%, and ESO# were lower in patients with AS (Tables 1 and 4). There was no significant difference in MCV, MCH, LYM#, MONO%, ESO%, BASO#, BASO%, RDW, and PCT on blood routine examination. Cr and Ccr of patients with AS were higher than those of patients without AS, whereas BUN and Cys-C were lower than those of patients without AS (Table 3). There was no statistical difference in UA and HCO between patients with AS and those without AS. Furthermore, the liver function test showed no significant differences in IBil, GGT, ALT, or PAB (Table 2). Moreover, DBil/TBil, TP, ALB, GLB, ALB/GLB, and ChE of patients with AS were higher than those of patients without AS, whereas TBil, DBil, TBA, AST, AST/ALT, and ALP were lower.

Table 1 Differences on blood routine examination
Table 2 Differences on liver function examination
Table 3 Differences on kidney function examination
Table 4 Differences in hypersensitive C-reactive protein

The correlation heat map (Fig. 2) shows the correlations between all the variables. Positive correlation was noted between HGB and HCT (hematocrit value), TBil and DBil, TBil and IBil, MCV and MCH, WBC and absolute value of neutrophils, BPC and PCT, TP and GLB, EO and absolute value of eosinophils, and BASO and absolute value of basophils. In contrast, a negative correlation was noted between LYM and NEUT and between Ccr and Cys-C (Fig. 3).

Fig. 2
figure 2

Heat map of the correlations between all the variables

Fig. 3
figure 3

LASSO coefficient profiles of the factors, using cross-validation to the optimal penalty parameter lambda. A The results of the LASSO regression analysis of dependent variables. B The 20 factors that exhibited significant differences between the patients with AS and those without AS

Machine Learning

In the training cohort, 30 factors were included in ML with P < 0.05 after t-test screening.

Random Forest

Figure 4A shows the 23 most important factors calculated using the two random forest algorithms “%IncMSE” and “IncNodePurity.” Fig. 4B shows that the ideal regression effect can be obtained by retaining the ten most important factors after tenfold cross-validation. Table 6 lists the ten important factors for the final selection of random forest regression.

Fig. 4
figure 4

Random forest screening variables. A The 23 most important factors calculated using the two random forest algorithms “%IncMSE” and “IncNodePurity.” B The ideal regression effect can be obtained by retaining the ten most important factors after tenfold cross-validation

Lasso Regression

Figure 3A shows the results of the LASSO regression analysis of dependent variables. Figure 3B shows the 20 factors that exhibited significant differences between the patients with AS and those without AS. Table 5 presents the factors screened by LASSO regression (Table 6).

Table 5 LASSO regression screened variables
Table 6 The final selection of random forest regression

Svm-Rfe

Figure 5 shows that, when 30 factors were selected as diagnostic models after SVM-RFE calculation, the error rate was the lowest, and all the factors included were meaningful for diagnosis. Table 7 shows the order of importance of the 30 factors in SVM-RFE. The smaller the AvgRank value, the greater the influence of the dependent variable on the independent variable.

Fig. 5
figure 5

The 30 factors were selected as diagnostic models after SVM-RFE calculation

Table 7 The order of importance of the 30 factors in SVM-RFE

Figure 6 shows the intersection of variables screened using LASSO, random forest, and SVM-RFE; nine variables were finally obtained: sex, ESR, RBC, HGB, MPV, TP, ALB, AST, and Cr. The AUC values for the nine variables are shown in Fig. 7.

Fig. 6
figure 6

The intersection of variables screened using LASSO, random forest, and SVM-RFE

Fig. 7
figure 7

AUCs of the intersection of variables screened

Diagnostic Mode

As can be seen in Fig. 8A, nine variables were included in the nomogram model. The optimal cutoff value of this nine-variable nomogram model is 179.459, with a sensitivity of 0.857 (95% CI 0.814–0.899), specificity of 0.806 (95% CI 0.756–0.855), and PPV of 0.822 (95% CI 0.766–0.867). NPV was 0.843 (95% CI 0.797–0.890). The C-index for the prediction nomogram was 0.878. The AUC value of the nomogram curve was 0.8777422 (95% CI 0.847–0.908) (Fig. 8C). In addition, calibration curves exhibited a satisfactory agreement between nomogram predictions and actual probabilities (Fig. 8B). The decision curve (Fig. 8D) showed that if the threshold probability of a patient and a doctor is > 1 and < 92%, respectively, using this nonadherence nomogram to predict AS nonadherence risk is more beneficial than the intervention-in-all-patients scheme or the intervention-in-none scheme[27].

Fig. 8
figure 8

The nine factors establish a nomogram for AS. A Nomogram for predicting AS probability. B Calibration curves for predicting AS probability. C AUC of the nomogram based on the nine characteristics. D Decision curve analysis for the nine-characteristic AS prediction nomogram

Simplified Diagnostic Mode

On the basis of the importance of the variables screened by ML and combined with clinical practicability, we attempted to simplify the diagnostic model to achieve high diagnostic efficiency. RBC, HGB, ALB, and TP are partially repeatable, and deletion of HGB and TP does not seriously impact the final diagnostic model, but also makes the model more concise. Finally, we selected seven variables, namely sex, ESR, RBC, MPV, ALB, AST, and CR, for the simplified diagnostic model. The optimal cutoff value of this nine-variable nomogram model is 173.139, with a sensitivity of 0.860 (95% CI 0.818–0.903), specificity of 0.798 (95% CI 0.747–0.848), and PPV of 0.816 (95% CI 0.770–0.862). NPV was 0.845 (95% CI 0.799–0.892). The C-index for the prediction nomogram was 0.878. The AUC value of the nomogram curve was 0.8779462 (95% CI 0.847–0.909) (Fig. 9C). In addition, calibration curves exhibited a satisfactory agreement between nomogram predictions and actual probabilities (Fig. 9B). The decision curve (Fig. 9D) revealed that when the threshold value of the model was set in the range of 1%–100%, the decision curve was above the NONE line and ALL line, thus indicating that the model has clinical usefulness in this range.

Fig. 9
figure 9

The seven factors establish a nomogram for AS. A Nomogram for predicting AS probability. B Calibration curves for predicting AS probability. C AUC of the nomogram based on the seven characteristics. D Decision curve analysis for the seven-characteristic AS prediction nomogram

Validation Cohort

The two diagnostic models were employed for the validation cohort for verification. The C values of this nomogram were 0.827 (nine-factor diagnostic model) and 0.823 (seven-factor diagnostic model). The calibration curves exhibited a satisfactory agreement between nomogram predictions and actual probabilities (Fig. 10A, B). Figures 10C, D show the AUC values of the two diagnostic models were 0.8267453 (nine-factor diagnostic model) and 0.8232055 (seven-factor diagnostic model). We selected the seven-factor diagnostic model for subsequent analyses.

Fig. 10
figure 10

Validation cohort. A In validation cohort, calibration curves for nine characteristics predicting AS probability. B In validation cohort, calibration curves for seven characteristics predicting AS probability. C In validation cohort, AUC of the nomogram based on the nine characteristics. D In validation cohort, AUC of the nomogram based on the nine characteristics

Discussion

Using clinically relevant data and ML algorithms, we established a prediction model (Fig. 8A) for AS. The prediction model is based on a series of predictions. In addition, three ML models were used to filter variables and then verified on a validation cohort. This artificial-intelligence-based strategy can be used by clinicians to help them choose easier diagnostic methods [28].

ML has contributed to a paradigm shift in health care wherein computers learn from patient data without employing explicit programming tasks [29]. ML offers the advantages of extensive applicability, objectivity, and repeatability when dealing with large datasets and reliable data [30, 31]. Moreover, it can help improve the quality of early diagnosis, identify disease progression, and increase the likelihood of predicting specific patient outcomes in orthopedic procedures, such as outcome scores, risk of complications, and implant survival [32, 33]. These benefits facilitate decision-making and information sharing between clinicians and patients and facilitate effective planning and rational use of healthcare services [34].

AS is a chronic progressive inflammatory disease of the spine and its affiliated tissues. Through ML screening, we screened a total of nine variables that can be used to predict AS: sex, ESR, RBC, MPV, ALB, AST, and Cr. Most of the patients with AS are male [35], but the prevalence of AS in women is gradually increasing [36]. The proportion of male patients with AS receiving medical treatment is much higher than that of female patients [8]. Male patients with AS are more likely to develop hip and spinal mobility disorders than women [36, 37]. Males score higher in our diagnostic model.

ESR is the distance of the erythrocyte subsidence at the end of the first hour to represent the rate of erythrocyte sedimentation. An increase in ESR is considered an inflammatory reaction or hyperglobulinemia in clinicopathology [38]. AS is a chronic inflammatory disease that can lead to accelerated ESR [39]. Studies have shown that ESR is associated with poor physical activity in patients with AS [40]. In our diagnostic model, ESR is positively correlated with the final score. The higher ESR is, the more likely it is to be diagnosed AS.

The results of the current study revealed that RBC and HGB increased in patients with AS [41]. Ninety percent of RBC is composed of HGB, which is mainly responsible for the transport of oxygen and carbon dioxide in the body. In addition, RBC can clear circulating immune complexes, reduce T-cell proliferation, and promote phagocytosis [42, 43]. The proportion of CD4+ T-cells in patients with AS is reduced, which may be the reason for the increase in RBC and HGB in patients with AS [44]. The higher the RBC, the higher the nomogram score. In clinical practice, MPV is often used to determine the risk of bleeding and changes in bone marrow hematopoietic function. A decrease in the MPV in patients with AS may be caused by bone marrow suppression caused by a chronic inflammatory reaction due to AS [45]. So, the lower the MPV, the higher the score in nomogram. Further research on the mechanism is required.

TP and ALB are often used in the clinical monitoring of a patient’s nutritional status. Our study showed that TP and ALB of patients with AS were higher than those of patients without AS. Elevated AST is often used for the diagnosis of liver diseases, and AST decreases significantly in patients with AS, which has no special clinical significance [46]. Cr is the product of muscle metabolism in the human body, and Cr increases significantly in patients with AS, which may be caused by impaired immune function in patients with AS. The average UA in patients with AS was also higher than that in patients without AS. These studies on AS provide a new direction for further research. In our diagnostic model, ALB and Cr values correlated positively with the final nomogram score, while AST correlated inversely.

The proportion of hs-CRP (Table 4) in patients with AS > 10 was significantly higher than that in patients without AS, and the proportion of hs-CRP < 0.8 was lower than that in patients without AS. In a study by Seulkee et al., CRP was higher in patients with symptoms of AS than in patients without symptoms [6]. WBC and NEUT were elevated in patients with AS, consistent with chronic inflammation. However, ML did not include them in predictive models .

Machine learning is widely used in diagnosing, treating, treating, preventing, and managing AS diseases. Riel et al. used computed tomography (CT) to construct an early diagnosis model using machine learning methods [47]. Samuel et al. used single-cell transcriptome and surface epitope analysis of AS to classify diseases using machine learning methods [48]. Liang et al. used LASSO regression to find that the severity of the platelet-to-lymphocyte ratio was related to the severity of AS, which is helpful for diagnosing and treating physicians [8].

This study aimed to use a dataset of 708 patients to select the best ML model. Our work has several advantages. First, there are few studies on AS using age, ESR, blood routine examination, liver function, and kidney function, and we did not find similar studies. Second, we used three ML methods to filter the data and used the validation cohort for verification. Finally, upon comparison, our model exhibited superior predictive power and ease of usage for clinicians to diagnose AS.

However, there are some limitations to this study. First, the retrospective nature of this study may have led to subjective bias and selection bias. Second, the ML algorithm model we developed is limited to one hospital, which may limit its use in other areas and requires further validation. Third, our study lacks imaging data, which may improve our diagnostic efficacy. Fourth, the predictive performance is average and can be improved further.

Conclusion

We established two prediction models that offer the advantages of good performance, high accuracy, and simplicity of use. We can effectively serve patients with AS and help doctors make a diagnosis by using predictive models. Of course, clinicians always have the final word for interpretation based on their domain expertise. In future studies, we will attempt to cover a wide range of clinical variables so that our diagnostic model can be used more accurately in a wider population.