FormalPara Key Summary Points

Why carry out this study?

Advanced technologies are certainly prompting precision medicine in rheumatoid arthritis (RA). However, their complexity and cost may limit their use to a few advanced facilities.

We propose a concept of low-cost precision medicine using readily available information.

What was learned from the study?

A machine learning approach incorporating 48 clinical parameters effectively predicted Clinical Disease Activity Index (CDAI) remission at 6 months after initiation of tumor necrosis factor inhibitors (TNFi) in addition to methotrexate (MTX).

The generalizability of the models generated in this study needs to be confirmed. However, this study suggests that regional/institutional precision medicine is feasible.

Introduction

Advances in molecularly targeted anti-rheumatic therapies have revolutionized the management of rheumatoid arthritis (RA) over the past decades. The development of biologic disease-modifying anti-rheumatic drugs (bDMARDs) and targeted synthetic (ts)DMARDs has significantly improved disease control, making clinical remission the primary treatment objective for RA [1]. In addition, the concept of "beyond remission" has gained recognition.

Current treatment guidelines recommend initiating therapy with methotrexate (MTX) in the initial phase (phase I) of RA, unless contraindicated, followed by bDMARDs or Janus kinase inhibitors (JAKis) in phase II if initial treatment proves ineffective [1, 2]. However, based on the findings of ORAL Surveillance [3] and subsequent developments, JAKis are becoming less favored as first-line therapy in phase II. As a result, current practice is leaning toward the use of MTX and subsequent bDMARDs. According to the recommendations, interleukin-6 receptor inhibitors (IL-6Ri) are preferred for patients not receiving MTX, while both tumor necrosis factor inhibitors (TNFi) and non-TNFi bDMARDs are considered for patients receiving MTX [1]. Although TNFi are considered the standard among bDMARDs [3,4,5,6,7], only about 70% of patients show a favorable response [8].

With this background, numerous efforts have been reported to predict the efficacy of TNFi in advance. These efforts have included serum proteins, single-nucleotide polymorphisms (SNPs), peripheral blood gene expressions, peripheral blood epigenomic status, peripheral blood immune cell populations (analyzed by flow cytometry), synovial histology, and comprehensive synovial gene analysis [8,9,10,11,12,13,14,15,16,17,18,19]. However, access to these advanced technologies may be limited to certain countries and advanced facilities due to cost, labor requirements, and the need for process standardization. Conversely, predictive models based on routine clinical data are more accessible. Vastesaeger et al. used clinical parameters (such as sex, age, health assessment questionnaire [HAQ], tender joint count [TJC] 28, erythrocyte sedimentation rate [ESR], and presence of complications) [20]. Their study, published in 2016, introduced a tool that predicts disease activity score 28 (DAS28) remission in response to TNFi treatment, although it is not consistent with current recommendations that advocate more stringent remission criteria, such as the Clinical Disease Activity Index (CDAI) [21, 22]. Notably, most predictive models have focused on DAS improvement, with some addressing CDAI and simplified disease activity index (SDAI) improvement [18,19,20]. However, to our knowledge, no model has been presented that predicts CDAI remission with TNFi.

The primary objective of this study was to develop a method for predicting CDAI remission in patients considering TNFi as phase II therapy after an inadequate response to MTX. To achieve our objective, our approach involved establishing a predictive model using our cohort of patients with RA. With a focus on low cost and accessibility, we exclusively used factors that are available in routine clinical practice. However, it is important to note that these models may not be universally applicable to other cohorts. Therefore, the concept of establishing regionally optimized models within each cohort becomes valuable, although this may involve potentially smaller sample sizes, leading to concerns about overtraining or overfitting. To address this concern, we planned to use Lasso, a machine learning approach known for its sparse modeling capabilities.

Methods

Data Source

Subjects for this study were selected from FIRST registry, a database of patients with RA treated with b/tsDMARDs at the University Hospital of University of Occupational and Environmental Health, Japan (UOEH) and 28 medical institutions in nine municipalities. Patients who met the ACR/EULAR 2010 RA classification criteria and did not respond to phase I therapy were referred to UOEH for further evaluation of eligibility for b/tsDMARDs. This comprehensive evaluation included assessment of contraindications and consideration of comorbidities, such as screening for infectious diseases and malignancies [23]. Subsequently, the administration of the b/tsDMARDs was decided by the patients and the treating physician under the guidance of recommendations/guidelines [1, 2, 24]. Patients eligible for b/tsDMARDs underwent initiation of b/tsDMARDs treatment, provided informed consent, and were subsequently enrolled in FIRST registry. In Japan, due to the healthcare system, patients typically visit medical facilities at approximately 4-week intervals. Under the FIRST registry framework, patients administered b/tsDMARDs at UOEH subsequently will then visit nearby medical facilities for ongoing care. They will visit UOEH at 6 months, 1 year, and then annually for professional assessments. During these visits, clinical information is documented, including RA disease activity, comorbidities, and treatment status. The term b/tsDMARDs refers to TNFi, IL-6Ri, and cytotoxic T-lymphocyte-associated protein 4 immunoglobulin (CTLA4-Ig). Notably, rituximab and anakinra are not approved for the treatment of RA in Japan and are therefore not included in FIRST registry.

Selection of b/tsDMARDs for Patients on MTX in FIRST Registry

Treatment strategies for patients eligible for b/tsDMARDs after screening involve shared decision making, where options are presented considering efficacy, safety, routes of administration, and dosing intervals. Physician recommendations include appropriate vaccination and the choice of b/tsDMARDs, tailored to the patient's risk factors [24,25,26]. Specifically, the physician recommends pneumococcal, influenza, and herpes zoster vaccines. Since the launch of abatacept (ABT) in November 2010, it is often recommended for patients at a higher risk of infection due to factors such as advanced age, coexisting lung disease, or concomitant glucocorticoid use.

For other patients, the options typically include TNFi (infliximab [IFX]/IFX-biosimilar [BS], etanercept [ETN]/ETN-BS, adalimumab [ADA]/ADA-BS, golimumab [GLM], certolizumab pegol [CZP]), CTLA4-Ig (ABT), IL-6Ri (tocilizumab, sarilumab) and JAKi (tofacitinib, baricitinib, peficitinib, upadacitinib, filgotinib), according to the latest recommendation[1]. Currently, JAKis are less commonly used as the first-line b/tsDMARDs. However, prior to the initial results of ORAL Surveillance trial [3], JAKis were also considered as first b/tsDMARDs, especially in highly active patients.

Handling Censoring in the Study

FIRST registry includes various medical institutions from multiple municipalities, with UOEH as the central hub. Some patients opt out of regular evaluations at UOEH due to distance or the accessibility limitations (e.g., residing in nursing facilities), resulting in censoring. Although multiple imputation was a possible approach, we chose to exclude censored cases from this study as their occurrence appeared to be random and unrelated to the patients' medical conditions. Indeed, an analysis comparing background factors between censored and non-censored cases showed no statistically significant differences (all p values > 0.05 after Benjamini–Hochberg false discovery rate correction, data not shown).

Analysis Procedure (Fig. 1)

Fig. 1
figure 1

Study diagram. b/tsDMARDs biologic/targeted synthetic disease-modifying antirheumatic drugs, CTLA4Ig cytotoxic T-lymphocyte-associated protein 4 immunoglobulin, IL-6Ri interleukin-6 receptor inhibitor, JAKi Janus kinase inhibitor, MTX methotrexate, TNFi tumor necrosis factor inhibitor

The primary objective of this study was to develop a predictive model for achieving CDAI remission 6 months after initiation of TNFi as the first b/tsDMARD in addition to MTX after unsuccessful MTX therapy. We included patients enrolled in the FIRST registry between August 2003 (introduction of IFX) and October 2022. Baseline characteristics at initiation of first b/tsDMARD and CDAI measurements at 6 months were collected from the FIRST registry. Patients who discontinued the first b/tsDMARD within 6 months were classified as non-remission cases (Fig. 1: Study population).

The analysis consisted of two parts. The first part focused on assessing the generalizability of a previously reported remission prediction model. We used DAS28 remission prediction model developed by Vastesaeger et al.'s (Vastesaeger’s model) from a European cohort of GLM [20]. The effectiveness of Vastesaeger’s model in predicting DAS28-ESR remission in study population (to assess its applicability beyond the original cohort) and CDAI remission (to assess its generalizability across different composite measures) 6 months after TNFi initiation was evaluated using the study population.

In the second part, we performed a retrospective analysis of our cohort to build a model based only on factors available in routine clinical practice.

  1. (i)

    Primary analysis (Fig. 1, left): The analysis compared multivariate logistic regression with stepwise variable selection, decision tree, support vector machine (SVM) and Lasso logistic regression, a machine learning technique for sparse modeling. We assessed its robustness using k-fold cross-validation (k = 10). Receiver operator curve (ROC) analysis was used to calculate cutoff values to assess model performance.

  2. (ii)

    Calendar cohort analysis (Fig. 1, middle): Given the increase in RA treatment options and the change in treatment strategies over time, we performed a sensitivity analysis. The cohort was divided by the cutoff date of October 2019, resulting in a training to validation cohort ratio of 9:1 (Calendar cohort).

  3. (iii)

    Sub-cohort analysis (Fig. 1, right): The introduction of ABT has influenced the approach to selecting bDMARDs. Subsequently, ABT tends to be prescribed to patients at high risk of infection [25]. Therefore, we performed an analysis similar to i) on the sub cohort of patients who started TNFi as their first b/tsDMARD after the launch of ABT.

In the second part, all continuous variables are standardized through the process of z-score normalization. STATA-SE17 (for logistic regression with stepwise variable selection and Lasso logistic regression) and JMP pro 16 (for decision tree and SVM) software were used for statistical analysis.

Ethical Approval

This study followed the Declaration of Helsinki and was approved by the Ethics Committee of the University of Occupational and Environmental Health School of Medicine (#UOEHCRB21-068). Informed consent was obtained from all participants based on the Ethical Guidelines for Medical and Health Research Involving Human Subjects issued by the Japanese Ministry of Health, Labor and Welfare. Written informed consent was obtained from participants enrolled after April 2015, and written or verbal consent was obtained from others.

Results

Study Diagram (Fig. 1)

Of the 5066 individuals registered in the FIRST registry, 4706 initiated their b/tsDMARDs treatment within the study period. Of these, 2223 MTX users initiated their first b/tsDMARDs, with the distribution among TNFi (N = 1630), IL-6Ri (252), CTLA4-Ig (262), and JAKi (79). The characteristics of the b/tsDMARDs naïve MTX users including the study population (MTX + TNFi) are detailed in Table 1. The entire cohort consisted of Asian individuals, with an average age of 59.2 ± 14.4 years, a body mass index (BMI) of 22.1 ± 3.7, and a mean disease duration of 75.7 ± 106.1 months. Approximately 24.0% of the patients were on glucocorticoids (GC). All patients received MTX at a mean dose of 11.3 ± 3.7 mg/week. The mean CDAI score was 26.1 ± 13.0.

Table 1 Baseline characteristics of the b/tsDMARDs-naïve MTX users in the FIRST registry

Clinically significant interstitial lung disease (ILD) was observed in only 4.3% of cases, although coexisting lung disease (such as interstitial pneumonitis, pleuritis, diffuse panbronchiolitis, bronchiectasis, old tuberculosis or inflammatory nodules detected on chest X-ray/CT) was observed in 20.8% of cases. These patients were treated with b/tsDMARDs. Of these, 90% completed the 6-month follow-up (81% continued treatment, 9% discontinued for various reasons, and 10% were censored). At 6 months, the rates of CDAI remission rates were TNFi/IL-6Ri/CTLA4-Ig/JAKi 33.0/33.7/27.2/48.5%. The rate of DAS28-ESR remission (overall 40.9%) was higher than that of CDAI remission (overall 32.9%), particularly pronounced in the case of IL-6Ri (33.7 vs. 67.3%, Table 2).

Table 2 Treatment outcome of study population at 6 months

Vastesaeger's Model Predicts DAS28 Remission, Yet Shows Inferior Predictive Performance for CDAI Remission in External Cohort

Vastesaeger et al. proposed a predictive model for TNFi-induced DAS28 remission based on six baseline factors: sex, age, HAQ, TJC28, ESR, and comorbidities [20]. This model was validated in an external European cohort of 674 cases [27]. We simulated whether this model predicts DAS28-ESR remission and CDAI remission in an Asian cohort (Supplementary Materials Table S1).

In the analysis of the study population, participants with missing variables or censoring were excluded. The results showed that 40.0% achieved DAS28-ESR remission and 35.5% achieved CDAI remission at 6 months (Supplementary Materials Table S1, far left). Remission probability scores [20] were calculated for each patient using Vastesaeger's model, and its performance in predicting remission was evaluated using the specified cutoffs (20%, 30%, 40%, 50%, 67% as in Ganhao et al. [27], along with cutoffs determined via ROC analysis in the study population.

Vastesaeger's model showed significant predictive ability for DAS28-ESR remission within the study population. For example, using a 20% cutoff, 59.2% of b/tsDMARDs naïve MTX users were identified as predicted TNFi responders, with 50.4% of them achieving DAS28-ESR remission at 6 months (= positive predictive value [PPV]). Conversely, only 25.0% of those classified as predicted TNFi non-responders achieved remission (= 1- negative predictive value [NPV]). The performance of the model across different cutoffs included: subjects classified as predicted TNFi responders 1.8–59.2%; PPV 50.4–69.9%; 1-NPV 25.0–39.5%. The risk ratio, which represents the ratio of remission rates in predicted TNFi responders to predicted TNFi non-responders, ranged from 1.73 to 2.03 (Supplementary Materials Table S1, top half).

Next, we evaluated the predictive ability of Vastesaeger's model for CDAI remission (Supplementary Materials Table S1, bottom half). It also showed some predictive ability: those classified as predicted TNFi responders 1.8–59.2%; PPV 40.7–54.5%; 1-NPV 27.8–35.2%. Notably, however, the risk ratios ranged from 1.38 to 1.55, indicating that Vastesaeger's model was less effective in distinguishing TNFi responders from non-responders in advance when using CDAI.

Vastesaeger's Model Fails to Predict CDAI Remission for Certain TNF Inhibitors

The Vastesaeger's model, developed using a cohort of patients treated with GLM, predicted overall CDAI remission in the study population, but may have variable predictive performance across different TNFis. Hence, the study population was stratified by each TNFi to assess the performance of Vastesaeger's model (Supplementary Materials Table S2). GLM was commonly prescribed to MTX non-users in FIRST registry, resulting in a small sample size for this analysis (N = 29). Therefore, they were excluded.

The predictive ability of the Vastesaeger's model for CDAI remission in the study population was assessed using a cutoff value of 21% (Supplementary Materials Table S1, bottom: ROC cutoff). It successfully distinguished TNFi responders from non-responders in patients treated with ETN (PPV 36.1%, 1-NPV 18.2%, RR 1.98 [1.21–3.25]) or ADA (PPV 47.0%, 1-NPV 29.2%, RR 1.61[95% confidence interval (CI): 1.19–2.17]). However, its predictive ability was not observed for IFX (risk ratio: 1.36 [0.99–1.85]) or CZP (1.14 [0.85–1.53]) (Supplementary Materials Table S2).

Development of a CDAI Remission Predictive Model

Vastesaeger's model, based solely on clinical information, predicted DAS28-ESR remission in our Asian cohort. However, its predictive performance for CDAI remission was limited and showed instability in predictive ability across different TNFis.

Therefore, we aimed to develop a CDAI remission predictive model using FIRST registry. FIRST registry comprehensively collects clinical items related to RA. Table S3 in the Supplementary Materials shows 48 such items along with their missing rates. Before the formulation and publication of CDAI, evaluators global assessment (EGA) was not routinely documented. Anti-citrullinated protein antibodies (ACPA) was not accessible before 2007. As a result, the missing rates for EGA and ACPA are particularly high. First, we compared various machine-learning approaches; logistic regression with stepwise variable selection, decision tree, SVM and Lasso logistic regression using 48 factors and performed k-fold cross-validation, dividing the cohort into a 9:1 training: validation ratio. The models built using the training cohorts were evaluated for performance using the corresponding validation cohorts. Figure 2 summarizes the performance of the tested approaches. Stepwise and Lasso predicted response rates of 43.9% and 42.1% in b/tsDMARDs-naïve MTX users, whereas decision tree and SVM predicted lower rates (22.8% and 26.1%). Stepwise and Lasso had higher sensitivities, whereas decision tree and SVM had higher specificities. PPV and 1-NPV reflect CDAI remission rates in predicted responders and non-responders and have implications for clinical decisions: agent selection. All approaches showed comparable PPV, 1-NPV and risk ratios (determined by the PPV/1-NPV ratio). Notably, decision tree and SVM models showed more variation. Figure S1 in the Supplementary Materials shows the risk ratios and 95% CIs of models generated using different approaches with k-fold cross-validation. Some models had a 95% CI of the risk ratio below 1, indicating potential inadequacy in separating responders from non-responders. Such models were more likely to be obtained from decision tree (40%) and SVM (30%), while Stepwise and Lasso showed their stability. Model accuracy was comparable among the four approaches, with Stepwise and Lasso showing the higher F1 scores (Fig. 2). In summary, these analyses revealed favorable aspects of model performance, particularly with Stepwise and Lasso.

Fig. 2
figure 2

Comparative analysis of different machine learning models. Multivariate logistic regression with stepwise variable selection (Stepwise), decision tree, support vector machine (SVM), and lasso logistic regression (Lasso) were used to generate Clinical Disease Activity Index (CDAI) remission prediction models. Each approach used k-fold cross validation (k = 10). NPV negative predictive value, PPV positive predictive value, risk ratio PPV/(1-NPV), SVM support vector machine, TNFi tumor necrosis factor inhibitor

The performance of the Stepwise and Lasso models, which appear to support the clinical decision to avoid agent selection on a trail-and-error basis, raises concerns about their effectiveness in external cohorts and may be specific to our cohort. In this scenario, consideration of establishing CDAI remission prediction models tailored to each institution (regional/in-house precision medicine) becomes a viable option. However, limited sample size is a potential barrier to this approach, prompting an in-depth evaluation of sparse modeling. Table 3 details the factors selected by Lasso and Fig. 3 shows the ROC curves. Among the 48 factors, patient global assessment, patient pain assessment, HAQ, concomitant GC use, history of malignancy, and serum IgG were consistently included in all models and showed notable coefficients. The performance of the Lasso-generated CDAI remission predictive model, based on the ROC analysis cutoff for predicted TNFi responders/non-responders, was as follows: AUC 0.676 [95% CI 0.642–0.711], PPV (remission in predicted TNFi responders) 53.2% [49.4–57.0%], 1-NPV (remission in predicted non-TNFi responders) 26.4% [24.0–28.9%], risk ratio 2.01 [1.75–2.26] (Table 4, bottom).

Table 3 Variables selected in the Lasso-generated CDAI remission predictive models
Fig. 3
figure 3

Areas under the curve of the Lasso-generated CDAI remission predictive models. Predictive models for Clinical Disease Activity Index (CDAI) remission were generated using training cohorts (A) and their performance was evaluated using validation cohorts (B). Figures show receiver operating characteristic (ROC) curves calculated in A the training cohorts and B the validation cohorts. The ten curves represent the k-fold cross-validation cohort

Table 4 Performance of Lasso-generated CDAI remission predictive models

Because Vastesaeger's model showed variable stability in predicting CDAI remission across different TNFis (Supplementary Materials Table S1-2), we evaluated the performance of our Lasso-generated CDAI remission predictive model for each TNFi. Across all drugs, the Lasso-generated model consistently demonstrated an AUC of 0.667–0.736, a PPV of 48.2–56.8%, a 1-NPV of 16.3–30.8%, and a risk ratio greater than 1 (1.85–3.01) (Supplementary Materials Fig. S2). This suggests that it can predict CDAI remission regardless of the specific drug used. However, it is important to note that the GLM was not included in this analysis.

Sensitivity Analysis: Calendar Cohort

Given the expanding range of RA treatment options over time, predictive models developed using older data, which may have limited coverage of current treatments, could be less effective in the current era. To address this, we performed a sensitivity analysis on Calendar cohort (Fig. 1, middle). As shown in Table S4 in the Supplementary Materials, the Lasso-generated CDAI remission predictive model from the training-calendar cohort showed a relatively lower yet statistically significant ability to predict CDAI remission during TNFi inhibitor use (PPV 58.8%, 1-NPV 36.5%, risk ratio 1.61 [95% CI: 1.05–2.46]).

Sensitivity Analysis: Performance of Lasso-Generated CDAI Remission Predictive Model in the Post-Abatacept Setting

ABT is recommended for patients at higher risk of infection [25]. In FIRST registry, ABT is often selected for such patients, which may bias the predictive performance of the Lasso-generated CDAI remission prediction model. Therefore, we performed Lasso logistic regression using k-fold cross-validation in a cohort consisting only of patients after ABT-launch. As shown in Figure S3 in the Supplementary Materials, the validation cohort demonstrated a robust predictive performance with a PPV of 54.5% [95% CI: 49.2–59.8%], a 1-NPV of 29.5% [25.0–34.0%], and a risk ratio of 1.92 [1.62–2.23].

This sensitivity analysis also resulted in a reduced sample size (N = 1630–930). When comparing the model performance between Stepwise and Lasso in the 930 patients, Lasso shows higher AUC, sensitivity, specificity, PPV, risk ratio, accuracy, F1 score and lower 1-NPV (remission rate in predicted non-responders). These results highlight the benefits of sparse modeling (Fig. 4).

Fig. 4
figure 4

Stepwise- vs. Lasso-generated CDAI remission predictive models in sparse cohort. The box plots along and corresponding values show the mean and 95% confidence intervals (CI) of performance for Clinical Disease Activity Index (CDAI) remission predictive models generated multivariate logistic regression by stepwise variable selection (Stepwise) or lasso logistic regression (Lasso). Both approaches include ten models generated using the k-fold cross-validation method. The cohort includes 930 subjects who were enrolled after November 2010. Cutoff values were determined by receiver operating characteristic curve analysis. AUC area under the curve, NPV negative predictive value, PPV positive predictive value, Risk ratio PPV/(1-NPV)

Clinical Application of the Lasso-Generated CDAI Remission Predictive Model

For clinical implementation, we considered the cutoff values of the Lasso-generated CDAI remission predictive models. Table 5 shows the performance validation of the models which are shown in Table 4. Using a log-odds cutoff of 0.400 (close to the cutoffs calculated by ROC analysis: 0.408–0.467), 41% (37–43%) were classified as predicted TNFi responders, of which 55% (53–56%) achieved CDAI remission 6 months after TNFi initiation, while 26% (24–27%) of the predicted TNFi non-responders achieved remission. In this case, it may be a decision-making option to start TNFi in 41% of MTX users who are naïve to b/tsDMARDs and to consider non-TNFi/JAKi for the remaining 59%. However, TNFi is considered the standard of care based on its extensive history and evidence and is therefore expected to be used in a larger patient population. Using log-odds of 0.300, 61% were classified as predicted TNFi responders, with CDAI remission rates of 47% in predicted TNFi responders and 23% in predicted TNFi non-responders. Other cutoffs may be considered based on the specific circumstances of the institution (i.e., remission rates with non-TNFis and JAKis).

Table 5 Optimization of the cutoff value predicting CDAI remission

Discussion

While advances in b/tsDMARD treatments have significantly improved the management of RA, no drug is universally effective for all patients. A model that predicts an individual patient's response to treatment can reduce trial-and-error treatment decisions and will certainly aid in the selection of treatment options. However, despite numerous attempts, no established method has been developed [28]. The aim of this research was to create a simple and affordable model using machine learning techniques that could predict CDAI remission 6 months after starting first b/tsDMARDs treatment with TNFi, in addition to MTX. The model was designed for practical use in standard clinical settings and relied solely on clinical information available in routine care. In our initial comparison, we assessed multivariate logistic regression analysis, decision tree, SVM, and Lasso logistic regression using K-fold cross-validation. Of the four approaches tested, Stepwise and Lasso showed advantages in terms of stability (lower deviation) and F1 score (Fig. 2). Furthermore, Lasso demonstrated robustness in the context of the sparse cohort, as shown in Fig. 4. These results suggest a potential strategy of selecting an alternative b/tsDMARD class (e.g., IL-6Ri, CTLA4-Ig, JAKi) for Lasso-predicted TNFi non-responders (Supplementary Materials Figure S4).

Predicting remission with TNFi treatment has been studied extensively. Initially, researchers tried to use various markers such as age, gender, disease duration, serum biomarkers (rheumatoid factor: RF, ACPA, inflammatory mediators), genetic information (SNP), blood gene patterns, and immune cell types from blood samples (examined by flow cytometry) [8, 9, 11, 13, 15]. However, these methods have not been widely accepted due to issues with accuracy and reproducibility [8, 9, 11, 13]. Recent efforts have focused on comprehensive gene expression analysis in peripheral blood [10,11,12] and synovial tissue [14] to predict treatment response. Several studies have introduced a multi-omics approach, combining factors such as peripheral blood gene expression and epigenetic status [17]. Other attempts have used synovial biopsies, clustering synovial tissue types based on histological/genetic multi-omics to predict treatment response [16, 18]. However, while these advanced methods appear promising, they require further testing for reproducibility. In addition, these methods may not be widely accessible due to cost and complexity of analysis, limiting their use to specific healthcare settings. Thus, the potential use of predictive models that integrate readily available clinical data seems noteworthy, especially in scenarios with ethical or economic constraints on access to advanced technology.

In this context, a notable study emerged in 2012 (published in 2019) that presented a machine learning-based method that integrated clinical and SNP data to predict responsiveness to TNFi treatment (assessed by change in DAS28 at 24 months) [29]. This analysis, conducted across international cohorts, showed minimal SNP contribution due to significant interethnic variation. It highlighted the sufficiency of four clinical parameters (sex, age, MTX use, and baseline DAS28) for accurate prediction [29]. These findings offer encouraging prospects for achieving high predictive efficacy even in resource-constrained healthcare settings. However, concerns have been raised about the translation of these findings into daily practice due to its focus on DAS28 remission at 24 months and the potential increased computational burden associated with Gaussian process regression.

Vastesaeger et al. reported in 2016 a simple matrix derived from clinical information (sex, age, HAQ, TJC28, ESR, presence of comorbidities) to predict treatment response in csDMARD non-responders augmented with GLM [20]. This predictive model has demonstrated efficacy in an external European cohort including multiple TNFis [27]. However, this tool also defines DAS28 remission as the outcome, which is not consistent with current recommendations for more stringent remission criteria such as CDAI [21, 22]. In fact, while Vastesaeger's model accurately predicted DAS28-ESR remission in our Aian cohort, its predictive performance for CDAI remission was inferior (Supplementary Materials Table S1). Furthermore, the predictive performance of Vastesaeger's model for CDAI remission did not show stability across different TNFis (Supplementary Materials Table S2).

This study demonstrated the development of a predictive model using a machine learning technique, Lasso, capable of predicting CDAI remission 6 months after TNFi initiation (Table 4). Notably, this model stably predicted remission across IFX, ETN, ADA, and CZP (Supplementary Materials Figure S2).

Reports by Vastesaeger [20] and Guan[29] demonstrated the potential of predictive tools based solely on clinical information to be applicable to external cohorts. Interestingly, all models generated by lasso regression with K-fold cross-validation in our study selected only PRO and quantitative data, excluding subjective physician input. K-fold cross-validation generated 10 CDAI remission predictive models by lasso logistic regression, showing general consistency in variable selection from a pool of 48 factors relevant to daily RA practice: Sex, RA duration, PROs, RF, ACPA, concomitant GC use, and history of malignancy. These factors are known contributors to refractory disease [8, 9, 30]. This suggest the potential for extrapolation of our model; however, it should be noted that our models also selected certain factors, such as RF and antinuclear antibody (ANA) titers (Table 3), for which measurements lack international standardization. Therefore, external validation in other cohorts remains crucial to confirm the reliability of our study results.

Should our findings prove to be less generalizable, it may be worthwhile to explore the tailored prediction models for different regions or institutions (regional/in-house precision medicine). Our study relied on data at the initiation of b/tsDMARDs and subsequent CDAI at month 6, highlighting the feasibility of establishing observational cohorts and creating tailored predictive models for different regions or institutions. The conventional multivariate logistic model with stepwise variable selection was also explored for model building. This approach yielded positive results in our cohort of 1630 subjects. However, the simulation of sample size reduction showed that the Stepwise model decreased its effectiveness while Lasso model remained effective, thus we advocate for the use of Lasso logistic regression because of its ability to mitigate concerns regarding overtraining, especially in cases with limited sample sizes, a major concern with this approach.

This study has certain limitations. The accuracy of the Lasso-generated CDAI remission predictive model in the Calendar cohort decreased (Supplementary Materials Table S4). In Calendar cohort, which was split in a 9:1 ratio with a cutoff of October 2019, increased censoring was observed in the validation cohort (32 vs. 5% in training cohort). This was undoubtedly influenced by the COVID-19 situation, which have had an impact on the reduced accuracy observed.

Second, this study had fewer subjects treated with GLM. Thus, the proposed predictive model and the concept of regional/in-house precision medicine may not be feasible in cohorts with higher rates of GLM use.

Third, our selection strategy among b/tsDMARDs classes (TNFi, IL-6Ri, CTLA4-Ig, JAKi) might influence the study results. For example, prescribing ABT to individuals at high risk of infection in accordance with guideline recommendations [25]. However, this strategy is likely to be widely adopted across regions and institutions, which may not diminish the impact of our study results.

The limited predictive power of the machine learning models in this study may raise questions about the relevance of the study results. While the integration of advanced technologies such as proteomics or transcriptome analysis of blood or synovium has the potential to improve predictive power, it comes with higher costs and feasibility challenges. Likewise, unexplored statistical approaches such as deep neural networks or Gaussian process regression could improve model effectiveness, although these algorithms have high computational costs. It is important to recognize that our study, based on a single cohort, currently has limitations in terms of generalizability. Therefore, the objective of this study was not to determine the most effective model for our cohort, but rather to propose the feasibility of the concept: establishing a predictive model within each cohort. Each cohort can explore different statistical approaches and select the most appropriate among those generated.

Conclusions

This study suggested the potential of a predictive model for CDAI remission following TNFi treatment using Lasso-based machine learning. While validation in external cohorts is required to confirm its extrapolation, promising indications emerge from two previous studies.

This study also suggests the feasibility of creating cost-effective models customized for individual regions/institutes and implementing regional/in-house precision medicine approaches.