The Fragility of Statistical Findings in Cervical Disc Arthroplasty: a Systematic Review of Randomized Controlled Trials

Purpose This study employs both the fragility index (FI) and fragility quotient (FQ) to assess the level of robustness in the cervical disc arthroplasty (CDA) literature. We hypothesize that dichotomous outcomes involving CDA would exhibit statistical vulnerability. Methods A PubMed search was conducted to evaluate dichotomous data for randomized controlled trials (RCTs) in CDA literature from 2000 to 2023. The FI of each outcome was calculated through the reversal of a single outcome event until significance was reversed. The FQ was calculated by dividing each fragility index by the study sample size. The interquartile range (IQR) was also calculated for the FI and FQ. Results Of the 1561 articles screened, 111 met the search criteria, with 35 RCTs evaluating CDA included for analysis. Six hundred and ninety-three outcome events with 130 significant (P < 0.05) outcomes and 563 nonsignificant (P ≥ 0.05) outcomes were identified. The overall FI and FQ for all 693 outcomes were 5 (IQR 3–7) and 0.019 (IQR 0.011–0.043). Fragility analysis of statistically significant outcomes and nonsignificant outcomes both revealed an FI of 5. All of the studies reported loss to follow-up (LTF) data where 65.7% (23) did not report or reported an LTF greater or equal to 5. Conclusions The literature regarding CDA RCTs lacks statistical robustness and may misrepresent the conclusions with the sole use of the P value. By implementing the FI and FQ along with the P value, we believe the interpretation and contextualization of the clinical data surrounding CDA will be better understood.


Introduction
Cervical disc arthroplasty (CDA) is an evolving surgical approach for addressing degenerative disc diseases in the cervical spine and has gained attention as an alternative to anterior cervical discectomy and fusion (ACDF).This procedure involves replacing degenerative discs with artificial the knowledge gaps in the field of cervical disc arthroplasty [8][9][10].
Considering all possible complications of cervical disc diseases, applications of evidence-based medical research become essential when choosing the best treatment method.Statistical approaches that utilize the P value reveal the significance of potential divergences between clinical interventions [11].In the case of CDA, evidence-based medical practice can apply the foundational implications of said statistical results to determine treatment outcomes for damaged cervical discs.Various clinical literature has used the P value as a conventional evaluation method for interpreting the marginal significance of statistical results.In recent research findings, novel methodologies have been adopted with the utilization of the fragility index (FI) and fragility quotient (FQ) to assess the robustness of clinical trial results.When evaluating the fragility of randomized controlled trials (RCTs), the FI is to reveal the necessary number required to reverse a trial from high degrees of significance to low or none [12].In addition, as a measurement of fragility, the FQ further assesses the FI concerning the sample size by dividing the FI by sample size [13].Combined, the FI and FQ complement the P value and promote confidence that low degrees of fragility (higher fragility index) indicate more robust clinical trial data.
Evidence-based medical research will enable physicians to practice evidence-based patient care when treating cervical disc diseases.Despite the debate between conservative and nonconservative management of cervical disc diseases, available literary evidence continues to explore the overall patient outcomes of CDA [14][15][16].With these considerations, it becomes even more essential to analyze the robustness of the results in the literature.This study evaluates the degree of statistical fragility in the cervical disc arthroplasty (CDA) literature.We hypothesize that the dichotomous outcomes within the CDA literature are statistically fragile and will be vulnerable to a small number of outcome event reversals.

Methods and materials
A systematic review was conducted to compare statistical robustness for cervical disc injuries.Since this was a systematic review, no ethical consent or IRB consultation was needed to continue this study.The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were used for article identification and consequent selection.The PubMed database was queried from 2000 to 2023, for all RCTs relating to cervical disc arthroplasty.The language was restricted to English.Search criteria involved articles containing: "cervical disc arthroplasty" OR "cervical disc replacement".Study outcomes with levels of statistical significance displayed a P value of less than 0.05.Conversely, statistically nonsignificant data had a set P value greater than or equal to 0.05.A cross-tabulation was created to compare and categorize the FI of each dichotomous outcome event.The number of outcome events was modified to reverse significance.The FI is defined as the designated number that was required to reverse the significance of patient outcomes (Table 1).The FI was calculated and recorded for each outcome, event, and nonevent.The median FI that incorporated all outcomes was marked as the overall FI for this present study.The FQ was calculated by dividing the FI by the sample size.Characteristics of RCTs were split into subgroups for comparison.FI and FQ subgroups consisted of primary versus secondary outcomes, initial significance (P < 0.05 vs. P ≥ 0.05), complications, comparing outcomes FI to LTF (FI < LTF vs. FI > LTF), and year (Table 2).Overall FI and FQ were measured by adding all outcome events.FI outcomes were calculated using the two-tailed Fisher exact test.Characteristics organized within each subgroup were calculated via interquartile ranges (IQRs) for FI and FQ.The IQR measures the difference between the upper quartile (Q3, 75th percentile) and lower quartile (Q1, 75th percentile).A riskof-bias assessment was also performed (Table 3).

Results
Of the 1561 studies screened, 111 met the search criteria with 35 RCTs included in the analysis (Fig. 1).A total of 693 events with 130 significant (P < 0.05) outcomes and 563 with nonsignificant (P > 0.05) outcomes were identified.The overall FI, incorporating all 693 outcome events from the 35 RCTs was 5 (IQR 3-7).The overall FQ was 0.019 (IQR 0.011-0.043),indicating that the reversal of 13 of 100 outcomes may change the study significance of the included RCTs.Of the 9 included RCTs, 3 studies reported loss to follow-up (LTF) data greater than the overall FI of 5. Therefore, 33.3% of studies reported an LTF value that was greater than the overall FI.For the 130 outcomes that were reported as significant, the median FI of events required to change significance was 5 (IQR 2-11) (Table 2).The FQ for significant outcomes was 0.017 (IQR 0.006-0.044).For the 563 outcomes that were reported as nonsignificant, the number of events required to change significance was 5 (IQR, 4-7).The FQ for nonsignificant outcomes was 0.019 (IQR 0.011-0.043).For the outcomes where FI ≤ LTF (n = 565), the median FI was found to be 5 (IQR 3-7).For the outcomes where FI > LTF (n = 128), the median FI was found to be 5 (IQR 4-7).The associated median FQs were 0.016 (IQR 0.009-0.033)and 0.055 (IQR 0.025-0.092),respectively.Fragility subanalysis per year of publication identified an FI of 5 (IQR 4-5) from 2000 to 2007, an FI of 5 (IQR 3-7) from 2008 to 2015, and an FI of 5 (IQR 3-10) from 2016 to 2023, thus demonstrating consistent statistical fragility over the 23 years.

Discussion
Our fragility analysis exploring the cervical disc arthroplasty revealed an overall median FI of 5 (IQR 3-7) and a median FQ of 0.019 (IQR 0.011-0.043).A fragility index of 5 indicates that 5 events were needed to alter the statistical significance from significant to nonsignificant or vice versa.
In correlation with sample size, the FQ of 0.019 demonstrates that 2 out of 100 patients is necessary for the median number to reverse significance across the total outcomes.In the RCTs included in this paper, 65.7% (23) of studies did not report or reported an LTF greater or equal to an overall  literature related to statistical fragility.Evaniew et al. were the first to apply a fragility index to the spine literature and reported an overall median FI of 2 [17].In a recent study, Muthu et al. reexamined the spine literature and reported an overall median FI of 2 [18].However, these studies focus on the overall spine literature and demonstrate significant fragility while our study highlights CDA literature to be more robust compared to the previous analyses.Gupta et al. examined the use of ketamine infusion during scoliosis surgery and reported a FI of 2. Cordero et al. calculated the statistical fragility of tourniquet use in total knee arthroplasty (TKA) and obtained an FI of 4 [19].The anterior cruciate ligament reconstruction (ACLR) literature comparing bone-patellar tendon-bone grafts and hamstring tendon FI of 5. Through this finding, LTF maintenance is an important tool that can help increase the overall robustness of a given study.The low median FI and FQ depict the statistical fragility within the CDA literature as only a small number of events is necessary to alter statistical significance.The application of a fragility analysis gives us an objective measure to compare clinical robustness where a higher FI would increase the confidence in the effectiveness of the data depicted in that study.This metric simply allows us to assess the quality of the studies that guide management algorithms in evidence-based medicine and accurately interpret clinical data that guide clinical care.
In this systematic review, the overall FI of 5 reported was similar to many of the previous spine and orthopedic

Conclusion
The literature regarding CDA RCTs lacks statistical robustness and may misrepresent the conclusions with the sole use of the P value.By implementing of the FI and FQ along with the P value, we believe the interpretation and contextualization of the clinical data surrounding CDA will be better understood.
autografts revealed a FI of 5 [20].When looking at autograft vs. allograft comparison in ACLR, the authors found a FI of 6 [21].A recent fragility analysis looking at calcaneus fractures demonstrated an FI of 6 as well [22].In an examination of the distal biceps repair literature, the authors demonstrated an FI of 6.5 [23].Megafu and Megafu in examining the distal radius fracture literature reported an FI of 9, the highest FI within all orthopedic and spine literature to date [24].Other trauma literature looking at fibula fractures, distal femur fractures, and orbital fractures reported a similar FI of 5 across all studies [25][26][27].By the spine and orthopedic literature, the FI consistently demonstrates statistical fragility, thus confirming our hypothesis and continual evidence of fragile results within the spine and orthopedic literature.This review paper analyzing the spine literature regarding cervical disc arthroplasty has many strengths and limitations.An inherent strength in this review is the unbiased, comprehensive nature of having a systematic review of RCTs with the utilization of the PRISMA methodology.Also, the sole use of RCTs is a notable strength as management protocols and evidence-based medicine guidelines derive their decision-making abilities based on these studies.However, it is important to acknowledge this study's limitations.The first potential weakness is that fragility analysis is solely applicable to binary data with dichotomous endpoints.This limits FI utility for non-dichotomous studies.Studies with ordinal and continuous outcomes were filtered out to prevent overlap.Adding more study designs may have influenced the ability to further investigate CDA results.However, our team was able to comprehensively assess all possible binary outcomes by utilizing a dichotomous approach.Another limitation was that studies needed to be RCTs to be eligible for inclusion in this present review.As a result cohort studies, case-control studies, cross-sectional studies, prospective studies, and retrospective studies were excluded from analysis.Despite reduced incorporation of subject outcome events, our systematic review with RCTs provided a suitable, reliable way to assess effective CDA treatment options.
With the ever-expanding advancements in medicine, our systematic review aimed to quantify robustness in CDA literature by exploring the FI and FQ values to confirm the clinical significance of CDA.Our findings show that the majority of RCTs in cervical disc arthroscopic surgery have consistently been less robust for the last 23 years as speculated by our team.Possible implications for this loss of statistical significance may be due to solitary reliance on P value analysis, which inadvertently overlooks LTFs, impacts of varying sample size, and event outcomes [28].Integrating fragility analysis in conjunction with P value analysis can be helpful when trying to carefully measure the Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.
Fig. 1 PRISMA flow diagram of included studies

Table 1
Demonstration of reversal significance with a fragility of 1

Table 2
Overall fragility data and analysis of subgroups a

Table 3
Risk of bias assessment Domain 1: Risk of