Introduction

Disabling chronic low back pain, with or without leg pain, associated with degenerative disc disease (DDD) is the leading cause of global disability, and the lumbosacral junction is commonly affected [1, 2]. Spinal fusion is a widely used surgical treatment option for patients with DDD that do not respond to non-operative care. The surgery aims to fuse the supposedly painful segment of the spine by a bony union between the involved vertebras.

Two principally different surgical approaches exist—anterior and transforaminal lumbar interbody fusion (ALIF and TLIF, respectively) [3]. ALIF is done through the retroperitoneal space (sometimes transabdominally), while TLIF involves a traditional dorsal approach. ALIF is commonly recommended to avoid surgical trauma to the back muscles and for restoration of the physiological lumbosacral height and lordosis and is reported to lead to less adjacent segment disease [4,5,6,7,8,9,10,11]. However, recent reviews based on cohort studies have shown similar clinical outcomes after ALIF and TLIF, and one systematic review calls explicitly for prospectively multicenter register studies to assess any differences between the two surgical fusion procedures [4, 7, 9, 10].

Planning this observational study from the Norwegian Registry for Spine Surgery (NORSpine), we hypothesized that, in a real-world daily clinical practice, the clinical effectiveness of ALIF could be superior to TLIF at long-term follow-up.

Methods

We hypothesized that long-term clinical effectiveness of ALIF was superior to TLIF.

Currently, 40 Norwegian hospitals report to NORspine—a mandatory national spine register. The coverage and the one-year response rate approximate 70% and 73%, respectively [12]. A NORspine dataset consists of informed consent, patient-reported outcome measures (PROMs) and socioeconomic variables, and surgeon-reported diagnostics and surgical details. Patients report clinical status at baseline during hospital admission and clinical outcome at 3 and 12 months after surgery directly to NORspine, including any postoperative complications (at 3 months). The quality of NORspine data has been assessed and found acceptable for most variables [13]. NORspine has evolved, and questionnaires have been amended somewhat during the study period. Finally, NORspine does not include radiological images.

We screened NORspine patients who received a surgical fusion of the lumbosacral junction during eleven years (1st January 2007–31st December 2017) and included those treated by ALIF or TLIF; we only included patients with complete datasets in the final analysis of clinical outcomes. NORspine defines ALIF as fusion surgery performed by an anterior approach, with or without additional posterior fixation. We defined TLIF as transforaminal approach to the disc and posterior midline or paramedian (Wiltze’s) incisions. We did not include patients operated on with posterior lumbar interbody fusions (PLIF) and posterolateral fusions (PLFs). Different types of autologous or allogenic bone transplants, and in a few cases commercially available grafts, were used for both ALIF and TLIF.

To obtain long-term PROMs, we reached out to patients by mail and asked them to respond to the same questionnaires that they did at one-year-follow-up form. The following PROMs were collected: (1) Oswestry Disability Index (ODI), a continuous variable from 0 (no disability) to 100 (bed bound) [14, 15]. (2) Numeric Rating Scale (NRS) for back pain and leg pain; a continuous variable from 0 (no pain) to 10 for worst imaginable pain [16]. (3) Quality of life assessed by EuroQol 5 Dimensions (EQ-5D) index, ranging from − 0.6 (“worse than dead”) to 1.00 (“full health”) [17]. (4) Patient-perceived effect of treatment as assessed by Global Perceived Effect (GPE), a categorical transitional health scale from 1 to 7 (1 = “completely recovered”, 2 = “much improved”, 3 = ”somewhat improved”, 4 = ”unchanged”, 5 = “somewhat worse”, 6 = ”much worse”, 7 = ”worse than ever”) [18].

Outcomes

The primary outcome was a clinically relevant treatment effect, as defined by minimum 30% improvement in ODI at long-time follow-up [19].

Secondary outcomes were treatment success defined by GPE categories “much improved” or “completely recovered”, mean differences in ODI, and NRS back and leg pain scores at long-term follow-up. We also analysed any between-group differences in working status; complications (perioperative (surgeon-recorded at hospital stay) and postoperative (patient-recorded at 3 months); and re-operations (patient-recorded at long-term follow-up); mean length of hospital stay; and mean surgical time of the index surgery.

Statistics

We described patient characteristics for those who responded to our long-term follow-up survey versus those who did not (respondents vs. non-respondents). We evaluated differences between patients that received ALIF versus those that underwent TLIF. We performed an unadjusted comparative analysis of the aforementioned primary and secondary outcomes for the two treatment arms.

In order to account for any imbalances between the groups at baseline, we matched the groups by propensity score. The propensity score, derived from a logistic regression model, was defined as a patient’s baseline probability for receiving ALIF, conditioned on pre-specified plausible confounders (age, gender, smoking, BMI, working status, civil status, higher education, ASA classification, and previous spine surgery; preoperative symptoms as assessed by ODI, NRS back pain, NRS leg pain, EQ-5D, duration of back pain > 12 months, and radiological characteristics) [20]. We used the technique of 1:1 matching without replacement. ALIF patients were matched with TLIF patients if the difference in propensity scores was less than 0.2 of the logit of the standard deviation [21, 22].

Finally, we compared the matched groups regarding primary and secondary outcomes and performed post-hoc power analyses using the new population size after propensity score matching. The null hypothesis was that ALIF could be superior TLIF in the long term. We set the superiority margin at a 15 per cent higher proportion of patients reaching the primary outcome (at least 30% improvement in ODI). With the assumption of missing not at random, we only included patients who completed the long-term cross-sectional survey [23, 24].

Continuous variables were displayed as means and 95% CIs and categorical variables as numbers and proportions (%). We compared means using the Student t test and proportions using Z-statistics. We used SPSS version 26 (IBM Corp., Armonk, NY, USA) and Med calc (MedCalc Statistical Software version 19.2.6 (MedCalc Software bv, Ostend, Belgium; https://www.medcalc.org; 2020). We used ClinCalc.com to perform the power analysis (clincalc.org/Stats/power.aspx). The Norwegian Regional Committee for Medical and Health Research Ethics approved the study (identifier 75294). The study was conducted and presented according to the STROBE consortium [25].

Results

We identified 945 patients at baseline: 43 were dead, 29 had no valid postal address, and eight were duplicates, leaving 865 patients eligible for the study (Fig. 1). A total of 535 (62%) patients responded to the cross-sectional long-term questionnaire at a median of 8 years after index surgery. Among long-term respondents, 159 (30%) received ALIF and 376 (70%) TLIF. After propensity score matching, 120 patients remained in each treatment group. The mean follow-up time (95%CI) was 92.3 (87.6–97.0) months in the matched cohorts.

Fig. 1
figure 1

Flowchart showing number of eligible patients, responders and non-responders, and matched cohorts

Table 1 displays baseline characteristics and indications for surgery for the patients who responded to the long-term follow-up and the non-respondents. The responders of this study had a mean age (95%CI) of 50 (49–51) years, 264 (49%) were females, and the mean (95%CI) BMI was 27.1 (26.8–27.5). At baseline, the mean (95%CI) ODI score was 40.4 (39.1–41.5), and 174 (33%) had previous spine surgery. Fewer respondents lived alone, smoked and reported any previous spine surgery than non-responders. Respondents were more often highly educated.

Table 1 Patient characteristics and indications for 865 patients that underwent anterior (ALIF) or transforaminal (TLIF) fusion of the lumbosacral junction and who were available for 8-year follow-up

Unmatched cohorts

Baseline data and indications for surgery for patients that received ALIF compared to TLIF are displayed in Table 2. ALIF patients were more often living alone (RR = 0.66 (0.43–1.00); p = 0.050) and had a lower BMI (MD = 1.1 (0.3–1.8); p = 0.003) compared to TLIF patients. ALIF patients reported a lower preoperative ODI (MD = 4.2 (1.7–6.8); p < 0.001), NRS leg pain (MD = 1.15 (0.67–1.63); p < 0.001), and NRS back pain (MD = 0.59 (0.24–0.93); p < 0.001) than TLIF patients. Furthermore, fewer ALIF patients were operated due to disc herniation, lateral spinal stenosis, and degenerative spondylolisthesis (RRs 0.08–0.38) than those who received TLIF.

Table 2 Patient characteristics and surgical indications for patients who received anterior (ALIF) versus transforaminal (TLIF) lumbosacral fusion

Table 3 shows the clinical outcomes for the unmatched cohort. We found no differences in the primary outcome or any other secondary clinical outcomes. Compared to TLIF, ALIF patients had a shorter duration of surgery (MD = 47 min (37–56); p < 0.001) and shorter hospital stays (MD = 0.9 days (0.3–1.4); p < 0.001). There were no differences in rates of perioperative (surgeon-recorded) or postoperative (patient-reported at 3 months) complications or re-operations (patient-reported at long-term follow-up).

Table 3 Unadjusted results for patients treated with lumbosacral fusion by ALIF versus TLIF

Propensity score–matched cohort

Table 2 and Fig. 2 demonstrate that propensity score matching created similar groups concerning the observed baseline parameter distribution.

Fig. 2
figure 2

Kernel density plot, displaying distribution of propensity score in the two cohorts before and after matching

Results for matched cohorts are displayed in Table 4. The proportions of patients with an improvement in the ODI of at least 30% were 68/120 (57.6%) in the ALIF group versus 77/120 (64.7%) in the TLIF group; the difference was not statistically significant (RR (95%CI) = 0.88 (0.72 to 1.08); p = 0.237). Similarly, we found no statistically significant differences in the secondary clinical outcomes between the two treatments. However, ALIF patients had shorter operation time (109 (102–116) min vs. 150 (141–158) min; MD = 40 min (29–51); p < 0.001). The between-group difference in length of stay was no longer significant after matching (MD = 0.6 days ( − 0.2–1.4); p = 0.077).

Table 4 Propensity score-matched results for patients treated with lumbosacral fusion by ALIF vs TLIF

We found no significant differences in the numbers of complications, neither perioperative (surgeon-recorded) nor postoperative (patient-recorded at 3 months) nor re-operation rates (patient-reported at long-term follow-up).

Most (117 of 120 (97.5%)) ALIFs were stand-alone variants, i.e. without additional posterior fixation. We performed an exploratory analysis of 84 patients with isthmic spondylolisthesis who received stand-alone ALIF (n = 38) versus those who underwent TLIF (n = 46) and found no significant between-group difference in number of patients who reported at least 30% ODI improvement (Isthmic stand-alone ALIF 24 of 37(65%) vs. Isthmic TLIF 36 of 46 (78%), RR (95% CI) = 0.90 (0.60–1.33); p = 0.588).

We performed a post hoc power analysis and found that our study achieved a statistical power of 89% (unmatched cohorts) and 65% (propensity score-matched cohorts) to detect a 15 percentage points difference in proportions of patients reaching at least 30% ODI improvement after fusion surgery.

Discussion

This national register-based study compared long-term results after ALIF versus TLIF of the lumbosacral junction. We used prospectively collected register data, supplemented by a long-term cross-sectional survey, and matched groups at baseline by propensity score. At an extended follow-up of 8 years after surgery, we found no differences in the proportions of patients with a clinically relevant improvement assessed by the ODI (primary outcome). Except for shorter operation time, secondary outcomes did not indicate support that ALIF could be superior to TLIF.

To our knowledge, no other studies have compared ALIF versus TLIF with a comparable long-term follow-up. Previously published meta-analyses included studies with short- or middle-term follow-ups (10–71 months) and found no differences in clinical outcomes comparing ALIF versus TLIF, supporting our findings [4, 9].

Loss of sagittal balance and lumbar lordosis have been associated with back pain and disability and adjacent segment disease (ASD) [26,27,28,29,30,31]. One study hypothesized that ALIF could be superior to posterior fusion in restoring a more physiological lumbar lordosis which in turn could yield better patient-reported outcomes [11]. Additionally, superior fusion rates after ALIF versus TLIF have been suggested because the anterior approach provides a wider access to debride the disc space and allows larger implants and bone transplant volumes than posterior techniques [4,5,6,7,8]. However, a recent systematic review found similar fusion rates comparing ALIF versus TLIF [32]. Although surgeons report preoperative radiological findings, NORspine does not store any radiological images. More importantly, fusion rates and adjacent segment disease should optimally be assessed in the long term, whereas the standard follow-up time in NORspine is one year after surgery. Most comparisons of ALIF versus TLIF are short-term studies and might not detect clinically relevant long-term differences in fusion rates or ASD [8, 10, 33,34,35,36]. Finally, results from our study suggest that previously published differences in radiological lumbar lordosis after ALIF versus TLIF may not be relevant for long-term patient-reported outcomes.

In line with previously published data, we found a shorter surgical time for ALIF than for TLIF [37]. However, one study reported longer surgical time for ALIF in patients who required additional pedicle screws (360-degree fusion procedure). Our finding of shorter surgical time for ALIF versus TLIF could rely on the finding that almost all ALIFs were stand-alone procedures without additional pedicle screws. Still, surgical time is relevant because the cumulative surgical trauma, risk of hypothermia, infection, and postoperative complications partly rely on the duration of surgery. Additionally, the surgical time has an economic aspect.

Different indications for surgery could also influence patient-reported outcomes, irrespective of the fusion procedure. A recent NORspine study showed superior results reported by patients who received a fusion due to isthmic spondylolisthesis [38]. In the present study, we found no difference in the proportions of patients with isthmic spondylolisthesis that reported a successful effect of lumbosacral fusion by ALIF stand-alone versus TLIF. However, this subgroup sample was too small to be statistically robust.

Finally, in the unadjusted results, the length of stay was about one day shorter for ALIF patients. However, the difference was not statistically significant (p = 0.077) after matching. Previously published data do not support this finding: Phan et al. found longer lengths of stay for ALIF patients in the meta-analysis [4]. Our data are based on operations done between 2007 and 2017; less invasive surgery could have developed during the study period, i.e. unilateral pedicle screws in TLIF procedures have become frequent [39]. Length of stay has an impact on patients' return to everyday living and on health economics.

Although a trend against less perioperative and postoperative complications in ALIF patients, the difference was not statistically significant. Re-operation rates were almost similar. These findings align well with previously published data: Teng et al. reported no differences in complications between the two treatments [9]. The complication profiles are, however, reported to be different between the two techniques [4, 39,40,41,42,43]. Also, the recording of perioperative complications in spine registers may be underestimated [13, 43, 44]. Re-operations were reported by patients themselves at the long-term follow-up and could be inaccurately recorded and subject to recall bias.

Limitations

Most importantly, this study was based on register data and had an observational design. There was no randomization between the two treatments; however, we did balance the groups by propensity score matching. Still, unobserved variables not included in the propensity score could introduce some allocation bias. In addition to the differences between the comparative treatments, the two cohorts might differ regarding additional procedures and indication. ALIF is a pure fusion procedure, while TLIF necessitates nerve root decompression in addition to the fusion procedure.

The loss to follow-up at 8-year follow-up was 38% and could have introduced attrition bias; however, the proportion of patients lost was within the acceptable limit of 20–40% for registers and comparable to other spine register studies [23, 24, 45,46,47]. This study used register data to include a large patient pool; still, post-hoc power analysis showed only 65% power to detect a relevant between-group difference for the primary outcome in the propensity score-matched cohorts. Additionally, we believe this cohort is too small to assess rare events such as complications and re-operation. Increasing the study population would only be possible by decreasing the follow-up time significantly, as we would have to include more recently operated patients.

NORspine does not include postoperative radiological parameters, and neither fusion rates nor lordosis could be assessed radiologically. Finally, and crucial to recognize, the results of our study apply to a heterogeneous population of patients with a variety of degenerative lumbosacral spine conditions who underwent variants of ALIF and TLIF of the L5-S1 segment.

Conclusions

In this propensity score-matched prospective national spine register study, the type of procedure—anterior (ALIF) or transforaminal (TLIF) lumbar interbody fusion of the lumbosacral junction—was not associated with the long-term outcomes reported by the patients. The ALIF procedure was associated with somewhat shorter surgical time than TLIF.