Introduction

Primary liver cancer (PLC) is the sixth most common cancer worldwide and the third most common cause of cancer death, with hepatocellular carcinoma (HCC) accounting for about 85% of all PLC [1]. Despite the rapid advancements in medical technology, radical surgical resection of the primary tumor remains the crucial component in the treatment of HCC. Robotic liver resection (RLR) is a relatively novel technique and integrates the benefits of traditional surgery with the precision and flexibility of robotic surgical systems, the question of whether RLR outperforms traditional surgery in clinical practice remains ongoing.

Recent high-quality retrospective and prospective studies have corroborated the safety and efficacy of RLR in both short- and long-term outcomes, demonstrating its superiority over open liver resection (OLR) for liver cancer patients [2, 3]. Regarding the comparison of RLR and laparoscopic liver resection (LLR), the existing high-quality retrospective studies predominantly focus on short-term prognosis, with a severe lack of long-term outcomes concerning tumors [4, 5].

Our ultimate aim was to evaluate the potential impact of RLR on the long-term outcome of HCC patients in a single high-volume center in China, and concurrently evaluated the learning curve of this technique for HCC treatment. We present the following article in accordance with the STROBE reporting checklist.

Patients and methods

Patients

This cohort study entailed a retrospective analysis of databases meticulously maintained by the largest specialized liver disease medical center in China, The Fifth Medical Center, General Hospital of PLA. The study focused on patients who underwent RLR for HCC (RLR group) between July 1, 2016 and July 1, 2021. This cohort was compared with a control group of patients who underwent LLR for HCC (LLR group) during the same time frame. Prior to surgery, all patients provided informed consent, authorizing the anonymized collection of data and audiovisual recording of the surgical procedure. This retrospective cohort study was received approval from the institutional review board and ethics committee of The Fifth Medical Center, General Hospital of PLA (protocol KY-2022-12-76-1). This study was registered was registered as a retrospective cohort study at Research Registry (UIN: researchregistry9622, available at https://www.researchregistry.com) in compliance with the World Medical Association’s Declaration of Helsinki, 2013.

Definitions

The diagnosis of HCC was established using computed tomography or magnetic resonance imaging, in accordance with internationally approved radiological standards [6, 7]. The study excluded cases meeting the following criteria: (1) heterogeneous invasive lesions upon final pathology (e.g., adenosquamous carcinoma, etc.); (2) secondary carcinoma of the liver; (3) history of previous liver resection; (4) patients who required concomitant procedures, such as lymph-node dissection around the porta hepatis, cryoablation, radiofrequency ablation, or biliary tract exploration were not included in this study.

All robotic surgeries were performed using the da Vinci Si Surgical System (Intuitive Surgical, Sunnyvale, CA, USA). The surgical approaches of RLR and LLR were conducted using the procedures as previously described [8, 9].

The baseline characteristics examined in both study groups encompassed age, sex, body mass index (BMI), Child–Pugh–Turcotte (CPT) status, age-adjusted Charlson Comorbidity Index (aCCI) [10], American Society of Anesthesiologists (ASA) physical status score [11], IWATE criteria difficulty [12], Eastern Cooperative Oncology Group performance status (ECOG PS) score, albumin–bilirubin (ALBI) score [13], hemoglobin (HGB), platelet count (PLT), carbohydrate antigen 19-9 (CA19-9), carbohydrate antigen 125 (CA125), carcinoembryonic antigen (CEA), and alpha fetoprotein (AFP). The relationship between surgical approach and study outcomes, such as OS and RFS, was investigated using directed acyclic graphs (DAGs) to identify potential confounders (Appendix Fig. 3).

Perioperative outcomes comprised estimated blood loss (EBL), requirement for packed red blood cell (pRBC) transfusion, conversion rate to open laparotomy, operative time, duration of stay in the intensive care unit (ICU), and the length of postoperative hospital stay.

Short-term postoperative outcomes were meticulously documented, encompassing common complications (such as hepatic failure, fever, abdominal effusion, pleural effusion, and abdominal infection), along with 30-day morbidity, 30-day reoperation rate, and 90-day mortality were recorded in detail. Moreover, the severity of complications was classified using the Clavien–Dindo classification (CDC) [14].

We adhered to a standardized surveillance protocol, conducting postoperative follow-up at 30 days after the operation and subsequent patient assessments at 3-month intervals. Tumor markers, including CA 19-9, CA 125, and carcinoembryonic antigen (CEA), were assessed every 3 months, alongside a chest–abdomen–pelvis computed tomography (CT-CAP) performed at the same intervals. Comprehensive details pertaining to tumor recurrence and progression were collected using both the institutional database and patient information obtained from collaborating local hospitals.

The long-term oncological outcomes, including patients’ overall survival (OS) and recurrence-free survival (RFS), were the primary endpoints of this study. OS was defined as the duration from diagnosis to patient death or the last date of follow-up (cut-off date: August 1, 2023), while RFS was defined as the period from surgical intervention to the known date of disease recurrence or the last follow-up date (cut-off date: August 1, 2023).

Statistical analysis

Continuous data that follow a normal distribution are reported in terms of the mean (SD), while continuous data that do not follow a normal distribution are presented using the median (IQR). Categorical data are reported as counts and percentages. Comparisons between the OLR and RLR groups were performed using the Student’s t test or Mann–Whitney U test for continuous variables, Chi-squared test or Fisher exact test for categorical variables, and Cochran–Armitage test for trend for ordinal variables. Balance in covariates between treatment groups was also evaluated by the standardized mean difference (SMD). An SMD of less than 0.1 was deemed to be the ideal balance.

Randomization is sometimes difficult to achieve in observational studies. In this study, the cost of robotic surgery exceeded that of laparoscopic surgery, falling outside the coverage of basic medical insurance. Consequently, there was potential bias in the patient’s choice of surgical approach. Moreover, non-randomized grouping caused an imbalance between groups in terms of baseline features. In this context, propensity score matching (PSM) effectively reduces the confounding bias and obtains effects similar to those of randomized controlled studies. To achieve the balance between groups while making full use of the collected data, we employed a one-to-three nearest-neighbor matching algorithm with a caliper of 0.1. During the matching process, R software selects three patients with the closest propensity scores in the LLR group to match with one patient in the RLR group. However, due to the smaller size of the RLR group, only some patients in this group could find all three matched patients, resulting in an imperfect 1:3 ratio. Although the proportions are not strictly accurate, the matching model remains valid provided that it is ensured that the two matched groups are as consistent as possible on critical confounding variables, thereby conferring high internal validity to the results. Standardized mean differences (SMD) were estimated before and after matching to evaluate the balance of covariates, and the value of SMD less than 0.1 was considered relatively balanced enough. The propensity score was estimated using a multivariable logistic regression, with type of surgery as the dependent variable and age, sex, BMI, CPT status, aCCI, ASA physical status score, IWATE criteria difficulty, ECOG PS score, ALBI score, HGB, platelet count, CA19-9, CA125, CEA, and AFP as covariates.

Survival analysis of OS and RFS was performed by log-rank test and plotted by the Kaplan–Meier curve. Factors found to be significant in the univariate analysis and potential confounders were entered into the multivariate Cox regression analysis. Hazard ratio (HR) and its confidence interval (CI) were also calculated using Cox proportional hazard analysis. The PH assumptions of Cox proportional risk regression analysis were evaluated by Schoenfeld residual method.

CUSUM analysis is a statistical technique applied to surgical procedures for the quantitative estimation of the learning curve [15, 16]. The standard CUSUM analysis shows the cumulative differences between the observed data and the target value. To perform multidimensional CUSUM analysis, we designated operative time, estimated blood loss, postoperative complication, and postoperative hospital length of stay as the assessment indicators of surgical competence. These four assessment indicators were, respectively, set as quantized value \(\delta _1\), \(\delta _2\), \(\delta _3\), and \(\delta _4\) for each case. The quantized value of assessment indicator was defined as \(\delta\) = \(X_n - X_0\), where \(X_n\) was an individual attempt following the nth procedure, with \(X_n\) = 1 if a failure occurred and \(X_n\) = 0, if it did not. \(X_0\) is the established risk or failure rate of the control to which the ongoing attempts are compared. \(X_0\) can be calculated either as an overall frequency if this is known in this case, or on a case-by-case basis as with paired control trials. In this study, the mean or median of RLR group data after PSM is taken as the target value, and the proportion of each index reaching the target value is calculated to obtain the overall frequency. The \(X_0\) for these four assessment indicators were, respectively, 0.390, 0.537, 0.439, and 0.512. Therefore, the quantized value of surgical competence for each case was defined as S=\(\delta _1\)+\(\delta _2\)+\(\delta _3\)+\(\delta _4\). After each case, scores were sequentially added and then plotted graphically by the equation: \({\text {CUSUM}} = \sum\) \(S_i\). It was based on CUSUM graph that fit a restricted cubic spline (RCS) curve was used to depict the learning curve. A positive slope signified that the desired target remained unattained, while a negative slope indicated that it had been surpassed. The pivot point where the slope transitioned from positive to negative served as a reflection of the surgical procedure’s proficiency.

A 2-sided p < 0.05 was considered significant in all the analysis. All statistical analyses were performed using R software (version.4.3.1).

Results

Clinical characteristics of patients

Over a 5-year period, a total of 529 patients were included in the analysis, consisting of 107 individuals in the RLR group (85 men, 22 women) and 422 in the OLR group (348 men, 74 women). The baseline demographic and clinical characteristics for this cohort are detailed in Table 1. In the initial analysis, it was observed that patients who underwent RLR exhibited a significantly lower ASA score (p = 0.03). Additionally, there was a notable difference in the IWATE criteria difficulty between the RLR and OLR groups prior to PSM (p = 0.04). Patients in the RLR group also presented with a significantly lower ECOG PS score compared to the LLR group (p = 0.01). However, the CPT status of the RLR group was significantly higher than that of the LLR group (p = 0.01).

Table 1 Demographic and clinical characteristics of the study population before and after PSM

PSM is a statistical method employed to address causal inference problems. It achieves this by seeking observations from the experimental and control groups with similar propensity scores, thereby reducing the impact of potential confounding variables on causal estimation. After PSM involving 341 patients across all groups, no significant differences were observed between the two groups in terms of both baseline and pathologic characteristics. The SMD for all matched covariates was < 0.1 after PSM, signifying that weighted patient cohorts in both groups were comparable.

Regarding the oncological characteristics of the resected specimen, no significant differences were found between the RLR and LLR groups (Table2).

Table 2 Clinicopathological characteristics oncologic features before and after PSM

Perioperative outcomes and short-term postoperative outcomes

The perioperative outcomes for both groups are detailed in Table 3. After PSM, the RLR group showed a higher operative time (median [IQR], 210 [152.0–298.0] mins vs 183.5 [132.3–263.5] mins, p = 0.04) compared to the LLR group.

Table 3 Perioperative outcomes and short-term postoperative outcomes before and after PSM

In the postoperative phase, the RLR group exhibited a higher incidence of fever compared to the LLR group (p < 0.001). No significant differences were observed in other complications between the two groups. Patients who underwent RLR showed no significant differences in the length of postoperative hospital stay compared to those who underwent LLR (median [IQR], 8 [7–9] vs 8 [7–10] day; p = 0.18). Furthermore, the overall rates of 30-day readmission, 30-day reoperation, and 90-day mortality were similar between the groups.

Additionally, the health economics findings reveal that the surgical costs in the RLR group were significantly higher compared to the LLR group (p < 0.001). The cost of anesthesia was also notably elevated in the RLR group in contrast to the LLR group (p = 0.02).

To explore the efficacy of the RLR group and the LLR group in complex hepatectomy, we compared the primary perioperative and short-term outcomes associated with liver segments I, VII, and VIII. A combined analysis of these three segments was also performed (detailed in Appendix Tables 5, 6, 7, 8). Our statistical results showed that the length of postoperative hospital stays for segment I resection was significantly shorter in the RLR group than in the LLR group (p = 0.07). No significant differences were observed in other perioperative and short-term outcomes.

Long-term oncologic outcomes

At a median follow-up duration of 49 months (IQR, 28–67 months), the median OS and RFS times were 76 and 57 months, respectively, for the entire cohort. Following PSM, no significant difference in OS was observed between the RLR and LLR groups among patients with HCC (p = 0.43). However, patients in the RLR group exhibited a longer median RFS duration compared to the LLR group (median of 65 months vs. 56 months, p = 0.006) (Fig. 1). The estimated 5-year OS for RLR and LLR were 74.8% (95% CI: 65.4–85.6%) and 80.7% (95% CI: 74.0–88.1%), respectively. The estimated 5-year RFS for RLR and LLR were 58.6% (95% CI: 48.6–70.6%) and 38.3% (95% CI: 26.4–55.9%), respectively.

Fig. 1
figure 1

A Kaplan–Meier curve of overall survival (OS) after PSM (log-rank test, p = 0.43); B Kaplan–Meier curve of recurrence-free survival (RFS) after PSM (log-rank test, p = 0.006)

Given the observed significant difference in RFS between patients undergoing RLR and those in the LLR group, a Cox proportional hazard analysis was conducted on the entire cohort. This aimed to further affirm the impact of the surgical approach on RFS and explore significant covariates. In the multivariate Cox regression analysis for recurrence outcome, RLR (HR: 0.586, 95% CI (0.393–0.874), p = 0.008) emerged as an independent predictor associated with decreased recurrence rates and improved RFS (Table 4). Additionally, an ECOG score \(\ge\) 1, CPT status \(\ge\) 1, elevated CEA levels, and reduced HGB levels were identified as significant independent predictors associated with poorer RFS.

Table 4 Risk factors of recurrence outcome: univariate and multivariate COX analyses

The long-term results of liver segments I, VII, and VIII and the combined analysis of the three sites were also compared, and the results showed no statistically significant difference in either RFS or OS (outlined in Appendix Table 5, 6, 7, 8).

Learning curve analysis

At our center, the initial RLR case was conducted in July 2016. To reduce bias, this evaluation used data after PSM analysis. The data were collected from the entire medical center, a total of 4 liver surgery departments, mainly by eight surgeons. All the surgeons who performed the minimally invasive surgery had a master’s degree or above in surgery and had an average of more than 5 years of experience in minimally invasive surgery. They already had extensive experience in laparoscopic and open surgery before robotic surgery was introduced into the medical center. To mitigate biases, data post-PSM analysis were utilized in this assessment. To maintain consistency, 56 patients who underwent RLR performed by a surgeon other than Dr. Zhang, the most frequent practitioner of RLR surgeries during the study period, were excluded from this evaluation. Consequently, a total of 41 patients in the RLR group were considered for this analysis. The CUSUM curve for RLR is depicted in Fig. 2. It is worth noting that the transition from positive to negative slope in the CUSUM curve occurs after the 11th case, indicating that the proficiency period of the learning curve is achieved after approximately 11 RLR procedures.

Fig. 2
figure 2

The learning curve of robotic liver resection

Discussion

To our knowledge, this study represents the largest single-center, retrospective, observational cohort study in China at the time of study registration, investigating the long-term prognostic outcomes, short-term outcomes, and perioperative outcomes of consecutive patients with HCC treated with either RLR or LLR. Our study shows that OS was comparable between RLR and LLR, and RFS was improved in the RLR group after a PSM analysis based on clinical, oncologic, and technical criteria. The similar OS between the two groups may be attributed to cancer staging and the effectiveness of treatment after relapse. Most patients in our study had early stage HCC and were still in middle age with good overall physical condition, who inherently have better prognoses and longer OS. This could mask the potential survival benefits of the RLR approach in more advanced stages. In addition, in most situations, both groups received similar, aggressive, and effective treatment when they relapsed, leading to undifferentiated survival outcomes.

The higher RFS rate in the RLR group could be attributed to several key factors. A previous European multicenter comparative study by Lim et al. analyzed data from patients who underwent a multicenter comparative study for HCC [17]. The 3-year RFS rates for 3D-laparoscopic and robotic surgeries were 24% and 48% (p = 0.18), respectively. Although there was no statistical difference, the RFS rate in the RLR group was twice that in the LLR group, which is evidence that cannot be ignored. There are also single-center studies showing that the 3-year RFS rate was 50% in the LLR group and 64% in the RLR group (p = 0.30), suggesting a 14% higher recurrence-free survival rate in the RLR group compared to the LLR group [18]. Considering that these studies began and concluded approximately 5 years earlier than our own, and given the faster pace of technological iteration for RLR compared to LLR, there has been a substantial increase in the number of surgical procedures each year, as well as an expansion of the indications for robotic liver surgery [19, 20]. Various clinical advantages of RLR have been continuously reported, and its actual clinical efficacy is increasing [19, 20]. This may partly explain why the RLR group exhibits a superior recurrence-free survival rate compared to the LLR group. The rapid development of robotic surgical systems aims to reduce human error and provide feedback during the execution of standardized surgical procedures [21]. Due to the unique features of the robotic surgical system, such as motion scaling and enhanced three-dimensional vision [2, 22], providing superior visualization and high flexibility during surgery, allowing a more thorough exploration of the area around the tumor bed, facilitating finer cuts and suturing, which reduces the likelihood of tiny tumor residuals and thus the risk of recurrence. Our research shows that the rate of positive surgical margins in the RLR group was 1.0 % whereas in the LLR group, it was 4.1\(\%\). The RLR group’s rate was approximately one-quarter that of the LLR group, suggesting that RLR may facilitate a higher rate of complete tumor resection (R0) and thereby mitigate tumor recurrence. In addition, the robot platform has a unique tremor filtering function [2, 22], which can reduce the involuntary tremor of the hand, improve the stability of the operation, reduce errors during the operation, and make it easier to cope with various situations during the operation. This fact is further emphasized in our study by showing a conversion rate of 2.1\(\%\) in the RLR group and 7.4\(\%\) in the LLR group during surgery. Conversion to open surgery often leads to increased trauma, prolonged postoperative recovery, and increased risk of complications, which is more likely to lead to an increased probability of tumor recurrence. Our results of surgical complications showed that the incidence of Ascites was 20.6\(\%\) in the RLR group and 25.4\(\%\) in the LLR group. The incidence of Pleural effusion was 28.9\(\%\) in the RLR group and 34.0\(\%\) in the LLR group. Intra-abdominal infection in LLR was more than twice that in RLR (8.6\(\%\) vs 4.1\(\%\)). Complications, such as pleural effusion, ascites, or postoperative infection, may weaken the patient’s immune system, making the patient more susceptible to mutation and invasion of tumor cells. In addition, surgical complications can prevent patients from completing a series of postoperative treatments, such as interventional embolization or radiotherapy, which help to kill residual tumor cells and reduce the risk of recurrence. Compared with LLR, RLR can reduce the incidence of surgical complications and the risk of recurrence to a certain extent. Although RLR only improved RFS compared with LLR, relatively longer RFS means a healthier physical and mental state and a higher quality of life for patients. From a different perspective, RLR is only about two decades old compared to the decades-long record of LLR [23, 24] and has already achieved similar results in various aspects. The enhanced potential of LLR may still exist. However, it is limited, and robot-assisted surgery is entering a phase of rapid development with continuous optimization of the surgical learning curve [16, 25]. The prospects of RLR are undoubtedly excellent, and its expected long-term oncological benefits are destined to exceed those of conventional surgical approaches.

High-quality multicenter studies have indicated that RLR can lead to reduce blood loss, fewer conversions to open surgery, decreased postoperative morbidity, and shortened hospital stays compared with LLR [4, 5, 26,27,28,29]. These findings support the argument for RLR as the surgical method of choice, especially in improving surgical safety and facilitating rapid recovery. However, it should be noted that other studies have reached different conclusions, stating that RLR may increase blood loss, increase the need for blood transfusion, increase major postoperative morbidity, and increase mortality within 30 and 90 days compared with LLR [30]. In our analysis of the entire patient cohort, we did not observe significant differences between RLR and LLR groups in the primary perioperative outcomes, such as blood loss, transfusion rate, and length of hospital stay, despite theoretical differences in surgical approach between RLR and LLR. In addition, readmission rates within 30 days, reoperation rates within 30 days, and mortality rates within 90 days were similar between RLR and LLR groups. Considering that the main cause of liver cancer in China is HBV infection, this is quite different from Western countries. In addition, the health status of patients, the experience of surgeons, and the local medical resources are also very different, so we are more inclined to the view that robotic hepatectomy is not inferior to laparoscopic hepatectomy. We believe that both approaches can be safely used in the treatment of hepatocellular carcinoma with good results. This is also consistent with Peng Zhu’s report (3). In addition, our results showed that the operation time was longer in the RLR group than in the LLR group, which is also consistent with the previous studies [31, 32], and we believe that this is intricately linked to the learning curve. This study commenced with the initial RLR case for HCC in our center, intentionally not starting post the proficiency period of the learn. We aimed to genuinely demonstrate the results of a new surgical technique from its introduction to gradual application compared to the established traditional technique. While, as LLR in our center began many years earlier than RLR, the LLR procedures are considerably more established, leading to the actual results being in the surgical proficiency phase, whereas some actual results of RLR do not reach the expected surgical maturity phase. Consequently, we speculate that RLR in the surgical proficiency stage may outperform LLR in multiple aspects in future studies.

To map the learning curve of individual surgeons in this study, we specifically employed multidimensional CUSUM analysis. Our institution holds considerable experience in hepatectomy, having performed numerous procedures, particularly utilizing RLR since the introduction of the da Vinci Surgical System in 2016. This vast experience positions our institution well for this comparative analysis. We utilized operative time, estimated blood loss, postoperative complications, and postoperative hospital length of stay as evaluation indicators of surgical competency. The fitted curve indicated that the proficiency period began after the 11th case, marking the proficiency stage of the learning curve. A prior systematic review reported a decline in the number of cases needed to achieve surgical proficiency from 48.3 in 1995 to 23.8 in 2015 [33]. Additionally, the international consensus guidelines on robotic liver resection in 2023 suggest that an experienced surgeon typically requires around 25 consecutive cases to surpass the learning curve for major RLR and 15 cases for minor RLR, further emphasizing the influence of LLR experience on the RLR learning curve [34]. Due to variations in assessment criteria for evaluating surgeons’ proficiency across different studies and individual differences such as the actual number of surgeries and individual learning capabilities of our doctors, we consider the results of this study acceptable. However, because this learning curve is individual-specific, its general applicability may be limited. Nonetheless, it can serve as a valuable reference, particularly in the epidemiological context of liver cancer, such as in China.

As previously mentioned, RLR demonstrates a long-term oncological advantage compared to laparoscopic liver resection LLR. These findings align with our expectation, because we believe that the robotic surgical system offers superior technical precision in operation and resection, potentially reducing the tendency for tumor recurrence. Moreover, the potential interference induced by economic factors should not be overlooked. As previously mentioned, the economic circumstances of patients in the LLR group might be inferior to those in the RLR group. Consequently, postoperative aspects, such as diet recovery, medication adherence, and environmental support, may be more favorable in the RLR group, potentially contributing to a longer RFS compared to the LLR group.

This study encompasses several limitations. Primarily, its retrospective nature serves as a significant constraint and may be associated with information and selection biases. Although PSM analysis was employed to mitigate selection bias, residual selection bias due to unmeasured or unknown confounders remains inevitable in the absence of randomization. Moreover, as a single-center study, it may not capture potential variations in patient management across various high-volume institutions, as differences in surgical experience and perioperative protocols are minimized.

To address these challenges, future endeavors should focus on conducting multinational, multicenter cohort studies to ensure sample diversity, minimize unknown selection biases, and guarantee external validity. Additionally, single-target value methodologies may also be considered for application in future research.

Conclusions

The findings of this study confirm the feasibility and safety of RLR as a surgical treatment for HCC. Patients who underwent RLR for HCC exhibited improved RFS, while the OS was found to be comparable to that of LLR. This research highlights the oncological viability of RLR in treating HCC and lays the groundwork for future randomized controlled trials.