Introduction

Acute myeloid leukemia (AML) is an aggressive hematological disease that mainly affects elderly patients (median age of 68 years at AML diagnosis) [1]. In patients who are not candidates for intensive chemotherapy (due to very advanced age or associated comorbidities), the prognosis is very poor, with a median survival between 8–10 months using hypomethylating agents (HMAs) alone [2, 3, 4], 4–5 months using low-dose cytarabine (LDAC) based regimens [5], and 1–2 months with best supportive care (BSC) only [6]. Current standard for “unfit” AML patients consists of the combination of venetoclax (VEN), a selective inhibitor of the anti-apoptotic protein BCL-2, and HMAs (such as azacitidine [AZA] or decitabine [DEC]). In the phase 3 randomized VIALE-A trial, the VEN + AZA combination (28 days VEN plus 7 days AZA) showed better initial complete remission (CR) and complete remission with incomplete recovery (CRi) [i.e., composite CR (CRc)] compared to AZA monotherapy (66.4% vs 28.3%) [7]. Despite higher VEN associated toxicity due to myelosuppression, the VIALE-A showed significantly prolonged overall survival (OS) in the VEN-AZA arm (14.1 months vs. 9.6 months, p < 0.001). The OS benefit in the VEN-AZA arm was driven by increased CRc rate and a prolonged duration of remission (DoR) (median 17.8 months vs. 13.9 months), and higher undetectable measurable residual disease (MRD) rate (23.4% vs 7.8%). Thus, this regimen has been adopted as the new standard of care for unfit AML patients. It should be noted that the randomized VIALE-C trial, exploring the combination of LDAC plus VEN, showed increased CRc rates compared to LDAC monotherapy but failed to prove an OS benefit (7.2 months vs 4.1 months, p = 0.11) perhaps due to differences in inclusion criteria compared to VIALE-A, such as prior HMA exposure [8].

Although VEN-based regimens represent a step forward for the treatment of unfit AML patients, their management remains challenging in routine practice due to several factors: 1) the greater toxicity of the VEN combinations compared to HMA monotherapy in frail patients; 2) the strict protocol management imposed in a clinical trial is not always offered in the daily practice; and 3) the rigorous selection criteria of the VIALE-A trial cohort could lead to a difference in the patient profile treated with VEN-AZA in the real world, which may result in treatment toxicities and safety issues that are not as frequent in the trial. All these factors could influence the extrapolation of the VIALE trials results to a real-world hard-to-treat population. Furthermore, as VEN-AZA has become the control arm for randomized phase 3 trials in these patients, it is necessary to investigate how the new therapeutic standard for unfit AML performs in non-selected populations.

The aim of this systematic review is to analyze and summarize the growing existing evidence regarding the effectiveness of the VEN-based combinations in unfit adult patients with newly diagnosed AML in the real-world setting.

Material and methods

Search methodology

Following the PRISMA guidelines, two independent reviewers (PM and ASA) conducted the systematic search. The following databases were searched without restrictions: EMBASE, PubMed, the Database of Abstracts of Reviews of Effects (DARE), the Cochrane Central Register and the Web of Science. In addition, the references of relevant studies and reviews were hand-searched. Available conference abstracts from the European Hematology Association (EHA), and the American Society of Hematology (ASH) were also reviewed (at HemaSphere and Blood supplements, respectively). Similar keywords were used in distinct databases: venetoclax, and “AML” or “Acute Myeloid Leukemia”. The literature search was updated on March 1st, 2024. Both authors conducted study selection independently. In case of disagreement, a third reviewer (RRV) decided. We screened the title and abstracts to exclude duplicate articles and read the full text and/or abstract of the remaining articles to assess their eligibility according to the selection criteria listed below.

Selection criteria

Only articles or abstracts written in English were considered. We screened all prospective or retrospective observational real-world studies including unselected newly diagnosed unfit AML patients receiving frontline with doublets of VEN plus non intensive chemotherapy (i.e., VEN combined with HMAs or LDAC). Despite the inclusion criteria in the VIALE-A trial being based on the modified Ferrara criteria, in our review, the criteria to define intensive chemotherapy ineligibility (unfit) were based on the investigator's or treating physician's criteria in each article. No upper or lower age limit was applied given that the authors stated that patients were considered unfit for intensive chemotherapy. We included patients treated with VEN-AZA, VEN-DEC or VEN-LDAC, and we presented disaggregated results by type of doublet when possible. When the type of HMA was not described or results by HMA were not reported, we classified treatment as VEN-HMA.

We also included real-world studies mixing unfit AML and relapsed or refractory (R/R) AML treated with VEN or front-line fit AML patients treated with intensive chemotherapy plus VEN, given that main outcomes of unfit AML patients were segregated (in that case the results of fit for intensive chemotherapy or R/R cohorts were dismissed).

Studies were eligible if at least one of the following primary outcomes was evaluated: 1) CR/CRi [CRc], and 2) median OS. Series not informative about CRc or median OS were not selected (Fig. 1).

Fig. 1
figure 1

PRISMA 2020 flow diagram for systematic review including searches of databases and registers

The following exclusion criteria were applied: 1) reports in acute promyelocytic leukemia (APL); 2) studies analyzing only a subpopulation of unfit AML patients (e.g., selection of patients by genetic categories); 3) non intent-to-treat analyses (e.g., studies only analyzing patients surviving more than 14 days after first VEN dose); 3) interventional studies in the context of phase 1 to 3 clinical trials; 4) real-world studies performed in unfit AML patients using VEN-based doublets with other drugs than HMAs or LDAC; or VEN-based triplets; 5) studies mixing subgroups of newly diagnosed unfit AML and fit and/or R/R AML where main outcomes (i.e., median OS and/or CRc) were not segregated by subgroup; and 6) studies with VEN-HMA or HMA monotherapy where main outcomes were not segregated by treatment modality.

In case of published manuscripts and EHA/ASH conference abstracts with series duplication, we included the most recent and detailed study.

Data extraction and endpoints

An extensive search of electronic databases was conducted to identify observational studies, patient registries, and other relevant reports. Of the included studies, we displayed the first author, year of publication, research site, study type, study region, study size (number of patients overall and by VEN doublet when segregated), target population description (when applicable), type of study (published studies or EHA/ASH conference abstracts), and primary outcomes (median OS and/or CRc). When CR with partial hematologic recovery (CRh) was reported, this was grouped in the CRc category. We also included two relevant studies with non-segregated CRc rate but instead Overall Remission Rate (ORR) [9], and CR/CRi plus Partial Response (PR) [10].

We also collected secondary outcomes as early death (ED) at 30 days (ED30) and 60 days (ED60), rate of post-VEN hematopoietic stem cell transplantation (HSCT), median DoR (DoR), median relapse-free survival (RFS), median progression-free survival (PFS), and median event-free survival (EFS), when available. All data extracted are presented in Table 1 (median OS, rate of HSCT and other time estimation outcomes), and Table 2 (CRc and ED).

Table 1 Summary of real-world studies reporting the composite complete remission rate (CRc) in newly diagnosed unfit AML patients receiving upfront VEN-based non-intensive doublets
Table 2 Summary of real-world studies reporting the median overall survival (mOS) in newly diagnosed unfit AML patients receiving upfront VEN-based non-intensive doublets

Statistical analyses

The age of different study cohorts was described as median and range, unless it was expressed as interquartile range (IQR). We calculated the median age (and range) of the entire evaluable cohort of studies. The mean weighted age (wmAge) was also calculated using the following formula: \(wmAge=\frac{\sum (mAge \times number\; of\; patients)}{\sum\; number\; of\; patients}\).

The CRc rate, frequency of patients undergoing allogeneic HSCT (alloHSCT), and ED rates were expressed as percentages (%). We calculated the median CRc (mCRc) and CRc range of all evaluable studies for this outcome (n = 62 studies). Median OS values for each study were showed and when available its 95% confidence interval (CI). We calculated the median of median OS values (mOS) and range of all evaluable studies for this outcome (n = 55 studies). Weighted arithmetic means of CRc (wmCRc) and mOS (wmOS) with adapted CI, were calculated for all studies reporting these outcomes (n = 62 and n = 55 studies, respectively), using the following formulas: \(wmCRc=\frac{\sum (mCRc \times number\; of\; patients)}{\sum number\; of\; patients}\) and \(wmOS=\frac{\sum (mOS \times number\; of\; patients)}{\sum number\; of\; patients}\). We calculated an adapted interval with the variance of weighted mean, and the standard error of the weighted mean was calculated as the square root of the variance of the weighted mean. The wmCRc and wmOS were also calculated for VEN-AZA series (n = 27 and n = 24 studies, respectively) (Supplementary Tables 3 and 4). The mean weighted formula of alloHSCT rate (wmHSCT) was \(wmHSCT=\frac{\sum (mAge \times number\; of\; patients)}{\sum number\; of\; patients}\). Differences between the weighted means of published studies vs. conference papers were compared using the Mann Whitney U Test, with p < 0.05 considered significant. We also calculated the median of median RFS (mRFS), and median of median DoR (mDoR).

The boxplot analysis included mCRc and mOS, across all VEN-based doublets together, VEN-HMA, VEN-AZA, and VEN-DEC groups (Figs. 2 and 3). For the boxplot figures we defined outlier studies as those with values that fall outside of three times Interquartile Range [IQR] below first quarter (Q1) or 3 times IQR above third quarter (Q3), and we described the sample size (number of involved studies), the mOS [IQR], and the mCRc [IQR].

Fig. 2
figure 2

Boxplot diagram of the CRc rates (median and IQR) in all studies and by type of VEN combination

Fig. 3
figure 3

Boxplot diagram of the median OS (median and IQR) in all studies and by type of VEN combination

The statistical software packages Stata 14.2 and SPSS 26.0 were used for conducting statistical analyses.

Results

Search results

The Fig. 1 shows the main results of our systematic search. In summary, we obtained 5 704 citations from databases, of them, 195 studies were screened as potentially describing outcomes in newly diagnosed unfit AML patients, and 47 were subsequently excluded as they reported exclusively results of VEN-based regimens in clinical trial or in R/R or fit AML patients. Overall, 148 abstracts or manuscripts potentially fulfilling inclusion criteria were exhaustively revised, and 75 of those studies were finally excluded. The main causes for final exclusions were lack of segregated data (n = 26 studies), repeated cohort (n = 17 studies), specific subgroup analysis (n = 10 studies), and primary outcomes non reported (n = 8 studies) (Fig. 1). Two manuscripts were excluded due to selection bias (i.e., Freeman et al. [83] excluded patients receiving less than 28 days of VEN or only one cycle of HMAs, and Bazinet et al. [84, 85] included only patients who had achieved CR). The agreement in study selection between reviewers was excellent (kappa = 0.94). A list of excluded references and the criterion for selecting overlapping studies is provided in the supplementary material.

Finally, 73 studies were eligible and analyzed, comprising 43 studies reporting both, CRc and mOS, 11 reporting only median OS, and 19 only CRc. Overall, 5,831 patients were evaluable for CRc (Table 1) and 7 138 patients for median OS (Table 2). Seventeen studies (23%) included 150 or more patients [24, 37, 43, 51, 60, 62, 63, 64, 68, 70, 72, 73, 74, 77, 79, 82, 86], including the majority of the patients (n = 6 365 patients), and 12 out of 17 (71%) studies with 150 patients or more were conference abstracts.

Median age was available in 68 studies, with a median of median age of 73 years (range of 17 to 92) and a wmAge of 73.0 years. In 16 studies the median age was not segregated for unfit AML treated with VEN-based regimens, and it was reported for a mixed population also including patients treated with HMA monotherapy, with intensive chemotherapy, or with R/R AML [15, 19, 21, 23, 30, 33, 35, 38, 42, 47, 51, 55, 65, 76, 78]. The median age among 53 studies without mixed populations was 73.5 (range 55 to 79), and wmAge was 72.7 years old.

Most of the collected real-world data articles were retrospective (Tables 1 and 2). The vast majority of studies included patients diagnosed after 2018, although some included data from patients diagnosed since 2013 [22, 25, 53]. While most studies reported treatment according to standard 28-days VEN cycles, a multicenter study by Willekens et al. reported a real-world series using a VEN-AZA regimen of 7 + 7 days, with a median OS of 12.8 months [87].

Complete remission

The CRc rate was reported in 62 studies (n = 5 831 patients), 55 with VEN-HMA only, and 7 also including VEN-LDAC in a small subset of cases (n < 100 patients) (Table 1). Overall, the mCRc was 56.2% (IQR 48.8% to 63.3%), 58.0% among VEN-AZA (IQR 49.2% to 64.3%), and 55.0% among VEN-DEC (IQR 47.6% to 67.7%) (Fig. 2). Overall, the wmCRc was 58.2%, with a significant difference between published manuscripts and conference abstracts (54.6% vs. 60.6%, respectively, p = 0.018) (Supplementary Table 1). In studies with disaggregated CRc for VEN-AZA (n = 1 607 patients), the wmCRc was 58.4%, with a non-significant difference between published manuscripts and conference abstracts (56.0% vs. 63.0%, respectively, p = 0.64) (Supplementary Table 3).

Early death

ED was reported in 26 studies (2 927 patients) (Table 1), with a median of 5% of ED30 rate (range 0% to 26%) and 13% of ED60 rate (range 0% to 41%) [11, 12, 13, 14, 16, 17, 20, 22, 24, 25, 27, 28, 29, 30, 32, 37, 38, 40, 41, 48, 51, 57, 64, 65, 67, 70].

Overall survival

The median OS was reported in 54 studies (n = 7 138 patients), mostly using VEN-HMAs, except for 9 manuscripts which also included small subsets of VEN-LDAC (Table 2). The study by Matthews et al. [24] was only included in the calculation of the VEN-AZA subgroup, and is not included in the 54 studies of the overall calculation. The mOS was 10.4 months overall, 9.8 months among VEN-AZA, and 12.0 months among VEN-DEC (Fig. 3). The wmOS was 10.3 months, with no significant difference between published manuscripts and conference abstracts (10.6 vs. 10.1 months, respectively, p = 0.35). In studies with disaggregated VEN-AZA results (n = 3 571 patients), the wmOS was 10.6 months, with no significant difference between published manuscripts and conference abstracts (11.7 vs 10.3 months, p = 0.23).

Focusing on some real-world studies with 150 or more patients: 1) a multicenter study by Gross et al. with 186 patients in France [62], reported a median OS of 12.4 months, noteworthy observing lower febrile neutropenia in patients with cycles of less than 21 days: 2) Mims et al. showed in a multicenter study in the US with 403 patients a median OS of 13.4 months [79] 3) Venditti et al. showed a median OS of 14.2 months among 178 Italian patients [82], 4) Vachhani et al. reported in 2022 median OS of 8.6 months among 169 US patients [29, 37], 5) Matthews et al., reported a median OS of 10 months in a US cohort of 488 patients treated with VEN-HMA [29, 37], 6) Gershon et al. published a study in 2023 with 619 patients, showing a median OS of 9.2 months [72], and 7) Fuqua et al. reported at ASH 2023 a median OS of 9.6 months among 1 393 patients of an international database [77].

AlloHSCT rate

Overall, 26 studies involving 5 144 patients reported the percentage of subsequent alloHSCT, yielding a median of 10.3% (range 0% to 29%) (Table 2). The wmHSCT was 15.4% [12, 17, 19, 20, 22, 28, 32, 33, 34, 37, 39, 50, 54, 58, 59, 61, 64, 65, 72, 73, 77, 79, 88, 89, 90, 91].

Other outcomes

RFS was reported in 7 studies (930 patients) with a mRFS of 9.3 months (range 4.0 to 14.1 months) [39, 40, 43, 57, 75, 79, 88]. Seven studies (486 patients) reported DoR, with a mDoR of 10.6 months (range 3.1 to 19.8 months) (Table 2) [9, 11, 12, 28, 50, 54, 61].

Discussion

Our systematic review provides a comprehensive assessment of the effectiveness of VEN-based regimens for the upfront treatment of unfit AML patients in real world. We show a wmOS of 10.3 months among 7 138 patients, significantly lower than expected according to the VIALE-A trial (14.7 months), while the wmCRc rate was 58.2% among 5 831 patients, slightly lower to that reported in the VIALE-A (66.4%) [7]. This lack of effectivity has been observed in other hematological diseases and drugs. For instance, in a series of 251 patients with high-risk myelodysplastic syndromes treated with AZA, a discrepancy was noted between the OS data from clinical trials and real-world data [92].

As far as we know this is the more comprehensive analysis of real-world outcomes using VEN-based regimens for unfit AML patients. Du et al., performed a systematic review and meta-analysis of VEN-AZA combination for patients with AML and myelodysplastic syndromes involving 1 615 patients (newly diagnosed and R/R) from 1 randomized clinical trial (RCT) and 18 non RCT studies [93]. They found CR/CRi rate of 67.5%, while the pooled median OS could not be calculated. Ucciero et al., recently published a meta-analysis involving 1 134 newly diagnosed unfit AML patients from 19 real-world studies [94]. The pooled survival curve was similar to that reported in the VIALE-A during the first three months of treatment but diverged thereafter (the estimated median OS was 9.37 months, p < 0.0001). The CRc rate was not reported in that Italian study. We did not aim to perform a meta-analysis as this methodology was not considered proper in the context of non RCT with substantial heterogeneity between series. On the other hand, we opted by also including conference abstracts as there might be some selection bias towards publishing more positive results. In fact, we show that the wmOS was slightly higher among published manuscripts vs. conference abstracts (10.6 months vs. 10.1 months, respectively). However, the wmCRc rate was superior among abstracts than manuscripts, probably because of more accurate response assessment among peer-reviewed articles. In order to have more accurate results, we have calculated the weighted means for median OS and CRc rate, assigning more impact to larger studies. However, we also presented the median and ranges for primary efficacy outcomes (median OS and CRc), with no substantial differences with weighted means, probably due to the sizable number of studies involved.

Although it is generally accepted that results could be worse in the real-world setting than in clinical trials [95], we have observed a considerable difference in median OS between our study and the VIALE-A (4.4 months difference, 30% decrease) [7]. However, no such differences in median OS have generally been reported between phase 3 trials and real-world studies with AZA monotherapy for unfit AML. In fact, median OS after AZA was 10.4 months in the pivotal AZA-AML001 (which included some fit patients) [96], 9.6 months in the VIALE-A [7], 9.8 months in the PETHEMA-FLUGAZA (for > 65 years old patients) [97], and 8.7 months in the ASTRAL-1 (comparing with guadecitabine) [98]. Similarly, several unfit AML population-based studies have reported median OS of 7.1 months (n = 1 114 patients) [99], 9.1 months (n = 710 patients) [100], 10.4 months (n = 486 patients) [101], 9.2 months (n = 1 073 patients), and 9.9 months (n = 809 patients) [102, 103]. Moreover, it will be useful to know the results of the VEN-AZA control arm in phase 3 trials (e.g., ENHANCE-3 [NCT05079230]), in order to test the reproducibility of the median OS of the VIALE-A study. Although real-world CR/CRi rates remained higher than the 20–30% observed with HMA monotherapy [4], this was not clearly translated into an OS advantage; probably because many CRi obtained with VEN-HMA is just a pancytopenia state without evident clinical improvement of patients.

We can speculate about potential causes for discrepancy between mOS in real-world vs. VIALE-A trial: 1) patients could be frailer in the routine practice as the VIALE-A excluded some patients with severe comorbidities and/or ECOG performance of 4. However, the median age in our systematic review was 73.0 years old (vs. 76.0 years old in the VIALE-A) [7], and the weighted mean rate of alloHSCT was 15.4% (vs. < 1% in the VIALE-A) [104]. Thus, we can infer that patients “ineligible” for intensive chemotherapy treated with VEN-AZA in real-world cohorts were even less fragile than those include in the pivotal RCT; 2) an excess of initial toxicity could lead to increased death rate in real-world patients managed with VEN-based regimens. Nevertheless, the ED rate at 30 days was similar in our study as compared to the VIALE-A (5% vs 7%); 3) VEN-LDAC and VEN-DEC series have been included in our review, and this could harm the overall results. However, less than 100 out of 7 120 patients were treated with VEN-LDAC, the VEN-DEC mOS was 12.0 months, and the VEN-AZA mOS was 9.8 months (Fig. 3); 4) the VIALE-A trial could be enriched with AML subgroups who benefit more from VEN-AZA. This is plausible as the rate of secondary AML was low in the trial (25% vs up to 40% for unfit patients) [105], in part because prior HMAs exposure and antecedent of myeloproliferative neoplasms were exclusion criteria; and lower CRc rates and median OS have been reported in real-world for secondary AML [16, 29, 43, 46, 52, 55, 72, 106, 107, 108, 109]. On the other hand, with the exception of low cytogenetic risk that was excluded, the genetic profile of the VIALE-A patients (i.e., 36% of poor-risk cytogenetics, 23% of P53, 25% of IDH, 17% of NPM1 mutation) could be similar to that reported by epidemiologic registries for unfit AML [110, 111].

In our opinion, the patient selection and management of VEN could be the main factors leading to reduced median OS observed in the real-world cohorts. In order to improve outcomes, it could be reasonable to cautiously use VEN for very frail patients (e.g., with severe comorbidities and/or extreme age) and for subsets where there is no evidence of substantial benefit by adding VEN (e.g., prior HMAs exposure, P53 mutated, or secondary AML) [4, 29, 37, 52, 60, 98, 99, 100, 101, 102, 103, 104, 105, 107, 112, 113]. Regarding management, initial hospitalization followed by tight monitoring by skilled teams during aplastic phases should be recommended, as infectious complications, among others, could jeopardize tolerability and long-term feasibility of VEN schedules. Furthermore, dose adjustments and drug-drug interactions might be challenging in older AML patients, leading to potential toxicities and overdosage. An option might be therapeutic drug monitoring of VEN given the observed variability in pharmacokinetics and the potential for varying responses due to factors like adherence, food effects, and the influence of P-gp or CYP3A4 inhibitors on drug exposure [114]. Furthermore, Philippe et al. stated that the duration of VEN treatment has been generally reduced from 28 days to 21 or 14 days in routine practice to limit cytopenia and the risk of complications while maintaining a satisfactory CR rate [114].

Our study has limitations as it is a literature review mainly based in retrospective series, and many of them were reported only as conference abstracts (not peer-reviewed). Obviously, our findings should not replace the best evidence to date on VEN-AZA efficacy, which is the randomized double-blinded VIALE-A trial. However, our systematic review emphasizes on the need of optimizing the VEN regimens indication and management in the real world. Moreover, although VEN has emerged as a new standard of care for unfit AML patients, we highlight that we might be far from achieving dramatic improvements in the real-world setting. Also, our results should warn about replacing intensive chemotherapy by VEN-HMA for older but fit AML patients, at least until having well designed trials demonstrating superiority of this strategy. Our data should be critically re-assessed in the following years, exploring the long-term benefits while physicians refine the management of this relatively novel therapy. Finally, our study could be useful for regulators when advising design of new clinical trials with VEN-based schedules (i.e., triplets or doublets for fit or unfit AML patients).

In conclusion, groundbreaking median OS reported in the VIALE-A trial using VEN-AZA was not well reproduced in real world for unfit newly diagnosed AML patients (14.7 vs. 10.3 months); while ED and CRc rates were more consistent. Strategies to optimize patient selection, dosing regimens, and supportive care management are crucial to improve outcomes in real-world practice.