Background

According to the updated National Cancer Institute-Working Group (NCI-WG) guidelines, indication for treatment of chronic lymphocytic leukemia (CLL) still depends on clinical stage and disease activity [1]. In this context, measurements of biological prognostic markers, namely CD38, ZAP-70, mutational status of immunoglobulin heavy chain variable gene segments (IGHV), are judged as mandatory in the context of clinical trials, but not in general practice, since they fail to influence therapeutic decisions [1]. The only exception is represented by analyses of chromosomal aberrations by interphase fluorescence in-situ hybridization (FISH), given the presence of high-risk cytogenetic lesions (del11q and del17p), which may predict resistance to chemotherapy-based treatments [2]. Wierda et al. [3] proposed to combine a set of clinical risk factors, i.e age, gender, Rai staging, absolute lymphocyte count (ALC) and number of involved lymph node regions (LNR), with an inexpensive and widely available serum marker such as beta2-microglobulin (β2 M) to develop a prognostic index (PI) stratifying patients in three risk groups with different expected median survival, and a nomogram, estimating individual patient survivals. This model was subsequently validated in independent patients series also using time to first treatment as end-point [48]. A reduction of this model from six to four variables, i.e. age, gender, β2 M levels and Binet staging, was also shown to predict survival with equal or even better performance [8]. The object of the present study was to provide evidence that prognostic models for overall survival based on clinical variables [48] could be improved by information on biological risk factors. By retrospectively analyzing a multicentre CLL population of over 600 untreated patients the most significant and independent biological and clinical prognosticators were integrated in a new clinical-biological prognostic index for group stratification and in a novel nomogram for estimating individual survival.

Methods

Patient population

Between 1996 and 2008 a cohort of 620 CLL patients was collected in the context of a larger multicenter patient dataset (n = 1037), previously utilized to propose a modified prognostic model and nomogram [8], according to the availability of the following biological prognosticators: IGHV mutational status, chromosomal abnormalities, as detected by interphase FISH, and flow cytometric expression of CD38 and ZAP-70. Moreover, since most of the diagnoses of the original patient set were made before the publication of the revised NCI-WG guidelines [1], all cases of previously defined CLL that could be re-classified as monoclonal B cell lymphocytosis (MBL) were removed accordingly. The percentage of recruited cases in the different centers was: 30% at Roma Catholic University, 25% at Novara, 15% at Roma Tor Vergata, 8% at Siena, 6% at Milano, 4% each in the other 4 centers. Cut-points for LNR were as previously reported [3]. Continuous variables age and β2 M levels were categorized using cut-points at 65 years for age and at the upper limit of normal (ULN) for β2 M, as deduced by the analysis of martingale residuals plots [9]; ALC was categorized at the median, since the martingale residual plots did not show any suitable cut-point.

Biological prognosticators

Evaluation of biological prognosticators was centralized in few reference laboratories, utilizing previously validated common procedures; in detail, 5 centers performed IGVH mutational analysis, 6 centers performed cytogenetics and flow cytometry. IGHV mutational status was performed as previously reported [10]. Cytogenetic abnormalities involving chromosomes 11 (del11q22; hereafter del11), 12 (trisomy 12), 13 (13q14.3) and 17 (del17p13; hereafter del17) were investigated by interphase FISH, as reported [11]. Results of FISH analyses were classified as unfavourable when high-risk genomic aberrations (del17p and or del 11q) were present [1214]. ZAP-70 measurements were determined by flow cytometry, utilizing the 20% of positive CLL cells as cut-off to discriminate between ZAP-70 positive and negative cases [1518]. CD38 measurements were performed as reported [19], using a threshold at 30% expression to define positive cases. All the variables were measured at or within one year from diagnosis and always before treatment on either fresh or frozen samples. Data were used upon informed consent from patients and approval by Institutional Review Boards (Centro di Riferimento Oncologico, Aviano; Catholic University of the Sacred Heart, Rome), and in accordance with the Declaration of Helsinki.

Statistical methods

All analyses were performed in R, an open source statistical package (http://www.r-project.org/). Median follow-up was computed using the reverse censoring method. The primary end points were overall survival (OS) and time-to-first-treatment (TTT), defined as described [1, 20, 21]. OS was estimated using Kaplan-Meier plots and compared between groups by log-rank test. Univariate and multivariate Cox models were used to verify independent prognostic power of each parameter. Model minimization was performed by stepwise backward elimination. A p value < 0.05 was considered to be statistically significant. Departure from proportionality in hazard was tested in all Cox models. The predictive accuracy of various Cox models was evaluated by calculating the concordance index (c-index), which is a probability of concordance between predicted and observed survival, equal to the area under the receiver operating characteristics curve for censored data [22]. A c-index of 0.5 indicates that outcomes are completely random, whereas a c-index of 1 indicates that the model is a perfect predictor. Prediction error was calculated as 1-c-index. U-statistics was applied to test the significance between different c-index values [22]. Nomogram was developed and calibrated following published methods [22]. Final risk group scoring was developed in four step: 1. selection of independent predictive variables; 2. fitting of a Cox model with selected variables; 3. score assignments based on regression coefficients; 4. identification of best cut-point to split the score in 3 risk groups by recursive partitioning [23]. Internal validation for step 1. and 2. was performed with bootstrap .632+ method [24, 25] with B = 620 bootstrap samples and (step 2) with cross-validation [26]. Variables selected with a frequency greater than 50% were entered in the final model. Risk score categorical model obtained by recursive partitioning was internally validated by bootstrap methods applied to tree-based analysis [27]. Finally, the whole model building procedure was validated by a comprehensive leave-one-out cross validation (see Additional file 1: supplementary statistical methods). All p values are based on two-tailed tests.

Results

Patients characteristics

Patients characteristics are reported in Table 1. Treatment was administered according to NCI-WG indications. Deaths occurred mostly in treated patients (83%). Deaths among untreated patients aged beyond 70 years accounted for 11% of all deaths. All patients characteristics were balanced across age groups <55, 55-64, 65-4 and ≥ 75(chi-square tests), except for a greater proportion of males in the <55 age group and a greater proportion of high β2 M levels and deaths events in the ≥75 age group. Kaplan-Meyer plots of OS and TTT are shown in Figure 1.

Table 1 Patients characteristics (n = 620)
Figure 1
figure 1

Overall survival (OS) and time to treatment (TTT) in the whole cohort of 620 CLL patients.

Univariate and multivariate analysis for OS and TTT

In univariate analysis for OS all clinical and biological variables were significant, except for del11q (Table 2). The effect of chemoimmunotherapy was small, without statistical significance. The effect of period of diagnosis was significant, with patients diagnosed 2000-2005 and >2005 at increasing risk compared to patients diagnosed before 2001. Based on these results, multivariate analyses were adjusted for the year of diagnosis introduced in the model as a three level stratification factor (<=2000, 2001-2005, >2005). Age dependent variations of variables effects were explored by including an age interaction term either in continuous form or as a four-group (<55, 55-64, 65-74, ≥75) ordinal variable in multivariate models. No significant variations in hazard ratio (HR) values were found (p > 0.05 for all interactions). The selection of the variables entered in final model was internally validated by bootstrap .632+ method [24]. All the variables introduced in the final model were selected in more than 50% of bootstrap samples (Table 3); prediction error in this step of model building was 0.244. The final model fitting was also validated by bootstrap .632+ method, showing at this step a prediction error of 0.247. Leave-one-out cross validation [24, 26]. showed that neither β2 M nor gender, the least important variables, could be safely removed from the model (Table 4). Univariate and multivariate analyses of TTT, performed on the subset of Binet A patients below 70 years of age, are shown in Table 5.

Table 2 Univariate and multivariate Cox regression analysis for overall survival
Table 3 Prognostic score for overall survival with clinical and biological risk factors and bootstrap validation
Table 4 Leave-one-out cross validation of the final model
Table 5 Univariate and multivariate analysis of TTT in Binet A patients below 70 year of age (n = 291, no missing cases)

Clinical-biological prognostic index

The 4-variable clinical model previously proposed by us [8] was refitted in the present CLL cohort (see Additional file 2: Table S1). The 4-variable clinical model had a greater discriminatory power (c-index 0.72) than the 6-variable clinical model by Wierda et al. (c-index 0.62; p = 1.3 × 10-7). According to the β regression coefficients, a novel clinical-biological prognostic score was developed by assigning 3 points for Binet C stage, 2 points/each for Binet B stage and age > 65 years, 1 point/each for male gender, high β2 M levels, presence of an unmutated IGHV gene status or 17p deletion (Table 3). The score point distribution is reported in Figure 2a. To this distribution we applied a recursive partitioning method [23], which yielded three prognostic groups, with score 0-1, 2-5 and 6-9. The Kaplan-Meier plots of the three risk group partitioning of the prognostic score is shown in Figure 2b, for comparison also the risk group partition by Wierda PI [3] is shown in Figure 2c. In particular, 21% of patients (score 0-1) were at low-risk, 63% (score 2-5) were at intermediate risk, and 16% of patients (score 6-9) were at high risk. Projected survival in respectively low, intermediate and high-risk groups was 98%, 90%, 58%, and 98%, 69% 9% at 5-year and10-year, respectively. Predictive accuracies were significantly greater in the clinical-biological model, compared to the 6-variable clinical model by Wierda et al. (c-index 0.73 vs 0.62, p < 0.0001) [3], or to the 4-variable clinical model (c-index 0.73 vs 0.72, p < 0.0001) [8], or to Binet (c-index 0.73 vs 0.65, p < 0.0001), or Rai (0.73 vs 0.62, p < 0.0001) staging systems. To show the combination of predictive variables in each patients and in each group we used a heat-map plot (Figure 3). In the low risk group, comprising 133 cases, 52 patients had no adverse predictors (score 0), 50 patients were male, 16 patients had a β2 M > 1 and 15 patients had unmutated IGHV gene mutational status. Of note, low risk patients were never aged >65, nor had a Binet staging B or C, or were affected by a CLL bearing del17p (Figure 4). Conversely, in the high-risk group, only 3 or 4 patients, respectively, had <65 years or a Binet stage A disease; these patients, however, had all the other prognosticators in their bad configuration. Moreover, the 51 patients in Binet stage B of the high-risk group, had mostly (37/51) an unmutated IGHV gene status or high β2 M (42/51) levels. Finally, the 29 patients classified in Binet stage C and belonging to the high-risk group, mostly had (26/29) high β2 M levels (Figure 4). Kaplan-Meyer plots of the individual variables are reported in Figure 4.

Figure 2
figure 2

Clinical-biological index. a) histogram of score points according to clinical-biological prognostic model. Vertical red lines show the positions of cut points splitting sample in 3 risk groups. b) Kaplan-Meyer plot showing prognostic stratification in 3 risk groups according to clinical-biological score. c) prognostic stratification in 3 risk groups according to Wierda et al prognostic score6.

Figure 3
figure 3

Heatmap of individual patient clinical-biological scores. Columns refer to individual patient; rows refer to predictors. In heatmap, each dicotomic predictor is indicated in green or red if present in its favourable or unfavourable configuration, respectively. Binet stages A, B, C are indicated in green, red and black, respectively. Yellow bars show the splits between low, intermediate and high-risk groups. Number of patients in each score class are reported at the bottom of columns.

Figure 4
figure 4

Nomogram for predicting overall survival according to the clinical-biological prognostic index. To read the nomogram, draw a vertical line from each tick marker indicating the status of a predictor to the top axis labeled Points. Sum the points and find the corresponding number on the axis labeled Total Points. Draw a vertical line down to the axes showing 5- and 10-year overall survival rates and median survival. Beta2M, ß2 microglobulin; ULN, upper limit of normal; OS, overall survival.

Nomogram for estimating prognosis in individual patients

Even if individual estimates of survival, as those obtained from nomograms, are more likely affected by inaccuracy than group estimates [28], to allow individual patients survival estimation a nomogram was developed as described previously [8], based on the final model with clinical and biological prognostic factors shown in Table 3, modified using age and β2 M as continuous variables (Figure 5). The clinical-biological nomogram showed a better predictive accuracy than the clinical nomogram proposed by Wierda et al. [3] (c-index respectively 0.79 and 0.76, p = 0.046).

Figure 5
figure 5

Kaplan-Meyer plots of overall survival (OS) for the 6 variable of the clinical-prognostic index.

Discussion

Survival time at CLL diagnosis may be simply estimated by means of six variables, four of them clinical-demographic (stage, LNR, sex, age) plus two quantitative assays (ALC, β2 M) [3, 4, 8]. Two independent studies [4, 8] failed to confirm the predictive power of ALC. A simplification of the PI from six to four variables was previously proposed by us as capable to stratify patients with equal or better performance [8]. In the present study, the aim was to improve these clinical prognostic models by adding information on biological variables, in particular those identified by the updated NCI-WG guidelines [1] as mandatory at least in the context of clinical trials. We demonstrated that PI for OS prediction based on clinical variables could be improved only by IGHV gene mutational status and del17p, but not CD38, ZAP-70 and del11q. The lack of prognostic power of CD38 and ZAP-70 is not totally unexpected. Similar findings have been found either analyzing OS [13, 14, 29] or TTT [30, 31], although none of these reports included both biological and clinical prognosticators in a comprehensive clinical-biological PI, as proposed here. It has been often emphasized that assays evaluating ZAP-70 and, at least in part, CD38 expression suffer from inherent weakness and lack of proper standardization [13, 3234]. As a consequence, considerable analytic variability still exists on measurement of these parameters [35]. In this regard, such a variability could be more relevant in multi-center series like that investigated in this study. Indeed, at variance with our results, ZAP-70 or CD38 turned out to be among the strongest prognosticators in mono-center studies [36, 37], with time-to-first-treatment or time-to-progression as end-points. Lack of reproducibility and standardization of biological markers can affect the results of prognostic tools applied at different institutions. Our model might be less subjected to this bias, since it includes IGHV and del17p, but not the less standardized measurements of CD38 and ZAP-70. Krober et al. [13, 14] have previously showed the importance of molecular risk factors in CLL by stratifying patients by IGHV gene mutational status and presence of high-risk genomic aberrations (del17p or del11q), although authors failed to test if their model was independent of clinical and demographic risk factors. Here we had the chance to integrate the data by Krober et al. [13, 14] by showing the independent prognostic relevance of UM IGHV gene status and del17p in a model that also included clinical and demographic risk factors. Of note, the effect of these molecular prognosticators was found to be additive and of equal importance. The unexpected limited relevance of molecular risk factors in our model and the lacking predictive power of CD38 and ZAP-70, may be in part justified by a relative small number or deaths and a median follow up of only 5 years, despite the large number of patients collected. Future analyses with longer follow-up data and more events might regain significance to some biological variables showing the need to update the score. Compared to our previous clinical model [8], we confirmed the value of β2 M, although with the smallest coefficient and the weakest level of significance. We had no data to adjust for renal function impairment, particularly in aged patients, or for other comorbidities. However, β2 M was shown to be important in other retrospective and prospective studies [3840]. The value of prognostic factors in aged CLL patients has been recently criticized by showing that FISH aberrations (del11q or del 17q) and IGVH lost their predictive power for OS in patients aged above 75 years [41]. In our CLL series, we specifically addressed this issue by testing age dependent variations of the predictive power of all the variables included in the final model. No significant interaction effect was found for age. We found a greater proportion of death events in the oldest age group. The risk of death in this group may be influenced by other factors, not related to disease. However, epidemiological data from cancer registry show the frequent occurrence of late deaths attributable to CLL also in aged patients group [42, 43]. The effect of chemoimmunotherapy with anti-CD20 was small, with a non significant trend for a longer survival (p = 0.10). Results of the present study differed in part from those of a randomized prospective trial [40], where Binet stage and gender, in addition to del11q, CD38 and ZAP-70, all failed to be independent prognostic markers in a multivariate model for OS which also included IGHV gene mutational status, usage of IGHV3-21 gene, del17p, age and β2 M. Notably, this study, which investigated a population of selected patients in need of treatment (i.e. with active or progressive disease), selected del17p as the strongest risk factor [40]. Conversely, in our retrospective study dealing with untreated patients at diagnosis, the relative weight of del17p appeared equal or lower than that of other variables. Therefore, while the model described in [40] seems to better predict outcome of CLL patients with progressive or active disease, our model appears to be more suited for estimating survival in untreated patients at diagnosis or before clinical progression. The analysis of TTT in Binet A patients below 70 years of age showed that demographic factors (age, gender), important for OS estimation, lost their prognostic power for TTT. Conversely, It might be expected that biological prognosticator, particularly those with limited or even absent significance in the OS analyses, would have gained more importance in the TTT analyses in this subset of patients. However only IGHV and β2 M confirmed their role, with IGHV the most important predictor of TTT. CD38 and ZAP-70 were again not significant, in spite of a good representation of positive cases (respectively 21% and 36%); del17p lost its power whereas del11q gained significance. In this case the low percentage of positive cases may, at least in part, justify the fluctuating results.

Conclusions

In the present study we showed that the survival of untreated CLL patients may be estimated by a limited set of clinical and biological variables, integrated in a prognostic index and in a nomogram, allowing group and individual estimation, respectively. CD38, ZAP-70 and del11q gave redundant prognostic information. Both the proposed PI and the nomogram were only internally validated. Even in internally validated models, the performance of prognostic tools may be influenced or biased by the composition of the population in which they are developed and lack of standardization of biological variables. Therefore the prognostic tools proposed should be used with caution until externally validated on independent, prospective patient series.