Introduction

Current World Health Organization guidelines recommend using CD4 counts to monitor treatment response to highly active antiretroviral therapy (HAART) in regions where HIV viral load testing is unavailable [1]. However, recent reports suggest that monitoring CD4 counts does not accurately classify individuals who have not successfully suppressed HIV RNA levels [24]. One study, from Uganda, examined whether CD4 counts and CD4 percentages could be used to classify individuals as above or below four thresholds of HIV RNA (50, 500, 1000, and 5000) and at three time points (6, 12, and 18 months) after the initiation of treatment [3]. Various classification schemes based upon CD4 counts (e.g. an increase in CD4 count from 0 to 6 months) or CD4 percentage provided a sensitivity range of only 0.04-0.62 for detecting individuals with HIV RNA above 500 [3]. We examined whether other clinical markers that are routinely assessed within the Johns Hopkins HIV Clinical Cohort (JHHCC) could provide better classification of individuals who do not have suppressed HIV RNA levels using a novel approach.

Methods

The JHHCC was established to prospectively quantify the processes and outcomes of care for HIV-infected individuals seen in clinical practice in the Baltimore metropolitan area [5]. All patients give informed consent and the JHHCC is conducted in accordance with the ethical standards of the Johns Hopkins Institutional Review Board and with the Helsinki Declaration of 1975. Subjects included in this analysis were individuals who initiated HAART after January 1, 2000 and had an HIV RNA measurement at least 4 months after initiation. Each individual also had to have at least one of the biological markers (listed below) measured within 60 days before or 30 days after the time of HIV RNA measurement. Only a single record of HIV RNA (the first measurement occurring at least 4 months after HAART initiation) and clinical markers for each individual was included in the analyses. All individuals were still on treatment at the time of their HIV RNA measurement.

We utilized a random forest approach to evaluate the ability of routinely collected clinical markers to classify individuals as greater or less than 500 HIV RNA copies/ml. Random-forests are an algorithmic, non-parametric approach to identify prognostic variables and are robust to over fitting the data [6, 7]. These methods are an extension of classification and regression trees (CART) which by introducing randomness in variable selection and have been shown to have lower error and better classification rates [6, 8]. Briefly, individual classification trees were generated from random bootstrap samples from the data set. Each node of the tree (or branch point) was created by selecting a random subset of candidate classification variables. As with standard CART methods, nodes were split by variables that optimize a splitting criteria and each tree is grown to full size. Because each classification tree was developed from a bootstrap sample of the study population, a subset of the study population remained unused for that tree; this subset was used to validate the tree and estimate the classification error. Ultimately, the random forest approach provides a measure of each variable's importance by examining (in the validation subset) the increase in error rate when the variable is ignored [6, 8]. This consists of running the data from the subset of individuals not chosen in the bootstrap sample through the tree while permuting each covariate in turn. Thus each covariate has a set of error rates (obtained from each tree) for when the specific covariate has and has not been permuted. The change in the error rate is summarized over all trees in the random forest and divided by the standard error to provide a standardized change in error rate. If a variable did not truly have prognostic importance then the change in error rate should be distributed around 0 and normally distributed.

The random-forest approach was used to search for prognostic variables among the following measures: absolute CD4, percent CD4, serum albumin, alanine aminotransferase, aspartate aminotransferase, creatinine, hemoglobin, total lymphocyte, eosinophil, and neutrophil counts, potassium, calcium, chloride, CD3 counts, red blood cell count, mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV), packed cell volume, platelet count, alkaline phosphatase, CO2 , direct billirubin, and HAART regimen (protease inhibitor [PI], non-nucleoside [NNRTI], triple nucleoside [NRTI], and dual regimen containing both PI and NNRTI based regimens). The measured value up to 1 year preceding HAART initiation, and the corresponding difference between post-HAART and pre-HAART values, were included (e.g. change in MCH = [post-HAART MCH] - [pre-HAART MCH]). Individuals who were missing marker measurements had values imputed using the approach for imputation in random-forests as outlined by Brieman [9] (R Foundation for Statistical Computing, Vienna, Austria: http://www.R-project.org). Imputation in random forests maintains accuracy when up to 80% of the data are missing [9].

Initially the random forest included all covariates to determine an overall error rate and order of the variable importance. Utilizing this information, we constructed another random forest limiting the covariates to the 12 most important variables. Subsequently we continued to prune covariates from the random forest by eliminating the least predictive variables until we reached a random forest consisting of variables which had a variable importance above 1.96 as a cutoff since a non-prognostic variable should be normally distributed.

While, the random-forest approach provides an excellent method for identification of important, prognostic variables, it unfortunately does not produce a familiar regression equation that can be easily disseminated through printed material. Furthermore, the random-forest may contain thousands of trees and therefore cannot be easily included in a figure. Thus, we utilized the random-forest results to identify variables that had the most predictive capability based upon the variable importance measure. We then used these variables to construct a logistic model with HIV RNA above 500 cps/ml as the outcome and the concurrent markers (or change from pre-HAART levels) were included as covariates. Because of the missing data, we re-imputed the missing data 20 times to account for the variability in the imputation process and summarized the results of the logistic model for multiple imputation [10]. To optimally assess the classification error of both the random-forest and logistic model, we reserved one-half of the study population as a cross-validation set. The models calibration was examined by splitting the predicted probabilities from the logistic model into 8 quantiles (each consisting of 98 indviduals) and assessing the observed probability of having an HIV RNA above 500 copies/ml as compared to the mean predicted probability in each quantile group [11].

To determine how well the models were able to discriminate between those who did and did not have an HIV RNA above 500 copies/ml, we relied primarily on the receiver operating characteristic (ROC) curve and the area under the receiver operating characteristic (AUROC) curve. An ROC curve is the relationship between sensitivity and specificity when different cutoff of a distribution is utilized. The AUROC provides the probability that one can discriminate between two individuals (one randomly chosen from those who are above 500 copies/ml and one randomly chosen from those that are below 500 copies/ml) which individual is above 500 copies/ml [12].

Results

The study population was comprised of 1,568 individuals; 784 were used for the random-forest analysis and 784 for the validation set. Study population characteristics are shown in Table 1. The median (interquartile range, IQR) time the HIV RNA was measured after HAART initiation was 0.48 (IQR: 0.39-0.65) years. The majority of individuals had an HIV RNA level below 500 copies/ml (1017 [65%]). The median CD4 count just prior to HAART initiation was 190 (IQR: 66-315) cells/mm3 and 278 (IQR: 143-426) at the time of HIV RNA measurement. Most were on a PI-based regimen (47%) followed by a NNRTI-based regimen (38%) with the rest on either a dual PI and NNRTI or triple NRTI-based regimen (15%). A total of 696 (59%) were on a regimen containing a thymidine analogue (49% containing zidovudine and 33% containing stavudine).

Table 1 Study population characteristics

Utilizing the variables listed above, the random-forest method was able to correctly classify 659/784 individuals: 473/509 individuals with HIV RNA < 500 copies/ml, for a specificity of 0.93 [95% confidence interval, CI: 0.90-0.95] and 186/275 individuals ≥ 500 copies/ml, for a sensitivity of 0.68 [95% CI: 0.62-0.73]. The most important variable when all variables were included was the MCH levels followed by change in MCH from pre-HAART levels. Using the variable importance as a guide, a new random-forest was grown eliminating the least important variables from the model. The final random-forest included with four different markers: MCH (both current and change from pre-HAART level), current CD4 count, change in percent CD4, and MCHC (both current and change from pre-HAART level). This final, reduced random-forest was able to correctly classify 643/784 individuals for an overall error rate of 18% (specificity: 459/409 = 0.90 [95% CI: 0.87-0.93]; sensitivity: 184/275 = 0.67 [95% CI: 0.61-0.72]).

A logistic model was constructed based upon these final variables. The final logistic model is shown in Table 2 and shows approximately 20% decrease in odds having a HIV RNA above 500 copies/ml for every pg/cell higher of either MCH level or change in MCH (from pre-HAART levels). A calibration plot (Figure 1) demonstrated that the logistic model was fairly well calibrated as the curves lowess curves for the training (solid line) data set followed the 45 degree line. The area under the receiver operating curve (AUROC) was 0.85 (95% CI: 0.82-0.88). Using a predicted probability from the logistic model of >0.5 as having HIV RNA ≥500 copies/ml the sensitivity, specificity, positive and negative predictive values are shown in Table 3. The operating characteristics of this logistic model relative to the random-forest prediction approach resulted in a slight decrease in specificity (0.89 vs. random forest: 0.90) and an increase in sensitivity (0.70 vs. random forest: 0.67) with a positive predictive value of 0.77 and negative predictive value of 0.84 among the training set.

Table 2 Results of logistic model after screening for variables by the random forest approach***
Table 3 Results of applying the logistic model to both the training set and validation set using a predicted probability of 0.5 as the cutoff*.
Figure 1
figure 1

Calibration curve. A calibration curve resulting from the logistic model presented in Table 2, which shows good calibration overall when applied to the training (solid diamonds, solid line) and validation (open circles, dash-dot line) sets, despite that those with a predicted probability between 0.21 and 0.37 the actual probability appears to be lower than predicted in the validation set. Vertical lines correspond to 95% confidence intervals for the corresponding quintile group.

Change in MCHC from pre-HAART levels and MCHC were not included in the final model as these two variables were not significant in the logistic model (p = 0.71 and 0.36, respectively). Modeling a non-linear relationship of these two variables did provide a significant association (χ2 = 18.64; 5 degrees of freedom; p = 0.002). However, when included in the model, these two variables and non-linear terms did not substantially improve prediction (AUROC = 0.86). Therefore, these variables were left out of the final logistic model in favor of a more parsimonious model.

The final random-forest applied to the validation set resulted in a sensitivity of 0.71 (95% CI: 0.66-0.76) and a specificity of 0.91 (95% CI: 0.88-0.93). These were not significantly different compared to the training set (sensitivity p = 0.27; specificity p = 0.67). When the logistic model was applied to the validation data set, the calibration curve (Figure 1, open circles and dashed-dot line) suggests that the actual probability was lower than the predicted. However, this was mainly for those with a predicted probability between 0.21 and 0.37 and otherwise the overall curve and confidence intervals suggest a fairly well calibrated model. Nevertheless, the sensitivity and specificity from the logistic model (Table 3), was not significantly different in the validation set as compared to the training set (p = 0.64 and p = 1.0, respectively). Furthermore, the AUROC in the validation set was unchanged at 0.85 (95% CI: 0.82-0.88) and the receiver operating curves for the training and validation data sets in addition to the two data sets combined were similar (Figure 2). Using the combined training and validation data sets, the point on the ROC curve that maximized both the sensitivity and specificity at both 0.80 was a cutoff in the predicted probability from the logistic model of 0.31.

Figure 2
figure 2

Receiver operating characteristic curve. The receiver operating characteristic curve (ROC) for the combined training and validation data set (solid line), training (dashed line), and validation (dash-dot line) data based upon the logistic model presented in Table 2.

For comparison to a logistic model based solely on CD4 at time of HIV RNA measurement, the training and validation data had an AUROC of 0.73 (95% CI: 0.70-0.77) and 0.75 (95% CI: 0.71-0.78), respectively indicating that CD4 by itself had a lower ability to discriminate between those who were and were not above 500 copies/ml. A cutoff in the predicted probability of 0.5 from this logistic model resulted in a sensitivity of 0.49 (95% CI: 0.44-0.55) and specificity of 0.87 (95% CI: 0.84-0.89) in the training data. Similar results were seen in the validation set (sensitivity: 0.55 [95% CI: 0.49-0.60]; specificity: 0.85 [95% CI: 0.82-0.88]). Inclusion of change in CD4 from pre-HAART levels slightly improved these results (AUROC of 0.77 and 0.78 for training and validation data sets, respectively).

Discussion

There are two notable conclusions to this study. First, we expected traditional markers to be the most predictive (e.g. CD4, total lymphocyte counts) of current HIV RNA status. The importance of MCH and change in MCH was unexpected. There is a paucity of information on MCH with treatment and HIV RNA levels. Previous studies have suggested that mean corpuscular volume may change with NRTI use [13, 14]. Another suggested that among treated individuals, those on an indinavir, nelfinavir, or saquinavir regimen had higher MCV and MCH than individuals on non-PI based regimens [15]. Perhaps the most compelling data is a recent study that examined hematological differences among Thai patients with and without antiretroviral therapy stratified by thalassemia (both alpha and beta) status [16]. Focusing on those without thalassemia, individuals treated with antiretrovirals had a higher MCH level (36.13 vs. 28.7 pg; p < 0.001) and higher MCV (107.26 vs 87.1 fL; p < 0.001) [16]. However, HIV RNA levels were not reported. Therefore, whether or not the importance of MCH is due to a correlation with HIV RNA levels or due to antiretrovirals, remains to be answered. While a significant portion of our study population was on a regimen containing a zidovudine, which has been associated with anemia [1720], these antiretrovirals were not likely to have had an effect because treatment would have likely attenuated the association of MCH with an HIV RNA above 500 due to the inverse relationship. Furthermore, including variables indicating whether zidovudine was used did not significantly contribute to the random forest analysis. In the logistic model, the point estimates for MCH and change in MCH remained virtually unchanged (less than 5% of the estimate in Table 2) suggesting that zidovudine and stavudine are unlikely potential confounders of the MCH HIV RNA relationship. Furthermore, inclusion of zidovudine and stavudine in the model did not substantially improve the AUROC (0.86 vs. 0.85). Nevertheless, the relationship between HIV RNA and MCH should be further investigated in longitudinal studies to confirm this relationship.

Second, the results suggest that a binary rule for classifying individuals as either above or below 500 copies/ml is too simplistic. Rather it may need to be multiple markers as a set of complex binary partitions (random-forest) or a linear combination (on a logit scale) of multiple markers. The algorithmic random-forest approach has not been used extensively in HIV/AIDS applications but shows promise as a powerful tool to identify important variables that may classify individuals as above or below a certain HIV RNA threshold. As we have demonstrated, this approach may be used in conjunction with a regression model. For example, building a logistic model using a backwards stepwise selection approach with Akaike's information criteria upon our training data resulted in a model with 30 variables. Additionally it had an AUROC of 0.91 with a sensitivity of 0.75 and specificity of 0.92. However, this model was overly optimistic had an attenuated AUROC, sensitivity and specificity that was 0.84, 0.69, and 0.87, respectively. This demonstrates that our approach resulted in a much simpler model of 6 variables and an AUROC that remained constant at 0.85 in both the training and validation data sets. Thus our analysis did not result in an overly optimistic model (i.e. model was transportable to the validation set).

Our goal was to assess whether routinely collected clinical markers in addition to CD4 could potentially predict individuals who had an HIV RNA above 500 copies/ml after initiation of effective treatment. It is possible that additional information such as adherence data would improve the prediction and discrimination between those who do and do not have an HIV RNA above 500 copies/ml. Recent studies by Bisson [21] and Cambiano [22] have shown that adherence measures may be useful for detecting virologic failure and rebound, respectively. Thus inclusion of good adherence data is likely to improve our prediction model that focused on clinical markers.

We do not know whether these results will generalize to regions which need a method for identifying individuals whose HIV RNA levels remain above 500 copies/ml. Regional conditions may affect hematologic parameters such as MCH in ways (e.g., nutrition, endemic diseases, etc.) that would make the MCH less predictive. In addition, patients in a developed country can afford routine complete blood count testing, which may be less affordable and available in developing countries. Finally, the prevalence of HIV RNA suppression will also contribute to the usefulness of this predictor since the prevalence affects the positive and negative predictive values. However, our approach of utilizing a random forest to screen through variables to identify important predictors for a prediction model may be applied to resource limited settings.

We believe that our approach is more powerful for determining predictors of a suppressed viral load than previous approaches in that it was able to identify important prognostic markers from a large number of variables while providing a parsimonious model without loss (as compared to automatic backwards selection) in ability to discriminate between those who do and do not have an HIV RNA above 500 copies/ml. These methods could be used to determine whether the MCH or other biomarkers can be used in resource-limited settings where the viral load is not routinely available.