Background

As an important measure of self-reported health and well-being, health-related quality of life (QOL) has been widely applied in evaluating treatment effects among different populations[1]. The effectiveness of highly active antiretroviral therapy (HAART) in arresting viral replication and reducing HIV-related morbidity and mortality has been consistently demonstrated [24]; however, its impact on QOL has been unclear.

Published study findings have varied between reporting positive[5, 6] or negative effects of HAART on QOL[7, 8], with documented improvements often of minimal or modest change [912]. A number of these studies have been nested in clinical trials which are typically of short duration and enroll selected study populations[13, 14] resulting in under-representation of women, minorities, and substance users who now comprise an increasingly important demographic component of the HIV epidemic[14, 15], Observational studies offer an opportunity to examine long-term changes in more heterogeneous populations. However, without randomized treatment assignments, these studies may be influenced by unbalanced distributions of disease stage and background covariates that complicate unconfounded comparisons of treatment groups[16]. Although HAART has been available since the introduction of protease inhibitors in 1996, its long-term effect on QOL has rarely been assessed in large prospective cohort studies[17].

The primary objective of this study was to assess the effect of HAART on QOL change by comparing HIV-infected women using HAART with women remaining HAART naïve. To evaluate this question, we utilized data from the Women's Interagency HIV Study (WIHS), one of the largest prospective cohort studies of HIV-infected and at-risk women in the U.S. Acknowledging the challenges encountered in the analysis of observational data, we utilized methods that balanced the distributions of many background covariates through matching based upon a propensity score, the estimated probability of HAART initiation, and effectively handled informative drop-out by using a pattern mixture model.

Methods

Study population

The WIHS is a multicenter prospective study designed to explore the natural and treated history of HIV disease among women since 1994. The WIHS study design and methods are detailed elsewhere[18]. Briefly, a total of 3,768 HIV-seropositive or high risk HIV-negative women aged 13 years or older were recruited from six consortia sites located in Chicago, Los Angeles, San Francisco, Washington D.C., Brooklyn and Bronx in New York City. The study was approved by the local institutional review board at each site and informed consent was obtained for all participants. Research visits are conducted semiannually and include extensive questionnaire-based interviews, specimen collection, physical and obstetric/gynecologic examination. Self-reported quality of life was ascertained at each semiannual visit through 1999 and annually thereafter. This analysis uses data collected through September 2004 (study visit 20). For this study, a matched cohort design was adopted and our analyses were restricted to the HIV-positive participants who enrolled in WIHS during 1994–1995 and had at least one QOL measurement after the matching (baseline) visit as described in detail below.

Study variables

Among many QOL instruments used for HIV-infected populations, the Medical Outcome Study (MOS)-HIV has been one of the most widely used disease specific instruments. In WIHS, a shortened version of MOS-HIV developed by Bozzette et al[19] was adopted to measure QOL. With this instrument, item redundancy is reduced while excellent reliability is maintained and construct validity is comparable to that of MOS-HIV. The shortened form has 21 items representing 9 domains: physical functioning, role functioning, energy/fatigue, social functioning, cognitive functioning, pain, emotional well-being, perceived health index and current health perception. The domain scores are derived by averaging the recoded raw scores for corresponding items of each domain expressed on a 0–100 scale, with higher values for better functioning and well-being according to an established scoring recommendation. In addition, one summary score is generated from six domains (physical functioning, role functioning, energy/fatigue, social functioning, pain and emotional well-being) on the basis of a published algorithm[19]. The summary and nine domain scores are the outcomes of interest in this study.

HAART was defined following the Department of Health and Human Service/Kaiser Panel guidelines[20] and defined as: (a) two or more nucleoside reverse transcriptase inhibitors (NRTIs) in combination with at least one protease inhibitor (PI) or one non-nucleoside reverse transcriptase inhibitor (NNRTI); (b) one NRTI in combination with at least one PI and at least one NNRTI; and (c) an abacavir or tenofovir containing regimen of three or more NRTIs in the absence of both PIs and NNRTIs. Combinations of zidovudine (AZT) and stavudine (d4T) with either a PI or NNRTI were not considered HAART. While HAART use can vary over time, in this analysis we consider trends following first HAART initiation.

On the basis of results from prior studies and data available in WIHS, we selected a number of variables possibly affecting participants' and/or provider's decision to initiate HAART or their QOL. Age was determined at the matching visit. Race/ethnicity was categorized as White non-Hispanic, Black non-Hispanic, Latina/Hispanic and other. Education level at study entry was coded as less than high school, completed high school, and above high school. Annual gross income was dichotomized as greater than $12,000 or not. The number of HIV-related constitutional symptoms, including fever, diarrhea, memory problems, neuropathy symptoms (numbness, tingling or burning), unintentional weight loss, confusion and night sweats, were aggregated for each visit. Standardized three or four color flow cytometry was used to determine total CD4+ cells/mm3 at laboratories concurrently[21] at each visit. Plasma HIV-1 RNA levels were measured using the isothermal nucleic acid sequence based amplification (NASBA/Nuclisens) method (bioMérieux, Boxtel, NL) in laboratories participating in the NIH/NIAID Virology Quality Assurance Laboratory proficiency testing program. The current lower limit of quantification was 80 copies/ml using 1.0 ml sample input. Self-reported depressive symptoms was measured using the 20-item Center for Epidemiological Studies Depression Scale (CES-D)[22], with a total score of 16 or greater used to define the presence of depression. Current employment, any insurance coverage, clinical AIDS diagnosis, and the number of outpatient visits, hospitalizations and medications taken (antiretroviral and non-antiretroviral) since last visit, were also included in our analysis. As calendar time affected the chance of HAART initiation[3, 16], it was also included as a covariate in estimating propensity score.

Statistical analysis

Propensity score matching

Unlike in randomized trials, use of therapies in observational studies is not from random assignment and thus unbalanced distributions of background confounders may bias the estimated exposure effects. To account for this, conventional matching or stratification methods can sometimes be used to create groups of exposed and unexposed individuals with similar measured covariates. Given the large number of background covariates and limited sample size in most observational studies, it is often implausible to control all covariates at one time in this way. As an alternative, propensity score methods have been developed[23] that attempt to match or stratify on a scalar propensity score that reflects an individual's estimated probability of taking a treatment conditional on other variables. By selecting exposed and unexposed individuals matched on the propensity score, we eliminate the associations between HAART initiation and these covariates; thus, these factors will not serve as confounders when we evaluate the effect of HAART. As many factors could affect HAART initiation in WIHS, it is reasonable to use propensity score matching to help eliminate indication bias.

To construct the propensity score of initiating HAART in our analysis, a multiple logistic regression method was used. For the HAART users, we selected the last visit before HAART initiation as the matching visits. For the HAART naïve HIV positive women, we included all of their QOL visits as candidate matching visits. The matching visit data from the HAART exposed group and the candidate matching visit data from the HAART naïve group were pooled together and a propensity score was obtained for each participant at each visit conditional on a number of variables, including age, education, race/ethnicity, income, employment, health insurance, CD4+ cell counts, viral load, history of AIDS diagnosis, clinical depression, and number of symptoms, outpatient visits, hospitalizations and medications, QOL scores and calendar time. Every HAART user was matched to one randomly selected HAART naïve participant at a baseline visit with an equivalent (within 0.1% rounding level) propensity score of HAART initiation. For any HAART unexposed individual selected as a control, the rest of her visits were removed to ensure 1:1 matching. To evaluate the effect of propensity score matching, T tests and chi-square tests were performed to test differences in the distributions of background variables between the exposed and unexposed groups before and after matching.

Pattern mixture model analysis

After matching, the differences of the QOL summary score and the nine domain scores at each visit from their values at the matching baseline visit were used as the study outcomes. To evaluate the effect of HAART, a conventional random effects mixture model could be fit if data were missing only at random, e.g. not related to study outcomes. However, in our analysis, a substantial proportion (33%) of participants, especially those from the HAART naïve group (46%), died during the study follow-up. To obtain a better estimate of changes over time, we utilized a pattern mixture model approach where data were stratified by the pattern of follow-up and distinct models were constructed within each stratum[24] To implement this approach, we grouped the drop-out times into 4 categories (≤ 2, 2.1–4, 4.1–6, and ≥ 6 years) and assumed that the distribution of response would be a weighted mixture over drop-out categories[25]. The overall estimates of variable coefficients and standard errors were obtained across the pattern.

In each model, we included an overall intercept term, a binary indicator for HAART vs. HAART-naïve groups, and a variable reflecting the time (in per 6 months) from the baseline visit, which formed Model 1. Thus, the HAART indicator reflects short-term effects of HAART and the term for time reflects whether this change persists over follow-up. To assess if HAART impacts the overall long-term trend, we fit interaction terms between HAART and time. Furthermore, in order to account for residual confounding and explore possible mediators of how HAART exerted its effect on QOL, a series of models were fit with different combinations of covariates added to previous models: Model 2 added baseline age, ethnicity, and education variables to Model 1, Model 3 added time-varying socioeconomic variables of income, employment, and health insurance to Model 2, Model 4 added time-varying CD4+ cell counts and viral load to Model 3, and Model 5 added time varying symptoms, outpatient visits, hospitalizations, medications, AIDS and depression to Model 4. All statistical analyses were performed using a SAS version 9.1 (SAS Institute, Cary, NC) and Splus 7.0 (Insightful, Seattle, WA).

Results

Table 1 displays the differences in the distributions of baseline covariates between the HAART users and HAART-naïve groups before and after matching. Prior to propensity score matching, the distributions of risk factors affecting HAART initiation were compared between 1,271 HAART exposed (the last visits before matching) and 555 HAART naïve participants (at candidate matching visits). Thirteen out of the 24 background covariates, including education level, race/ethnicity, income, insurance, CD4+ cell counts, viral load, AIDS diagnosis, number of symptoms, outpatient visits and medications, physical functioning, perceived health index and health rating, were significantly different between the groups, which necessitated the matching of these covariates in our study. Using a tolerance of 0.1% in the propensity score, we were able to obtain 458 matched pairs of HAART initiators and HAART naïve women. No statistically significant differences were observed for any of these background covariates after matching (Table 1), which demonstrated a success in matching the covariates as expected. The resulting distributions of propensity scores for the two groups before and after matching are displayed in Figure 1. Before matching, the average propensity scores for HAART using and naïve groups were 0.42 and 0.22 respectively. However, after propensity score matching, the distributions of propensity scores were nearly identical (mean: 0.36; standard deviation: 0.17 for both groups).

Table 1 Study Participant Characteristics Before and After Propensity Score Matching. Numbers indicate mean value unless otherwise noted.
Figure 1
figure 1

Boxplots of QOL summary score between HAART users and HAART naïve groups before and after propensity-score matching. Box widths are proportional to the number of observations in each group.

The 916 matched participants had a mean age of 38.5 years at baseline and contributed a total of 4,292 person visits, with a median follow-up time of 4 years (interquartile range (IQR): 1–6 years). Among these women, about 58% were Black, non-Hispanics, 60% completed high school and 42% had an AIDS history at the matching visits. At baseline, the average CD4+ cell count was approximately 340 cells/mm3, the mean viral load was approximately 10,000 copies/ml and the mean QOL summary score was 62. About 63% of HAART naïve women dropped during the first two years, while the percentage was only 11% for women using HAART. In contrast, only 11% of HAART naïve women were followed for 6 or more years whereas the percentage for the women using HAART was 38%.

To evaluate how HAART affected QOL change, we fit a series of pattern mixture models with different subsets of covariates (Table 2). In each model, HAART use and time from matching visits were included. We first examined whether there were any significant interactions between time and HAART use to assess any long-term HAART effect on QOL score changes. As the interaction terms were not statistically significant in any model (though its direction was positive), it was dropped out from our analyses. Then, we evaluated the overall effect of HAART on changes of QOL scores (the summary score and nine specific QOL domain scores) without time varying intermediate variables (models 1–2) and the direct effects of HAART after adjusting for different possible mediating covariates (models 3–5).

Compared with the HAART naïve group in the bivariate model (Model 1) with HAART use and time as the only covariates, the HAART users had improved QOL scores from the matching visits for almost all domains except for energy/fatigue, with those for role functioning (mean change: 5.08; P = 0.01), social functioning (4.33; P = 0.01), pain (4.53; P = 0.01) and perceived health index (4.25; P < 0.01) reaching a statistically significant level. A second model (Model 2) was fit by adding fixed personal characteristics, including age at baseline, race/ethnicity and education at study enrollment, into the bivariate model. The model estimates for HAART and time changed slightly except for cognitive functioning, which became statistically significant (3.51; P = 0.02). In Model 3, we included time-dependent socioeconomic variables – income, employment and health insurance into the Model 2. No significant change of HAART effect was observed. After further adding markers for disease progression (CD4+ cell counts and HIV viral load) as in the Model 4, the HAART effects remained stable except for health perception (3.43; P = 0.04). In the final model (Model 5), the clinical variables (number of symptoms, outpatient visits, hospitalizations and medications, history of AIDS diagnosis and clinical depression) were added as covariates into the Model 4. Except for cognitive functioning, health perception and perceived health index, adding clinical variables into the models was associated with biggest changes in HAART effect estimates. In addition, the direct HAART effect on summary QOL change became significant (3.25; P = 0.02). Furthermore, though the QOL scores decreased over time for almost all domains in all models, only the decreases of summary QOL, role functioning, emotional well-being and health perception were statistically significant in the final model after controlling many time varying covariates.

As the HIV-infected individuals at different disease stages might have different responses to HAART, we further examined the association of HAART and QOL among women who were AIDS-free at the matching visits (Table 3). Again, all QOL domain scores remained stable or decreased (for health perception) during follow-up, and HAART use did not modify these trends. Compared to the Table 2, fewer QOL domains were significant for short-term HAART effects (social function, pain and health rating) and it was negative for the energy/fatigue domain.

In addition to HAART use and time, a number of the covariates were significantly associated with QOL changes from baseline. Evaluating the results from Model 5 for the summary QOL change, women having less than high school education had slightly higher summary QOL change (3.12; P = 0.02) compared to women with college education at study enrollment. In addition, all clinical variables were significantly associated with summary QOL change. Having one more symptom, outpatient visit, hospitalization or medication was associated with a decrease of 2.17 (P < 0.01), 0.11 (P < 0.02), 1.57 (P < 0.01) or 0.24 (P < 0.01) in summary QOL change respectively. Depression was strongly related to a decline in summary QOL change (-9.78; P < 0.01), while having a history of clinical AIDS was associated with improved QOL change (2.13; P = 0.04). All other demographic, socioeconomic and biological (CD4+ cell counts and HIV viral load) variables were not significantly associated with QOL changes from baseline.

Table 2 Estimates of the Impact of HAART on the Mean Change in QOL Scores from Propensity-Score Matched Pattern Mixture Models.
Table 3 Estimates of the Impact of HAART on the Mean Change in QOL Scores from Propensity-Score Matched Pattern Mixture Models among AIDS-free Women at Matching Visits

Discussion

In our study, we attempted to obtain unbiased estimates of HAART effects on QOL in WIHS by minimizing indication bias and further adjusting for the effect of informative drop-outs using several innovative statistical methods. By balancing the distributions of observed background covariates using propensity score matching, the observational studies come closer to mimicking the effect of randomized clinical trials with equivalent probability of receiving treatment. In addition, application of joint modeling skills like pattern mixture model is one method to handle the informative drop-outs which may bias effect estimates in longitudinal studies.

Our study showed that HAART improved most QOL domains relatively quickly. Most of these domains were stable or showed slight declines over subsequent follow-up, and there was no indication that HAART modified these trends. These results suggest that continued use of HAART did not result in continued improvement in QOL domains. This lack of long-term effect might reflect a balance between reduced HIV-related symptoms and added side effects from HAART. As many time-dependent variables were controlled already, the likely explanation for QOL decrease over time might be due to aging or other uncontrolled factors. It should be noted that the QOL decrease trends were not entirely homogeneous. Examining results from different drop-out patterns revealed that women with the shortest maximum follow-up time had the highest rate of QOL decrease in both groups (data not shown). As early drop-out due to causes like death is usually associated with faster disease progression and quicker deterioration of QOL, appropriate handling of informative drop-outs using a pattern mixture model was justified in our analysis.

By adding different combinations of covariates step by step into the models, we could explore the possible mediators through which HAART renders its effect. In the bivariate models, HAART use had positive overall effects for almost all QOL domains, which is congruent with some clinical trial results with relative short follow-up periods[10, 11]. Because fixed demographic covariates were already controlled at baseline by matching, it is not surprised that adding these variables did not substantially alter the estimated HAART effects. Addition of time varying socioeconomic variables did not change the estimates much either, indicating that these covariates had been stable through the study follow-up. Though HAART could decrease viral load dramatically and increase CD4+ cell counts accordingly, the observed HAART effects did not differ substantially with and without these variables in the models. This phenomenon might be explained by the weak associations between these biological variables and QOL[1, 26]. Finally, with the inclusion of the time varying clinical variables, the estimates of HAART effect experienced the biggest improvement for most QOL domains, providing evidence that these clinical covariates served as mediate factors and had negative impacts on QOL. In addition, the significance of direct HAART effects on most QOL domain scores implies that HAART might have rendered its effect through pathways other than improving the patient's immune status or changing clinical profile. One of the multiple possible explanations for this may be simply a placebo effect resulting from relieved stress for the infected individuals[27] using HAART because the effectiveness of HAART in reducing AIDS-related morbidity and mortality has been demonstrated. Similar to previous studies[6, 7], HAART had different effects for individuals at different disease stages, with short-term improvements of all QOL domains for AIDS patients and deterioration of certain QOL domain for AIDS-free HIV-infected individuals. Thus, it would be advisable to think about the timing of initiating HAART, especially for those individuals at their early stage of HIV disease, to maximize their quality of life.

The propensity score method has been widely applied in observational studies through matching, stratification, or weighting to obtain estimates that may be less biased, more robust and precise[28]. By generating a propensity score from many risk factors affecting HAART initiation, the overall effect of these factors on starting HAART can be represented by this scalar summary score. Through matching with the propensity score, the associations between these risk factors and HAART initiation are blocked and these covariates no longer act as confounders. Noticeably, the distributions of all covariates that were substantially different before matching became identical after matching, which convincingly showed that the matching did what we expected. Furthermore, the HAART effect estimates were relatively stable across models with different combinations of covariates, indicating indirectly that the matching successfully turned many covariates into non-confounders. However, two possible limitations should also be noted. First, we could not find a sufficiently close match for all individuals. In our dataset, the HAART naïve group was smaller (N = 555) than the HAART initiators (N = 1271). In order to have a 1:1 match, we had to restrict to the smaller group, and could only find a match for 83% of these individuals. This is common in propensity score analyses. Second, although the propensity score adjusting method is very effective in balancing the known confounders across groups, omission of important unobserved confounders might still lead to residual confounding in estimating treatment effect. In our study, we included many possible confounders identified from prior studies in estimating the propensity score and examination of other potential variables such as substance abuse and violence history did not show any difference. Thus, the chance of leaving out important confounders was minimized. Of course, omission of unmeasured confounders is a constant threat to the validity of non-interventional studies as well.

In our intent-to-treat analysis, we assumed that individuals who started HAART would remain on HAART throughout the follow-up. Though some participants may have discontinued HAART for a few visits, our data showed that the HAART users had been on HAART for about 80% of their follow-up visits. We did not take into account the adherence to HAART in our analysis though we have controlled some variables, including age and viral load, that contribute to the lower level of adherence to HAART use[29]. In our analysis, we examined the effects of HAART as a whole, rather than the effects of specific HAART regimens on QOL. As HAART regimens vary from individual to individual and from time to time within the same individual in WIHS, it is nearly impossible to assess the effect of every regimen on QOL change given the numerous number of HAART regimens used. In addition, we did not analyze the effect of HAART-related side effects on QOL due to insufficient data. However, as we controlled for clinical variables which are related to both HAART effectiveness and HAART-related side effects, the heterogeneity of HAART regimen effects could be predicted and effect of drug side effects could be partially controlled. In addition, our study subjects are comprised of women at a relatively advanced stages of disease, thus the observed HAART effects may not be representative of the general HIV-infected population.

In WIHS, a shortened version of MOS-HIV form was used to assess QOL change among the participants. The reliability and construct validity of this instrument have been demonstrated and the burden for both investigators and patients was alleviated due to reduced administration time [19]. Though MOS-HIV form has been frequently used in HIV research since the last decade[30], it has relatively limited application among women, minorities and individuals with lower socioeconomic status[31]. As the largest HIV/AIDS prospective cohort of women in the US, the WIHS represents an ethnically diverse, socioeconomically disadvantaged group with complex risk factors whose QOL status has not been well studied. Thus, our analysis will provide important initial information of QOL change for women in the HAART era.

In summary, we evaluated the effects of HAART on QOL among women in the WIHS. HAART did not show any long-term effect on QOL changes, but had short-term direct effects not mediated through clinical variables.