Introduction

Cocaine use is common among persons with chronic human immunodeficiency virus (HIV) infection, with prevalence estimates for current or recent use ranging from 5 to 30% [1,2,3,4,5,6], compared with 2% in the US general population [7]. Previous studies have shown that cocaine use accelerated HIV progression [8,9,10,11]. However, the biological mechanism of cocaine’s effect on HIV outcomes remains largely unknown. Some studies have suggested that cocaine use may worsen HIV outcomes due to poor adherence to antiretroviral therapy (ART) among HIV-positive participants [2, 12]. Other studies have demonstrated that cocaine’s adverse effect on HIV outcomes is independent of ART [10, 11, 13,14,15], supporting the hypothesis that cocaine exposure may lead to long-lasting pathophysiological changes in the immune system that worsen HIV outcomes.

DNA methylation (DNAm) is an important mechanism associated with many environmental exposures such as smoking, alcohol, and drug misuse [16,17,18,19,20,21,22,23,24,25,26] and diseases such as cancer, diabetes, and cardiovascular diseases [27,28,29,30,31,32]. Our previous study showed that two DNAm CpG sites in NLRC5 were differentially methylated between HIV-positive and HIV-negative participants in peripheral blood [33]. DNAm may play an important mediation role linking environmental exposure and disease outcomes [34,35,36,37,38,39,40]. Environmental exposure such as substance use or toxicants can directly or indirectly affect DNA methylatransferases, causing global or site-specific DNAm changes that may lead to disease [41]. A recent study reported that DNAm sites in PIM3 (energy metabolism) and ABCG1 (lipid metabolism) mediated the association between prenatal famine exposure and long-term metabolic outcomes [38]. Another study reported the mediation effect of cg05575921 (AHRR) between smoking and the risk of bladder cancer among postmenopausal women [42].

Previous studies have shown that the use of cocaine enhances HIV-1 replication and undermines immune function by dysregulating gene expression on HIV-1 entry coreceptors, enhancing HIV-1 cellular toxicity, and dysregulating interleukins (IL) in the host [43, 44]. Cocaine use increases the release of cytokines in immune cells and alters cytokine profile in HIV-infected individuals [45, 46]. Specifically, cocaine use was positively associated with IL-4 and IL-10 [47], which likely worsens HIV severity and disease progression. Epigenetic mechanisms may play a role in cocaine’s effect on the HIV severity because cocaine exposure has been showed to increase the expression of Methyl CpG binding protein 2 (MeCP2) expression [48] as well as DNMT3A and DNMT3B expressions [49] in the animal brains. In a well-matched, case-control human pilot study, cocaine use alters DNA methylation profile in blood [50]. It is plausible that cocaine use may lead to DNAm changes in immune response genes and gene expression changes in cytokine gene family, which further affects HIV progression. Thus, we hypothesized that DNAm may mediate the effect of cocaine exposure on HIV severity.

In this study, we first validated previous findings by examining cocaine’s adverse effect on HIV severity and mortality. We further conducted mediation analyses to assess the mediation role of DNAm sites (or CpGs) on cocaine’s effect on HIV severity using the Veteran Aging Cohort Study Biomarker Cohort (VACS-BC, n = 1435). To assess how sensitive our results are to the violation of model assumptions and validate our findings using a different approach, we performed the sensitivity analysis [51] and the two-step epigenetic Mendelian randomization (MR) analysis that used genetic variants as instrumental variables to assess the mediation role of DNAm between cocaine use and HIV severity [52]. Our results provide new insights for the role of DNAm on how cocaine affects HIV severity.

Methods

Study samples

VACS is a prospective cohort study of veterans designed to study substance use and HIV-related outcomes with patient surveys, electronic medical records, and biospecimen data [53]. A baseline survey was conducted at enrollment [53]. The follow-up survey of 5 visits occurred at approximately 1-year intervals [53]. Blood samples were collected in the middle of follow-up for a subset of participants in the cohort (VACS-BC) [54]. A total of 1435 HIV-positive participants from the VACS-BC were used to examine cocaine’s effect on mortality and HIV severity, and a subset of participants (n = 875) with DNAm data available were used for mediation analyses (Fig. 1). Demographic and clinical information of baseline samples and a subset of the samples at the time of blood collection are summarized in Table 1.

Fig. 1
figure 1

Timeline of data and blood sample collection for each analysis

Table 1 Sample characteristics in HIV-positive participants

Assessment of cocaine use

The timeline of cocaine use assessment for each analysis is illustrated in Fig. 1. Information on cocaine use status was self-reported through telephone interviews for a total of 5 visits. We defined the “persistent cocaine use” group as self-reported cocaine use across all 5 visits and the “no cocaine use” group as self-reported no cocaine use across all 5 visits. This definition led to a subset of samples with 265 persistent cocaine users and 202 nonusers for the mediation analyses to eliminate the inconsistent response across 5 visits and examine the effect of long-term cocaine exposure on DNAm and HIV severity.

The frequency of cocaine use was also assessed at baseline (Fig. 1). Each participant was asked “how often in the past year have you used cocaine or crack?”, from which cocaine frequency of use was coded as an ordinal variable as follows: 0 = never tried, 1 = no use in the last year, 2 = less than once a month, 3 = 1–3 times a month, 4 = 1–3 times a week, and 5 = 4 or more times a week.

Assessment of mortality and HIV severity

The timeline of HIV severity measurement and survival information for each analysis is shown in Fig. 1. Mortality and survival year information were based on medical records. The VACS index was used as a measure of HIV severity [55,56,57,58] and was obtained at each visit and at the time of blood collection (Fig. 1). The VACS index was calculated by summing preassigned points for age, routinely monitored indicators of HIV disease (CD4 count and HIV-1 RNA), and other general indicators of organ system injury [55]. A high VACS index corresponds to worsened HIV outcomes, and the VACS index is positively associated with increased mortality [59]. The VACS index and DNAm profiling were measured at the same time for the selection of candidate mediator CpGs, and the average VACS index after blood collection was used for mediation analyses (Fig. 1).

DNA methylation profiling and quality control

DNA samples were extracted from blood for a subset of 875 HIV-positive participants (Fig. 1). DNAm was profiled using two different methylation arrays, with 475 samples profiled by the Infinium Human Methylation 450K BeadChip (HM450K, Illumina Inc., CA, USA) and 400 samples later profiled by the Infinium Human Methylation EPIC BeadChip (EPIC, Illumina Inc., CA, USA) [54]. DNA samples were randomly selected for each methylation array regardless of cocaine use status or other clinical demographic variables.

The quality control (QC) for samples measured by each array was conducted separately using the same pipeline as previously described [60] by the R package minfi [61]. After QC, a total of 408,583 CpGs measured by both the HM450k and EPIC array remained for analysis. Six cell-type proportions (CD4+ T cells, CD8+ T cells, NK T cells, B cells, monocytes, and granulocytes) were estimated for each participant using the established method [62]. Negative control probes were designed to capture background signals in Illumina arrays, and negative control principal components (PCs) were extracted by minfi to control for background noise [61]. Batch effect removal was conducted by combat after QC [63].

Genotyping and quality control

The 1177 samples were genotyped using the Illumina HumanOmniExpress Beadchip and imputed for 18,960,156 single nucleotide polymorphisms (SNPs). IMUPTE2 (ver 2.3.2) was used for imputation with the reference of 1000 genome phase 3 [64]. QC was conducted using plink (ver 1.90b21) [65]. SNPs and samples with low call rate less than 0.05 were removed. The Hardy-Weinberg equilibrium test cutoff was set to 1E−06. SNPs with minor allele frequency less than 0.01 were filtered.

Statistical analysis

Cocaine survival analysis among HIV-positive participants at baseline

Survival analysis was conducted using baseline information among 1435 HIV-positive participants with cocaine use frequency (0–5) and other covariates (Fig. 1). Kaplan-Meier analyses on 10-year follow-up among HIV-positive and HIV-negative participants by cocaine use frequency (0–5) at baseline were conducted, and the Kaplan-Meier curves were plotted by using the R package survminer [66]. A test on ordered differences of Kaplan-Meier curves by cocaine use frequency was conducted by survminer [66].

To adjust for confounding factors, a Cox proportional hazards model was used to assess the hazard ratio of baseline cocaine use frequency (0–5) on mortality during the follow-up using the R package survival [67]. The following model was used to calculate the adjusted hazard ratio among HIV-positive participants:

$$ h(t)={h}_0(t)\exp \left({\beta}_1\mathrm{cocaine}\ \mathrm{use}\ \mathrm{frequency}+{\beta}_2\ sex+{\beta}_3\ \mathrm{baseline}\ \mathrm{age}+{\beta}_4\ \mathrm{race}+{\beta}_5\ {\log}_{10}\left(\mathrm{viral}\ \mathrm{load}\right)+{\beta}_6\ CD4\ \mathrm{count}+{\beta}_7\ \mathrm{antiviral}\ \mathrm{medication}\ \mathrm{adherence}\right) $$

Association between cocaine use frequency and HIV severity among HIV-positive participants at baseline

This analysis was conducted using baseline information on cocaine use frequency (0–5), the VACS index, and other covariates (Fig. 1). The following linear regression model was performed to test the association of cocaine use frequency and HIV severity, adjusting for confounders as shown in the following model:

$$ \mathrm{HIV}\ \mathrm{disease}\ \mathrm{severity}={\beta}_1\mathrm{cocaine}\ \mathrm{use}\ \mathrm{frequency}\kern0.5em +{\beta}_2\ \mathrm{sex}+{\beta}_3\ \mathrm{age}+{\beta}_4\ \mathrm{ra} ce+{\beta}_5\ {\log}_{10}\left(\mathrm{viral}\ \mathrm{load}\right)+{\beta}_6\ CD4\ \mathrm{count}+{\beta}_7\ \mathrm{antiviral}\ \mathrm{medication}\ \mathrm{adherence} $$

Selection of candidate CpGs by epigenome-wide association (EWA) of persistent cocaine use and HIV severity

To select candidate CpGs for mediation analysis, we conducted two separate EWAs, one for persistent cocaine use and the other for HIV severity (Fig. 1). Each EWA model adjusted for sex, baseline age, race, smoking, self-reported antiviral medication adherence, white blood cell count, estimated cell-type proportions, and negative control PCs. We used the linear regression model with methylation as dependent variable for EWA as described previously [33, 60, 68]. Since CD4+ T cell count is one component of the VACS index, to avoid overrepresented CpGs associated with CD4+ T cells in the EWA results, we extracted the top 1000 CD4+ T cell-type relevant CpGs based on data from FlowSorted.Blood.450k [69]. The top 2 PCs that in total account for > 80% variation of the 1000 CD4+ T cell CpGs were used as covariates in the VACS index EWA model. CpGs with p < 0.001 in both EWAs for persistent cocaine use and HIV severity were selected as candidate CpGs for mediation analyses. A liberal selection threshold was arbitrarily set to make sure there would be a sufficient number of candidate CpGs for the mediation analysis. To limit confounding by use of other substances, we tested the association of each candidate CpG site with alcohol use, cannabis use, and opioid use based on self-reported data. Alcohol use was assessed by using 3 items of Alcohol Use Diagnosis Identification Test-consumption (AUDIT-C). Cannabis and opioid uses were assessed by asking the same questions as for cocaine use, described earlier.

Single-site mediation analysis and joint mediation analysis

The selected candidate CpGs were assessed as potential mediators of the association between persistent cocaine use and HIV severity among HIV-positive participants (n = 467). We performed single-site mediation analysis using the mediation method as previously described [51] and the R package mediation [70]. Here, we used the average VACS index after DNAm profiling to ensure the temporality of our mediation hypothesis that DNAm measurement preceded the HIV severity measurement. In our mediation model, we adjusted for sex, age, race, smoking, self-reported antiviral medication adherence, white blood cell count, and estimated cell-type proportions as confounding factors.

We used M to represent the candidate CpGs (mediator), X to represent persistent cocaine use status (exposure), Y to represent the average VACS index after blood collection (outcome), and Ci to represent k confounding variables (sex, age, race, smoking, self-report antiviral medication adherence, white blood cell count, estimated CD8 T cells, granulocytes, NK cells, B cells, and monocytes). The mediator model f(M| X, C) examined the association between persistent cocaine use and CpGs:

$$ f\left(M|X,C\right)={\beta}_0\ X+\sum \limits_{i=1}^k{\beta}_i\ {C}_i $$

The outcome model f(Y| X, M, C) examined both the direct effect of persistent cocaine use on VACS index and the mediation effect by CpG:

$$ f\left(Y|X,M,C\right)={\alpha}_0\ X+{\alpha}_1\ M+\sum \limits_{i=1}^k{\alpha}_{i+1}\ {C}_i $$

Thus, the mediation effect, or the average causal mediation effect (ACME) of CpG M, was α1β0, the total effect was α0 + α1β0, and the proportion mediated was α1β0/(α0 + α1β0). The confidence interval and p value were estimated by bootstrapping 1,000,000 iterations.

To assess the robustness of the results if the sequential ignorability assumption was violated, we conducted a sensitivity analysis developed by Imai et al. [51] using the R package mediation [70]. Sequential ignorability consists of two assumptions: (a) conditional on the covariates Ci, the exposure X was independent of all potential values of the outcome Y and mediator M; and (b) the observed mediator M was independent of all potential outcomes Y given the observed exposure X and covariates Ci. The sensitivity parameter ρ was calculated on a grid of 0.05 and the ρ at which ACME = 0 was calculated. For each mediator, sensitivity plots were illustrated to show the estimated ACME and their 95% confidence interval as a function of ρ (Figure S2). If the ρ at which ACME = 0 was close to 0, it indicates that the mediation analysis was sensitive to violation of the sequential ignorability assumption.

The joint mediation analysis of all significant mediator CpGs was conducted as previously described [71]. The mediator model f(Mj| X, C) for multiple mediators Mj (M1, M2, …, Mn) was:

$$ f\left({M}_j|X,C\right)={\beta}_{0j}\ X+\sum \limits_{i=1}^k{\beta}_{ij}\ {C}_i $$

The outcome model f(Y| X, M1, …, Mn, C) was:

$$ f\left(Y|X,{M}_1,\dots, {M}_n,C\right)={\alpha}_0\ X+\sum \limits_{j=1}^n{\alpha}_j{M}_j+\sum \limits_{i=1}^k{\alpha}_{i+n}\ {C}_i $$

The joint mediation effect of CpGs M1, …, Mj is \( {\sum}_{j=1}^n{\alpha}_j{\beta}_{0j} \), the total effect is \( {\alpha}_0+{\sum}_{j=1}^n{\alpha}_j{\beta}_{0j} \), and the proportion mediated is \( {\sum}_{j=1}^n{\alpha}_j{\beta}_{0j}/\left({\alpha}_0+{\sum}_{j=1}^n{\alpha}_j{\beta}_{0j}\right) \). The confidence interval and p value were estimated by bootstrapping 1,000,000 iterations.

Two-step epigenetic Mendelian randomization of cocaine and HIV severity

To evaluate whether the results from the mediation analysis were influenced by reverse causation or unmeasured confounding, we conducted a two-step epigenetic MR analysis [52] (n = 1177) on cocaine use, candidate mediator CpGs, and HIV severity using the inverse-variance weighted (IVW) method by R package MendelianRandomization [72].

In step 1, we conducted a two-sample MR on the effect of cocaine use on candidate CpGs (n = 1177). Based on a recent meta-analysis of a cocaine dependence genome-wide association study (GWAS) [73], 8 SNPs genotyped in our samples pruned at linkage disequilibrium (LD) r2 < 0.1 by the R package LDlinkR [74] were used as instrumental variables (p < 1E−05) (Table S4). We tested the associations between the 8 SNPs and the candidate CpGs, adjusting for age, sex, race, and 5 ancestry PCs using a linear regression model in our sample (n = 1177). Based on these summary statistics, we conducted MR using the IVW method to evaluate the effect of cocaine use on candidate CpGs.

In step 2, we conducted a one-sample MR on the effect of candidate CpGs on HIV severity (n = 1177). Here, cis-methylation quantitative trait loci (meQTLs) were used as instrumental variables. cis-meQTLs were defined by the distance between a candidate CpG and a SNP within 1 Mb. A linear regression analysis was performed to identify cis-meQTLs, adjusted for age, sex, race, and 5 ancestry PCs. For each candidate CpG, cis-meQTLs with p < 0.01 after pruning (LD r2 < 0.1 using 1000 genome African ancestry samples as references [75]) were used as instrumental variables in the MR analysis (Table S4) by the R package LDlinkR [74]. Association between each cis-meQTL and HIV severity was assessed by linear regression, adjusting for age, sex, and 5 ancestry PCs. Similar to the first step, we conducted an MR using the IVW method to evaluate the effect of candidate CpGs on HIV severity.

Results

Cocaine use affects HIV severity and mortality among HIV-positive participants

We found that among HIV-positive participants, higher cocaine use frequency was associated with increased mortality (p = 0.008, Fig. 2a). This difference was not found among HIV-negative participants (p = 0.180, Fig. 2b). Using Cox proportional hazards model, this trend remained significant with a hazard ratio (HR) of 1.10 (95% CI 1.02–1.19, p = 0.011), controlling for sex, baseline age, race, viral load, CD4 count, and antiviral medication adherence (Table 2). A higher frequency of cocaine use at baseline was also significantly associated with a higher VACS index (i.e., higher HIV severity, β = 1.00, p = 0.00027) after adjusting for sex, age, race, viral load, CD4 count, and antiviral medication adherence (Table 2). To account for other drug use, we further adjusted for baseline use of alcohol, cigarette smoking, cannabis, and opioids in the model. Cocaine use frequency remains significantly associated with HIV severity after adjusting for use of other substances (p = 0.049). Our results suggest that cocaine use accelerated HIV progression and increased mortality independent of antiviral medication adherence, which is consistent with previous reports [10, 11, 13,14,15].

Fig. 2
figure 2

Kaplan-Meier curves of cocaine use frequency at baseline among HIV-positive (n = 1435, a) and HIV-negative (n = 795, b) participants. The higher frequency of cocaine use is associated with lower survival probability among HIV-positive participants but not among HIV-negative participants

Table 2 Association between cocaine use frequency and HIV severity and survival analysis among HIV-positive participants (n = 1435)

Selection of candidate DNAm sites for mediation analysis by EWA scan

The EWA scan of persistent cocaine use showed good control of inflation (λ = 1.034, Figure S1). A total of 497 CpGs met our candidate selection threshold (p < 0.001). The top ranked CpG site, cg22917487, was close to the epigenome-wide significance threshold with a p value of 1.69E−07. This CpG site is located in CX3CR1, a gene that encodes a coreceptor for HIV-1 and leads to rapid HIV progression (Table S1).

The EWA scan of the VACS index also showed good control of inflation (λ = 1.116, Figure S1). There were 876 CpGs that reached the candidate selection threshold (p < 0.001) (Table S2). Of note, 6 CpGs reached the epigenome-wide significance threshold (p < 1.2E−07). These CpGs were located near the genes involved in the viral and immune response (PARP9, IFITM1, CD247, IFIT3, VASN, and RUNX1).

We selected candidate CpGs that were both associated with cocaine use and HIV severity (p < 0.001) by two separate EWA scans of 408,583 CpGs for mediation analysis. Fourteen CpGs met both candidate selection thresholds. Additionally, cg22917487 in CX3CR1 showed a strong association with cocaine (p = 1.69E−07) and a marginal association with the VACS index (p = 1.73E−03). Given its biological plausibility, this CpG was also included as a candidate mediator for mediation analysis. Five of the top 10 VACS index EWA CpGs were selected as candidate mediator CpGs (cg08122652, PARP9, p = 2.30E−10; cg03038262, IFITM1, p = 7.65E−09; cg06188083, IFIT3, p = 4.76E−08; cg08818207, TAP1, p = 2.11E−07; cg26312951, MX1, p = 2.50E−07). Overall, a total of 15 CpGs were selected as candidates to assess their potential mediation roles on the association between persistent cocaine use and HIV severity. Notably, the DNAm from each of the 15 CpGs was not associated with cannabis, opioid, or alcohol use (p > 0.05, Table S3).

Mediation analysis of candidate CpGs between persistent cocaine use and HIV severity

We examined the mediation role of DNAm between persistent cocaine use and HIV severity. Twelve out of the 15 candidate CpGs showed significant mediation effects on the association between persistent cocaine use and the VACS index, with p values ranging from 1.00E−06 to 0.003 (Table 4). These results remained significant after Bonferroni correction (p < 0.003). Each CpG mediator explained between 11.3 and 29.5% of persistent cocaine use affecting HIV severity. Notably, the direction of mediation effects among these 12 mediator CpGs were the same. The average direct effects of cocaine on HIV severity were attenuated from 0.329 to 0.231–0.291 after adjusting for each mediator CpG. These 12 CpGs collectively mediated 47.2% of the cocaine’s effects on HIV severity by joint mediation analysis.

We also conducted a sensitivity analysis on these 15 candidate CpGs to assess the robustness of our mediation analysis when the sequential ignorability assumption was violated [51]. The absolute sensitivity parameters at which ACME = 0 of the 12 significant mediator CpGs were relatively higher (|ρ| ≥ 0.15) than 3 nonsignificant CpGs (|ρ| ≤ 0.10) (Table 4, Figure S2). Notably, 6 significant mediator CpGs had |ρ| of 0.30, indicating that these mediation effects were robust even when the assumptions are slightly violated. The sensitivity analysis showed that our mediation results were relatively stable.

Significant mediator CpGs are located near 11 viral and immune response genes: MX1, PARP9, IFIT3, IFITM1, NLRC5, EPSTI1, PLSCR1, TAP2, TAP1, CX3CR1, and RIN2. Five CpGs are located on 5’ gene regulatory regions, 4 CpGs on gene bodies, 2 CpGs on transcription start sites, and 1 CpG on 3’ gene regulatory region. Notably, these 12 CpGs were mostly less methylated in the persistent cocaine use group than in the no cocaine use group (Table 3, Fig. 3). Figure 4 illustrates the mediation effect of cg26312951 (MX1), cg08122652 (PARP9), cg07839457 (NLRC5), and cg22917487 (CX3CR1) on persistent cocaine use affecting HIV severity.

Table 3 The selected candidate CpG sites by epigenome-wide association (EWA) scan on persistent cocaine use (n = 467) and HIV severity (n = 875)
Fig. 3
figure 3

DNA methylation level of the significant CpG mediators by persistent cocaine use status

Fig. 4
figure 4

Significant mediation effect of cg26312951 (MX1), cg08122652(PARP9), cg07839457 (NLRC5), and cg22917487 (CX3CR1) between persistent cocaine use and HIV severity (p < 0.0033)

Two-step epigenetic Mendelian randomization of cocaine and HIV severity

To validate our mediation results while eliminating unmeasured confounding and reverse causation, we used the two-step epigenetic MR method [52] to test our mediation hypotheses (n = 1177): whether cocaine use has a causal effect on candidate CpGs (step 1) and whether candidate CpGs have causal effects on HIV severity (step 2).

In step 1, we conducted the MR analysis based on summary statistics of a meta-analysis of GWAS on cocaine dependence [73]. The effect estimates of the association between 8 SNP instrumental variables and 15 candidate CpG sites were obtained in our sample. Our MR analysis showed that cocaine had significant MR estimates (p < 0.05) on 4 CpGs (cg03753191, EPSTI1; cg06188083, IFIT3; cg26312951, MX1; cg22917487, CX3CR1), as shown in Table 5. Three of these CpGs were also among the top significant mediators in our previous mediation analysis (Table 4).

Table 4 Mediation analyses on candidate CpGs between cocaine use and VACS index (n = 467)

In step 2, we conducted the MR analysis based on cis-meQTLs of the candidate CpGs and their association with HIV severity in our sample. Seven CpGs showed significant MR estimates on HIV severity (Table 5). Of note, 3 significant CpGs in the MR analysis in step 1 were also significant in step 2.

Table 5 Two-step epigenetic Mendelian randomization on cocaine and HIV severity (n = 1177)

Overall, 3 mediator CpGs discovered by the mediation analysis were validated as significant mediators by two-step epigenetic MR analysis (cg03753191, EPSTI1; cg06188083, IFIT3; cg26312951, MX1). Three CpGs without significant mediation effects in the mediation analysis were also found to be nonsignificant in the two-step MR analysis (cg26396492, RIN2; cg22385827, C2orf67; cg08623256).

Discussion

Our findings provide evidence that cocaine use worsens HIV severity and increases mortality among HIV-positive participants and that cocaine’s adverse effects are partially mediated by DNAm in the blood. We identified 12 CpGs that collectively accounted for a total of 47.2% of cocaine affecting HIV severity. Three of the 12 mediator CpGs were further validated by a two-step epigenetic MR approach, which provides supporting evidence that our mediation results were not affected by unmeasured confounders or reverse causation. The sensitivity analysis showed that our mediation analyses are relatively robust to slight violation of assumptions. These 12 mediator CpGs offer new insights into the mechanisms of how cocaine use may affect HIV outcomes by DNAm.

Methodological considerations are important for examining the mediation effect of DNAm. It is possible that our mediation analyses could be undermined by violation of model assumptions, reverse causation, and unmeasured confounding. To address these concerns, we performed the sensitivity analysis and two-step epigenetic MR analysis to further evaluate the mediation effects of the 12 CpG sites. The results from the sensitivity test showed that the 12 mediator CpGs were robust when slight violation of the sequential ignorability assumption is present. The two-step epigenetic MR analysis confirmed 3 of 12 CpG sites as mediators of cocaine affecting HIV severity and was not affected by reverse causation and unmeasured confounding. Of note, the 8 SNPs used in the MR analysis showed marginal association with cocaine use, which limited their utility as instrumental variables and may explain why 9 CpG sites did not show significant mediation effects in two-step epigenetic MR analysis. In addition, in the mediation analysis, the HIV severity was measured after the blood collection for DNAm profiling to assure that the measurement of mediator precedes the measurement of outcome. Our study design intended to match the temporality of exposure, mediator, and outcome and to avoid reverse causation. Of note, we observed a discrepancy on the direction of cocaine use effect on DNA methylation between EWA scan and step 1 MR analysis. This may happen because EWA scan assessed association while MR evaluated the causal effect by removing reverse causality. This difference might also be due to different ways on adjusting for confounding factors in two models. Additionally, to assess whether cocaine use influenced cell-type proportions as reflected by DNA methylation, we conducted a MR on cocaine affecting six cell-type proportions using the same SNP instruments as used in step 1 MR. We found no significant MR estimates across six cell types (p > 0.1) (supplementary table S5), suggesting that cocaine use does not directly affect cell-type proportions in our sample. Overall, we took various measures to make sure our mediation results are valid and robust.

We observed that the sum of individual mediation proportion for 12 mediator CpGs exceeded 100%. An alternative approach is to test the joint mediation effect of all mediators [71]. We found that the 12 mediator CpGs jointly accounted for 47% of the total effect (effect size = 0.329) of cocaine use on HIV severity. This finding indicates that these mediators may affect one another or that there is an interaction effect [71]. For example, several mediator CpG sites are near genes on the response to cytokine pathway (PARP9, PLSCR1, CX3CR1, IFITM1, IFIT3, MX1, and NLRC5). It is possible that these CpGs may share the common biological pathway on mediating the effect of cocaine on HIV severity.

These 12 CpGs are located in or near 11 biologically meaningful genes that were previously reported to be involved in inflammation, HIV-1 viral replication, and other pathways that play critical roles in HIV progression. Specifically, cg06188083 on IFIT3 mediated 28.8% of the variation, and IFIT3 encodes an IFN-induced antiviral protein which acts as an inhibitor of viral processes and viral replication [76]. Another significant mediator CpG site, cg06188083, is located near interferon gene IFITM1. We previously reported the hypomethylation of cg07839457 due to HIV infection, which is located in the promoter region of NLRC5 [33]. This CpG site was also a significant mediator between cocaine and HIV severity in this study. NLRC5 plays an important role in the cytokine response and antiviral immunity through its inhibition of NF-kappa-B activation and negative regulation of type I interferon signaling pathways [77]. The converging evidence on cg07839457 (NLRC5) warrants further investigation of its role in HIV infection and progression. Another interesting CpG site, cg22917487 on CX3CR1, showed both a strong association with persistent cocaine use and a significant mediation effect of cocaine affecting HIV severity. CX3CR1 is involved in leukocyte adhesion and migration and was recently identified as an HIV-1 coreceptor [78]. Some studies also showed that genetic variants on CX3CR1 were associated with HIV susceptibility and rapid HIV progression to AIDS [79]. cg25114611, located in the promoter region of FKBP5, is also biologically plausible, given the implication for chronic cocaine administration upregulating FKBP5 expression in rats [80].

Cocaine use commonly cooccurred with the use of other substances, and this may confound cocaine’s effects on HIV severity and the mediation effects of CpGs between cocaine use and HIV outcomes. However, our results show that the association between cocaine use and HIV severity remained significant after accounting for smoking, alcohol, cannabis, and opioid use. Additionally, our cocaine use EWA model adjusted for smoking as a covariate, and the selected candidate CpGs were not associated with alcohol, marijuana, and opioid use (Table S3).

There are several strengths of this study. First, instead of selecting candidate mediator CpGs based on the literature or hypotheses, we applied an unbiased epigenome-wide screening to select CpGs associated with both cocaine use and HIV severity. Second, to limit self-reporting bias of cocaine use, we leveraged longitudinal data in defining persistent cocaine use and no cocaine use. We included only those participants who consistently reported cocaine use or no cocaine use across all 5 visits for the selection of candidate CpGs and the mediation analyses. Last, we used the average VACS index after blood collection so that DNAm measurements (mediator) preceded HIV severity (outcome) for the mediation analyses.

One limitation of the study is that our sample size for the mediation analyses is small. However, the strict definition of cocaine use helped reduce self-reporting bias and can potentially increase power by comparing extreme groups. In addition, we used a less stringent criterion when selecting candidate CpGs for mediation analysis due to the limited sample size to achieve epigenome-wide significance. To our knowledge, there are no sufficiently sized independent cohorts for replication. Although this approach has also been adopted by previous studies [42, 81], using epigenome-wide significant CpG sites as candidate mediators may show stronger signals in the future study with a larger sample size. Additionally, other unmeasured confounding factors such as socioeconomic status may not be fully addressed in the mediation model. Lastly, our samples consisted of mostly male veterans, which may limit the generalizability of our findings.

Conclusions

We validated previous reports that the use of cocaine worsened HIV severity and increased the risk of all-cause mortality among HIV-positive participants. For the first time, this study found that several biologically meaningful DNAm sites mediated the adverse effect of cocaine use on HIV severity. These results merit future studies to further explore the biological mechanisms revealed by these DNAm sites on how cocaine affects HIV disease outcomes.