Introduction

Antiretroviral therapy (ART) has lessened but failed to eradicate the incidence of AIDS and resulted in a continuing search for new drug targets. The goal is to elucidate virus–host interactions, identify genes involved in HIV resistance, and restore functionally active lymphocytes in order to minimize pill burden and facilitate remission. Such a strategy requires a deeper understanding of how HIV infection dysregulates metabolic pathways. HIV has a complex life cycle during which it engages multiple host cellular components, including undermining immune functions by targeting immune cells for virus replication and utilizing host transcription factors and enzymes for virus production and subsequent infection. HIV dysregulates host genes resulting in aberrant immune response, disease progression, and opportunistic infections. Recent developments in gene array technology and high-throughput screening have furthered our understanding of virus–host interaction and genome-wide dysregulation during HIV infection (Fig. 1). Given that single proteins do not work in isolation, gene arrays have revolutionized the way we assess host cellular pathways in the context of HIV and other diseases. This technology holds the potential to decipher the role of host genes during HIV infection.

Fig. 1
figure 1

Chronological analysis of developments in gene array studies related to HIV and the outcome/novel concepts that emerged from these studies. Graph represents average year-wise published HIV-related gene array studies, retrieved from database searches, manual literature search, and cross-referencing

Gene Array Technology in a Nutshell

Gene expression arrays are designed to measure the expression levels of large numbers of genes simultaneously. The array chips hold small DNA or oligonucleotide fragments as probes that will be hybridized to the complementary sequences present is the sample of interest. The development of lithographic techniques for imprinting thousands of oligonucleotide signature sequences for different genes, combined with hybridization principles, resulted in miniaturized blotting surfaces known as biochips, genechips, or DNA chips. They are primarily glass or nylon membrane platforms that can support stable imprinting with oligonucleotides representing a signature sequence from different genes. A single assay can rapidly identify thousands of genes in a sample and can differentiate between expression profiles of two or more sets of samples (such as infected and uninfected). Many variations have been developed—oligonucleotide arrays (Affimetrix chip), can-based microarrays (2-color biotin labeled spotted on glass, 33P labeled nylon filters), amplified RNA arrays, and PCR-based arrays (gene arrays/miRNA arrays). Detection methods have become more sensitive and capable of identifying minor changes in gene expression.

Analytical softwares are available to evaluate the voluminous data and develop significant conclusions (Table 1). Although current approaches for data analysis vary considerably, most use a three-tiered approach. First, differentially expressed genes are identified through Student’s t-test and ANOVA or permutation-based significance criteria used in Significance Analysis of Microarrays. Second, false discovery rates are estimated using well-established statistical tests such as Bonferroni, Newman, Kuels, Tukey, Benjamini, and Hochberg. Additional statistical validations are performed using sophisticated multivariate statistics and machine-learning techniques such as Support Vector Machines and Penalized Discriminant Analysis. Third, genes are functionally annotated using public databases such as Gene Ontology, DAVID/Expression Analysis Systematic Explorer, Ingenuity Pathways Analysis, GenMAPP, STRING, Cancer Genome Anatomy Project, and Biocarta. For details of analytical approaches, please refer to other publications [1•, 2, 3]. To validate genes, more sensitive real-time PCR-based assays are being developed into high-throughput PCR array platforms.

Table 1 Methods applicable to gene array data analysis and online resources

Gene Array Studies of HIV-Specific Target Cells: In Vivo Analysis

HIV-1 mainly targets immune system cells expressing CD4 surface receptors. These include monocytes, macrophages, lymphocytes, and dendritic cells. The virus also directly or indirectly affects uninfected bystander cells such as CD8+ T cells, NK cells, and neurons [47]. T-cell defects are thought to be the cause of AIDS pathogenesis [8]. Betts and colleagues [9] have shown that the quality of CD8+ T cells (based on five markers) in long-term non-progressors (LTNPs) is highly active compared with progressors, suggesting that these cells protect the host from developing AIDS. Recent studies have focused on differentially expressed genes in macrophages and CD4+ and CD8+ T cells from HIV-infected individuals. Results showed increased involvement of genes regulating complement activation, actin filaments, proteasome, and proton-transporting ATPase complex. Enriched pathways showed mitochondrial signatures of disease progression and pathways linked to metabolism, energy production, apoptosis, and cell-cycle dysregulation [10•]. Compared to lymphocytes, infected monocytes and macrophages are relatively resistant to apoptotic cell death showing anti-apoptotic gene signatures and serve as a viral reservoir [11]. An adipocytokine NAMPT/visfatin has been identified as a potential contributor to monocyte dysfunction with HIV infection [12•].

Gene Regulation in HIV-1–Infected Cells: In Vitro Analysis

In vitro gene array studies on cells infected with HIV-1 have identified viral and cellular factors modulating gene expression and highlighted changes in different cell functions. Most report gene modulation associated with immune dysregulation, virus replication, and persistence following in vitro or in vivo infection/exposure to HIV-1 virus or viral proteins. Individual viral proteins such as Env, Vpr, Nef, and Tat induce death in different cell types (reviewed in [1316]). Temporal analysis of HIV infection in CEM-GFP CD4+ T-cell line showed subjugation of host transcription machinery by viral mRNAs (up to 30% increase) coupled with upregulation of apoptotic genes during later stages of infection, suggesting that overburden of intracellular HIV proteins initiates apoptosis [17]. This study also found increased expression of proapoptotic genes such as p53-induced Bax and activation of caspase-2, -3, and −9 [17].

Microarray data have also yielded novel information about potential mechanisms of HIV-mediated pathogenesis, including modulation of cholesterol biosynthetic genes in CD4 T cells (relevant to virus replication and infectivity) and modulation of proteasomes and histone deacetylases in chronically infected cell lines (relevant to virus latency) [1•]. HIV-induced deregulation in host gene responses mimics exposure of those cells to heat shock, interferons, or influenza A virus [18]. One in vitro gene expression profile of HIV-resistant human T-cell clones showed 29 differentially expressed genes when compared to susceptible clones. These include different cell surface adhesion glycoproteins receptors (eg, LFA-1, CD3ε), nuclear receptor (Nup214), and transcription factors (eg, STAT, IRF-2, ErgB) that are important in different stages of viral life cycle [19].

Studies of HIV-infected macrophages show upregulated genes related to inflammation and immune response, transcription factors, and cell cycle. Increased inflammatory genes include β2M, CCL2, CCL8, PKR, OAS, MX1, CD16, MCP-1, and CXCL10; transcription factors such as c-MYC, STAT-1, p38, MAPK, ERK, STAT5A, and IFIT-1 are altered [1•, 20, 21]. Binding of the gp120 envelope has been shown to make gene expression profiles conducive for viral replication. Cell cycle–related genes upregulated in HIV-infected monocytes and MDMs include p21, RBBP, MCC, and YWHAE; downregulated genes include UbcH6, UbcH7, Ndr, PP2Aalpha, and BM28 [22]. HIV-1 infection produced similar dysregulation of the majority of genes in a line of monocytes (U937) and of lymphocytes (Hut-78); differential regulation was found in genes such as c-myc, CD71, CD69, and β-chemokines [22]. Further, dysregulated genes in infected U937 cells are involved in divergent functions such as apoptosis (FAS, Fas ligand, PIN, HSP90β, bcl-2, bcl-x), cell-signal transduction (Ras, RGS1, IRF-1, STAT3), receptor-mediated signaling transduction (CD71, CD69, CD3δ), cell cycle and growth (c-myc, cytokines, kinase), transcriptional regulation (EWS, CREB-2), and chemotaxis (β-chemokines, RANTES) [22]. A study of HIV-infected monocyte-derived dendritic cells showed enhanced expression of 20% signal transduction, 14% transcription, 7% cell proliferation and cell cycle, and 7% immune response–related genes. Interferon-stimulated genes (ISGs) including STAT1, MAPK1/ERK2 kinase, and chemokine CXCL3 and SHC1 were differentially upregulated when infected by HIV subtypes C and A/E compared to subtype B [23].

In Vivo Gene Profiling: HIV-Infected Subjects with Distinct Disease Progression or Phenotypes

HIV-1–infected individuals show a remarkable variation in virus replication and disease progression. Subsets of HIV-1 or AIDS phenotypic groups include rapid progressors (RPs), chronic progressors (CPs), viremic non-progressors (VNPs), LTNPs, and elite controllers (ECs) (previously elite suppressors), based on the decreasing occurrence of phenotypic markers (severity, CD4 counts, and viral load) of disease progression. The latter groups effectively suppress HIV infection and maintain normal CD4 counts, and are currently the object of research interest. Gene array studies of ECs––who maintain undetectable viral load without using ART––are aimed at identifying genes for immune-mediated control of HIV or vaccine development [2427]. Both host and viral factors are implicated in HIV-1 disease resistance [28]. It was once thought that the viruses infecting ECs are defective [24]. However, the whole-genome sequencing of six isolates showed that they were functionally intact [24], suggesting that host cellular factors are involved. In contrast to progressors, who have mitochondrial pathways related to apoptosis, non-progressors have MAPK, WNT, and AKT pathways that contribute to cell survival and anti-viral responses [10•]. The majority of in vivo gene array data comes from longitudinal or cross-sectional cohort studies; most are performed on PBMCs partially because not enough patient samples are available. The use of PBMCs may limit interpretations because of dilution and may not reflect the gene profile from individual subpopulations [29]. It has been suggested that researchers limit studies using PBMCs in favor of those within subpopulations [30].

Gene expression profiling using PBMCs isolated from HIV drug-naïve mothers showed a broad spectrum of innate immune response gene-sets, including toll-like receptor, ISGs, and anti-viral RNA response pathways. HIV-specific host genetic profiling is believed to be a useful tool in preventing HIV infection and transmission [31]. Another gene array study on PBMCs from HIV-infected patients found a reduction in IL-7Rα and increased expression of perforin in antigen-experienced mature CD8+ T cells [32]. Comparison of CD3+ T cells from LTNPs and matched disease progressors showed distinct profile differences. The LTNPs expressed genes involved in cytokine-cytokine receptor expression, negative regulation of apoptosis, and regulation of actin cytoskeleton at higher levels, whereas progressors showed an increased expression of viral genes interacting with host cellular partners [33].

In contrast to T cells, circulating monocytes from HIV-infected subjects compared to those from healthy individuals show specific anti-apoptotic gene signatures [11, 12•]. These signatures contain enhanced expression of TNF, CD40/CD40L, ERK/MAPKinase, G-protein signaling-related genes, PPAR, and p53 transcription factor. These genes modulate major monocyte functions including inflammatory response, lipid metabolism, and survival, indicating that HIV-1–resistant transcriptome is present in monocytes (Table 2) [11]. Differential regulation of transcriptome in these cell types can provide insight into the HIV-resistant genes. Transcriptome analyses of ECs recently identified subgroups of ECs––one resembling HIV-infected ART-treated aviremic individuals (EC-ART) and another showing a profile similar to that of HIV-negative controls (EC-NC) [34•]. Microarray study of such subgroups is expected to identify genes that confer HIV resistance. It has yet to be determined if these two subgroups experience differences in HIV-related complications later in treatment. That knowledge may provide specific gene data related to an effective immune reconstitution.

Table 2 Molecular pathways deregulated in in vitro and in vivo studies

Overview of Gene Arrays in HIV Latency

During HAART HIV-1 goes into latency. The viral genome is transcriptionally silenced and the viral load becomes undetectable. HIV-1–infected resting memory T cells are a major latent reservoir for infection and pose a hurdle to eradicating the virus. The use of immune-activating agents such as anti-CD3 antibody or IL-2 to purge the virus from these reservoirs has had limited success and requires identification of other transcription factors as targets to activate cells from latency [35]. A comparative analysis of resting CD4+ T cells from a viremic population and from aviremic and healthy controls (n = 5) showed a set of 370 differentially upregulated genes upon activation. Mainly related to three functional categories, these genes are essential to sustain virus production. They include genes involved in early signal transduction molecules, transcription factors known to modulate HIV-1 transcription (YY1, TFCP2, RUNX1), and heterogeneous nuclear ribonucleoproteins [36]. Other genes upregulated in viremic patients are related to protein/vesicle transport including ER and golgi proteins, vesicle coating associated protein, and ubiquitination-related proteins, suggesting enhanced activity in secretory pathways [36]. Additional gene array studies using ACH-2, a latently infected cell line, indicate that HIV-1 represses genes involved in the glycolytic pathway [37]. This suggests that viremic patients may provide adequate external stimuli, an intracellular environment, and metabolic energy to sustain virus replication. Latency is supported by suppression of various immune-response genes including IL-7, GZM, TLR2, and IFN and transcription factors such as MYB, MYC, and STAT5A [1•]. Genes related to the cell cycle and cytoskeleton were upregulated, while decreased expression of transcription factors and NCoA3, SRC1, p300, RAS, RAF, MAPK, and cytoskeleton-related EIF5 was observed.

The Future: The Use of Gene Arrays in Disease Management

The discovery of biomarkers correlating with the severity of disease will facilitate the diagnosis of AIDS and our understanding of disease development. Although CD4+ T-cell count and viral load remain the gold standard, other immunological correlates including CRP, soluble TLR2, anti-leukocyte antibody, β2 immunoglobulins, neural markers (sphingomyelin), and HLA provide limited use in predicting disease outcome [3843]. There is a gradual shift to using molecular instead of immunological biomarkers to determine disease progression. Microarray studies also provide additional targets for antivirals. By identifying novel drug targets and pathways, these studies hold tremendous promise for better health care.

During the initial years of ART, substantial morbidity occurs with advanced disease in the absence of viral-load suppression. HAART patients also develop metabolic abnormalities such as lipodystrophy (40%–70% of patients), insulin resistance, and lactic acidosis associated with NRTI and protease inhibitors [4446]. In an in vitro model of adipogenesis (3T3-L1 preadipocytes), adipocyte differentiation was inhibited by protease inhibitors (indinavir, saquinavir, and lopinavir) by upregulation of Wnt signaling genes and suppression of genes encoding master adipogenic transcription (C/EBPα, PPARγ), estrogen receptor β, and adipocyte-specific marker (Adiponectin, leptin, Mrap, cd36, S100A8) [47]. Microarray identified the molecular mechanism behind dyslipidemia caused by protease inhibitors, and elucidated the mechanism of enhanced lipidemia associated with enhanced proteasome system [48]. To identify predictors of drug toxicity and facilitate informed decision making, large-scale gene array projects combined with conventional toxicology (toxicogenomics) have been performed at the InnoMed PredTox Consortium [4951]. Another study called STALWART revealed that an increase in CD4 count or IL-2 therapy is not sufficient to reduce opportunistic infections or death by AIDS [52], suggesting that additional immunoprotection strategies are required. Raes and coauthors proposed CCL1 and CYP2C19 as potential biomarkers for the development of abacavir hypersensitivity reaction. Other possibilities are cytoplasmic enzymes CA2, transcription factor NFIB, transmembrane receptor NRP2, and an uncharacterized nuclear factor ANP32E [12•]. However, their study lacked statistical power due to its small sample and needs additional validation [12•].

The Future: The Use of Gene Arrays to Predict Disease Progression

It is important to be able to predict how patients will respond to a given drug or what opportunistic infections they may be susceptible to during infection. Studies have been done to identify predictors of disease progression, clinical outcomes, latency, and the emergence of drug-resistant mutants. Existing diagnostic biomarkers offer limited predictive value. What is needed is the identification of sophisticated molecular multivariate biomarkers, a combination of which would predict disease outcome. A study of 21 HIV patients from Uganda showed that host gene subsets predictive of disease prognosis were mainly related to immune response, T-cell differentiation, apoptosis, and active HIV replication (Table 1), suggesting that active destruction and regeneration of T cells are associated with rapid disease progression [53]. Early detection of these predictors may help physicians monitor overall disease status and drug resistance, change the course of treatment, and avoid worsening of the disease. Soluble CD14 (sCD14), a lipopolysaccharide receptor on monocytes, in plasma is considered predictive of mortality in HIV infection [54, 55]. Although it is a good marker of disease severity, it does not provide information about immune status or disease outcome. Boulware and colleagues [56] reported that increased levels of inflammatory molecules (CRP and IL-6), coagulation (D-dimer), and tissue fibrosis (HA) were directly proportional to disease progression and inversely proportional to plasma sCD14. Patients experiencing immune reconstitution inflammatory syndrome (IRIS) had higher TNF-α and HIV RNA levels, followed by significant increases in CRP, D-dimer, IL-6, IL-8, CXCL10, TNF-α, and IFN-γ levels, compared to those who experienced non-IRIS events [56].

Specific gene expression profile has been shown to be an indicator of AIDS progression in HIV-1–infected individuals. Compared to VNPs, RPs upregulate genes encoding apoptosis-related cysteine peptidases (IL-1β convertase), caspase 1, and lymphocyte-activation gene 3 (LAG3) in CD4/CD8+ T cells [57•]. Caspase 1 is involved in the proteolytic cleavage of inactive IL-1 that produces inflammation and septic shock. Subsets of regulatory T cells that express LAG3 (CD223) are endowed with potent immunosuppressive activity and release immunosuppressive cytokines such as IL-10 and TGF-β1. Upregulation of these genes could potentially lead to immunosuppression and disease progression. In contrast, suppression of cytokine signaling 1 (SOCS1) and eukaryotic translation elongation factor 1 delta (EE1FD) gene levels is inversely proportional to disease progression [57•]. SOCS1 is involved in negative regulation of JAK-STAT cascade, insulin-receptor signaling pathway, and interferon γ–mediated signaling pathway. The SOCS1 gene encodes STAT-induced STAT inhibitor (SSI), which functions downstream of receptors in a negative feedback loop to attenuate cytokine signaling. HIV is known to interfere with SOCS1 and 3, thus driving immune activation [5860].

Recent reports suggest that cumulative viral load better correlates with disease severity and should be further validated [61]. PCR-based gene arrays can be used to assess viral loads, genotypes, and variants in patients’ samples during HAART. When coupled with biomarkers of disease progression, assessment of a patient’s viral load could lead to robust diagnostics and better disease management. Genotyping can be coupled in the same array to find clade-specific disease outcomes and facilitate surveillance. Data integration is the key to producing and disseminating useful information for clinical research.

Led by the nonprofit Critical Path Institute, Predictive Safety Testing Consortium is a successful public-private partnership that focuses on qualifying preclinical safety biomarkers related to drug-induced nephrotoxicity [49, 50]. Similar large-scale collaborations need to be encouraged to find biomarkers for HIV-1 disease progression. Most biomarkers found in such studies require evaluation in larger, long-term trials.

Suggestions for Improving Gene Array Analysis

Many investigators use gene array analyses for biomarker identification, applying widely divergent biological samples, platforms, and analytical approaches to interpret the results. Care must be taken to interpret results in order to reach appropriate conclusions. We offer the following suggestions to improve the quality of microarray analysis.

First, choose an appropriate biological sample. The sample chosen may affect the overall interpretation. For instance, PBMCs show dilutional effect of genes active in monocytes where only highly dysregulated genes in T cells predominate while masking the gene signatures with small changes [29]. PBMCs could be used to identify “highly dysregulated genes,” bypassing unnecessary separation protocols for individual cell types.

Second, take into account that current approaches for imprinting gene chips offer poor reproducibility within different batches and need to be regulated. A particular gene chip may have overrepresented genes for certain pathways that might bias the results and future studies.

Regarding other technical variables, we discourage the use of reagents from different batches and the use of the same samples as those used in gene array for validation in PCR. Use of the same samples will validate the microarray test rather than provide conceptual validation. To avoid false subclasses in analysis, minimize nonspecific biases due to differences in sample processing. For instance, methods of cell isolation such as continuous leukapheresis may induce stress-related genes, so care must be taken in interpreting results [62]. Guidelines for a standard method for imprinting a set of genes need to be implemented to maintain uniformity should gene-array technology become a diagnostic tool.

Our next suggestions have to do with sample size and statistical considerations. It’s important to calculate the dataset sample size required in an analysis to classify microarray data. Establish and standardize statistical stringency as part of routine diagnostic use of gene arrays to minimize analytical variables. Use of false discovery rate has become standard in current analysis. Giri and colleagues [1•] cautioned that removal of FDR in an array of 10,000 genes with alpha of 500 means that ~500 differentially regulated genes can skew the interpretations. Pay particular attention to pathway analysis using differentially regulated genes. Consider results with caution, and take care not to overlook underrepresented pathways, which are easy to miss because of over-representation of genes of a highly studied pathway (apoptosis, for instance) in databases. When that happens, not all the pathways resulting from the analysis may be represented.

Be cautious of false interpretations based on genomic data. For instance, CD4+ T cells from HIV-infected patients show high levels of SOCS mRNA despite protein levels remaining low [58]. Another example: modulation of some genes may be at translation levels and not transcriptional level and can be missed in these arrays. In large-scale transcriptome studies, transitory surges of viral loads in elite controllers or non-progressors may skew the results and interpretations of the dysregulated genes [63]. Therefore, use caution in removing outliers in the estimations, especially for in vivo studies. They might represent a subgroup of a population rather than an experimental anomaly. Whenever possible, do follow-up functional validation studies of selected targets of interest in order to reach valid conclusions in terms of biological relevance.

Conclusions

Gene array technology in HIV-1 research has opened new avenues and encouraged collaboration across different fields including basic science and clinical, statistical, and bioinformatics approaches. Sophisticated high-throughput platforms have been developed and coupled with various databases to share information efficiently across the globe (Fig. 1). This technology offers tremendous potential for studying host–pathogen interactions and discovering new drug targets in HIV-1 and other diseases. Gene array technology is expected to prevent virus replication, boost anti-viral immune response, and dampen deleterious host cell loss. The literature shows that an HIV/AIDS-specific gene array needs to be developed to extract meaningful information that can predict disease outcome and help manage treatment. That array would combine HIV viral load, gene expression, host gene markers, AIDS progression markers, and relevant host-with-host genotype markers.

Current gene profile studies focus on a subset of HIV-infected individuals showing resistance to AIDS progression in order to find the gene signatures that make them resistant. This subset includes LTNPs, ECs, and highly exposed seronegative (HESN) individuals (with no Δ32 mutations). Together these studies indicate that the pathways related to HIV resistance appear to be linked to cell survival and anti-inflammation, while disease progression is related to apoptotic, cell-cycle, and metabolic pathways. Understanding the differences in regulation of these genes will further our understanding of the pathogenicity of HIV and enable the discovery of novel therapeutic approaches for AIDS.

Gene modulation studies are a rapidly developing field. The challenge is how to analyze, share, and unify data from a wide array of platforms. With the expanding knowledge of gene profiles and individualized medicine, we must be prepared for future possibilities. In particular, we need an efficient way to share data not only among researchers but also between patients and their health care providers. Towards this end, we support the use of public databases for knowledge discovery and data sharing.