Human Immunodeficiency Virus (HIV)-1 Integration Sites in Viral Latency

The persistence of human immunodeficiency virus type 1 (HIV-1) in latent reservoirs is a major barrier to HIV cure. Reservoir establishment depends on low viral expression that may be related to provirus integration sites (IS). In vitro, in cell lines and primary T cells, latency is associated with specific IS through reduced viral expression mediated by transcriptional interference by host cellular promoters, reverse orientation, and the presence of specific epigenetic modifiers. In primary T cell models of latency, specific IS are associated with intracellular viral antigen expression that is not directly related to cell activation. In contrast, in patient CD4+ T cells, there is enrichment for IS in genes controlling cell cycle and survival and in some clonally expanded T cell subpopulations. Multiple insertion sites within some specific genes may suggest that integrated HIV can increase the host’s T cell survival.


Introduction
Despite combination antiretroviral therapy (cART) that can effectively control viral replication and normalize immune function, human immunodeficiency virus (HIV) remains a global health issue. Stopping therapy is almost invariably associated with recurrence of HIV viremia because of HIV persistence in latent reservoirs that remain one of the major barriers to HIV cure [1].
The latent viral reservoir exists as an immunologically undetected pool of infection in patients on cART [2][3][4]. The latent reservoir is established early during infection [5,6] and latent proviruses can be found, albeit at different frequency, in all CD4+ T cell subsets including naive (T NA ), stem cell memory (T SCM ), central memory (T CM ), effector memory (T EM ), and terminally differentiated (T TD ) T cells [7][8][9][10] as well as in monocyte and macrophages [11][12][13]. The ability of HIV-1 to generate reservoirs in different cell subsets leads to the formation of a heterogeneous population of infected CD4+ T cells with different life span, different frequency of viral latency, and potentially different requirement for cellular and viral activation [10,14]. It is suggested that the heterogeneity of the reservoir may play a role in stability and persistence of latency [14][15][16].
Although latent HIV is defined as replication-competent but transcriptionally silent infection, only some of the fulllength intact proviruses can be induced following cellular stimulation [17, 18, 19•], and there is a pool of integrated provirus that is apparently noninducible. Activation of these pools of latent HIV-1 by latency reversing agents (LRA) that induce viral expression is one of the cornerstones of strategies to eliminate latency by "kick and kill." Study of the cellular and molecular establishment of the latent reservoir has used in vitro models of latency in cell lines and primary CD4+ T cells and analysis of residual HIV in resting CD4+ T cells from HIV-infected patients on cART. It is well established that HIV-1 integration sites (IS) are not randomly distributed in the genome and differ in preferred genomic sites from other retroviruses [20]. The regulation of integration site selection may be critical for understanding HIV latency and the potency of LRA in eliminating HIV, particularly for those that effect epigenetic modification. In this review, we focus on understanding the effect of HIV-1 integration in the in vitro models of latency and the recent data identifying some IS that may directly determine the persistence of latency either by favoring infected cell survival or by maintaining low viral expression.

In Vitro HIV-1 Integration Occurs Predominantly Within Transcriptional Units Outside Promoter Regions
Initial studies of genomic distribution of HIV-1 provirus in T cell lines found IS in gene-rich regions [21][22][23] (Table 1). Integration of provirus at sites of active transcription promotes viral expression, while integration in the site with low level of transcription can delay proviral transcription and promote latency [23]. One of the major elements regulating transcription is host cell chromatin. The fundamental structure of chromatin is the nucleosome, which contains repeating histone-containing units including H2A, H2B, H3, and H4 [24]. Further analysis showed that the site of HIV-1 integration can remodel nucleosome via changes in histone acetylation at the site of integration or via recruitment of specific chromatin remodeling like histone deacetylase inhibitors (HDACi) increasing transcription activity [24]. These suggested a correlation between proviral IS and epigenetic modifications of chromatin environment at the site of integration. This notion was further supported by studies on molecular mechanism of IS in host genome [25,26]. It was shown that depletion of chromatin reassembly factors (CRFs) like Spt6, Chd1, FACT as well as the histone chaperons ASF1a and HIRA at the site of integration resulted in chromatin relaxation promoting proviral expression [27]. This highlighted that the presence of these CRFs influences the accessibility of the transcription factors to HIV promoter, blocking the elongation process leading to formation of latency through transcriptional interferences [26,27].

HIV-Specific Determinants of Integration Sites
Comparative studies of IS in retroviral infection with murine leukemia virus (MLV), HIV, and avian sarcoma-leukosis virus (ASLV) showed that the IS selection differs among retroviruses [28,29]. HIV-1 predominately targets transcriptional unit (TU) of active genes, while MLVand ASLV favor transcription start sites (TSS), particularly regions in promoters and upstream of start codons that are uncommon IS for HIV [20,30]. The differences between IS selections among retroviruses (ASLV, MLV, and HIV) have been attributed to the interaction between viral-associated integrase enzyme and host cellular factor LEDGF/ p75 [31]. The cellular protein lens epithelium-derived growth factor (LEDGF) and coactivator protein p75 have shown to form a stable tetramer structure with HIV-1 viral integrase enzyme [31]. This structure forms at regions rich in 5′ GT (A/T) AC 3′ [32], enhancing strand transfer activity of viral integrase. In addition, depletion of LEDGF/p75 led to loss of the preferential integration of HIV-1 into TU of the host genome [33] and replacing LEDGF/p75 chromatin interaction domain by fusion proteins resulted in redirecting the HIV-1 IS in G/C-rich regions [32,34,35]. The role of host cellular factor in viral integration was further supported by substitution of LEDGF/p75 with hepatoma-derived growth factor-related protein 2 (HRP2) [35, 36••], where the conserved pro-trp-trp-pro (PWWP) domain common in both proteins could bind to modified histone leading the proviral integration into active TU [37•]. In contrast, in MLV, the association between integrase enzyme and LEDGF/p75 was weaker, which might explain the different preferences of integration site selection in these two viruses [28]. Subsequently, the interaction of viral integrase enzyme with several other cellular proteins including barrier-to-autointegration factor (BAF) [38,39] and high-mobility-group family 1 (HMG 1 Y) [40,41] was also described. These two cellular proteins are small DNA binding proteins, which are able to modify DNA structure at the site of integration and increase the probability of integration. Collectively, these data suggested interaction between host cellular proteins and HIV-1 integrase enzyme resulting in chromatin remodeling at the site of integration and selectively favoring proviral integration into actively transcribed genes [42,43]. Although preferential integration of HIV-1 provirus in transcriptionally active regions was supported in subsequent studies [20,25,44], the transcriptionally silent, but replication-competent proviruses were also reported in regions with low level of transcription including gene dessert [45] and alphoid regions in heterochromatin [46].

HIV Integration Site Can Determine Expression In Vitro
In the Jurkat, T cell line, there is a relationship between site of integration with viral expression and viral latency. The integration of provirus in actively transcribed gene resulted in an efficient transcription of viral proteins, while integration in sites with low transcriptional activity would result in delay in viral expression or latency [24]. The critical factors appear to be the position and the orientation of the provirus in relationship to actively transcribed genes as well as the methylation of CpG [19•, 27, 42].

Orientation of Provirus in IS
The effect of orientation of the provirus in promoting latency was supported by subsequent studies, where HIV-1 provirus integrated into active genes, orientation relative to the host Patients on cART rCD4 T cells 74 TU IS in actively transcribed genes [52] GDR gene dense region, NP nuclear periphery, TAHM transcriptional associated histone modification, GD gene dessert, TSS transcription start site, TU transcriptional unit, NR not reported, rCD4 resting CD4 cells, HAG highly active genes, AG active gene, PBMC peripheral blood mononuclear cells, MLV murine leukemia virus, ASLV avian sarcoma-leukosis virus, Ac acute, Ch chronic, lat latent, pat patient genes could increase the HIV-1 transcription by >10 fold, while integration at opposite orientation reduced the HIV-1 gene expression by 4-folds [47]. These observations highlighted that low level of viral expression is not because of lack of transcriptional activity at the site of proviral integration and it might be due to the presence of factors blocking the transcription at the site of integration by transcriptional interferences (TI) [19•, 47-49].

Transcriptional Interference
TI occurs when transcription from the host promoter prevents the transcription of the provirus at the 5′ long terminal repeat (LTR) region downstream of the transcribed gene [50]. The effect of TI has been shown previously in ALV virus, when active transcription from the 5′ LTR was able to block transcription from the 3′ LTR region [51]. Therefore, in TI, there is an ongoing transcription activity from the upstream promoter [19•, 22, 52-54]. TI can inhibit the expression of viral proviruses and promote latency through promoter competition caused by neighboring genes. It has shown that when HIV-1 provirus integrated in a face to face position with the host gene promoter (i.e., in a convergent position), the assembly of the initiation complex necessary for transcription is blocked by promoter occlusion resulting in silencing of the HIV-1 promoter [50]. It is clear that the virus integrated into the gene in the reverse orientation is transcribed at a low level via transcriptional interferences [19•, 27]. Therefore, it is likely that the relationship of the IS and the transcriptional activity of the host gene can shift the balance from proviral expression toward latency.

HIV Integration Site Analysis in Primary T Cells
HIV-1 latency can be generated through postactivation and preactivation pathways. Postactivation latency refers to direct infection of activated CD4 T cell population, where viral production results in depletion of infected cells, but some revert to a resting stage containing HIV-1-integrated provirus [55]. The integrated proviruses generated through this path are commonly replication competent; however, due to some epigenetic modifications at the site of integration, they become transcriptionally silent [56]. Infection of resting cells may result in preintegration or postintegration latency. Preintegration latency occurs after direct infection of resting CD4 T cells and incomplete reverse transcription or a block at steps prior to integration [57]. This transient infection may be rescued by TcR ligation or mitogen activation resulting in the viral antigen expression and virus production [58,59]. If viral expression is blocked after integration, a postintegration form of latency, we have called preactivation latency, can be generated. This occurs in vitro by direct infection of chemokine-treated resting CD4 T cells [60,61], by spinoculation or by coculture with dendritic cells [62] or endothelial cells [63]. , of which 92.9 % were found in the transcriptionactive unit. These proviruses had an intact genome, with unmethylated promoter, and the possible potential to express infectious virus following cellular activation. How replication competent, but transcriptionally silent provirus can persist in patients on therapy is still unclear. Although hemostatic proliferation of latently infected cells has been proposed as a crucial factor for persistence of the viral reservoir [16], two recent studies have highlighted the role of IS in wellsuppressed patients.

Enrichment of IS in Genes Regulating Cell Growth and Specific Cancer-Associated Genes
In the study by Maldarelli et al., a total of 2410 IS isolated from five patients were analyzed [68••], of which 43 % (1022) of IS were associated with more than one host gene. This showed that a large population of infected cells was generated from expansion of a single clone. In one out of five patients tested in this study, 20 % of integration events (i.e., 62 out of 317) were found in the same site in HORMAD2 gene (64 integrants (INT)). This analysis showed that almost 58 % of IS in patients were derived from a single HIV-1-infected cell. Subsequent analysis showed that approximately 70 % (21 of 29) of genes targeted more than once were known to be directly involved in cell growth (STATB5 (18 INT)) or mitosis (MAP4 (7 INT)) ( Table 2). These data suggested clonal expansion of HIV-1-infected cells as a potential mechanism for persistence of HIV-1 reservoir in well-suppressed patients on cART. In a separate study by Wagner et al., a total of 534 integrated proviruses isolated from three patients were analyzed and the proviral IS was found with the same location in multiple cells within each patient. Identical IS were derived from ≥2 individual cells, thus highlighting the proliferation of infected cells in these patients [69••]. Subsequent analysis on the proviral site of integration in this population revealed preference of integration in cancer-associated genes. Among 1332 unique genes examined for overrepresentation of IS, 12.5 % (36/288) were cancer-associated genes. In a comparative analysis between IS derived from patient samples with those from acutely infected Jurkat T cell line, a similar pattern of integration was observed in cancer-related genes (12.70 vs 11.4 %, respectively). However, IS detected in proliferating cells derived from patients were enriched in cancer-associated genes compared to IS detected in Jurkat cells (i.e., 15.97 vs 11.14 %, respectively). The genes like CREBBP, STAT5B, BACH2, C2CD3, and MKL2 (Table 2) were found in two or three participants tested in this study, suggesting preferential integration of HIV-1 provirus in these regions. These observations raised the possibility of the effect of integration site on altering the function of these genes favoring long-term survival.

Viral Gene Expression and Latent Infection In Vivo
In the early stages of HIV-1 infection, there is a high turnover of both virus and infected cells [70] leading to activation of adaptive immune system and subsequent clearance of the infected cells [4]. Differential clearance of virally infected cells will lead to accumulation of cells that do not express viral proteins and evade immunological recognition or have no cytopathic loss. Both pathways of loss will depend on defective provirus or HIV latency. The in vitro models of latency have characterized factors affecting proviral expression including orientation of the provirus [19•] with neighboring host genes as well as TI [47] and epigenetic modifiers. In the study by Maldarelli et al., multiple proviruses were found in MKL2 genes in a patient on suppressive antiviral therapy. The proviruses shared the same transcriptional orientation and located upstream of the start codon of the MKL2 genes. There were also 15 independent IS found in the BACH2 gene. The integrants shared the same orientation with the BACH2 gene and located in introns 4 and 5 of the gene [68••]. A comparative analysis of IS detected in the patient samples with acutely infected HeLa cells and human CD34+-infected cells (in vitro) showed higher frequencies of IS in both MKL2 (7 vs 0.03 % in HeLa and CD34+ cells) and BACH2 (i.e., 1.5 vs 0.002 % in HeLa and 0.01 % in CD34+ cells) in patient samples. The IS were found to be more widely spread and had multiple orientation patterns in acutely infected primary cells and cell line compared to patient samples [68••]. In the study by Wagner et al., multiple HIV-1 proviral DNA were found in the BACH2 gene in two out of three participants tested in the study [69••]. Comparative analysis with previously reported reoccurrence of HIV-1 IS in BACH2 genes in vivo [67••, 71•, 72] showed that all integration events have occurred in intron 5 of the gene, upstream of the start codon and in the same orientation with the BACH2 gene. The integration patterns were distinct from the patterns reported in clonally infected cell line (Jurkat) where they found only  Integration site analysis showed that the provirus was integrated within the intragenic region, in opposite orientation to the SMC5 gene. The same mutation was observed in an additional four patients. The expression of full viral envelope proteins was detected from all defective proviral RNA, while there was partial, low, or no expression of Gag-Pol proteins compared to wild-type provirus. It is unclear whether the expression of viral proteins from a defective provirus is a result of readthrough transcripts [19•, 47] or epigenetic modifications associated with the site of integration [49]. However, detection of low level of viral proteins from latently infected cells in patients on long-term therapy may contribute to ongoing immune activation and production of cytokines and chemokine and promote homeostatic proliferation of infected cells and maintenance of reservoir [16].
In order to understand the mechanisms for maintaining the latent reservoir, further work is needed to determine whether the latent proviruses in well-suppressed HIV-infected patients are defective proviruses that have selectively accumulated by survival of a particular clone or represent the large pool of unactivatable virus that represents IS within genomic sites that promote temporary latent state.

Integration Sites in Specific Subsets of CD4 T Cells
Several studies have shown that clonal expansion of infected T cells in patients on long-term therapy is selective and driven by homeostatic proliferation. In the study of 31 aviremic patients, >50 % of cells harboring replication-competent proviruses were detected in memory T cell subsets [16]. Proviruses were mainly found in T CM and transitional memory T cells (T TM ) but not in T EM . Both T cell populations (i.e., T CM and T TM ) have been shown to have low level of proliferation due to immune activation caused by long-term therapy. These cells also have a long half-life which would help maintain viral reservoir in these subsets [16].
In addition, replication-competent viruses have been detected in CD4 T memory stem cell-like properties (T SCM ) cells isolated from three patients on continuous treatment with HAART [8]. Much higher levels of HIV-1 DNA occur in CD4 T SCM cells compared to CD4 T CM , CD4 T EM , and CD4 T TD isolated from patients. The contribution of CD4T SCM to the total size of the reservoir was most pronounced in patients with a small reservoir size. This data suggested HIV-1 in infected CD4 T SCM can persist as a stable viral reservoir. Considering CD4 T SCM subsets are permissive to the viral infection, have a low rate of apoptosis, and can survive for a long period [73], it raises the question of whether there is also selective targeting of this population or differences in IS.
In other T cell populations, long-term persistence may be associated with defective provirus. Such expansion of defective DNA in CD4 T EM subsets was found in a well-suppressed infected individual [71•]. Longitudinal analysis of the integration sites found the same site in 17 independent events, and identical viral sequences carrying W42Stop mutation suggested expansion of this particular subset over 15 years. Effector memory T cells are terminally differentiated T cells with the estimation half-life of 3 to 6 days [74,75]. Thus, survival of the CD4 T EM in an HIV-1-infected individual over 15 years strongly suggested that the persistence of proviruses in this subset is selected by the lack of viral antigen expression. It will be important now to expand studies of IS in T cell subpopulations isolated from patients on long-term therapy to determine how viral expression may be controlled by particular epigenetic markers and gene-specific control mechanisms in the specific subpopulations that contribute most to the size of the latent reservoir.

Conclusion
Although correlation of IS with expression in T cell lines has provided some evidence of specific characteristics of the genomic regions that downregulate expression and favor latency and has provided tools such as latently infected cell lines such as J-LAT [59] that can be used in screening for LRA, a clear demonstration of similar associations in primary cells has been more difficult. A correlation between viral antigen expression and specific genomic markers has been shown within different in vitro models of HIV latency, but the correlations have not translated to the general features of IS that extend across multiple models of latency in primary CD4+ T cells. Further, correlation of epigenetic markers with antigen expression across different states of cellular activation limits the usefulness of IS analysis as a predictor for efficacy or selection of LRA in reversing latency. The expansion of specific integration sites in patient CD4+ T cells has provided novel insights into the reservoir of latent HIV in vivo. This expansion is the result of two mechanisms. First the clonal expansion of cells infected with virus identical at both insertion sites and by viral sequence. This follows the observations made with viral sequence analysis alone and raises the question if this IS resides in a site favoring latency or if the virus is noninducible or defective. The second possibility raised by the recent findings of a clear enrichment for near identical or multiple IS in specific sites of a gene is that integration in specific genes provides a survival signal for cells carrying an HIV provirus in specific genes in those cells. That the enriched genes are associated with cell survival and cell cycle raises the intriguing possibility that HIV IS may alter the function of those genes and provide cell survival signals. It will be important now to determine if this is a direct genomic effect or if expression of specific HIV gene products is involved. Are readthrough HIV transcripts involved in controlling gene expression or is this an "oncogene" effect?