Introduction

HIV-1 displays a tremendous amount of genetic diversity. The binding of the HIV-1 to host cells is mediated by envelope glycoprotein. When the HIV-1 envelope protein binds to its primary receptor CD4, it undergoes conformational changes and it then binds to one of the coreceptors (chemokine receptor CCR5, CXCR4 or others) via its V3 loop. This tri-molecular interaction leads to the viral membrane fusion [1]. HIV-1 envelope is composed of relatively conserved (C1 to C5) and variable regions (V1 to V5). The V3 region elicits neutralizing antibodies and also govern co-receptor usage [1, 2]. Replacements in the V3 region with basic amino acids are associated with CXCR4 usage [2, 3]. Subtypes A and C usually contain a highly conserved GPGQ amino acid motif, while GPGR is the predominant motif in the V3 loop of subtype B envelopes [4, 5]. Mutational patterns in the V3 loop region are likely to be of clinical significance as they can influence their susceptibility to known CCR5 inhibitors. Although all HIV-1 genetic subtypes originated in Africa, it is not fully understood how certain subtypes dominate different regions of the world. For e.g. subtype B predominates in US and UK but subtype C is predominant in India, some parts of Asia and Africa [6].

It is fairly well established that HIV-1 that uses CCR5 chemokine receptor (R5-tropic) is transmitted preferentially than the ones that use CXCR4 chemokine receptor [7]. Individuals with a 32 bp deletion in the CCR5 open reading frame (ORF) are largely protected against HIV-1 infection [79]. Approximately 50% of HIV-1 subtype B infected individuals show HIV-1 co-receptor switch from CCR5 to CXCR4 which is associated with rapid progression of HIV/AIDS [10]. This is observed mainly in US and UK where subtype B predominates. However, in India, where subtype C predominates, the coreceptor switch has not been observed [11]. Replacements of charged amino acids within the V3 region are known to alter the co-receptor usage [2, 3, 12]. Genetic variations in the subtype C HIV-1 envelope sequences have recently been reported from Southern India with some strains exhibiting multiple co-receptor usage, including CXCR4 chemokine receptor, present predominantly on T-helper lymphocytes [13, 14]. It is noteworthy that we recently reported novel B/C LTR [15] and Vpr B/C/D sequences from North India [16].

Given the large size of India, and with increasing global travel, it is likely that subtypes other than B may also co-circulate, creating an ideal situation for the formation of recombinants. With this in mind, we genetically characterized the HIV-1 envelope sequences from HIV-1 infected individuals from Northern India and report the presence of HIV-1 CRF02_AG for the first time.

Methods

Genomic DNA isolation and Polymerase chain reaction

Genomic DNA was isolated from fresh peripheral blood collected in EDTA using a kit from Qiagen (QIAamp Blood Minikit) as described before by us [8, 9]. All requisite ethical clearances were obtained before initiating this study. All the polymerase chain reactions (PCRs) were performed with high fidelity Taq DNA polymerase (Ex-Taq, Takara, Japan) using the following primers:

Forward primer: 5'-ATGGGATCAAAGCCTAAAGCCATGTG

Reverse primer: 5'-AGTGCTTCCTGCTGCTCCCAAGAACCCAAG

Approximately 1.25 Kb DNA fragment corresponding to V1 to V5 region was amplified initially. Thereafter, 700 bp fragment (V3 to V5) was amplified using two internal sets of primers with following sequences:

Forward primer: CTGTTAAATGGCAGTCTAGC

Reverse primer: CACTTCTCCAATTGTCCCTCA

The cycling conditions for amplifying both the fragments were: 35 cycles at 98°C for 15 sec, 55°C for 30 sec and 72°C for 1 min with a final extension at 72°C for 10 min. PCR-amplified DNA was cloned into pGem-T expression vector (Promega Biotech. WI, USA) and sequenced in both directions using T7 and SP6-specific primers. The sequence from one representative clone from each sample was used to carry out phylogenetic analysis and sequence comparisons. The final concentration of MgCl2 was 20 mM for both the PCRs. Mother and child samples were processed separately to avoid cross contamination.

Patient population and genetic analysis

We carried out genetic analysis of 13 HIV-1 envelope sequences from Northern India. Nine unrelated and 2 mother-child pairs (Pair 1, D & E 57 and Pair 2, D & E 58) were selected randomly from two locations (one from GTB Hospital, Delhi - Samples ND1 to 5, all from commercial sex workers-CSW) and the rest were from Punjab/Haryana region. Primers were designed to carry out nested PCR as described earlier. It is noteworthy that we were unable to amplify envelope sequences from several samples which may be due to extreme genetic variability and therefore difficult to draw conclusions about the frequency of any genetic subtype from this study. Alternatively, since most of the HIV-1 infected individuals were on antiretrovirals, the amounts of proviral DNA may have been too small to amplify. Sequences were compared with reference strains (figure 1) (Los Almos-http://www.hiv.lanl.gov). At least 4 independent clones were analyzed from each sample and only one representative clone from each sample was genetically analyzed. Multiple sequence analysis was performed in ClustalW 1.8.3 obtained from DNA data bank of Japan (DDBJ) website http://clustalw.ddbj.nig.ac.jp/top-e.html. The phylogenetic analysis was carried out using MEGAA 4.1 (beta) software. Genotyping was carried out using viral genotyping tools located at NCBI http://www.ncbi.nlm.nih.gov/projects/genotyping/formpage.cgi, REGA subtyping tool ver 2.0 http://www.bioafrica.net/subtypetool/html and Recombination Identification Program (RIP) 3.0. http://www.hiv.lanl.gov/content/sequence/RIP/RIP.html. Potential N-glycosylation sites were calculated using N-GlycoSite program http://www.hiv.lanl.gov/content/sequence/GLYCOSITE/glycosite.html.

Figure 1
figure 1

HIV-1 envelope sequence comparison and coreceptor usage. HIV-1 envelope gene was amplified from infected individuals and subjected to sequencing as described in the text. Only the V3 loop region sequences with short flanking constant regions are shown with their accession numbers, their subtype assignment and possible co-receptor usage. Dots in the sequence indicate identity with consensus C, B, 02_AG and A sequences; asterisk indicates identical amino acids; single dot at the bottom of four groups of samples represents semi-conserved substitution of amino acids and double dots represent conserved substitution. Subtypes were determined using Viral Genotyping Tool, REGA Subtyping Tool and RIP 3.0 with maximum blast identity.

Results and discussion

All of the HIV-1 infected individuals were infected through heterosexual route (except mother-child pair) and their CD4 count varied from 120 - 150 (sample A81 & 82) and between 400-500 (D57 and D58). Most of them were under 1st line of antiretroviral treatment. The GPGQ motif present in the middle of the V3 loop was conserved among all subtype C and CRF_02 AG strains. Remarkably 5 of subtype C samples showed conservation of A residue just downstream of GPGQ motif (not observed in consensus C) and 4 of them showed H to Y change just prior to the second cysteine of the V3 region (figure 1). The subtype B sample (VT5) possessed the GPGR amino acid motif at the crown of the V3 loop as expected. It is noteworthy that we recently reported novel mosaic B/C HIV-1 LTR and B/C/D recombinant Vpr structures from the same region of India (Punjab/Haryana region) [15, 16]. Group M subtype reference sequences along with outlier sequences were downloaded from Los Almos HIV data base. The sequences were subjected to various genetic subtyping tools (Phylogenetic Analysis, RIP 3.0, Viral Genotyping Tools and Rega Subtyping). This analysis indicated that 6 of 13 were related to subtype C, one B and the rest 6 showed resemblance with CRF02_AG strain (figure 2). Successful mother-to-child transmission was detected in both the pairs (Bootstrap value 99 in pair 1 and 71 in Pair 2) as judged by high bootstrap value (figure 2). It is noteworthy that no changes in the V3 sequences were observed in both the mother-infant pairs. Maximum intra-patient proviral diversity was observed in two samples (A81 and C5) (manuscript under preparation). It was reported earlier that subtype determination based on phylogenetic analysis should also be confirmed by using other tools or signature sequences present in V3 region [17]. Representative subtype sequences identified by RIP 3.0 program are given (additional file 1). Each curve is a comparison between the envelope regions being analyzed (query- as indicated at the top of each square) and multiple reference sequences downloaded from the data bank. Using this kind of analysis, HXB2 (panel A) and an isolate with an accession number FJ769836 (panel B), were identified as subtype B; isolate FJ968673 as CRF_02AG (panel C) and isolate with an accession number FJ968672 as subtype C.

Figure 2
figure 2

Phylogenetic analysis of the HIV-1 envelope sequences from North India. All the reference sequences from M-group & outlier group were retrieved from Los Almos Data Base and used for constructing neighbor joining phylogenetic tree. Indian CRF02_AG strains are represented as 'Black Circles'; 'Black Square' for subtype C; and 'Black Diamond' as subtype B. The evolutionary history was created using Neighbor-Joining Method in MEGA4. Similar evolutionary pattern was detected when Maximum Likelihood and UPGMA methods were used (data not shown). Mother-child pairs were shown by filled and empty circles by numbers (1 and 2). Empty circles denote maternal envelope sequences while filled circles denote infant envelope sequences.

The most remarkable finding was the predominance of CRF02_AG strain among the unrelated commercial sex workers (CSWs) from Delhi (Capital of India) region. All the isolates from Punjab/Haryana region showed relatedness with consensus C. This recombinant form is predominantly found in Africa (Cote Divoire, Mali, Senegal, Ghana and Cameroon etc.) followed by Korea, Spain and France. The potential glycosylation sites present in V3 to V5 region varied from 7 (A81) to 15 (ND5 (data not shown). This is important because in some instances hypo-glycosylated forms of envelope have been associated with better transmission and in their ability to interact with neutralizing antibody [18].

It was remarkable that sample A81 clearly showed CXCR4 coreceptor usage by both the programs (WebPSSM and Geno2Pheno) designed to predict HIV-1 coreceptor usage. This is important because earlier studies with Indian subtype C envelope showed exclusive use of CCR5 co-receptor [11]. It is important to note that Samples A82 and C5 showed discrepancy in their predicted coreceptor usage and this is because the two programs use different parameters [19, 20].

Successful transmission of virus (judged by high bootstrap values) was observed in both the mother-child pair samples. It is important to study the functional implications of the changes in the viral gene sequences between mother-infant pairs to understand the molecular basis of successful transmission [21]. VT5 (subtype B) sample, as expected, showed CXCR4 usage and all of the CRF02_AG strains showed CCR5 usage.

In summary, we show for the first time presence and transmission of CRF02_AG HIV-1 strain in India (Delhi - Capital of India) and presence of subtypes B and C in North India. These observations will impact on the T-cell epitope based vaccine. The existence of multiple HIV-1 genetic subtypes in this region is likely to generate novel and complex recombinants.