Background

Guangdong is one of the areas in China most heavily affected by HIV-1. By the end of October 2019, Guangdong reported the fourth highest number of HIV cases (66,558) in China [1]. National wide antiretroviral therapy (ART) has substantially curbed rampant HIV transmission [2] and has significantly reduced the HIV infection associated mortality and morbidity [3, 4]. However, emerging HIV drug resistant variants due to the long-term ART selection post a threat to HIV prevention and control [5].

Transmitted drug resistance (TDR) of HIV is prevalent but varies worldwide. For example, the prevalence of TDR of HIV has been reported to be 4.1% in south/southeast Asia and 6.0% in sub-Saharan Africa [6] 14% in southwestern Siberia [7], 7.8% in Greece [8], 8.0% in Brighton [9], and 13.1% in Portugal [10]. In 2015, a nationwide cross-sectional survey revealed that the overall prevalence of TDR was 3.6% in China [11]. More recently, the TDR rate of many cities in China has increased 4.5% in Beijing [12], 7.21% in Guangxi [13], 11.1% in Zhejiang [14], and 7.8% in Tianjin [15].

Molecular transmission clusters can be identified by molecular phylogeny based on evolutionary theory and sequence analysis [16, 17]. The analysis of transmission clusters has been widely used to study HIV-1 transmission kinetics and develop real-time precision interventions [18, 19]. International guidelines recommend that newly diagnosed HIV patients should be tested for ART drug resistance for potential TDR and for antiviral drug selection [16, 17]. Given that first-line ART drugs has been used in Guangdong for thirty years, it is essential to investigate the prevalence and transmission of TDR among HIV-1-infected adults in Guangdong. Here, we performed a large cohort cross-sectional study in ART-naïve HIV-1-infected individuals in Guangdong.

Methods

Study population

Between January 2018 and December 2018, 2368 HIV-1 patients were enrolled in this study based on the following criteria (1) adult residents being over 16 years old and living in Guangdong Province; (2) diagnosed with HIV infection within 3–6 months and never received ART; and (3) not infected via mother-to-infant transmission. The epidemiological data of the patients (includingage, sex, marital status, education level, ethnicity, route of infection, and CD4+ T cell count) were acquired from the China Information System for Disease Control and Prevention.

HIV-1 RNA extraction and pol gene amplification

The blood sample mixed with the anticoagulant ethylene diamine tetraacetic acid (EDTA) was centrifuged at 3000 rpm for 5 min to collect plasma. Viral RNA was extracted from the plasma using the QIAamp Viral RNA Mini Kit (Qiagen, Germany) following the manufacturer’s instructions. The extracted RNA was transcribed and nest amplified using the PrimeScript One Step RT-PCR Kit (Takara, China) and PrimeSTAR HS DNA Polymerase (Takara, China). The PCR products were analysed using agarose gel electrophoresis, and the positive products (approximately 1300 bp in the HIV-1 pol gene corresponding to HXB2 2147–3462 nt, encoding the protease and the first 299 residues of reverse transcriptase) were sent for ABI3730 sequencing in a commercial company (Tianyi Huiyuan, China). The sequences obtained were assembled and cleaned with Sequencher software.

Genotype determination and analysis

Sequences were aligned, adjusted manually and merged with HIV-1 subtyping references downloaded from the Los Alamos HIV Sequence Database via Bioedit software. To determine the HIV-1 genotypes, sequences were assessed with the Context-based Modeling for Expeditious Typing (COMET) genotyping tool, developed by Daniel Struck [20] and the REGA HIV-1 Subtyping Tool Version 3.0, developed by Tulio de Oliveira [21]. The ML phylogenetic tree was used for confirmation. The phylogenetic tree was constructed using the maximum likelihood method with the GTR substitution model with the PhyML program 3.0 [22], and the branch support value was estimated using the approximate likelihood ratio test (aLRT) [23].

TDR and drug resistance mutation analysis

TDR was defined as the presence of surveillance drug resistance mutation (SDRM) [10]. The Stanford Calibrated Population Resistance (CPR) tool 8.0 (last updated on 1st July 2019) was used to identify SDRMs according to the WHO 2009 surveillance list [21]. The Stanford HIVdb Program 8.9 (last updated on 7th Oct. 2019) was used to infer resistance to antiretroviral drugs, including protease inhibitors (PIs), nucleoside reverse transcriptase inhibitors (NRTIs) and non-nucleoside reverse transcriptase inhibitors (NNRTIs) [24]. Sequences with low-level, intermediate-level, or high-level resistance were defined as drug resistant.

Transmission cluster construction

The HyPhy program 2.2.4 was used to calculate the pairwise Tamura-Nei 93 (TN93) genetic distance for the aligned sequences [25]. The network visualisation program Cytoscape 3.2.1 was used to analyse sequences with a threshold genetic distance of 0.015 and to visualize the transmission network as nodes (sequences), edges (links) and clusters (groups of linked sequences) [26]. This genetic distance threshold has been validated to identify partners with epidemiological links [27] and has been widely used [28, 29].

Statistical analysis

All statistical analyses were performed using IBM SPSS program version 25.0. Qualitative statistics are described using the frequency. Quantitative statistics are described using the median (IQR). Univariate and multivariate logistic regression analyses were performed to identify potential risk factors. A P-value < 0.05 was considered statistically significant. Variables with a P-value < 0.05 in the univariate logistic regression analysis were included in the multivariate logistic regression analysis. Odds ratios (ORs) and adjusted odds ratios (aORs) with their 95% confidence intervals (95% CIs) are reported.

Results

Demographic and clinical characteristics of the subjects

A total of 2368 (97.17%) HIV-1 pol sequences were successfully amplified and sequenced from the enrolled 2,437 participants whose age ranged from 16 to 90 years, with a median age of 36 years. In total, 86.53% (2,049/2,368) of the subjects were male. The most common infection route was men who have sex with men (MSM 46.75%, 1107/2368), followed by heterosexuals (HETs 42.40%, 1004/2368) and intravenous drug users (IDUs3.38%, 80/2,368. Approximately half of the participants were unmarried (46.28%, 1096/2368), and 36.95% were married or cohabiting (875/2368). The educational status of the subjects was mainly junior high school (34.76%, 823/2368). The median (range) CD4+ T cell count was 247 (1–1425) cells/mm3, and 37.80% (895/2368) of the subjects exhibited a CD4+ T cell count of < 200 cells/mm3 (Table 1).

Table 1 Demographic characteristics and factors associated with drug resistance

Distribution of HIV-1 genotypes

The main HIV-1 genotypes circulating in Guangdong were found to be CRF07_BC (35.90%, 850/2368), CRF01_AE (35.56%, 842/2368) and CRF55_01B (10.30%, 244/2368), accounting for 81.76% of total infections. HIV-1 subtype B (2.96%, 70/2368), CRF08_BC (2.79%, 66/2368) and CRF59_01B (2.24%, 53/2368) were less frequently observed. HIV-1 Subtype C (0.46%, 11/2368), subtype G (0.13%, 3/2368), CRF02_AG (0.1%, 3/2368) and CRF12_BF (0.04%, 1/2368) were classified as minor in this study. In addition, 225 recombinant strains were observed (REGA tool ‘Recombination’, ‘Recombination-like’, ‘potential-Recombination’, or ‘check the report’; and COMET tool ‘unassigned’ and not clustered with any reference sequences by the phylogenetic tree). Minor HIV-1 genotypes and recombinant strains were classified as ‘other’ genotypes (Fig. 1A).

Fig. 1
figure 1

Genotypic analysis based on the sequences of the human immunodeficiency virus 1 pol gene. A Distribution of genotypes according to the HIV-1 pol gene. B Distribution of HIV-1 genotypes in each risk group. Other genotypes include subtype C subtype G, CRF02_AG, CRF12_BF, and some recombinant forms

The distribution of HIV-1 genotypes varied among different risk groups (Fig. 1B). CRF07_BC (40.65%, 450/1107), CRF01_AE (29.63%, 328/1107) and CRF55_01B (12.74%, 141/1107) were the dominant genotypes circulating among MSM, and CRF08_BC (0.36%, 4/1107) was rarely detected in this risk group. CRF01_AE (42.43%, 426/1004), CRF07_BC (30.28%, 305/1004) and CRF55_01B (8.27%, 83/1004) were the main genotypes circulating among HETs. CRF07_BC accounted for more than half of the genotypes circulating among IDUs (53.75%, 43/80), followed by CRF01_AE (22.50%, 18/80) and CF08_BC (17.50%, 14/80).

HIV drug resistance mutations (SDRMs)

Twenty-one SDRMs were identified among fifty-two drug-resistant strains by the CPR program. M46L (0.17%, 4/2368) was the most prevalent mutation in the protease region. K103N (0.42%, 10/2368), Y181C (0.21%, 5/2368), and G190A (0.21%, 5/2368) were the most common NRTI-associated mutations, and M184V (0.21%, 5/2368), L210W (0.21%, 5/2368), and T215S (0.13%, 3/2368) were the most common NNRTI-associated mutations (Fig. 2). Patients infected with the CRF01_AE (0.29%) strain were most likely to acquire a PI-associated SDRM, followed by those infected with the CRF07_BC strain (0.04%). Patients infected with the CRF07_BC strain were most likely to acquire an NRTI-associated SDRM, followed by those infected with the CRF01_AE strain and CRF55_01B strain. Patients infected with the CRF01_AE strain were most likely to acquire an NNRTI-associated SDRM, followed by those infected with the CRF07_BC and subtype B strains (Fig. 2).

Fig. 2
figure 2

Distribution of surveillance drug resistance mutations among HAART-naïve HIV-1-infected individuals from Guangdong China

HIV TDR and its associated factors

The clinical impact of these mutations was assessed with the Stanford HIVdb tool. In total 2.20% (52/2368) of patients had TDR (Table 2). Among them, 8 (0.34%) had TDR to PIs, 22 (0.93%) to NRTIs, and 23 (0.97%) to NNRTIs (Table 2). Two (0.08%) strains showed dual-class resistance to NRTIs and NNRTIs, and no strains showed triple-class resistance. For NNRTIs, the most frequent TDR drugs were EFV and NVP (all 1.01%, 24/2368). For NRTIs, the most frequent TDR drug was D4T (0.63%, 15/2368), followed by AZT (0.46%, 11/2368). All seven patients with TDR to PIs were resistant to NFV.

Table 2 Transmission drug resistance among ART naïve HIV-1 infections from Guangdong China

Risk factors associated with HIV TDR are listed in Table 1. In the univariate logistic regression analysis, two factors were significantly associated with HIV TDR. The OR for patients whose CD4+ T cell count was above 500 cells/mm3 versus patients whose CD4+ T cell count was below 200 cells/mm3 was 3.437 (95% CI 1.636–7.219) and that for patients infected with the CRF07_BC strain versus patients infected with the CRF01_AE strain was 0.406 (95% CI 0.193–0.854). The multivariate logistic regression model showed that a CD4+ T cell count above 500 cells/mm3 and CRF07_BC were important risk factors, with aORs of 4.062 (95% CI 1.904–8.668) and 0.360 (95% CI 0.170–0.764), respectively.

Genetic transmission cluster analysis

All 2368 sequences were used to construct the genetic transmission network, of which 1066 (45.02%) were segregated into 194 clusters with a genetic distance threshold of 1.5%, ranging from 2 to 414 sequences (Fig. 3). A total of 93.30% (181/194) of clusters had a size ≤ 5 and 6.70% (13/194) of clusters had a size > 5. The largest cluster A was the CRF07_BC cluster with 414 sequences, followed by the CRF55_01B cluster B with 124 sequences (Fig. 3). A total of 50.86% (563/1107) of sequences from MSM were included in the networks and dispersed among 53.09% (103/194) of the transmission networks, and 40.64% (408/1004) of sequences from HETs were included in the networks and dispersed among 69.59% (135/194) of the transmission networks. We also observed that 28.85% (15/52) of patients with TDR were included in 9 clusters, and an analysis of shared mutations revealed that cluster C contained two TDR sequences with the K103N mutation (Fig. 3). The proportion of patients with TDR entering the network was lower than that of those without TDR, and the difference was statistically significant (χ2 = 5.617, p = 0.023 < 0.05). These individuals with TDR included 10 patients with resistance to NRTIs, 4 patients with resistance to NNRTIs, and 1 patient with resistance to PIs.

Fig. 3
figure 3

Risk factor- and drug resistance-associated genetic transmission networks. A The largest cluster included 414 CRF07_BC sequences. B The second largest cluster included 1124 CRF55_01B sequences. C The CRF01_AE cluster with the surveillance drug resistance mutation K103N

Patients were divided according to whether they entered the transmission network, and the risk factors listed in Table 3 were examined. The multivariate logistic regression model showed that infection through intravenous drug use, a CD4+ T cell count between 200 and 499 cells/mm3, and CRF07_BC or CRF55_01B were important factors, with aORs of 0.266 (95% CI 0.144–0.493), 1.339 (1.095–1.636), 3.435 (2.789–4.232) and 2.498 (95% CI 1.850–3.372), respectively (Table 3).

Table 3 Factors associated with transmission within clusters

Discussion

In this study, we investigated the genetic characteristics and prevalence of TDR among ART-naïve HIV-1-infected individuals newly diagnosed in Guangdong, China, in 2018. The major epidemic HIV-1 genotypes detected in Guangdong were CRF07_BC (35.90%), CRF01_AE (35.56%), and CRF55_01B (10.30%). The distribution of HIV-1 genotypes in Guangdong has changed over the last three decades. Before 2000, subtype C (46.2%) and subtype B (30.7%) were the major prevalent strains before 2000 [30]. CRF01_AE (49.68%), CRF07_BC (22.26%), and CRF08_BC (21.93%) were the major strains circulating in 2006 [31]. CRF01_AE (43.2%), CRF07_BC (26.3%), CRF55_01B (8.5%) and CRF08_BC (8.4%) became the predominant strains circulating in 2013 [32]. In 2018, the proportion of individuals infected with CRF07_BC increased, while the proportion of individuals infected with CRF01_AE declined gradually. CRF07_BC was first identified from IDUs in the early 1990s and has spread to MSM [33]. In this study, CRF07_BC was confirmed as the most dominant HIV-1 genotype across MSM (40.65%, Fig. 1B), and the proportion of CRF07_BC in MSM increased from 33.3% in 2006[31] to 34.2% in 2013[32]. The CRF07_BC-infected cases are likely to keep increasing if HIV infection among MSM continue rapidly. Our finding highlights the important of CRF07_BC for HIV control in Guangdong.

The overall prevalence of TDR is 2.20% in Guangdong. In general, this prevalence has remained low according to WHO categorisation methods [34], and is lower than that in other regions of China [12,13,14,15,16]. A significant difference between the prevalence of TDR and CD4+ T cell count and genotype was observed, consistent with previous results [13]. When the CD4+ T cell count was used as a categorisation parameter, it was determined that patients with a CD4+ T cell count above 500 cells/mm3 were most likely to develop drug resistance. Of the six main genotypes, CRF07_BC had the lowest prevalence of TDR. In this study, TDR to NNRTIs and NRTIs was more common than TDR to PIs. This may be because NRTIs and NNRTIs are frequently used as first-line treatments. As the existence of TDR will affect antiretroviral therapy and spread drug resistance mutations, TDR continue to be monitored.

The SDRMs examined in our study were different from those in other regions. The most frequent PI-associated mutation in our study was M46L, whereas it is Q56E in southwest China [13], M46I in Iceland [35], and L90M in the south-central United States [36]. The most frequent NRTI-associated mutations in our study were M184V and L210W, while they are M41L and D67G in Southwest China [13] and T215C/D in Iceland and the south-central United States [35, 36]. The most frequent NNRTI-associated SDRM in our study was K103N, while it is V179E and V106I in Southwest China [13] and K103N/S and E138A in Iceland and the south-central United States [35, 36]. These dominant SDRMs are consistent with the main drug resistance sites among ART-treated patients in Guangdong [37]. The different SDRMs among different regions may be due to different genotype distributions or ART regimens.

To elucidate the transmission dynamics in the surveilled population, we constructed transmission clusters based on HIV-1 sequences. Of all the transmission networks, 53.09% included sequences from MSM. Moreover, more than half of the largest cluster, cluster A, and the second largest cluster, cluster B were comprised of sequences from MSM (68.36% and 54.84%, respectively). These results indicate that MSM may contribute significantly to the spread of the virus, and additional efforts should focus on this population for HIV prevention and control. Additionally, 28.85% (15/52) of patients infected by TDR strains were included in 9 clusters. A cluster (cluster C) containing HIV strains sharing the same SDRM (K103N) was found in the present study. The presence of TDR strains within transmission networks accounted for 4.64% (9/194) of all networks. These results indicate that HIV TDR may have spread in the transmission network, and the surveillance of TDR should be factored into treatment and prevention policies. Logistic regression analysis revealed that a CD4+ T cell count between 200 and 500 cells/mm3, the CRF07_BC strain and the CRF55_01B strain may be associated with the probability of entering the transmission network. The reasons for the association should be investigated further.

Conclusions

In summary, this study of 2368 treatment-naïve HIV-1 patients shows that there is high genetic heterogeneity in Guangdong China. Although the overall prevalence of TDR is low, it is still necessary to remain vigilant to some important SDRMs.