Background

HIV drug resistance (DR) compromises successful clinical outcomes in both treatment naïve and treatment experienced patients [1, 2]. In the context of emergent HIV therapy such as occupational exposure, occult HIV DR may lead to failure of post-exposure prophylaxis [3, 4]. For these reasons and also for the purposes of mapping the epidemic, population level surveillance of transmitted DR (TDR) remains important [5, 6]; however, inherent in population analyses of TDR is the loss of resolution at the local level. True local HIV TDR prevalence is obscured through estimates based on larger data sets generated at the regional or national levels. For example, if two clinics with highly divergent HIV TDR prevalence’s are sampled, then when analyzed in aggregate, the average prevalence is inaccurate for both clinics. With new interventions such as treatment as prevention being considered, accurate local TDR data is critical for program implementation and for outcome measures.

HIV TDR was evaluated among anti-retroviral (ART) naïve patients referred over a period of 3 years to the Immunodeficiency Clinic in the 2nd largest city in Ontario, Canada. The Ottawa Hospital Immunodeficiency Clinic routinely carries out baseline HIV DR testing [79] which provided access to the genetic sequence of the virus from the clinical HIV genotype report. Using this information we re-interpreted the HIV genotype in order to determine the local TDR trends and analyzed the distribution of TDR within the sub-populations of the clinic cohort using molecular epidemiology.

Methods

Data collection

Clinical and demographic data were obtained via chart review for all treatment-naive HIV patients referred to the Ottawa Hospital Immunodeficiency Clinic from August 2007 until June 2010. Variables extracted from the chart included: date of birth, country of origin, date of initial HIV diagnosis, likely HIV exposure category likely date of infection according to recall, earliest available CD4 count, viral load, and history of previous anti-retroviral drug (ART) exposure. Exposure category for HIV infection was hierarchically categorized as follows: men who have sex with men (MSM), intravenous drug use (IDU), HIV endemic country, heterosexual contact, and other. Data were entered into a database that was then anonymized for analysis through the assigning of a random study number. Blood specimens were taken as part of standard care. This study was approved by the Ottawa Hospital Research Ethics Board.

HIV genotyping

Plasma was separated from EDTA anticoagulated blood within 4 hours of collection and frozen at -80°C. Frozen plasma was shipped to the HIV Drug Resistance Testing Program at the BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia for genotyping. Hard copy Vircotype clinical HIV drug resistance reports that included the HIV pol sequence submitted for interpretation were returned to the clinician and placed on the chart.

Digitization of nucleotide sequences

Nucleotide sequences from the Vircotype reports were digitized using Readiris Pro 11 optical character recognition (OCR) software (Iris Technologies, Delray Beach, Florida, USA). Digitized sequences were aligned and scanning errors corrected. To verify the fidelity of the scanning software, 10 digitized sequences were chosen at random and twice manually compared to the original Vircotype hard copies.

Genotypic resistance analysis

155 quality controlled sequences were submitted in FASTA format to the Stanford HIV Database Calibrated Population Resistance tool (CPR) [10] in order to derive an unambiguous and stable measure of TDR [11]. TDR mutations were identified using the World Health Organization (WHO) surveillance drug resistance mutation list [12]. Integral quality control assessment of the CPR analysis revealed 12 further sequences with non-IUPAC letter codes that were corrected and those sequences were re-submitted.

Phylogenetic analysis

Genotype sequences were edited, and aligned using Muscle [13], then trimmed to identical lengths (986 bp) within Geneious 5.4.3 [14]. 136 reference pol sequences from the Los Alamos HIV Sequence database [15] and 119 pol sequences from the Canadian national HIV drug resistance surveillance were aligned and trimmed as above. Codons that harboured TDR mutations identified in the CPR analysis were stripped from all of the sequences within the alignment in order to minimize convergent evolution artefact. A maximum likelihood tree was constructed using the general time reversible (GTR) model in PHYML [16] in order to determine phylogenetic interrelationships among viral sequences. Robustness of relationships among sequences was evaluated using bootstrap analysis with 100 replicates. Clusters having bootstrap values >98% and within cluster branch lengths of less than 0.04 were further evaluated [17]. The similarity of sequences within clusters was compared to background subtype B sequences using mean pair-wise distance within a Maximum Composite Likelihood model with pair-wise deletion [18].

Statistical analysis

303 patients were referred to the Ottawa Hospital Immunodeficiency clinic between August 2007 and June 2010 for assessment and care of their HIV infection. 196 patients (65%) were ART naïve of which 162 had undergone baseline genotyping. 155 of the baseline genotype reports were successfully digitized and used in the subsequent analysis. Reasons for the unavailability of a genotype were: insufficient plasma (13); genotyping was performed elsewhere (10); viral load was too low (9); or genotype not ordered (2).

Statistical analysis was performed using EpiInfo 2000 [19]. Descriptive statistics included frequency analysis (percentages) for categorical variables and means for normally distributed continuous variables. To compare the means of the continuous data an unpaired t-test was used. Chi-square or Fisher’s exact probability tests were used for categorical data. To determine whether the HIV infected patients seen at the Ottawa Hospital clinic patients were similar to those seen in the province as whole, comparisons of the characteristics of our cohort were compared with those from all reported Ontario provincial HIV cases [20]. To evaluate the internal validity of the cohort, characteristics of patients with genotyping results were compared with those from all patients seen concurrently in the clinic. Odds Ratios (OR) and 95% Confidence Intervals (CI) were calculated. P-values less than 0.05 were regarded as statistically significant. Patients with incomplete data were excluded from the analysis.

Results

Sample characteristics

155 sequences from ART naïve subjects were analyzed. Of the 155 patients for whom diagnosis date was recorded (n=145), 75% were genotyped within 1 year of diagnosis. The mean elapsed time from patient’s recollection of diagnosis date, to genotype was 487 days, with a median of 56 days (IQR 234 days).

Representativeness of the data

Epidemiologic characteristics of the 155 study cohort patients, for whom a genotype sequence was available, were compared with those from cases in the Report on HIV/AIDS in Ontario 2008 [20] and also to other clinic patients. There was no significant difference with respect to age at diagnosis, gender or risk factor for acquisition of HIV (Table 1) between either the study cohort or the patients in the clinic. Similarly, when compared to all other clinic patients, there were no significant differences among characteristics of patients for whom genotyping results were available.

Table 1 Comparison of HIV epidemiologic characteristics of study patients with provincial surveillance report

The mean age of patients in the study cohort was 37 years (median 36 years IQR (31, 44)) with 75% being male. The most commonly reported risk categories for HIV acquisition were MSM followed by birth in an HIV endemic country, with each accounting for approximately 1/3 of the group. Analyzed by gender, 58% of the men reported MSM as their risk factor for HIV infection, while heterosexual contact was the dominant risk factor among women (86%). The study cohort had an mean viral load of 4.17 log10 copies/ml (median 4.26 log10 copies/ml, IQR (3.67,4.72)). The mean CD4 count was 332 cells/ul (median 303.0 cells/ul IQR (149.5,473.5)). Sixty-eight percent of the study cohort was infected with subtype B virus with the next most common subtype being subtype C (23%).

Drug resistance

Of the 155 ART-naïve subjects with digitized HIV pol sequence that were subjected to further analysis, OCR fidelity validation trials identified only 2 incorrect base identification errors out of 11,600 bases analyzed (99.98% concordance). Neither of these errors influenced the TDR interpretation.

The crude prevalence of any drug resistance for the period of 2007-2010 was 12.3%. Although the TDR prevalence appeared to increase over the study period this trend was not significant (Chi-square, p=0.65). Fifteen of the 19 cases of TDR were resistant to NRTIs only. Two patients had 2-class resistance with one having a combination of NNRT/NRTI resistance and the other patient exhibiting combined NRTI/PI resistance. One patient had resistance to all 3 classes of ART. Details of the patterns of TDR for each patient can be found in Table 2.

Table 2 Surveillance drug resistance mutations patterns for patients

TDR was not evenly distributed within the sub-populations of the study cohort. Although 30% of the infections were non-B subtype, the prevalence of TDR was only 2% as compared to 17% among subtype-B infections. In addition, the prevalence of TDR among women was 2.5% versus 16% among the men. Consistent with the gender based distribution of drug resistance, the TDR prevalence was 21% in MSM and only 5.5% among IDU. Finally, there was a trend towards increased rates of TDR among patients who had genotyping within 6 month after diagnosis (Table 3).

Table 3 Comparison of patient characteristics between those with and without drug resistance

Phylogenetic analysis

The most common pattern of drug resistance mutations seen was the combination of RT mutations D67N and K219Q conferring low-level resistance to AZT. In fact, 15 of the 19 drug resistant specimens collected over the three-year period contained this identical dug resistance pattern (Table 2) which warranted further investigation. A phylogenetic analysis using a GTR model in PHYML, established that 13 out of the 15 sequences containing the D67N/K219Q formed a distinct out-group with a bootstrap value of 100% (Figure 1). The monophyletic cluster possessed a within cluster pair-wise distance of less than 1%, compared with the background variation of 6%, which is consistent with HIV transmission over a shorter period of time. There were two additional D67N/K219Q containing sequences that were not found to be part of this cluster. These two sequences were not related to each other. 17 additional clusters were found within this cohort with most (60%) having only two members. There were 4 clusters with 3 members and 3 clusters with 4 members. No TDR was found within any of these other clusters (Figure 1).

Figure 1
figure 1

Maximum likelihood tree of HIV pol sequences generated using GTR model in PHYML with 100 bootstrap replicates. Sequences were trimmed to 986 bp with identified TDR mutation codons removed. 155 study cohort sequences are present in a background of 255 sequences further described in the methods. Orange box shows the D67N/K219Q cluster that was identified. For comparison purposes another cluster identified in the study cohort is identified in blue. ** Indicates the out-group N sequence N.CM.97.YBF106.

The phylogenetically inferred relationships are strongly supported by the observed epidemiological characteristics. All of the patients with the D67N/K219Q resistance pattern are male, are infected with a clade B virus and either have MSM as their risk factor for acquisition or it is not reported (2 cases). These characteristics were significantly associated with this group (all OR > 6 and all p-values <0.02) when compared to all other genotyped patients. All but one of the cases with D67N/K219Q were less than 6 months induration, however the small numbers in the study prevented the correlation with recent infections being significant. The relationship between the pattern of clustering observed on the phylogenetic tree and the characterization of a clearly defined sub-population supports the interpretation of cluster identification as means to identify a group at risk of HIV infection.

Discussion

HIV TDR Surveillance projects are typically designed to avoid sample bias by seeking out an ever enlarging (n) [5, 21, 22]. Paradoxically, the most relevant information for the clinician arrives from a biased but more relevant, local sample. Similar to other national surveillance projects, [23] ideal HIV TDR data should be systematically obtained at the local level, where it most accurately reflects the ecology of the virus, and then collated at a regional level. This approach would produce clinically relevant data and facilitate the population level study of HIV TDR dynamics [24, 25]. This study is unique in that it compares the local viral HIV TDR landscape with that predicted to be present in a large population based study [26]. While the cohort of patients in this analysis was not significantly different with respect to age, gender, risk factors or ethnicity from the HIV infected population in Ontario; there were striking differences in the TDR distribution within our clinic sub-populations when compared to the province as a whole.

Initially appearing to be consistent with results from other groups [17, 2733], the average prevalence of TDR at 12.3% is heterogeneously distributed. TDR prevalence is lower in women, people with non subtype-B infections and acquisition of HIV through non-MSM contact [33, 34]. However, in our cohort 1 in 5 patients who acquired their infection through sex with men harboured some form of TDR, almost double the average prevalence. More striking is that most recent available data from on Ontario-wide study revealed that TDR prevalences were three-fold higher among intravenous drug users (IDU) than among MSM [26]. Despite our clinic cohort appearing to be representative of the population of the province as whole, we observed the exact opposite pattern of TDR with MSM having four times the prevalence of TDR than was found among IDU. The distinct and contrasting TDR distribution observed among our clinic sub-populations was lost completely when data was analyzed at the provincial level highlighting the importance of local analysis of TDR patterns.

One of the significant drivers of the distribution of TDR, within our local sub-populations, was the cluster of related infections. Comparing the group of patients with the D67N/K219Q (n=15) with those patients with another pattern of TDR (n=4), it is clear that the epidemiologic characteristics of patients with TDR in our clinic were made up of those from this larger group. The D67N/K219Q combination has a low fitness cost to the virus, is durable in the recipient after infection, is readily transmitted and serves as an excellent marker for TDR [35]. The similar pattern of drug resistance found in more than three quarters of the resistant cases, cued the molecular epidemiology investigation that identified the cluster of related infections among MSM. However, in the absence of this TDR marker, a cluster of related viruses would go unrecognized and, along with that, the opportunity to recognize an outbreak. Phylogenetics are often applied to large data sets to illuminate transmission patterns or trends in drug resistance at the population level [3642]. Our findings continue to support the value in applying routine phylogenetics at the clinical level to uncover related infections even in the absence of common patterns of TDR [41, 42].

While having clinical implications, there is potential public health benefit in identifying a cluster of HIV transmission [43]. Although the limitations of phylogenetic analysis include the inability to attribute directionality of a transmission event [44, 45], highly related viral sequences may correlate with shared social or risk-behaviour patterns [46]. If a cluster of related infections goes unrecognized by public health, an opportunity may be lost to mitigate ongoing transmission; especially originating from those individuals recently infected or from those not yet diagnosed. Public health education within the communities in which higher HIV TDR is found, more frequent testing of those within those communities and appropriate and early treatment are all public health interventions can reduce transmission and improve the outcomes within that community. From the outcome perspective, it is extremely important to understand the reasons for successes and failures of public health interventions. An occult transmission cluster, reflecting a negative result such as incident infection or drug resistance, can negatively bias the interpretation of a program that is actually effective. Studies are underway to examine how best to balance the use of phylogenetics in public health investigations with individual privacy [47].

There are some limitations to be considered in this study. Nearly 90% of the specimens that contained drug resistance were obtained less than 6 months after diagnosis while only 63% of the specimens without TDR were collected in the same timeframe. While our evaluation of TDR may be biased toward identifying less durable mutations the D67N/K219Q, which represented the majority of our TDR, has a low fitness cost and should be durable. The over-representation of these mutations among those recently diagnosed is likely consistent with the mutations acting as a marker for recent cluster formation. While we were diligent in identifying all new patients seen at the largest HIV referral center in Ottawa, there is the possibility of sample bias due to the unavailability of genotypes from some patients. Consequently, even though our cohort appears to be statistically representative of the epidemic in Ontario, our findings on TDR are valid only for the sub-populations seen at the Ottawa Hospital. This sample bias may also affect the depth of sampling in the phylogenetic analysis resulting in the possibility that the clustering of the D69N/K219Q mutations among MSM may be more widespread. Although our study possesses the theoretical limitations associated with sample bias, it is precisely these “limitations” that exposed the divergent TDR trends among our clinic sub-populations. Indeed, similar disparities in DR surveillance patterns have been observed in bacterial infections resulting in a similar call for local surveillance [48].

Conclusion

We have shown that analysis of HIV genotypes, at the local level, reveals patterns of DR that are distinct from those described in surveillance reports encompassing larger geographic areas. Local molecular epidemiological analysis of these genotypes may provide insight into the reasons why these HIV DR patterns differ and expose occult infection clusters. Using the model for antibiotic resistance surveillance, collecting and analyzing HIV TDR data at the clinic level and then collating these results at provincial and national levels would both optimize patient care and provide more complete surveillance [23]. Finally, local phylogenetic analysis of clinical specimens may provide public health with additional tools for outbreak investigations.