Background

Human immunodeficiency virus type 1 (HIV-1) sequences of the major group M are classified into nine subtypes (A-D, F-H, J, and K), six sub-subtypes (A1–A4, and F1–F2), and a variety of circulating recombinant forms (CRFs) and unique recombinant forms (URFs) (Los Alamos HIV sequence database: http://hiv-web.lanl.gov/). The subtype F1 causes a small number of infections globally (<1%) [1]; but it is particularly prevalent in some specific countries from Europe, South America, and Central Africa, either in its non-recombinant form, or as part of recombinant genomes.

In Europe, non-recombinant subtype F1 strains reach a high prevalence (>70%) among Romanian children and adults [25]. This Romanian epidemic was probably caused by the introduction of one subtype F1 virus into the adult population sometime before it appeared in 1989 among institutionalized children [6]. In South America, subtype F1 and mainly BF1 recombinant variants are prevalent (>10%) in countries from the Southern cone (Argentina, Brazil, Chile, Paraguay, and Uruguay), particularly among intravenous drug users and heterosexual populations [722]. A previous study suggested that the subtype F1 and BF1 epidemics in South America were initiated by the introduction, through Brazil, of a single founder subtype F1 strain around the middle-late 1970s [23]. Several studies have shown that the South American and Romanian epidemics are the result of distinct subtype F1 introductions [3, 4, 6, 23]; but the geographic epicenter(s) of these subtype F1 epidemics is one of the most puzzling aspects in the worldwide spread of the HIV-1.

The overall prevalence of subtype F1 in Africa is very low, and most of the subtype F1 infections initially described in the continent were from the Democratic Republic of Congo (DRC). Although subtype F1 forms represent a small percentage (<5%) of the HIV-1 strains circulating in the DRC [2430], sporadic cases of subtype F1 and CRF05_DF infections have been reported in Belgium and the Netherlands among individuals with a direct epidemiological link to the DRC [3134]; this indicates that this country could be an important epicenter of the world-wide dispersion of both pure and recombinant subtype F1 strains. The most extensive phylogenetic analysis of subtype F1 strains circulating worldwide performed to date, however, revealed that although sequences from the DRC fell in a basal position within the subtype F1 phylogeny, they were only weakly associated with the South American and Romanian clades [23].

Angola is a Central African country bordered by the DRC, Republic of the Congo, Namibia and Zambia. Like in the DRC, the HIV-1 epidemic in Angola is characterized by the circulation of all group M subtypes and sub-subtypes, a high number of URFs, and several unclassifiable sequences [3537]. Two recent studies, based on the analysis of partial genome regions, described an unusual high prevalence of subtype F1 infections in Angola, ranging from 8% to 16% [35, 37]. Angola maintains strong social, cultural and economic relationships with Brazil. Noteworthy, the estimated onset date of the subtype F1 epidemic in Brazil (and South America) was around 1975–1980 [23, 38, 39], coinciding with the beginning of the Angolan civil war in 1975 that was followed by a wave of emigration. These observations have lead us to suggest that the subtype F1 found in South America could have been originated from Angola [23].

To further test this hypothesis, we recovered nine partial and four full-length genome sequences from subtype F1 HIV-1 isolates from Angola and investigated the phylogenetic relationship of these strains with other subtype F1 strains isolated worldwide. We also estimated the onset year of the HIV-1 subtype F1 epidemic in Central Africa based on the analysis of 47 non-contemporary env gene sequences of African origin sampled over a period of 22 years (1984–2006).

Results

Identification of "pure" HIV-1 subtype F1 viruses in Angola

Among 46 samples collected in Angola in 2006, nine (20%) were classified as subtype F1 based on the analysis of the env-gp41 region (data not shown). These samples were further analyzed in the env-gp120 and pol-rt regions, and all nine Angolan sequences confirmed their classification as subtype F1 in both genomic regions (Figs. 1 and 2). To confirm the circulation of non-recombinant subtype F1 strains in Angola, we expanded the genetic characterization of four strains to near-full length genome sequencing. According to the REGA HIV-1 Subtyping Tool and bootscanning analyses, no evidence of inter-subtype recombination was found among the four samples analyzed, confirming their classification as "pure" subtype F1 strains. Twelve "pure" subtype F1 strains were described in the literature to date (Table 1). Six were isolated in South America, five from Brazil [15, 40, 41], and one from Argentina [23]. The other six strains have a reported epidemiological link to the DRC, Kenya, Chad, Romania and Angola, but were isolated in Belgium [33], Finland [33], France [42], and Spain (Sierra et al, unpublished results) (Table 1). Thus, the four new subtype F1 Angolan sequences described in the present study are the first "pure" HIV-1 subtype F1 strains isolated in Central Africa up to date.

Figure 1
figure 1

Majority-rule Bayesian consensus tree of HIV-1 subtype F1 env-gp120 (310 bp) sequences. Posterior probabilities are shown for key nodes. The names of HIV-1 isolates include reference to subtype, country of isolation, and year of isolation. The color of each branch within the subtype F1 cluster represents the country (or geographic region) of origin of sequence corresponding to that branch, according to the legend in the figure. The asterisks point at the subtype F1 Angolan sequences described in the present work. Brackets indicate the different monophyletic clusters identified. The trees were rooted using subtype C reference sequences as outgroups. Horizontal branch lengths are drawn to scale with the bar at the bottom indicating 0.1 nucleotide substitutions per site.

Figure 2
figure 2

Majority-rule Bayesian consensus tree of HIV-1 subtype F1 pol-rt (866 nt) sequences. See legend of Fig. 1. Horizontal branch lengths are drawn to scale with the bar at the bottom indicating 0.05 nucleotide substitutions per site.

Table 1 HIV-1 subtype F1 data sets.

Phylogenetic analysis of env, pol, and full-length subtype F1 sequences

To investigate the phylogenetic relationship among HIV-1 subtype F1 isolated in Angola and other subtype F1 strains sampled worldwide, the new subtype F1 env-gp120 (310-bp) Angolan sequences and six subtype F1 env-gp120 Angolan sequences described previously were aligned with all strains from African and Romanian origin and a subset of strains of South American origin that were subtype F1 in the genome fragment analyzed. These subset strain sequences are available at the Los Alamos HIV database. This approach resulted in a final data set of 82 subtype F1 env sequences (Angola = 15, DRC = 29, Romania = 10, South America = 21, others = 7) (Table 1). The resulting Bayesian phylogenetic tree (Fig. 1) showed that isolates from the DRC occupy the most basal positions in the subtype F1 phylogeny, confirming their older radiation and indicating that the original diversification of subtype F1 probably occurred within or near the DRC. Two isolates from Angola previously described (AO.01.CSE126 and AO.01.HDC240), and isolates from other African countries also fell at the base of the tree, intermixed among strains from the DRC. Most subtype F1 sequences from Angola, however, segregated in a monophyletic group nested among the DRC strains, together with subtype F1 sequences from Romania. Within this Angola/Romania clade, isolates from Angola occupied basal positions whereas Romanian isolates branched as a monophyletic sub-cluster. The South American sequences, and the subtype F1 env segments derived from CRF05_DF strains formed two additional (and independent) monophyletic clusters nested within the basal strains from the DRC. The support of these distinct monophyletic clusters was generally low (posterior probability, PP < 70%), with the exception of the South American monophyletic group (PP = 96%).

To confirm this tree topology, a phylogenetic analysis of the larger pol-RT gene fragment (866-bp) was performed. The new subtype F1 pol-RT Angolan sequences and 12 subtype F1 pol-RT Angolan sequences previously described, were aligned with all strains from African and Romanian origin and a subset of strains of South American origin, available at the Los Alamos HIV database, that were subtype F1 in the genome fragment analyzed. The final dataset contained a total of 67 subtype F1 pol sequences (Angola = 21, DRC = 3, Romania = 18, South America = 21, others = 4) (Table 1). Ten F1/D recombinant pol sequences from CRF05_DF strains were also included. The overall topology of the resulting pol Bayesian tree was similar to the env tree (Fig. 2). This analysis showed the clustering of the Angolan and Romanian strains in a highly supported monophyletic group (PP = 94%). Within this monophyletic group, strains from Angola and Romania segregated into two separate sub-groups (PP = 100%). One isolate from France (FR.96.MP411), and another one from Mozambique (MZ.04.UNV49) also segregated within the Romanian and Angolan sub-clusters, respectively. South American strains formed an independent monophyletic lineage (PP = 91%) that was only weakly associated with two reference strains isolated in Belgium (BE.93.VI850) and Finland (FI.93.FIN9363). The three strains from the DRC occupied the most ancestral positions within subtype F1, together with the Angolan strains AO.01.CSE126 and AO.01.HDC240, consistent with the env tree. As expected, the F1/D recombinant pol fragments of the CRF05_DF strains appeared outside the subtype F1 group.

Finally, the evolutionary relationship among all 16 "pure" subtype F1 strains was investigated. Phylogenetic analysis of near FL (~8.5-kb) subtype F1 strains showed that Angolan and Romanian sequences segregated into two separate but highly related clusters, consistent with the pol tree topology; whereas South American strains form an independent monophyletic lineage (Fig. 3). The reference strain isolated in France (FR.96.MP411) segregated with the Romanian sequences, while the two strains isolated in Belgium (BE.93.VI850) and Finland (FI.93.FIN9363) clustered with the South American monophyletic lineage (Fig. 3). Maximum Likelihood analyses yielded env, pol, and FL trees with basically the same topology (data not shown). These analyses confirmed that subtype F1 strains from Angola and Romania are more related to each other than to any other subtype F1 strains isolated elsewhere.

Figure 3
figure 3

Majority-rule Bayesian consensus tree of HIV-1 subtype F1 near full-length (~8.5 kb) strains. Posterior probabilities are shown for key nodes. The names of HIV-1 isolates include reference to subtype and country of isolation. The color of each branch within the subtype F1 cluster represents the country (or geographic region) of origin of sequence corresponding to that branch, according to the legend in the figure. The asterisks points at the subtype F1 Angolan sequences described in the present work. Brackets mark the different subtype F1 monophyletic clusters identified. Subtypes F2 and C reference sequences were used as outgroup. Horizontal branch lengths are drawn to scale with the bar at the bottom indicating 0.02 nucleotide substitutions per site.

Timing the origin of the HIV-1 subtype F1 in Central Africa

To estimate the date of the root of the Central African subtype F1, a total of 49 subtype-F1 env (310-bp) sequences from African origin (Angola = 14; DRC = 28; others = 7) sampled from 1984 to 2006 were used; including 11 subtype-F1 env sequences derived from CRF05_DF strains. Evolutionary parameters were estimated under strict and relaxed clock models as implemented in BEAST v1.7. The coefficient of variation under the relaxed clock model was 0.24 (95% Highest Posterior Density [HPD]: 0.12–0.36); indicating detectable variation in evolutionary rates among branches. Despite this, the median evolutionary rate estimated for this data set under both strict (3.3 × 10-3 [95% HPD: 2.2 × 10-3-4.5 × 10-3] subst./site/year) and relaxed (3.2 × 10-3 [95% HPD: 1.9 × 10-3-4.5 × 10-3] subst./site/year) clock models was very close. According to this env dataset the time of the most recent common ancestor(Tmrca) of the subtype F1 epidemic was estimated to be around 1955 (95% HPD: 1932–1970; strict clock) and 1958 (95% HPD: 1934–1971; relaxed clock).

Discussion

Two previous studies described a high prevalence (8–16%) of HIV-1 subtype F1 infections in Angola, based on the analysis of partial genomic regions [35, 37]. In the present work, nine (20%) out of 46 HIV-1 samples recovered in Angola in 2006 were classified as subtype F1 after analysis of the env-gp41 region, confirming the previous observations. A considerable proportion (~40%) of env subtype F1 viruses originating in the DRC display a F1/D recombinant profile in pol and are, in fact, CRF05_DF strains. All nine env subtype F1 Angolan sequences identified in our study, however, exhibited a non-recombinant subtype F1 profile at pol. Four samples were further subjected to near full-length genome sequencing and were classified as non-recombinant F1 strains, showing for the first time that "pure" HIV-1 subtype F1 viruses are common in Angola. The detection of "pure" subtype F1 genomes at relatively high prevalence in Angola is surprising considering the co-circulation of all group M subtypes and the great number of URFs described in the country [3537].

Phylogenetic analyses of subtype F1 strains isolated worldwide suggest that the original diversification of subtype F1 occurred within the DRC and subsequently spread to Angola, Romania and South America. Despite the intense population mobility between Angola and the neighbor DRC, subtype F1 variants from both countries were not highly intermixed in the phylogenetic trees. Most Angolan subtype F1 strains segregated in a monophyletic cluster nested within the strains from the DRC; suggesting that most subtype F1 infections in Angola derived from a single founder event. Only two Angolan samples described previously branched outside this major Angolan cluster, and probably represent independent introductions of subtype F1 into Angola. Thus, the subtype F1 epidemic in Angola seems to be mainly driven by internal spread of a local subtype F1 variant introduced sometime in the past, rather than by repeated introductions of subtype F1 variants from neighboring countries.

The high prevalence of subtype F1 in Angola, the strong relationships between Angola and Brazil, and the coincidence between the estimated onset date of the subtype F1 epidemic in Brazil (around 1975–1980) [23, 38, 39] and the beginning of the Angolan civil war (in 1975) lead us to suggest that the subtype F1 found in Brazil (and South America) probably originates from Angola. All phylogenetic analyses conducted in sub-genomic and full-genome regions showed, however, that the South American and Angolan F1 sequences shared a distant common ancestor and segregated in independent clusters within the subtype F1 phylogeny; suggesting no direct epidemiological link between these lineages. Two strains isolated in Belgium (BE.93.VI850) and Finland (FI.93.FIN9363), but probably originating in the DRC and Kenya, were the African sequences most closely related to the South American cluster. Whether the South American subtype F1 lineage originated from the DRC or from some lineage circulating in Angola as a minor form is unclear.

Surprisingly, all phylogenetic analyses conducted in this study revealed that the subtype F1 variants from Angola and Romania are more related to each other than to any other subtype F1 strains isolated elsewhere. Analysis of env region reveals that isolates from Angola occupied basal positions within the Angola/Romania clade, whereas Romanian isolates branched as a monophyletic sub-cluster, indicating that Angola could be the epicenter of the subtype F1 epidemic that spread into Romania. Analyses of pol and full-length genome, however, showed that Romanian and Angolan subtype F1 sequences segregate in two highly related but distinct sub-clusters, suggesting that both epidemics could have arisen independently from closely related subtype F1 strains probably introduced from the DRC.

The median Tmrca of the African subtype F1 epidemic was estimated to be around the late 1950s, more than 30 years later than the estimated emergence of the HIV-1 group M in Central Africa [43, 44]. Previous studies also place the date of origin of the subtype C epidemic in Africa between the middle 1950s and the middle 1960s [4547], supporting a similar emergence date of the two subtype epidemics. The simultaneous appearance of group M subtypes suggests that some extrinsic factors could have played roles in producing and spreading high numbers of HIV-1 infections in the DRC during the period of 1950–1960. Of note, this period coincides with massive people movements from rural areas to the major cities (e.g., the population of Kinshasa increased almost 10-fold from 1940 to 1960) [48], and dramatic increases in the number of unsterile injections [49] in Central Africa. In such new high-risk social networks, some of the pre-existing HIV-1 group M lineages may have more opportunity for rapid local expansion, and subsequent dissemination out of the epicentre resulting in the global subtypes.

Conclusion

The subtype F1 epidemic, as with most HIV-1 group M clades, probably emerged in the DRC around the late 1950s and subsequently spread locally and globally. Our results suggest that non-recombinant subtype F1 HIV-1 strains are present in Angola at a prevalence higher than in any other African country, and this could have important implications for vaccine design. The subtype F1 epidemic in Angola has its own characteristics, different from the neighbouring DRC; and it is highly related to the subtype F1 epidemic spreading in Romania. The exact origin of the Romanian and South American subtype F1 epidemics remains unclear; and a denser sampling of subtype F1 sequences from Central Africa will be needed in order to reconstruct the history of these epidemics.

Methods

Study population

Forty-six HIV-1-positive samples were collected in 2006 at the "Hospital Esperança" in Luanda, Angola, with the approval of local Ethical Committee. Nine samples were classified as subtype F1 based on the analysis of the envelope (env)-gp41 region, and further subjected to the analysis of the env-gp120 and the polymerase-reverse transcriptase (pol-rt) regions. Four samples were subjected to near full-length (FL) genome analysis. Additionally, five env-gp120 and 11 pol-rt subtype F1 Angolan sequences described previously [36, 37], and one FL subtype F1 sequence of Angolan origin isolated in Spain (Sierra et al, unpublished results) were also downloaded from the Los Alamos HIV Sequence database.

Amplification and sequencing of HIV-1 DNA

DNA samples were extracted from 200 μl of whole blood using a QIAamp DNA kit (Qiagen Inc., CA, U.S.A.), according to the manufacturer's protocol. Partial env and pol regions were PCR-amplified using nested primers as previously described [50]. Amplification of near-FL HIV-1 genomes (~8.5-kb) was obtained by nested PCR of four overlapping fragments of 2,100-bp to 3,100-bp each. Primer sequences and PCR conditions used for nested amplifications are available upon request. PCR products were purified using the Qiagen PCR purification kit (Qiagen) according to the manufacturer's protocol. Purified DNA was sequenced by using the ABI BigDye Terminator v.3.1 cycle sequencing ready reaction kit (Applied Biosystem, CA, U.S.A), and processed with an automated ABI 3100 Genetic Analyzer (Applied Biosystem). Sequence electropherograms were edited and assembled with the Seqman v.7.0 program (DNASTAR).

Sequence alignments

Nucleotide sequences were aligned using CLUSTAL X program [51] and later hand edited. All positions with alignment gaps and regions of ambiguous alignment were removed. Three distinct alignments were used to investigate the phylogenetic relationship among subtype F1 sequences isolated worldwide (Table 1). The first alignment of 310-bp spanned the V3 region of the env gene (positions 7050 to 7374 relative to HXB2) and included a total of 82 subtype F1 sequences isolated worldwide. The second alignment of 866-bp covered part of the pol (reverse transcriptase) gene (positions 2550–3415 relative to HXB2) and contained 67 subtype F1 sequences from all over the world, and 10 F1/D recombinant sequences derived from CRF05_DF strains. The third alignment of ~8,400-bp (positions 790 to 9084 relative to HXB2 reference strain) contained 16 subtype F1 FL sequences (four new and 12 available in the Los Alamos HIV database). A fourth alignment of 310-bp, spanning the same region of the env gene previously described, was used to estimate the onset date of the subtype F1 epidemic in Central Africa and contained 49 subtype F1 sequences of African origin with a known sampling year. All alignments are available from the authors upon request.

HIV-1 subtype classification

Analyses of HIV-1 subtypes and recombination were performed using: 1) the REGA HIV-1 Subtyping Tool [52]; and 2) bootscanning analysis (sliding window of 400 bp, incremental steps of 10 bases, and the Kimura two-parameter model) as implemented in Simplot version 2.5 [53]. Bootstrap support was calculated based on 100 re-samplings.

Phylogenetic analyses

The best-fit model of nucleotide substitution was selected using Modeltest [54] resulting in the general time-reversible nucleotide substitution model with gamma-distributed rate heterogeneity among sites and a proportion of invariable sites (GTR+I+Γ) in all cases. Phylogenetic tree reconstructions were performed by Bayesian method using MrBayes version 3.1.2 [55]. For each dataset, two runs of four chains each (one cold and three heated, temp = 0.20) were run for 50 × 106 generations, with a burn-in of 10 × 106 generations. Convergence of parameters was assessed by calculating the Effective Sample Size (ESS) using TRACER v1.4 [56], excluding an initial 10% for each run. All parameter estimates for each run showed ESS values >100. A final Bayesian majority-rule consensus tree was obtained for each data set. Maximum Likelihood trees were reconstructed with PhyML [57] using an online web server [58] and assessing phylogenetic confidence by bootstrap with 100 replicates. Trees were visualized using the FigTree v1.1.2 program available at http://tree.bio.ed.ac.uk/software/figtree/.

Estimation of evolutionary rates and dates

Estimates of the evolutionary rate (μ, units are nucleotide substitutions per site per year, subst./site/year) and the time of the most recent common ancestor (Tmrca, years) of the Central African subtype F1 epidemic were performed using a Bayesian Markov Chain Monte Carlo (MCMC) approach as implemented in BEAST v1.7 [59, 60]. The time span covered by the African subtype F1 env sequences (i.e., 22 years) was sufficient to reliably estimate the evolutionary parameters under a chronological time-scale employing the dates of the sequences. Analyses were performed with a Bayesian Skyline coalescent tree prior [61], under the GTR+I+Γ nucleotide substitution model, and using both a strict and a relaxed (uncorrelated Lognormal model) molecular clock [62]. Three separate MCMC chains were run for 1 × 107 generations with a burn-in of 1 × 106. BEAST output was analyzed using TRACER v1.4, with uncertainty in parameter estimates reflected in the 95% Highest Posterior Density (HPD) intervals. All Bayesian MCMC independent runs converged to almost identical values for all parameters, and the ESS values for parameter estimates were >100. The results reported are the combined estimates of the three independent runs.

GenBank accession numbers

Sequences were deposited in GenBank under accession numbers FJ900256 to FJ900269.