Human Genetics

, Volume 128, Issue 6, pp 577–588

Long-term balancing selection maintains trans-specific polymorphisms in the human TRIM5 gene

Authors

  • R. Cagliani
    • Scientific Institute IRCCS E. Medea
  • M. Fumagalli
    • Scientific Institute IRCCS E. Medea
    • Bioengineering DepartmentPolitecnico di Milano
  • M. Biasin
    • Chair of Immunology, DISP LITA VialbaUniversity of Milano
  • L. Piacentini
    • Chair of Immunology, DISP LITA VialbaUniversity of Milano
  • S. Riva
    • Scientific Institute IRCCS E. Medea
  • U. Pozzoli
    • Scientific Institute IRCCS E. Medea
  • M. C. Bonaglia
    • Scientific Institute IRCCS E. Medea
  • N. Bresolin
    • Scientific Institute IRCCS E. Medea
    • Dino Ferrari Centre, Department of Neurological SciencesUniversity of Milan, IRCCS Ospedale Maggiore Policlinico, Mangiagalli and Regina Elena Foundation
  • M. Clerici
    • Chair of Immunology, Department of Biomedical Sciences and Technologies LITA SegrateUniversity of Milano
    • Fondazione Don C. Gnocchi IRCCS
    • Scientific Institute IRCCS E. Medea
Original Investigation

DOI: 10.1007/s00439-010-0884-6

Cite this article as:
Cagliani, R., Fumagalli, M., Biasin, M. et al. Hum Genet (2010) 128: 577. doi:10.1007/s00439-010-0884-6

Abstract

The human TRIM5 genes encodes a retroviral restriction factor (TRIM5α). Evolutionary analyses of this gene in mammals have revealed a complex and multifaceted scenario, suggesting that TRIM5 has been the target of exceptionally strong selective pressures, possibly exerted by recurrent waves of retroviral infections. TRIM5 displays inter-individual expression variability in humans and high levels of TRIM5 mRNA have been associated with a reduced risk of HIV-1 infection. We resequenced TRIM5 in chimpanzees and identified two polymorphisms in intron 1 that are shared with humans. Analysis of the gene region encompassing the two trans-specific variants in human populations identified exceptional nucleotide diversity levels and an excess of polymorphism compared to fixed divergence. Most tests rejected the null hypothesis of neutral evolution for this region and haplotype analysis revealed the presence of two deeply separated clades. Calculation of the time to the most recent common ancestor (TMRCA) for TRIM5 haplotypes yielded estimates ranging between 4 and 7 million years. Overall, these data indicate that long-term balancing selection, an extremely rare process outside MHC genes, has maintained trans-specific polymorphisms in the first intron of TRIM5. Bioinformatic analyses indicated that variants in intron 1 may affect transcription factor-binding sites and, therefore, TRIM5 transcriptional activity. Data herein confirm an extremely complex evolutionary history of TRIM5 genes in primates and open the possibility that regulatory variants in the gene modulate the susceptibility to HIV-1.

Introduction

The TRIM5 gene encodes a member of the tripartite motif protein family which counts more that 70 members in the human genome. TRIM5 is located on human chromosome 11, in a cluster of four TRIM genes. Several transcripts originate from TRIM5 by alternative splicing; the longest splicing isoform (TRIM5α) contains a SPRY domain and possesses antiviral activity (Stremlau et al. 2004). Human TRIM5α has been shown to restrict some retroviruses but it is scarcely efficient against HIV (Stremlau et al. 2004; Kaiser et al. 2007). Conversely, orthologs from macaque and other primates are highly efficient in restricting HIV, possibly by binding to the incoming viral capsid and leading to its premature disassembly (Stremlau et al. 2006).

The species-specificity of TRIM5α against retroviruses is thought to be the result of aminoacid variations that have been selected along primate evolution to fend off the threat imposed by ancient or ongoing retroviral infections and TRIM5 genes have been selection targets in many mammalian species (reviewed in Johnson and Sawyer. 2009). Thus, Sawyer and co-workers (2007) showed that the SPRY domain has undergone multiple episodes of positive selection in primates and the same protein region has experienced length variation and segmental duplications in different primate lineages (Song et al. 2005). Expansions, deletions and duplications of the entire TRIM5 gene have also been observed in different mammalian species (Sawyer et al. 2007; Tareen et al. 2009) and chimeric TRIM5-cyclophilyn genes have arisen independently at least twice during the evolutionary history of primates (Virgen et al. 2008).

Recent data indicated that TRIM5 has evolved under long-term balancing selection in some primate species, and trans-specific polymorphism in macaques and sooty mangabeys have been identified (Newman et al. 2006). Old polymorphisms shared by multiple species are extremely rare and are generally considered a compelling evidence of long-term balancing selection, as the maintenance of a neutral allele over long evolutionary times is quite unlikely (unless species are very closely related) (Charlesworth 2006). The best known examples of trans-specific polymorphisms involve MHC loci in multiple species (including humans) and the self-incompatibility genes of certain plants and fungi (Charlesworth 2006).

In humans, a recent survey for shared polymorphisms with chimpanzees outside the MHC revealed no instance that could be ascribed to the action of balancing selection but rather to coincidental mutation (Asthana et al. 2005).

We analyzed nucleotide variation at the TRIM5 locus in humans and chimpanzees; results indicate that intron 1 harbours trans-specific polymorphisms maintained by long-term balancing selection and displays extreme nucleotide diversity levels.

Materials and methods

DNA samples and sequencing

Human genomic DNA was obtained from the Coriell Institute for Medical Research; all individuals have been included in the HapMap project. These samples only partially coincide with those resequenced by the NIEHS SNP discovery Program; therefore, all analyses were performed separately using either our resequencing data (population genetic analyses) or the NIEHS sample data (sliding window analysis). The genetic material of three unrelated chimpanzees (Pan troglodytes) was purchased from the European Collection of Cell Cultures. All analyzed regions were PCR amplified and directly sequenced; primer sequences are available upon request. PCR products were treated with ExoSAP-IT (USB Corporation, Cleveland, OH, USA), directly sequenced on both strands with a Big Dye Terminator sequencing Kit (v3.1 Applied Biosystems) and run on an Applied Biosystems ABI 3130 XL Genetic Analyzer (Applied Biosystems). Sequences were assembled using AutoAssembler version 1.4.0 (Applied Biosystems), and inspected manually by two distinct operators.

Data retrieval and haplotype construction

Genotype data for 2-kb regions from 238 resequenced human genes were derived from the NIEHS SNPs Program web site. In particular, we selected genes that had been resequenced in populations of defined ethnicity including Europeans (EU), Yoruba (YRI) and Asians (AS) (NIEHS panel 2).

Haplotypes were inferred using PHASE version 2.1 (Stephens et al. 2001; Stephens and Scheet 2005), a programme for reconstructing haplotypes from unrelated genotype data through a Bayesian statistical method. When inferring haplotypes using PHASE, we monitored confidence probabilities associated with each phase call. Most of them are nearly 1 and the overall probabilities associated with each individual are quite high (>75%) with only few exceptions due to singleton assignments. Haplotypes for individuals resequenced in this study are available as supplemental material (Supplementary Table 1).

Linkage disequilibrium analyses were performed using the Haploview (v. 4.1) (Barrett et al. 2005) and blocks were identified through an algorithm implemented in the software (Gabriel et al. 2002).

Statistical analysis

Tajima’s D (1989), Fu and Li’s D* and F* (1993) statistics, as well as diversity parameters θW (Watterson 1975) and π (Nei and Li 1979) were calculated using libsequence (Thornton 2003), a C++ class library providing an object-oriented framework for the analysis of molecular population genetic data. Calibrated coalescent simulations were performed using the cosi package (Schaffner et al. 2005) and its best-fit parameters for YRI, EU and AS populations with 10,000 iterations. The maximum-likelihood-ratio HKA test was performed using the MLHKA software (Wright and Charlesworth 2004), as previously proposed (Fumagalli et al. 2009). Briefly, 16 reference loci were randomly selected among NIEHS loci shorter than 20 kb that have been resequenced in the three populations; the only criterion was that Tajima’s D did not suggest the action of natural selection (i.e. Tajima’s D is higher than the 5th and lower than the 95th percentiles in the distribution of NIEHS genes). The reference set was accounted for by the following genes: VNN3, PLA2G2D, MB, MAD2L2, HRAS, CYP17A1, ATOX1, BNIP3, CDC20, NGB, TUBA1, MT3, NUDT1, PRDX5, RETN and JUND.

In all analyses, the chimpanzee sequence was used as the out-group because the orthologous regions in orangutan and macaque is interrupted by several transposon insertions.

In order to test for gene conversion events, we applied Sawyer’s gene conversion algorithm (Sawyer 1989) implemented in the GENECONV program. Significance was assessed using the approximate p value method described in (Karlin and Altschul 1990, 1993). We performed several tests by varying the mismatch penalty from 0 to 3. For all these runs, no pairwise or global p value involving TRIM5 intron 1 resulted to be significant, suggesting no apparent gene conversion in this region.

Haplotype analysis and TMRCA calculation

The reduced-median network to infer haplotype genealogy was constructed using NETWORK 4.5 (Bandelt et al. 1995). Estimate of the time to the most recent common ancestor (TMRCA) was obtained using a phylogeny-based approach implemented in NETWORK 4.5 using a mutation rate based on the number of fixed differences between chimpanzee and humans (Forster et al. 1996). A second TMRCA was obtained by applying a previously described method (Evans et al. 2005) that calculates the average pairwise difference between all chromosomes and the MRCA: this value was converted into years based on the mutation rate. A third TMRCA estimate derived from application of a maximum-likelihood coalescent method implemented in GENETREE (Griffiths and Tavare 1994, 1995). Again, the mutation rate μ was obtained on the basis of the divergence between human and chimpanzee and under the assumption both that the species separation occurred 6 million years (MY) ago (Glazko and Nei 2003) and of a generation time of 25 years. The migration matrix was derived from previous estimated migration rates (Schaffner et al. 2005). Using this μ and θ maximum likelihood (θML), we estimated the effective population size parameter (Ne). With these assumptions, the coalescence time, scaled in 2Ne units, was converted into years. For the coalescence process, 106 simulations were performed.

All calculations were carried out in the R environment (R Development Core Team 2008).

Analysis of transcription factor-binding sites

Transcription factor-binding sites (TFBS) analysis was performed using TFSEARCH (http://www.cbrc.jp/research/db/TFSEARCH.html) with a threshold score of 80.0. Following this prediction, single matrices for TFBS were retrieved from the Transfac 7.0 database (Heinemeyer et al. 1998) and manually inspected. In addition to the results reported in the text, TFSEARCH predicted the loss of an Arp-1-binding site for the CTC deleted allele. Yet, inspection of Arp-1 matrix revealed that the consensus was based on a small number (n = 9) of sequences and was therefore ignored. Pictograms for E2F1, AP-1 and AML-1-binding sites were derived from M00516, M00517, M00271 Transfac consensus matrices, respectively; each cell value was converted in bits by multiplying its frequency for the information content of that position.

Results

Trans-specific polymorphisms and nucleotide diversity

The presence of trans-specific polymorphisms in the TRIM5α-coding region has been recently reported in macaques and sooty mangabeys (Newman et al. 2006). In order to test whether polymorphisms shared between humans and chimpanzees also exist, we resequenced the genomic region encompassing TRIM5 in three unrelated chimpanzees. Specifically, a total of 13.6 kb were resequenced leaving a gap in the large intron 4 and covering the whole gene including 1.6 kb upstream the transcription start site (Fig. 1). A total of 40 segregating sites were identified; these sites were compared with human polymorphism data. In particular, TRIM5 has been almost fully resequenced by the NIEHS SNP discovery program in five human populations, the major resequencing gap being located in intron 1. In order to fill the gap and obtain human diversity data for this region, we resequenced intron 1 in 20 Yoruba (YRI) individuals. By comparing genetic variation in humans and chimpanzees, we identified two trans-specific polymorphisms, both located in intron 1 (Fig. 1): a 3-bp deletion (CTC) and a G>A substitution (rs34506684). Both polymorphisms occur within transposable elements (LTR12C and PABL_A, respectively) and the latter involves a CpG dinucleotide. The genotype at the two trans-specific polymorphisms for the three resequenced chimpanzees, their ancestral status, and the allele frequencies in human populations are reported in Table 1. As evident, inference of the ancestral state is not robust for the G>A substitution as reference sequences differ in gorilla and orangutan/macaque.
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-010-0884-6/MediaObjects/439_2010_884_Fig1_HTML.gif
Fig. 1

Exon–intron structure of TRIM5. Exonic regions are shown is green. Grey horizontal bars denote the genomic regions we resequenced in chimpanzee. Red bars indicate regions resequenced by the NIEHS SNP discovery Program. The intron 1 gene region we resequenced in human HapMap populations is indicated by the hatched box and the position of the two trans-specific variants is shown. LD in the resequenced region is also shown (D’/Lod) with the inclusion of the two variants we genotyped in exon 2 (rs3740996 and rs10838525, variants 55 and 56, respectively)

Table 1

Frequency in human population, ancestral alleles and chimpanzee genotypes for the two trans-specific polymorphisms

Trans-specific variant

Ancestral allele

Frequencies in human populations

Genotypes of Pan troglodytes

YRI

EU

AS

CP132

WES

EB176

CTC insertion/deletion

CTC insertion (gorilla and oragutan)

33/40

40/40

40/40

ins/del

ins/ins

ins/ins

A>G rs34506684

A (gorilla) T (orangutan) T (macaque)

19/40

27/40

25/40

G/G

A/G

G/G

As reported above, trans-specific polymorphisms are extremely rare and represent a hallmark of long-standing balancing selection. In order to verify whether this is the case for TRIM5, we resequenced a 2,230-bp intron 1 region encompassing the two trans-specific variants in two additional HapMap populations, namely Europeans (EU) and East Asians (AS) (as shown in Fig. 1, this region was only partially covered by NIEHS data). Including data from YRI, we identified a total of 57 variants, 3 and 54 of them being accounted for by small indels and single base substitutions, respectively. The region displays relatively high linkage disequilibrium (LD) in the three populations and is covered by a single LD block which excludes few SNPs at the 3′ end (Fig. 1).

While the G>A trans-specific variant was polymorphic in all populations, the CTC deletion only segregated in populations with African ancestry. We next calculated nucleotide diversity for the 2.2-kb region by using two indexes: θW, an estimate of the expected per site heterozygosity (Watterson 1975), and π, the average number of pairwise sequence nucleotide differences (Nei and Li 1979). In order to obtain an empirical comparison, we calculated θW and π for 2 kb regions deriving from 238 genes resequenced by the NIEHS program in the same populations; the percentile rank corresponding to the TRIM5 intron 1 region is reported in Table 2 and indicates that it displays extremely high nucleotide diversity (all ranks higher than the 99th percentile) in all populations.
Table 2

Nucleotide diversity and neutrality tests for the TRIM5 intron 1 region

Pop.a

Nb

Sc

ΘW (×10−4)

Π (×10−4)

Tajima’s D

Fu and Li’s D*

Fu and Li’s F*

Value

Rankd

Value

Rankd

Value (p)e

Rankd

Value (p)e

Rankd

Value (p)e

Rankd

YRI

40

41

43.15

>0.99

67.73

>0.99

2.01 (0.0013)

0.99

1.85 (0.025)

>0.99

1.11 (0.0016)

0.93

EU

40

34

35.78

>0.99

56.29

>0.99

1.99 (0.023)

0.96

1.15 (0.073)

0.90

1.71 (0.021)

0.96

AS

40

34

35.78

>0.99

62.06

>0.99

2.55 (0.0058)

0.99

1.15 (0.078)

0.94

1.94 (0.0001)

0.99

aPopulation

bSample size (chromosomes)

cNumber of segregating sites

dPercentile rank relative to a distribution of 2-kb windows from NIEHS genes (n = 238)

ep value calculated by coalescent simulations

Neutrality tests

Under neutral evolution, values of θW and π are expected to be roughly equal; for the TRIM5 intron 1 region this is not the case, π being definitely higher than θW in all populations (Table 2). Tajima’s D (DT) (1989) evaluates departure from neutrality by comparing θW and π. Positive values of DT indicate an excess of intermediate frequency variants and are a hallmark of balancing selection. Fu and Li’s F* and D* (1993) are also based on SNP frequency spectra and differ from Tajima’s D in that they also take into account whether mutations occur in external or internal branches of a genealogy. The statistical significance of these statistics is calculated by performing coalescent simulations. Since, in addition to selective processes, population demographic history affects allele frequency spectra, we performed coalescent simulations using population genetics models that incorporates demographic scenarios (Schaffner et al. 2005; Voight et al. 2005; Marth et al. 2004) (Supplementary Table 2). Also, in order to disentangle the effects of selection and population history, we exploited the fact that selection is a locus-specific force while demography affects the whole genome. Thus, we compared data obtained for the TRIM5 intron 1 region to those of 2 kb windows deriving from NIEHS genes. Neutrality tests for the TRIM5 intron 1 region indicated departure from neutrality in all populations with significantly positive values for most statistics and using all demographic models. In line with these findings, DT and Fu and Li’s F* calculated for the TRIM5 intron 1 region rank above the 95th percentile in the distribution of 2-kb reference windows in all populations.

Another commonly used test to verify departure from selective neutrality is the HKA (Hudson–Kreitman–Aguadè) (Hudson et al. 1987); it is based on the assumption that under neutral evolution, the amount of within-species diversity correlates with levels of between-species divergence, since both depend on the neutral mutation rate. In particular, an excess of intra-specific diversity compared to divergence (k > 1) is considered a signature of balancing selection. Here, we performed a multi-locus maximum likelihood HKA (MLHKA) test for the TRIM5 intron 1 region using the MLHKA software (Wright and Charlesworth 2004) and data from 16 reference genes (see “Materials and methods”). Results are shown in Table 3 and indicate that for the TRIM5 intron 1 gene region a significant excess of polymorphisms compared to divergence is observed in all populations.
Table 3

MLHKA test

Pop.

ka

p

YRI

5.09

4.06 × 10−5

EU

5.97

1.25 × 10−5

AS

6.28

6.08 × 10−6

aSelection parameter (k > 1 indicates an excess of polymorphism relative to divergence)

We selected the region to resequence on the basis of its carrying trans-specific polymorphisms and signatures of long-standing balancing selection are usually localized over relatively short regions (Charlesworth 2006). Nonetheless, we wished to verify that the selection signature we identified in intron 1 is not influenced by the presence of a linked balanced polymorphism within, for example, exon 2, where nonsynonymous variants have been described. We exploited the availability of resequencing data on the whole gene (available through the NIEHS SNP discovery website) and calculated human–chimpanzee divergence, as well as θW and π in sliding windows. For this analysis only individuals resequenced by the NIEHS SNP discovery program were used. As shown in Fig. 2 (and in Supplementary Figs. 1 and 2 for EU and AS), divergence is relatively constant across the entire gene region, whereas a peak in nucleotide diversity is observed in intron 1, within the region we analyzed. In line with a previous analysis findings (Sawyer et al. 2006), nucleotide diversity does not support the action of natural selection in exon 2 or other coding regions. In an attempt to infer the possible function of variants in intron 1, we analyzed whether the two trans-specific polymorphisms potentially alter known TFBS (see “Materials and methods”). No alteration of TFBS was observed for the CTC deletion polymorphism compared to the ancestral allele. Conversely, as shown in Fig. 3, the trans-specific SNP affects putative TFBSs, with the G allele creating a site for E2F1, and the A allele introducing binding sites for AP-1 and AML-1 (official gene name: RUNX1).
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-010-0884-6/MediaObjects/439_2010_884_Fig2_HTML.gif
Fig. 2

Sliding window analysis of nucleotide diversity and divergence. Genotype data were retrieved from the NIEHS website and refer to YRI. θW (red line), π (red hatched line) and human–chimpanzee divergence (black line) were calculated in windows of 3,000 bp moving with a step of 25 bp. Please note that values on the y axis do not correspond to those in Table 2 as the latter refer to per site θW and π. Grey shading represent resequencing gaps in NIEHS data; green regions denote exons and the region we resequenced is reported in yellow

https://static-content.springer.com/image/art%3A10.1007%2Fs00439-010-0884-6/MediaObjects/439_2010_884_Fig3_HTML.gif
Fig. 3

Analysis of transcription factor-binding sites. Analysis of TFBS for the two alleles of the G>A trans-specific variant. The location of TFBS that differ between the two alleles is shown with the polymorphic base being reported in red. Consensuses for AP-1, AML-1 and E2F1 are shown as pictograms: the height of each position reflects the information content at that position. The relative height of a letter in each position reflects the frequency of observing the nucleotide. The corresponding sequence in TRIM5 intron 1 is reported below pictograms and the polymorphic position is denoted with an asterisk

As reported above, TRIM5 is part of a large gene family and is located within a cluster of TRIM genes, raising the possibility that non-homologous gene conversion is responsible for the high nucleotide diversity we observed. Although sequence homology among TRIM genes is low in intronic regions, we wished to exclude that TRIM5 intron 1 undergoes non-allelic gene conversion with other paralogous TRIM genes on chromosome 11. To this purpose, we applied Sawyer’s gene conversion algorithm (Sawyer 1989) which identified no region of apparent gene conversion within TRIM5 intron 1 (see “Materials and methods”).

Finally, a copy number polymorphisms encompassing human TRIM5 has been described (Zogopoulos et al. 2007): the variant is quite rare with two duplication instances described in 1,190 subjects, indicating that it is unlikely to affect our results.

Haplotype analysis and TMRCA estimates

In order to study the genealogy of the TRIM5 intron 1 region haplotypes we constructed a reduced-median network (Bandelt et al. 1995) (Fig. 4). The topology indicates the presence of two major clades separated by long branches and a few recurrent mutations (positions 7, 19, 26, and 36). These latter may be due to undetected recombination/gene conversion events or to multiple mutational hits at the same site (true homoplasies). However, the homoplasies created by recombination tend to cluster together on the DNA molecule, whereas true homoplasies do not. This observation and the occurrence of three homoplasic positions (7, 26, and 36) at CpG dinucleotides suggest that they may be due to repeated mutation events. The two trans-specific variants are indicated in the network: while the G>A polymorphism is located along one of the two basal branches (variant 36), the CTC deletion defines a minority of African haplotypes in clade A.
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-010-0884-6/MediaObjects/439_2010_884_Fig4_HTML.gif
Fig. 4

Genealogy of TRIM5 intron 1 haplotypes reconstructed through a reduced-median network. Each node represents a different haplotype, with the size of the circle proportional to frequency. Nucleotide differences between haplotypes are indicated on the branches of the network. Circles are colour-coded according to population (green YRI, blue EU, red AS). The most recent common ancestor (MRCA) is also shown (black circle). The black star indicates the trans-specific CTC insertion/deletion polymorphism

In order to estimate the time to the most recent common ancestor (TMRCA), we applied a phylogeny-based method. For this calculation only SNPs were used (i.e. the CTC and the other two indels were not included). Using a mutation rate based on 28 fixed differences between chimpanzee and humans and a separation time of 6 MY (Glazko and Nei 2003), we estimated a TMRCA of 6.25 MY (SD 1.16 MY). A second TMRCA estimate of 7.08 MY (SD 1.7 MY) was obtained by applying a previously described method (Evans et al. 2005) that calculates the average sequence divergence separating the MRCA and each of the chromosomes. Yet, as evident from Fig. 4, the TRIM5 intronic region we analyzed shows some recurrent mutation and/or recombination events that might inflate the TMRCA; we therefore wished to verify these results using GENETREE, which is based on a maximum-likelihood coalescent analysis (Griffiths and Tavare 1994, 1995). The method assumes an infinite-site model without recombination and, therefore, haplotypes and sites that violate these assumptions need to be removed: in this case, one haplotype and six single segregating sites had to be removed. The resulting gene tree, rooted using the chimpanzee sequence, is partitioned into two deep branches (Fig. 5). A maximum-likelihood estimate of θ (θML) of 5.6 was obtained, resulting in an estimated effective population size (Ne) of 24000. Using this method, the TMRCA of the TRIM5 haplotype lineages amounted to 4.23 MY (SD 0.762 MY). Therefore, all the obtained TMRCA estimates are much deeper that those observed for neutrally evolving autosomal loci (Tishkoff and Verrelli 2003) and support the hypothesis whereby long-standing balancing selection has maintained trans-specific polymorphisms in TRIM5 intron 1.
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-010-0884-6/MediaObjects/439_2010_884_Fig5_HTML.gif
Fig. 5

Estimated haplotype tree for the TRIM5 intron 1 gene region. Mutations are represented as black dots and named for their physical position along the regions. The absolute frequency of each haplotype is also reported. Please note that polymorphism numbering does not correspond to that in Fig. 4

Two nonsynonymous variants (H43Y and R136Q, rs3740996 and rs10838525, respectively) in the first coding exon of TRIM5 (exon 2) have been shown to alter antiviral activity (Sawyer et al. 2006; Javanbakht et al. 2006) and their role in modulating the clinical course of HIV-1 infection has been addressed in several studies (Javanbakht et al. 2006; Speelmon et al. 2006; Goldschmidt et al. 2006; van Manen et al. 2008). In order to study how these two variants relate to intron 1 haplotypes, we typed them in YRI, EU and AS. Results indicated that a similar proportion of chromosomes in clades A and B carry the low-frequency 43Y and 136Q alleles (not shown). This is likely the result of historical recombination events along intron 1, as the two variants are in no LD with SNPs in the region we analyzed (Fig. 1).

Discussion

Host–pathogen interactions are a major driver of molecular evolution. TRIM5, with its complex evolutionary history, perfectly exemplifies this concept. Along mammalian evolutionary history the gene has undergone copy number variation, acquisition of new domains by exon capture, protein sequence diversification, and maintenance of balanced polymorphisms (Johnson and Sawyer 2009). These events testify the central role of TRIM5 in antiviral response and suggest that the gene has been constantly subjected to exceptionally strong selective pressures.

Our data add further complexity to the evolutionary history of TRIM5 by showing that a region in intron 1 has been the target of long-standing balancing selection. We resequenced the TRIM5 gene in chimpanzees and identified two polymorphisms that are shared with humans (CTC deletion and G to A substitution). Analysis of the gene region encompassing the two variants indicated that it displays exceptional nucleotide diversity levels and an excess of polymorphism compared to fixed divergence. Consistent with these data, most tests rejected the null hypothesis of neutral evolution for this gene region and calculation of TMRCA estimates yielded extremely deep coalescence times. Specifically, TMRCAs ranging between 4 and 7 MY were obtained, suggesting that the two haplotype clades have been maintained since the time when the human and chimpanzee linages split (Glazko and Nei 2003). Such deep coalescence times are extremely rare in the human genome (Tishkoff and Verrelli 2003). In humans, the only instances of trans-specific polymorphisms that are thought to be maintained by long-term balancing selection are located in the MHC (Charlesworth 2006). Additionally, we have previously described a trans-specific SNP in the defensin beta 1 (DEFB1) promoter, a region that is subjected to long-standing balancing selection (Cagliani et al. 2008). In analogy to one of the two shared variants we identified here, the trans-specific polymorphism in DEFB1 occurs at a CpG dinucleotide, raising the possibility that these SNPs result from coincidental mutation in humans and chimpanzees. Asthana and co-workers (2005) performed a genome-wide search for polymorphisms shared between humans and chimpanzees and retrieved only 11 of such instances, a number similar to the one that would be expected by chance (i.e. as a consequence of coincidental mutation) (Asthana et al. 2005). As the likelihood of recurrent mutations is higher at CpG dinucleotides, it is well possible that the G>A variant we identified arose independently in humans and chimpanzees, rather than being maintained by a selective process. Nonetheless, the two possibilities are not mutually exclusive as balancing selection might maintain variants that independently arose at the same position in two species.

Different is the situation for the CTC deletion. Although the evolutionary dynamics of indels are less well understood compared to those of SNPs, the frequency of small (<10 bp in length) insertion and deletion events in mammals is about 20 times smaller than that of single bas pair substitutions (Cooper et al. 2004); therefore, the possibility that a deletion of the same size arose independently in two species is extremely low, suggesting that the indel polymorphism represents a true trans-specific variant. As shown in Fig. 4, the polymorphic CTC deletion only segregates in the African population, suggesting that either different selective pressures have acted in distinct geographic locations or that demographic effects have influenced the distribution of this variant in different populations.

As mentioned above, both trans-specific polymorphisms occur within transposable elements, namely ERV1 sequences. This observation does not detract to the possibility that these variants are functional, as a large portion of human regulatory sequences was acquired from repetitive elements (van de Lagemaat et al. 2003; Jordan et al. 2003). The localization of the region subjected to balancing selection within intron 1 suggests that it may harbour variants that modulate TRIM5 expression and we noticed that the G>A variant in intron 1 potentially affects TFBSs. Although these data suggest that the two SNP alleles might result in different regulation of TRIM5 transcription, caution should be used in interpreting bioinformatic predictions, as TFBS consensuses are typically short and degenerate. Moreover, the binding specificity of several transcription factors is presently too limited to allow prediction of their binding sites in DNA sequences. Therefore, further analysis will be needed to analyze the role of intron 1 in regulating TRIM5 expression. In this respect, it is worth mentioning that signatures of balancing selection have previously been identified at the promoter/cis-regulatory regions of HLA genes (Cagliani et al. 2008; Tan et al. 2005; Loisel et al. 2006; Liu et al. 2006) and other loci involved in immune response (e.g. CCR5 and DEFB1) (Cagliani et al. 2008; Bamshad et al. 2002); these findings have been interpreted in terms of nucleotide diversity conferring increased regulatory flexibility. Specifically, different alleles/haplotypes might confer preferential expression in a tissue- or cell type-specificity manner, as well as modulate transcription in response to distinct stimuli. This same hypothesis holds for TRIM5, especially in light of its almost ubiquitous expression in human tissues (Sawyer et al. 2007). Interestingly, high levels of TRIM5 mRNA in peripheral blood mononuclear cells (PBMC) have been associated with a reduced risk of HIV-1 infection (Sewram et al. 2009), suggesting that inter-individual variability in TRIM5 expression does exist (at least in PBMC) and may affect susceptibility to viral infections. A number of studies has also addressed the role of TRIM5 variants in modulating the clinical course of HIV-1 infection. Most studies have focused on two nonsynonymous variants located in exon 2, H43Y and R136Q, but contrasting results have been obtained (Javanbakht et al. 2006; Speelmon et al. 2006; Goldschmidt et al. 2006; van Manen et al. 2008). Herein, we analyzed the distribution of H43Y and R136Q in clade A and B haplotypes. Results showing that both variants are equally common in both clades suggest that studies based on the sole typing of the two coding SNPs result in analyses of subjects having the same aminoacid alleles but different alleles in intron 1. Assuming a functional role for variants in intron 1, this observation might partially explain the low consistency among studies.

Sawyer and co-workers (2006) have shown that the 43Y variant decreases TRIM5 restriction activity against two distantly related retroviruses, namely HIV-1 and N-MLV (a murine γ retrovirus). The derived, low-efficiency allele is widespread in many human populations. In line with the data herein, Sawyer et al. (2006) ruled out the possibility that balancing selection acting on exon 2 is responsible for the maintenance of the impaired allele and suggested that relaxation of selective pressure on antiviral response genes together with genetic drift effects might result in the persistence of this variant in human populations. Our data suggest an alternative possibility: although balancing selection is typically limited to narrow genomic regions (Charlesworth 2006) (as it is evident from Fig. 2), depending on the degree of linkage disequilibrium and on physical proximity, variants in flanking genomic regions may be affected by the maintenance of balanced polymorphisms. Therefore, the persistence of the 43Y allele might in part be explained by its being located very close to the balancing selection region.

The introduction of HIV in human populations likely occurred too recently for its selection signatures to be detectable in the human genome. Therefore, we must assume that both the balancing selection signature we identified and the multiple selective events that have acted upon TRIM5 genes in primates and, more generally, in mammals have resulted from other infective agents, possibly extinct retroviruses. The role of past infections in shaping the repertoire and diversity of human antiviral genes has recently been discussed (Emerman and Malik 2010). Our data add further insight into this complex scenario and suggest that a regulatory region in the first intron of TRIM5 has conferred (and possibly still confers) protection against viral infections.

Acknowledgments

MC is supported by grants from Istituto Superiore di Sanita’ “Programma Nazionale di Ricerca sull’ AIDS”, the EMPRO and AVIP EC WP6 Projects, the nGIN EC WP7 Project, the Japan Health Science Foundation, 2008 Ricerca Finalizzata [Italian Ministry of Health], 2008 Ricerca Corrente [Italian Ministry of Health], Progetto FIRB RETI: Rete Italiana Chimica Farmaceutica CHEM-PROFARMA-NET [RBPR05NWWC], and Fondazione CARIPLO. MS is a member of the Doctorate School in Molecular Medicine, University of Milan.

Supplementary material

439_2010_884_MOESM1_ESM.doc (365 kb)
Supplementary material 1 (DOC 365 kb)
439_2010_884_MOESM2_ESM.xls (110 kb)
Supplementary material 2 (XLS 110 kb)

Copyright information

© Springer-Verlag 2010