Contrasting Patterns of Sequence Evolution at the Functionally Redundant bric à brac Paralogs in Drosophila melanogaster
- First Online:
- Cite this article as:
- Bickel, R.D., Schackwitz, W.S., Pennacchio, L.A. et al. J Mol Evol (2009) 69: 194. doi:10.1007/s00239-009-9265-y
Genes with overlapping expression and function may gradually diverge despite retaining some common functions. To test whether such genes show distinct patterns of molecular evolution within species, we examined sequence variation at the bric à brac (bab) locus of Drosophila melanogaster. This locus is composed of two anciently duplicated paralogs, bab1 and bab2, which are involved in patterning the adult abdomen, legs, and ovaries. We have sequenced the 148 kb genomic region spanning the bab1 and bab2 genes from 94 inbred lines of D. melanogaster sampled from a single location. Two non-coding regions, one in each paralog, appear to be under selection. The strongest evidence of directional selection is found in a region of bab2 that has no known functional role. The other region is located in the bab1 paralog and is known to contain a cis-regulatory element that controls sex-specific abdominal pigmentation. The coding region of bab1 appears to be under stronger functional constraint than the bab2 coding sequences. Thus, the two paralogs are evolving under different selective regimes in the same natural population, illuminating the different evolutionary trajectories of partially redundant duplicate genes.
KeywordsDrosophila melanogasterbric à bracPopulation geneticsPigmentationDuplicated genes
Gene duplication makes an important contribution to the evolution of novel functions and the modifications of existing functions (reviewed in Prince and Pickett 2002), and duplicated genes are prevalent throughout metazoans (Holland et al. 1994; Amores et al. 1998; Force et al. 1999; Holland 1999; Cresko et al. 2003; Amores et al. 2004). Two major theories have been advanced for the maintenance of gene duplications. One theory (the neo-functionalization model) postulates that one of the duplicated genes evolves a novel function while losing some aspect of the ancestral function. Thus, both genes are maintained by natural selection, one for the ancestral function and the other for the new function (Ohno 1970). Alternatively, each gene could accumulate complementary degenerative mutations in either coding or regulatory regions, resulting in the loss of a subset of the pre-duplication gene activity. Both copies will then be maintained by selection since both are needed to preserve the ancestral function (the sub-functionalization model) (Force et al. 1999; Lynch and Force 2000).
Both models predict that, once duplicated genes occupy different functional niches, they may come under different selective regimes. Paralogous gene regions responsible for redundant functions may experience similar selective pressures, if their overall activity affects fitness traits. On the other hand, selection acting on different functions may shape sequence evolution at functionally differentiated regions, and the mode and intensity of this selection may be different for each trait. As a result, duplicated genes would evolve in different modes and at different rates, with different functional elements dominating the evolution of each paralog. These ideas have primarily been tested using recently duplicated genes, but even old duplicates can share some functions. Here, we analyze the bric à brac (bab) locus of Drosophila melanogaster, which contains the duplicated paralogs bab1 and bab2 to determine whether we can detect these patterns using intraspecific variation.
Many of the structures patterned by the bab genes are sexually dimorphic, including the gonad (Sahut-Barnola et al. 1995), the sex combs on the front legs of males (Godt et al. 1993; Barmina and Kopp 2007; Randsholt and Santamaria 2008), ventral abdominal bristles (Kopp et al. 2000), and the dorsal abdominal pigmentation pattern (Kopp et al. 2000; Williams et al. 2008), which suggests that sexual selection may have acted on the bab locus. Furthermore, it has been demonstrated that sex combs are important for male mating success (Ng and Kopp 2008), ovaries are critical for reproduction and fecundity, and abdominal pigmentation plays a role in thermoregulation and desiccation resistance (Gibert et al. 1996; Brisson et al. 2005), suggesting that the bab1 and bab2 genes may experience selection on a variety of functions.
The bab1 and bab2 genes have sequence conservation in two protein domains, BTB/POZ and BabCD (bric à brac conserved domain) (Couderc et al. 2002). The BTB (Broad, Tramtrac, Bab) domain is shared by a large number of developmentally regulated genes and is involved in protein–protein interactions (Zollman et al. 1994), including bab1 homodimerization in vitro (Chen et al. 1995). The BabCD is composed of a Psq and AT-hook domains that are both involved in DNA binding, suggesting that the bab genes may act as transcriptional regulators (Reeves and Nissen 1990; Lehmann et al. 1998; Couderc et al. 2002; Lours et al. 2003). Both bab genes contain a single large intron (20 and 50 kb, respectively) that is present in an evolutionarily conserved position (Couderc et al. 2002). In both genes, this intron separates 5′ exons, which contain the protein interaction domain (BTB) from 3′ exons that contain the DNA binding region (BabCD) (Fig. 1).
bab1 and bab2 have largely overlapping expression patterns, with bab1 present in a subset of bab2-expressing cells. In the ovary, bab1 is expressed exclusively in the terminal filaments, while bab2 is expressed strongly in the terminal filaments and more weakly in apical cells of the ovary (Couderc et al. 2002). Flies with bab mutations that affect both paralogs show defects in terminal filament formation, apical cells, and basal stalk primordium, resulting in sterile females and ovaries with only a few rudimentary ovarioles, while mutations that affect a single bab gene result in weaker phenotypes (Godt and Laski 1995; Couderc et al. 2002). Both duplicate genes also contribute to the patterning of distal antennae and legs during larval and pupal development (Godt et al. 1993; Chu et al. 2002; Couderc et al. 2002). Again, the strongest phenotypes result from bab mutations that affect both bab1 and bab2, causing a complete fusion of the second through fifth tarsal segments, while mutations that affect only one of the genes result in intermediate phenotypes.
In the abdomen, the bab genes play a central role is specifying sexually dimorphic pigmentation patterns (Kopp et al. 2000). bab mutations have a dominant effect resulting in wider pigmentation bands, with the strongest phenotype seen in the most posterior segments (Couderc et al. 2002). Moreover, genetic variation at the bab locus is associated with intraspecific variation in the pigmentation of posterior abdominal segments in D. melanogaster females (Kopp et al. 2003). bab1 and bab2 are expressed in similar spatial patterns in the developing abdominal epidermis (Kopp et al. 2000; Williams et al. 2008), and artificial over-expression experiments show that both genes are capable of partially rescuing the bab mutant phenotypes (Bardot et al. 2002). In all tissues, despite slight differences in bab expression, bab1 and bab2 mutations have very similar phenotypes.
Detailed functional analysis of the bab locus has revealed a number of distinct cis-regulatory elements (CREs) (Williams et al. 2008). Separate enhancers were identified for pupal abdominal epidermis (large intron of bab1), legs (intergenic region between bab1 and bab2), and oenocytes (large intron of bab2). Surprisingly, only a single regulatory element was identified for each tissue that expresses both bab1 and bab2, raising the possibility that both paralogs may be controlled by the same “core” CREs. This does not rule out the existence of other, paralog-specific regulatory elements that modulate the expression of each gene in a more subtle way. If such modifier elements exist, the expression of each paralog could evolve independently and be subject to different selective regimes.
In summary, the two bab genes have largely overlapping expression and developmental roles, yet they show evidence of distinct functional specificities. At the same time, their involvement in a variety of sex-specific processes suggests that these genes could experience many competing selective pressures. In principle, both paralogs could be dominated by similar selective pressures, reflecting their shared functions. Alternatively, bab1 and bab2 could show different patterns of selection, suggesting that unique functions of the paralogs are shaping sequence evolution in the region. To distinguish between these modes of evolution, we analyzed intraspecific variation throughout the bab genomic region.
Materials and Methods
We have resequenced the bab genomic region including the bab1 and bab2 genes and the flanking intergenic regions from 94 inbred strains extracted from a single natural population at the Wolfskill orchard in Winters, CA. The 35 Wolfskill-1 (W1), 56 Wolfskill-3 (W3), and 3 A1 lines were all collected from the same orchard but in separate years. Eighty-three of the Wolfskill lines were chosen at random, while the remaining lines were chosen for inclusion because of their light abdominal pigmentation pattern. The removal of these lines from the analysis did not significantly change the results. All lines from the Wolfskill collections were inbred by full-sib mating for a minimum of 20 generations, while the A1 lines were inbred for at least 10 generations by the same method.
Sanger based sequencing (ABI 3730xl) was performed at the Joint Genome Institute. Overlapping 1 kb amplicons were designed across the region; successful amplicons were sequenced from both strands. Base calls and polymorphisms were initially identified using Phred and PolyPhred 6.11 (Ewing and Green 1998; Ewing et al. 1998; Stephens et al. 2006). Using Consed, insertion/deletions (indels) were identified and polymorphisms were checked for accuracy (Gordon et al. 1998). Although effort was made to obtain complete coverage, we were unable to sequence any of the strains for two regions that together cover approximately 5 kb. These regions are identified as repetitive by RepeatMasker (Smit 1996–2004), and each region contains a transposable element in the D. melanogaster reference genome sequence (Adams et al. 2000). Since transposable elements present in the reference annotation are rarely found in other strains at appreciable frequencies (Petrov et al. in preparation), we did not attempt to verify their presence in our lines. On average, we have sequence information from 90% of the lines for any given polymorphism.
Sliding window analysis was used to calculate population-genetic test statistics in 10 kb windows that were moved by 2 kb steps across the length of the bab region. Theta values (π, θW, and θH), Tajima’s D, Fu and Li’s D and Fu’s F were calculated using the compute implementation of libsequence library (Thornton 2003) and custom scripts, using the D. simulans genome sequence as an outgroup when appropriate (Tajima 1989; Fu and Li 1993; Fay and Wu 2000; Thornton 2003; Zeng et al. 2006). Fst was calculated as described in Hudson et al. (1992). Polarized and unpolarized McDonald–Kreitman (MK) tests (McDonald and Kreitman 1991) were performed as described by Begun et al. (2007). All figures were produced using the R statistical package (http://www.R-project.org).
Summary statistics of sequence variation in the bab region
Nucleotide diversity (π) and estimates of the population mutation rate (θW) were generally higher in non-coding than coding DNA, suggesting that non-coding sequences are under less functional constraint (Table 1). Intronic, intergenic, and UTR regions have similar values of Tajima’s D, which is consistent with genome-wide studies in D. melanogaster (Andolfatto 2005). The bab region shows little linkage disequilibrium (LD), with average correlation between polymorphisms (r2) dropping off rapidly within 300 bps (Fig. 2a). D′, a quantitative measure of LD normalized for allele frequency (Lewontin 1964), has a slower decline and is constant after 1 kb (Fig. 2b). The short range of LD suggests that different regions of the bab locus can, in principle, evolve independently of one another.
We also compared high-frequency derived and intermediate-frequency alleles using Fay and Wu’s H statistic (Fay and Wu 2000; Zeng et al. 2006). A low value of H indicates a higher than expected number of derived alleles, making it a powerful test for detecting positive selection and the initial stages of balancing selection (Zeng et al. 2006). We used D. simulans genome sequence as an outgroup to polarize SNP alleles in D. melanogaster. Similar to Tajima’s D, the strongest negative values of Fay and Wu’s H are found near the 3′ end of the bab2 transcript, with no comparable signature in the paralogous bab1 region (Fig. 3c). This pattern provides additional evidence for directional selection acting on the region near the 3′ end of bab2. In addition, the H statistic shows a region in the large intron of bab1 with strongly negative values, suggesting an additional region under selection, which was not detected with the D statistic.
Fu and Li’s D and Fu’s F statistics compare the frequencies of derived and ancestral alleles to detect deviations from the neutral expectation (Fu and Li 1993; Fu 1997). Negative values of D and F indicate an excess of derived mutations (an excess of external branches in the gene tree), while positive values show a deficiency of derived alleles (excess of internal branches). Fu and Li’s D is particularly sensitive to background selection—a reduction of diversity at a neutral locus due to selection against linked deleterious mutations (Charlesworth et al. 1993). We find negative values of D and F in the large introns of both bab1 and bab2, with peak values in the 10 kb windows centered near the 28,000 and 122,000 bp marks (Fig. 3d). These regions overlap with the locations of repetitive sequences and transposable element insertions in the reference genome sequence (Fig. 1). The same pattern remains if we repeat the analysis with these repetitive regions masked. Repetitive sequences are often a source of frequently occurring deleterious mutations, and the low values of D and F may arise when these mutations are removed by background selection.
Summary statistics comparing the W1 and W3 sequence samples across the bab region
McDonald–Kreitman test for genes in the bab region
If the bab genes are functionally redundant, it is possible that deleterious alleles in one gene are compensated by a functional allele of the other gene. We tested for compensatory evolution between the bab1 and bab2 transcripts. We found no correlation between the number of low frequency (likely deleterious) alleles in the bab1 and bab2 coding regions (P > 0.05 for synonymous, non-synonymous, and total changes), nor do we find long-range LD between polymorphisms in the bab1 and bab2 transcripts. This suggests that there is not compensation been the bab1 and bab2 alleles.
Duplicate genes persist in the genome due to the acquisition of new functions or the subdivision of the ancestral role. Over time, paralogous proteins may acquire subtle functional changes or gain entirely different biological activities (Hirth et al. 2001; Zhang et al. 2004). Alternatively, the proteins may share similar specificity while gene expression patterns diverge due to cis-regulatory changes, leading to the acquisition of different functional roles (Greer et al. 2000). The two mechanisms are not mutually exclusive, and both can operate on the same pair of paralogs. At the bab locus, the two duplicated genes have similar but non-identical expression patterns (Couderc et al. 2002) despite sharing at least some CREs (Williams et al. 2008). This suggests that some functions of these paralogs may experience shared constraints, while others may evolve independently.
Numerous studies have shown that duplicated genes diverge rapidly in expression (Gu et al. 2002; Makova and Li 2003; Gu et al. 2005), and that the rate of expression divergence is highest immediately after gene duplication and slows down over time (Jordan et al. 2004; Gu et al. 2005). This pattern is consistent with either directional selection (neo-functionalization model) or the relaxation of purifying selection (sub-functionalization model) acting during the early stages of gene divergence, and the relative contributions of these forces continue to be debated (Yu et al. 2003; Castillo-Davis et al. 2004; Jordan et al. 2004; Kondrashov and Kondrashov 2006). Generally, paralogous genes are more likely to lose ancestral expression domains than to acquire new ones, indicating that sub-functionalization is probably more common than neo-functionalization (Oakley et al. 2006). Both models predict that, once duplicate genes acquire non-identical functions, they may come under different selective regimes.
In this study, we used a population genetic approach to assess the evolutionary forces acting on the bab paralogs. The patterns of sequence variation suggest that selective pressures vary across the bab locus. Two regions show indications of selection. First, a region near the 3′ end of bab2 (which includes bab2 3′ exons, introns, and intergenic region) appears to experience directional selection (Fig. 3b, c). Furthermore, selection in this region may vary over time, as indicated by the difference between population samples collected in different years. Surprisingly, no CREs have been found in this region (Williams et al. 2008), although it remains possible that it contains regulatory elements that modulate transcriptional activity but cannot function independently in transgenic assays. Future analysis is required to determine whether the coding or non-coding DNA is driving this signature of selection. The second region that appears to be under selection is located in the large intron of bab1 (Fig. 3c). This region contains the CRE that controls female specific expression of bab in the abdominal epidermis (Williams et al. 2008), suggesting that selection may be acting on the sexually dimorphic pigmentation of D. melanogaster. Furthermore, a recent study found that this same region was differentiated between northern and southern D. melanogaster populations in North America and Australia (Turner et al. 2008). In the coding regions, bab1 exhibits stronger selective constraint than bab2. One possible explanation is that the two proteins have somewhat different functional activities despite being expressed in largely overlapping patterns.
Given our data, it seems that the bab homolgs are most likely maintained due to sub-functionalization. Previous work on the bab locus has shown that both bab genes are expressed in the same tissues during development (Couderc et al. 2002). This suggests that both genes probably maintain similar functions as the ancestral bab gene. We have found that the coding and non-coding DNA show differences in sequence evolution. Thus, within the ancestral functions it is likely that the bab genes have divided their roles such that both are indispensable and thus maintained.
Several recent studies have used comparative genomic approaches to examine the role of selection in the evolution of duplicate genes. Such analyses are based on variation in the rate of expression divergence over time (Jordan et al. 2004; Gu et al. 2005), across phylogenetic lineages (Shiu et al. 2006), or on the correlation between the rate of expression and sequence divergence (Yu et al. 2003; Castillo-Davis et al. 2004). However, these long-term evolutionary patterns are consistent with either selective or neutral explanations (Castillo-Davis et al. 2004; Jordan et al. 2004; Kondrashov and Kondrashov 2006), and are best suited for detecting selection at the genome-wide level rather than individual loci. A population-genetic approach brings an alternative perspective to this question, since it is explicitly designed to test for selection acting on specific DNA sequences. As genome-wide analyses of intraspecific variation become possible (Begun et al. 2007), an integration of population-genetic and comparative-genomic approaches will shed new light on the relative importance of positive selection and neutral changes in the maintenance and evolution of paralogous genes.
We would like to thank Anna Ustaszewska and Danielle Tufts for technical help, and Tina Hu and Jennifer Brisson for comments on the manuscript. This work was supported by NSF grant DEB-0548991 to AK and SN.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.