Cloning and sequencing of the breakpoint regions of inversion 5g fixed in Drosophila buzzatii
- First Online:
- Cite this article as:
- Prazeres da Costa, O., González, J. & Ruiz, A. Chromosoma (2009) 118: 349. doi:10.1007/s00412-008-0201-5
Chromosomal inversions are ubiquitous in Drosophila both as intraspecific polymorphisms and interspecific differences. Many gaps still remain in our understanding of the mechanisms that generate them. Previous work has shown that in Drosophila buzzatii, three polymorphic inversions were generated by ectopic recombination between copies of the transposon Galileo. In this study, we have characterized the breakpoint regions of inversion 5g, fixed in D. buzzatii and absent in Drosophila koepferae and other closely related species. A novel approach comprising four experimental steps was used. First, D. buzzatii BAC clones encompassing the breakpoints were identified and their ends sequenced. Then, breakpoint regions were mapped at high resolution in the Drosophila mojavensis genome sequence. Finally, breakpoint regions were isolated by polymerase chain reaction in D. buzzatii and D. koepferae and sequenced. Our aim was to shed light on the mechanism that generated inversion 5g and specifically to test for an implication of the transposon Galileo. No evidence implicates Galileo or other transposable elements in the origin of inversion 5g that was generated most likely by two independent breaks and non-homologous end-joining repair. Our results show that different inversion-generating mechanisms may coexist within the same lineage and suggest a hypothesis for the evolutionary time and mode of their operation.
Gross rearrangements are large-scale changes in chromosome structure that can be found as polymorphisms within species or as fixed differences between species. The occurrence of rearrangements in evolution has been known for a long time and, using cytological methods, the karyotypic evolution of many groups of plants and animals has been documented (Stebbins 1971; White 1973). The interest in chromosomal evolution has revived in recent years thanks to physical mapping and whole genome sequencing projects that allow us to compare the genomes of different species with an unprecedented resolution. The results of such comparative genomic analyses have shown an unexpected high rate of rearrangement fixation in many lineages and have demonstrated the remarkable flexibility of the eukaryotic genome (Ranz et al. 2001; Coghlan and Wolfe 2002; Eichler and Sankoff 2003; Coghlan et al. 2005; Nakatani et al. 2007; Bhutkar et al. 2008). How chromosomal rearrangements are generated and what are their functional consequences are long-standing but still controversial questions (Casals and Navarro 2007; Hurles et al. 2008).
Naturally occurring chromosomal inversions were first detected by Sturtevant (1917) as crossover suppressors in strains of Drosophila melanogaster. They were later found to be ubiquitous in Drosophila both as intraspecific polymorphisms (Sperlich and Pfriem 1986) as well as interspecific differences (Stone 1962; Powell 1997; Bhutkar et al. 2008). It is generally assumed that the origin and fixation of an inversion is a unique process and that two species bearing the same inversion share a common ancestor (Krimbas and Powell 1992; Wasserman 1992). Relying on this assumption, detailed inversion phylogenies have been elaborated in many species groups. The phylogeny produced by Wasserman (1992) for the Drosophila repleta species group comprises 70 species and includes nearly 300 inversions. Drosophila chromosomal elements often show contrasting patterns of chromosomal evolution (González et al. 2002). Most of the inversions (70.2%) in the repleta group are located on the dynamic chromosome 2 (that represents 23% of the euchromatin), whereas other chromosomes, such as chromosome 5 (containing 20.3% of the euchromatin), are remarkably conservative (6.4% of all inversions; Wasserman 1992).
The origin of Drosophila polymorphic inversions has been investigated in detail in a limited number of cases by isolating and sequencing the inversion breakpoint regions (see Ranz et al. 2007 for a review). Unequivocal evidence for the implication of transposable elements (TEs) has been found in three Drosophila buzzatii inversions: 2j (Cáceres et al. 1999, 2001), 2q7 (Casals et al. 2003), and 2z3 (Delprat et al., in preparation). These three inversions were generated by non-allelic homologus recombination (or ectopic recombination) between copies of the transposon Galileo (Marzo et al. 2008) inserted in opposite orientation at two distant chromosomal sites. A polymorphic inversion of Drosophila pseudoobscura, Arrowhead, was also generated by ectopic recombination between 128 and 315-bp repeats, yet the nature of these repeats and their possible relation to an unidentified TE are obscure (Richards et al. 2005). TE copies have also been found at the breakpoints of two Anopheles gambiae inversions: 2Rd′ and 2La. However, the implication of the TEs in the origin of these inversions is ambiguous (Mathiopoulos et al. 1998; Sharakhov et al. 2006). Another A. gambiae inversion, 2Rj, seemingly arose by ectopic recombination between segmental duplications without the involvement of TEs (Coulibaly et al. 2007). Finally, no TEs or repeats of any kind are seemingly involved in the origin of three D. melanogaster inversions, In(3L) Payne (Wesley and Eanes 1994), In(2L)t (Andolfatto and Kreitman 2000), and In(3R)Payne (Matzkin et al. 2005). These inversions might have been generated by a mechanism of chromosomal breakage and repair by non-homologous end-joining (NHEJ; Ranz et al. 2007).
A number of inversions fixed between Drosophila species have also been investigated trying to elucidate how these rearrangements were generated (Cirera et al. 1995; Bergman et al. 2002; Richards et al. 2005; Ranz et al. 2007; Cirulli and Noor 2007; Runcie and Noor 2009; Bhutkar et al. 2008). Most of these studies did not detect any TEs at the inversion breakpoint regions, and only in a few cases were inverted repetitive sequences present at both co-occurrent breakpoint regions found (Richards et al. 2005; Ranz et al. 2007). In 18 out of 29 inversions fixed between D. melanogaster and Drosophila yakuba, breakpoint regions are associated with duplications of genes or other non-repetitive sequences, suggesting that these inversions arose by staggered breaks and NHEJ repair (Ranz et al. 2007). In the most comprehensive comparative analysis carried out so far using the 12 sequenced Drosophila genomes, Bhutkar et al. (2008) corroborated the high rate of inversion fixation in this genus but did not observe enrichment for repeat sequences in reused breakpoints. The absence of evidence for TE implication in the generation of fixed inversions contrasts with the results obtained analyzing polymorphic inversions. One explanation is that inversions generated by certain mechanisms are more likely to become fixed than those generated by other mechanisms. A second hypothesis is that breakpoint regions of fixed inversions, which are relatively old compared to polymorphic inversions, have been altered after the generation of the inversion so that the footprints of the generation mechanisms have been wiped out.
Materials and methods
Stocks of three Drosophila species were used in this study: D. mojavensis (stock 15081-1352.22 from Catalina Island, California), D. koepferae (stock KO-2 from Sierra San Luis, Argentina), and D. buzzatii (stock st-1; González et al. 2005).
In situ hybridization
BAC clones from the D. buzzatii CHORI-225 library (González et al. 2005) and polymerase chain reaction (PCR) products generated in this work were used as probes for in situ hybridization. Polytene chromosome squashes, hybridization, and detection were carried out as previously described by Montgomery et al. (1987) and Ranz et al. (1997). All probes were labeled with biotin-16-dUTP (Roche) by random primed labeling, and detection was carried out with the ABC-Elite Vector Laboratories kit. Heterologous (interspecific) hybridizations were performed at 25°C, and homologous (intraspecific) hybridizations were completed at 37°C. Hybridization signals were localized on the polytene chromosomes using the cytological maps of D. repleta (Wharton 1942), D. buzzatii (Ruiz and Wasserman 1993; González et al. 2005), and D. mojavensis (Schaeffer et al. 2008). The chromosome maps of D. buzzatii are cut-and-paste reconstructions of the D. repleta maps according to the sequence of inversions proposed for their respective phylogenies (González et al. 2005).
Primers used in this work for PCR amplification
DNA sequencing and sequence analysis
PCR products were cloned into the pGEM-T easy vector (Vector Systems I of Promega, Madison, WI, USA) and sequenced with T7 and SP6 primers. For subcloning, Bluescript II SK (Stratagene) was used as a vector and sequencing was performed using M13 universal forward and reverse primers. PCR products were gel-purified using QIAquick gel extraction kit (Qiagen) and directly sequenced with the same primers used for amplification. Sequencing results were assembled using the CAP3 Sequence Assembly program (Huang and Madan 1999). The inverted and standard arrangement breakpoint sequences were aligned using ClustalW (Chenna et al. 2003) and BLAST 2 sequences (Tatusova and Madden 1999). Annotation of tRNA genes in D. mojavensis was taken from the data produced by C. Bergman and D. Ardell using the combined evidence of two de novo tRNA gene prediction methods: tRNAscan-SE and Aragorn (http://www.bioinf.manchester.ac.uk/bergman/data/ncRNA/tRNA/). Sequence comparison to D. mojavensis genome was supported by DroSpeGe BLAST and GBrowse (http://rana.lbl.gov/drosophila/mojavensis.html). The breakpoint sequences were analyzed with REPuter (Kurtz et al. 2001), Einverted, Palindrome, Equicktandem (http://bioweb2.pasteur.fr/docs/EMBOSS/), and RepeatMasker (http://www.repeatmasker.org/) to identify repetitive sequences.
Identification of BAC clones encompassing the breakpoints
BAC end sequencing
Results of BLAST searches carried out with D. buzzatii BAC end sequences against the D. mojavensis genome
Best hit coordinates
High-resolution mapping of the breakpoint regions in D. mojavensis
Mapping of PCR probes to pinpoint the 5g inversion breakpoints on the D. mojavensis genome
D. mojavensis Scaffold_6496 coordinates
Hybridization to D. buzzatii chromosome 5
The intergenic region between Sox15 and CG8394 is short enough for a PCR-based isolation of the breakpoints (see below). However, the intergenic region between CG30081 and CG15121 is relatively long (∼7.5 kb). In order to pinpoint the CD breakpoint within this region, a methodology based on PCR with primers anchored in conserved sequences was applied. PCR probe 11 (CG30081) could be amplified using DNA from BAC clones 16F24 and 22C13 (AC region in D. buzzatii) as template, but not with that of BAC clones 20J13 and 20J14 (BD region in D. buzzatii). Thus, this PCR probe was assigned to region C. The opposite amplification pattern was observed for PCR probe 12: Amplification was successful with clones 20J14 and 20J13 (BD), but not with clones 22C13 and 16F24 (AC). This probe was therefore assigned to region D. Hence, the location of the proximal breakpoint was narrowed down to a 3.5-kb region (Table 3).
Isolation of the breakpoint regions in D. buzzatii and in D. koepferae
For PCR amplification of the distal breakpoint in D. buzzatii (AC), primers A2 and C2, designed based on the D. mojavensis genome sequence, were combined and DNA of BAC clone 16F24 encompassing this breakpoint was used as template. The PCR product was cloned and sequenced (3,662-bp; GenBank accession number FJ534379).
The first attempt to amplify the proximal breakpoint in D. buzzatii (BD) by PCR was unsuccessful. Primers could not be anchored in conserved gene regions because the breakpoint falls in a relatively large intergenic region which is likely to be only partially conserved. Therefore, we amplified region B in D. koepferae with primers B1 and B2 designed in gene CG8394 of D. mojavensis and the 1,354-bp product was sequenced. Then, a new primer (B3) was designed in region B of D. koepferae. We expected this primer to work in D. buzzatii because D. buzzatii and D. koepferae are close relatives (see Fig. 1) and the probability of nucleotide changes in the primer sequence is lower. We combined this primer with a primer of region D of D. mojavensis (D2) to amplify the breakpoint BD in D. buzzatii. BAC clone 20J14 of D. buzzatii was used as template DNA for PCR amplification. The resulting PCR product was cloned and sequenced (4,831-bp; GenBank accession number FJ534380).
We also successfully sequenced the breakpoint regions in D. koepferae. Breakpoint region AB was amplified with primer A1, designed in the D. mojavensis genome sequence, and primer B4 designed in region B of D. buzzatii. Genomic DNA of D. koepferae was used as template and the 2,267-bp product was sequenced. In order to assemble the AB region with the B region previously sequenced in this species, we designed one primer in the AB region, primer B5, and another one in the B region, primer B6. The 465-bp amplification product was sequenced and the three fragments were assembled to produce a 3,948-bp sequence (GenBank accession number FJ534377). Finally, we amplified the proximal breakpoint region in D. koepferae (CD). We first amplified a 2,386-bp segment using primers C1 and D1, both of them designed in the D. mojavensis genome sequence. We then used this sequence to design another primer, primer C3. Primer C3 was combined with primer C4 designed in region C of D. buzzatii to amplify a fragment of 499 bp. Both fragments were assembled to produce a 2,865-bp sequence (GenBank accession number FJ534378).
Breakpoints sequences: annotation and analysis
We identified and isolated the two 5g breakpoint sequences in three species: D. mojavensis, D. koepferae (both representing the standard, non-inverted, arrangement), and D. buzzatii (bearing the inverted chromosome). These sequences were annotated with the aid of the DroSpeGe Browser for D. mojavensis annotation and also by similarity searches using BLAST against the D. mojavensis and D. melanogaster genomes. Other bioinformatic tools to uncover repeats and TEs were also used (see “Materials and methods”). Figure 4 depicts the molecular organization of the 5g breakpoint regions in the standard and inverted arrangements.
The distal breakpoint in the non-inverted arrangement (AB) falls in the intergenic region between genes Sox15 and CG8394. The size of this intergenic region (from the Sox15 STOP codon to the initial ATG codon of CG8394) is 1,633 bp in D. mojavensis and 1,652 bp in D. koepferae. In the latter species, this intergenic region contains four small blocks of AT-rich sequence (190 AT nucleotides out of 199 in total) and a (CCA)11 imperfect microsatellite. A small block of AT-rich sequence is also found in the homologous region in D. mojavensis.
The proximal breakpoint in the non-inverted arrangement (CD) was located in the intergenic region between genes CG30081 and CG15121 which is 7,362 bp long in D. mojavensis. Two tRNA genes have been annotated in D. mojavensis within this region on the minus (−) strand, D.moj_His_GTG_14000067 and D.moj_His_GTG_14000068 (http://www.bioinf.manchester.ac.uk/bergman/data/ncRNA/tRNA/). Henceforth, we will refer to them as tRNA-1 and tRNA-2, respectively (Fig. 4). In addition, three TE fragments have been annotated using ReAS: Dmoj_28 (105 bp), Dmoj_122 (41 bp), and Dmoj_36 (389 bp). Using probe 12, the breakpoint was further mapped to the region between CG30081 and tRNA-2. This 2,171-bp region contains immediately downstream of the tRNA-2 gene three small blocks of AT-rich sequence (totaling 156 bp with 146 AT nucleotides).
In D. koepferae, we sequenced 2,865 bp from the CD breakpoint. This sequence includes the beginning of CG30081 (306 bp) and two tRNA genes putatively orthologous to those present in the homologous region of D. mojavensis and located in the same orientation (− strand). The intergenic region between CG30081 and tRNA-2 is 2,477 bp long and contains three AT-rich blocks of sequence, one of them 452 bp downstream of tRNA-1 and the other two immediately downstream of tRNA-2 (Fig. 4).
We sequenced 3,662 bp from the AC breakpoint of D. buzzatii. This sequence contains the end of Sox15 coding region (positions 1–789) and the beginning of CG30081 (position 2,800–3,662). A his-tRNA, presumably orthologous to tRNA-1 in the D. koepferae CD sequence, is also present in the + strand (position 1,372–1,487). Alignment of D. buzzatii AC and D. koepferae AB sequences showed significant similarity reaching position 1,349. This observation locates the breakpoint in a 22-bp segment between positions 1,350 and 1,371 because the similarity with the sequence surrounding the tRNA that presumably belongs to C starts at site 1,372. A 53-bp segment with similarity to Helitron-1N1_Dvir is found 250 bp downstream of the breakpoint (Fig. 4). Two microsatellites, (CATA)6 and (TATG)5, and three small blocks of AT-rich sequence are also found in the Sox15-CG30081 intergenic region (Fig. 4).
The BD sequence in D. buzzatii (4,662-bp) contains the beginning of CG8394 (positions 1–353) and a his-tRNA gene in the − strand (position 4,238–4,309), presumably orthologous to tRNA-2 in the D. koepferae CD sequence. In addition, two TE-related sequences were annotated: a 77-bp fragment of a LINE-like element (TART-DV) and an ISBu2-like element of D. buzzatii (positions 2,241–2,990). When the D. buzzatii BD sequence was aligned to the D. koepferae AB sequence, the similarity extended well beyond the CG8394 coding sequence until position 1,351. Likewise, alignment with D. koepferae CD showed a small block of similarity (positions 1,451–1,607) and a larger one at the end of the BD sequence (positions 3,797–4,313) that includes the his-tRNA. These observations place the breakpoint in a 99-bp segment (positions 1,352–1,450) that includes the LINE-like fragment. The BD breakpoint region also contains two microsatellites, (TGG)16 and (CAA)15, and four small blocks of AT-rich sequence (Fig. 4).
Analysis of the breakpoint regions in the 12 Drosophila species sequenced
We analyzed whether the genes flanking the 5g inversion breakpoints were syntenic in the 12 Drosophila species sequenced. The distal breakpoint is located in the intergenic region between genes Sox15 and CG8394 (Fig. 4). These two genes are closely linked in 11 of the 12 Drosophila species sequenced according to Flybase (www.flybase.org). The only apparent exception is Drosophila simulans where these two genes have not been annotated. However, a careful inspection of the region suggests that they are indeed present, although a 3-kb-long assembly gap in this genomic region of D. simulans obscures their detection. We conclude that the region between Sox15 and CG8394 has been conserved across the evolution of the 12 Drosophila species sequenced.
The proximal breakpoint of the 5g inversion is located between two tRNA genes that are flanked by genes CG30081 and CG15121 (Fig. 4). We have determined the gene order of this region in the 12 Drosophila genomes (Electronic supplementary material Fig. S1). In the nine species of the Sophophora subgenus, the organization is CG11007-tRNA-CG15121-CG15122 with the tRNA missing in Drosophila ananassae. The gene order in D. mojavensis, CG30296-CG30081-tRNA-tRNA-CG15121-CG15122 (Fig. 4; Electronic supplementary material Fig. S1), reveals three alterations in comparison to the organization found in the species of the Sophophora subgenus. First, the gene CG30081 seems to have transposed into the region (in D. melanogaster CG30081 is nested within CG8092 in a distant region of chromosome 2R). This transposition is shared by the three species in the Drosophila subgenus and thus must be old. Second, in D. mojavensis, there are two tRNA genes instead of only one as in all other Drosophila genomes (except D. ananassae). The new tRNA may have arisen by a relatively recent duplication or transposition event as it is exclusive of D. mojavensis. Finally, D. mojavensis CG30081 is not flanked by CG11007 but by CG30296, indicating the presence of a chromosomal rearrangement breakpoint (Electronic supplementary material Fig. S1).
No evidence for the implication of the transposon Galileo in the generation of the 5g inversion
Several TE families have been shown to induce chromosomal rearrangements in laboratory populations of Drosophila (Lim and Simmons 1994). Among these families, the P element stands out as one of those especially prone to induce rearrangements (Engels and Preston 1984). However, the evidence for an implication of TEs in the origin of natural Drosophila inversions is minimal and appears to be restricted to some of the polymorphic inversions still segregating in natural populations (see “Introduction”). So far, no positive evidence for generation by TEs has been obtained for any fixed inversion. Three polymorphic inversions of D. buzzatii, 2j, 2q7, and 2z3, have been generated by the transposon Galileo (Cáceres et al. 1999, 2001; Casals et al. 2003; Delprat et al., in preparation), a relative of the D. melanogaster P and 1360 elements recently classified within the P superfamily (Marzo et al. 2008). In each case, Galileo copies were found at both inversion breakpoint junctions in all chromosomes with the inverted arrangement, and the pattern of target site duplications flanking the inversion indicated that it was generated by an ectopic recombination event. In all three cases, other TE copies were also found inserted in the breakpoint regions within or near the Galileo copies. These TEs are secondary colonizers of the breakpoints that are inserted after the generation of the inversion and accumulate in these regions due to the reduction of recombination in the heterokaryotypes. This consistent pattern provides a benchmark for testing the implication of TEs in the origin of other inversions.
Here, we have isolated and sequenced the breakpoints of the fixed paracentric inversion 5g distinguishing D. buzzatii from its close relative D. koepferae (Fig. 1). Our aim was to shed light on the mechanism that generated this inversion and specifically to test for an implication of Galileo. The results clearly show that this is not the case. No Galileo copies or even fragments with similarity to Galileo were observed at any of the two breakpoints of inversion 5g. The possibility that Galileo did in fact generate inversion 5g but the responsible Galileo copies were deleted from the inverted chromosomes afterwards seems very unlikely. There is a high intrinsic rate of nonfunctional DNA loss in Drosophila compared to mammals (Petrov et al. 1996; Petrov and Hartl 1998; Singh and Petrov 2004), but the half-life of such DNA (i.e., the expected time until 50% of the sequence has been eliminated by deletion) is still 12–14 myr (Petrov et al. 1996, 2000; Petrov 2002). Because the 5g inversion must be relatively young (<4 myr, Fig. 1), we expected to find at least partial or defective Galileo copies in the breakpoints of this inversion if this element was responsible for its generation. Thus, we must reject a role for Galileo in the generation of inversion 5g.
5g inversion was most likely generated by staggered breaks and NHEJ
We then looked for the presence of other repetitive sequences that could have acted as substrates for ectopic recombination. We did find a few TE copies besides Galileo inserted in the inversion breakpoints of the D. buzzatii 5g chromosome. A small segment (53 bp) with similarity to Helitron-1N1 Dvir (Kapitonov and Jurka 2007a) was found ∼250 bp downstream of the distal breakpoint, and a 750-bp segment with similarity to ISBu2 (Cáceres et al. 2001) was observed ∼800 bp downstream of the proximal breakpoint. These two elements are Helitrons (Kapitonov and Jurka 2007b; Yang and Barbash 2008), a subclass of DNA transposons that replicate using a rolling-circle mechanism (Wicker et al. 2007) that are extremely abundant in Drosophila species (up to 6000 copies per haploid genome; Yang and Barbash 2008). The two sequences are 86% identical over the aligned region (53 bp) and are inserted in the same orientation. Neither their localization (Fig. 4) nor their orientation (see below) supports a role for these TEs in the generation of inversion 5g. ISBu copies have been often found inserted in the breakpoint regions of D. buzzatii polymorphic inversions as a result of secondary colonization of the breakpoints (Cáceres et al. 2001; Casals et al. 2003; Delprat et al., in preparation). Thus, this seems the most plausible interpretation for the presence of these TEs in the 5g breakpoint regions. A similar explanation may apply to the small segment (77 bp) found in the proximal breakpoint with similarity to the non-LTR retrotransposon TART-DV (Casacuberta and Pardue 2003). TART along with other non-LTR retrotransposons is a normal constituent of Drosophila telomeres (Pardue et al. 2005; Villasante et al. 2007) and so far has never been implicated in the origin of chromosomal rearrangements. Although this insertion seems to be right in the proximal breakpoint junction, no traces of a similar copy were found at the distal breakpoint region, and thus, there is no evidence to implicate this element in the generation of inversion 5g.
Two highly similar (94% identical) tRNA copies were found in opposite orientation in the D. buzzatii 5g breakpoints, one at each breakpoint. At first glance, this observation might suggest that the 5g inversion was generated by ectopic recombination between these two tRNA copies. tRNA genes have been previously implicated in the origin of chromosomal rearrangements by ectopic recombination in yeast (Szankasi et al. 1986; Kellis et al. 2003). Ectopic recombination requires the presence of homologous sequences in opposite orientation at two sites in the parental chromosome (Petes and Hill 1988). The arrangement of the two tRNA copies in the parental non-inverted chromosome is inconsistent with this hypothesis. In chromosome 5 of both D. koepferae and D. mojavensis, two tRNA genes are found in the proximal breakpoint region (in the minus strand), but none is observed at the distal breakpoint region (Fig. 4). The presence of one of these tRNA genes in the distal breakpoint of D. buzzatii (in the plus strand) indicates that the proximal breakpoint falls right between the two tRNA genes (Fig. 4) and that these genes are not responsible for the generation of the 5g inversion.
Overall, we did not find evidence for inverted repetitive sequences in the breakpoint regions, suggesting that a mechanism other than ectopic recombination may be responsible for the generation of this inversion. Using genomic sequences, Ranz et al. (2007) analyzed the breakpoint regions of 29 inversions fixed between D. melanogaster and D. yakuba. They found that 18 of them (~62%) were associated with duplications of genes or intergenic regions at both co-occurrent breakpoint regions. Sequences from both breakpoints were duplicated in six of the inversions, whereas in the remaining 12 inversions, only sequences from one of the two breakpoints were duplicated. They proposed a model of staggered breaks (either isochromatid or chromatid) and repair by NHEJ as the most likely mechanism for inversion generation. The variation in the size of the duplications would be explained by the variable distance between the staggered breaks. Those cases in which sequences from only one of the two breakpoints were duplicated could be caused by staggered breaks in only one of the breakpoints and a single break in the other. This model of breakage (either staggered or not) and NHEJ is, at this point, the most likely hypothesis to explain how inversion 5g was generated. The absence of duplications of gene or intergenic sequences suggests that either a single break occurred at each breakpoint or the short distance between staggered breaks coupled with subsequent nucleotide evolution made the small duplications undetectable. The susceptibility of DNA to breakage is known to depend on its base composition. AT-rich sequences show an increased probability of breaks, in particular when they are palindromic and thus capable of forming hairpin or cruciform secondary structures (Schwartz et al. 2006; Zhang and Freudenreich 2007; Durkin and Glover 2007; Lukusa and Fryns 2008). We found several AT-rich small blocks of sequence in the breakpoints of the 5g inversion (Fig. 4), and it is possible that these AT-rich blocks could have enhanced the susceptibility of these breakpoint regions to breakage. In addition, it must be recalled that the 5g proximal breakpoint seems to coincide with a particularly dynamic region in the D. mojavensis genome (Electronic supplementary material Fig. S1). The region contains a translocated gene (CG30081), a recently duplicated or translocated tRNA gene and a rearrangement breakpoint besides the 5g breakpoint. Thus, this chromosomal region may be considered as “fragile.”
Coexistence of different inversion-generating mechanisms within the same lineage and its implications
The variety of molecular mechanisms for the generation of Drosophila inversions in nature is striking and raises questions about their relative contribution and the evolutionary time and mode of their operation. The apparent differences in the responsible mechanism between polymorphic and fixed inversions (see “Introduction”) are intriguing, although it is probably too soon to draw any firm conclusion. The results presented here on the fixed 5g inversion most likely produced by breakage and NHEJ repair and previous results on the three polymorphic D. buzzatii inversions generated by the transposon Galileo show that different mechanisms can operate within a single lineage. Why such a contrast between fixed and polymorphic inversions? One hypothesis is that inversions generated by breakage and NHEJ repair have a higher probability of fixation than those generated by ectopic recombination. We consider this explanation unlikely, although more information is needed to reject it. The fact that the three polymorphic inversions occurred on the dynamic chromosome 2 whereas the inversion analyzed here occurred in the more conservative chromosome 5 suggests another more likely hypothesis. We can explain these observations by assuming that the breakage and NHEJ repair mechanism generates inversions with a basal or background rate in all lineages and at all times. Double-strand breaks are produced in several ways in all cells, and the machinery necessary to deal with these lesions is conserved from yeasts to vertebrates (Pastink et al. 2001; Sonoda et al. 2006). This mechanism would explain most of the inversions present in the repleta group in conservative chromosomes, e.g., inversion 5g analyzed here or inversion Xe fixed in D. mojavensis (Cirulli and Noor 2007; Runcie and Noor 2009). On the other hand, TE activity would explain, for instance by means of the ectopic recombination mechanism, a local or temporary increase in the rate of inversion occurrence. This mechanism would be responsible for the generation of most of the chromosome 2 inversions in the repleta group, including the three polymorphic inversions of D. buzzatii. TE activity is likely to vary considerably between lineages and between times and it may also vary between chromosomes due to the accumulation of TE copies in the inverted segments of polymorphic chromosomes as has been observed for Galileo (Casals et al. 2005) and another six transposon families (Casals et al. 2006) in D. buzzatii. This hypothesis may be tested by characterizing the breakpoints of fixed inversions in different chromosomes in D. buzzatii and other species.
Functional consequences of the 5g inversion
Inversions are considered to play a role in the adaptation of species to their environments and in reproductive isolation between species (Noor et al. 2001; Coghlan et al. 2005; Hoffmann and Rieseberg 2008). However, the molecular mechanisms by which inversions could affect fitness are still unclear. One possibility is that the localization of the inversion breakpoints near or inside genes could affect their function or expression profile. The analysis of the breakpoint regions of inversion 5g showed that both breakpoints are located in intergenic regions and therefore do not disrupt the coding region of any of the flanking genes. The same scenario was found when the breakpoint regions of the other three inversions sequenced in D. buzzatii were analyzed (Cáceres et al. 1999; 2001; Casals et al. 2003; Delprat et al., in preparation). For the proximal breakpoint of one of them, inversion 2j, it was shown that the expression level of the gene located immediately outside the inversion was reduced in strains carrying the inversion (Puig et al. 1994). This silencing effect was not caused by the inversion itself, but by one of the TEs inserted at the breakpoint junctions. This particular kind of position effect is not likely to be acting in the case of 5g inversion, since no TEs were found close to the inversion breakpoints. However, the 5g inversion may still be affecting the expression of the neighboring genes, for example by disrupting or changing the location of cis-regulatory elements. The availability of the sequence of inversion breakpoint regions, as described in this paper, will allow the study of the position effects of natural inversions which was previously hindered by the lack of molecular studies.
We thank Oriol Calvete, Alejandra Delprat, Barbara Negre and Marta Puig for technical support and comments on a previous version of the manuscript. Dmitri Petrov lent us generously his lab to carry out the final part of this project. Work supported by a PIF fellowship from the UAB awarded to O. P. da Costa and grant BFU2005-02237 from the Dirección General de Investigación (Ministerio de Educación y Ciencia, Spain) awarded to A. Ruiz.