Patterns of molecular evolution of the germ line specification gene oskar suggest that a novel domain may contribute to functional divergence in Drosophila
In several metazoans including flies of the genus Drosophila, germ line specification occurs through the inheritance of maternally deposited cytoplasmic determinants, collectively called germ plasm. The novel insect gene oskar is at the top of the Drosophila germ line specification pathway, and also plays an important role in posterior patterning. A novel N-terminal domain of oskar (the Long Oskar domain) evolved in Drosophilids, but the role of this domain in oskar functional evolution is unknown. Trans-species transgenesis experiments have shown that oskar orthologs from different Drosophila species have functionally diverged, but the underlying selective pressures and molecular changes have not been investigated. As a first step toward understanding how Oskar function could have evolved, we applied molecular evolution analysis to oskar sequences from the completely sequenced genomes of 16 Drosophila species from the Sophophora subgenus, Drosophila virilis and Drosophila immigrans. We show that overall, this gene is subject to purifying selection, but that individual predicted structural and functional domains are subject to heterogeneous selection pressures. Specifically, two domains, the Drosophila-specific Long Osk domain and the region that interacts with the germ plasm protein Lasp, are evolving at a faster rate than other regions of oskar. Further, we provide evidence that positive selection may have acted on specific sites within these two domains on the D. virilis branch. Our domain-based analysis suggests that changes in the Long Osk and Lasp-binding domains are strong candidates for the molecular basis of functional divergence between the Oskar proteins of D. melanogaster and D. virilis. This molecular evolutionary analysis thus represents an important step towards understanding the role of an evolutionarily and developmentally critical gene in germ plasm evolution and assembly.
KeywordsDrosophila Positive selection Oskar Germ line specification Germ plasm Novelty
The advent of a dedicated germ line is a major evolutionary transition associated with the origin of multicellularity (Michod 2005). In all sexually reproducing animals, the specification of the germ line early in embryogenesis is a critical developmental event. Two modes of germ line specification have been identified in metazoans: inheritance of maternally synthesized cytoplasmic germ line determinants (germ plasm), and the induction of germ cell fate by signals from neighboring somatic cells (Extavour and Akam 2003). Phylogenetic analyses of these developmental patterns suggest that the inductive mode may be ancestral in metazoans, with germ plasm-driven mechanisms having evolved independently in multiple lineages (Extavour and Akam 2003; Blackstone and Jasker 2003). In Drosophila melanogaster, germ cells are specified by the inheritance mode, and germ plasm is assembled during oogenesis by the products of the oskar gene. Oskar protein physically interacts with and recruits germ plasm components, including Valois, Lasp, and Vasa proteins and nanos mRNA (Ephrussi et al. 1991; Breitwieser et al. 1996; Suyama et al. 2009; Cavey et al. 2005). oskar is necessary and sufficient for germ plasm assembly (Ephrussi and Lehmann 1992), and because it localizes the posterior determinant nanos, it is also required for posterior body patterning (Ephrussi et al. 1991).
Surprisingly, in contrast to most other critical metazoan germ line genes, oskar is not highly conserved across animals. It is instead a novel gene that evolved in the lineage leading to insects (Ewen-Campen et al. 2012), and may have facilitated the evolution of germ plasm in holometabolous insects (those that undergo true metamorphosis) (Lynch et al. 2011). oskar orthologs have been identified to date only from flies and mosquitoes (Diptera, the furthest derived clade of holometabolous insects) (Goltsev et al. 2004; Juhn and James 2006; Juhn et al. 2008), ants and wasps (Hymenoptera, the most basally branching clade of the Holometabola) (Lynch et al. 2011), and a basally branching hemimetabolous insect, the cricket Gryllus bimaculatus (Ewen-Campen et al. 2012). oskar is absent from the genomes of multiple holometabolous insects that lack germ plasm, including the silk moth Bombyx mori, the beetle Tribolium castaneum, and the honeybee Apis mellifera, indicating that this gene has been secondarily lost several times in holometabolous evolution (Lynch et al. 2011).
In contrast, oskar from the equally distantly related species Drosophila virilis (virosk) does not show functional conservation of its germ plasm role in a D. melanogaster context. In D. virilis, virosk transcript is localized to the posterior pole of oocytes and embryos like its homolog in D. melanogaster (Webster et al. 1994). D. virilis embryos also form posterior germ plasm and subsequently pole cells (Webster et al. 1994), suggesting that the germ plasm function of oskar is conserved in D. virilis, as in D. immigrans. In transgenic D. melanogaster oskar loss of function mutants carrying a virosk transgene, Virosk appears to recruit sufficient D. melanogaster nanos mRNA to rescue posterior patterning in these mutants (Webster et al. 1994). However, in D. melanogaster, Virosk is unable to assemble functional germ plasm, and thus unable to direct germ cell formation (Webster et al. 1994). This suggests that although Virosk may retain some ability to interact with D. melanogaster nanos mRNA, its interactions with other D. melanogaster germ plasm components may be too divergent to permit assembly of functional pole plasm in a D. melanogaster context, indicating essential functional divergence. Given that oskar’s role in functional germ plasm assembly appears conserved even in Hymenoptera (Lynch et al. 2011), it is likely that virosk and immosk direct functional germ plasm assembly in D. virilis and D. immigrans respectively, via interactions with the germ plasm component orthologs in these species. In this paper, we focus on the functional divergence within the genus Drosophila that prevent Virosk’s fruitful interaction with D. melanogaster germ plasm gene products, despite the high level of conservation of most other germ plasm genes (Ewen-Campen et al. 2010).
Although oskar plays an indispensible role in Drosophilid germ cell specification, the nature of the selective pressures and molecular changes responsible for its functional divergence within the genus Drosophila are unknown. To gain insight into the molecular evolution of this novel and critical gene, we took advantage of the completely sequenced genomes of 16 Drosophila species from the Sophophora subgenus, the D. virilis genome sequence and the sequenced oskar locus from D. immigrans. The goal of this study is to assess patterns of change in the oskar nucleotide sequence to evaluate potential variation in the evolutionary rate of distinct functional protein domains. We test the hypothesis that positive selection drives the evolution of Drosophila oskar, and identify regions that are likely to underlie functional divergence between D. virilis and D. melanogaster oskar, providing candidates for future study of the evolutionary changes that prevent virosk from specifying germ plasm in a D. melanogaster background.
Annotated oskar orthologs from D. melanogaster [GenBank NM_169248.1; FlyBase CG10901-PA], Drosophila simulans [GenBank XM_002104160.1; FlyBase GD18580-PA], Drosophila sechellia [GenBank XM_002031933.1; FlyBase GM23770-PA], Drosophila yakuba [GenBank XM_002096839.1; FlyBase GE25914-PA], Drosophila erecta [GenBank XM_001980858.1; FlyBase GG13545-PA], Drosophila ananassae [GenBank XM_001953262.1; FlyBase GF17692-PA], Drosophila pseudoobscura [GenBank XM_001359471.2; FlyBase GA10627], Drosophila persimilis [GenBank XM_002017349.1; FlyBase GL21554-PA], Drosophila willistoni [GenBank XM_002070244.1; FlyBase GK11117-PA], and D. virilis [GenBank XM_002053233.1; FlyBase GJ23790-PA] were obtained from FlyBase (www.flybase.org). Coding sequence of D. immigrans oskar was obtained from GenBank [DQ823084.1]. We also identified oskar orthologs from the following recently sequenced species whose genomes have not been annotated: Drosophila eugracilis [genomic scaffold: GenBank JH402624.1], Drosophila ficusphila [genomic scaffold: GenBank GL987928.1], Drosophila biarmipes [genomic scaffold: GenBank JH400370.1], Drosophila takahashii [genomic scaffold: GenBank JH112313.1], Drosophila elegans [genomic scaffold: GenBank JH110107.1], Drosophila rhopaloa [genomic scaffold: GenBank JH406433.1], Drosophila kikkawai [genomic scaffold: GenBank JH111367.1], Drosophila bipectinata [genomic scaffold: GenBank JH401929.1]. Genome sequences for these species were accessed via the Drosophila Species Stock Center at https://stockcenter.ucsd.edu/info/welcome.php. We conducted a tBLASTn search of each Drosophilid genome with D. melanogaster Oskar protein as the query to uncover the most similar coding sequence with respect to amino acid conservation. As oskar is a single-copy gene in all genomes examined to date, and shares no significant overall sequence similarity with non-oskar genes (Lynch et al. 2011), the top tBLASTn hit was considered to be the ortholog. To extract the coding sequence from genomic scaffold sequences we used the Augustus gene predictor (Keller et al. 2011), and manually curated the resulting sequences to mask stop codons and frame shift mutations. Reciprocal BLAST of the top predicted coding sequences to D. melanogaster was performed to confirm orthology. Nucleotide sequences of all oskar orthologs analysed in this study are provided in Online Resource 1.
As the results of PAML analyses can be sensitive to the alignment methods used (Blackburne and Whelan 2013), we generated multiple sequence alignments using two different methods, and performed analyses on the results of both MSAs. The sequences of the 18 Drosophila species were multiply aligned using MUSCLE implemented in TranslatorX (Abascal et al. 2010), or using PRANK (Löytynoja and Goldman 2008). We did not include the D. willistoni sequence as its predicted length was less than 60 % that of D. melanogaster oskar (not shown). There is no evidence for unusual divergence of D. willistoni oskar; this is therefore likely due to sequencing or annotation error.
Predicted structural and interaction domains were manually extracted from the whole gene alignment using the known acid residues corresponding to each domain in D. melanogaster (Fig. 1) (Breitwieser et al. 1996; Suyama et al. 2009; Anne 2010). Sequences from Valois-interacting regions were concatenated for analysis. The PRANK alignment was subjected to GBlocks analysis using the default options (minimum number of sequences for a conserved position: 10; minimum number of sequences for a flanking position: 10; maximum number of contiguous positions: 8; minimum length of a block: 5).
We repeated the branch site test for domains that showed a consistent signature for positive selection using the reported cDNA sequence of D. virilis oskar (Genbank_L22556.1) that was used in the experiments that suggested functional divergence of D. virilis and D. melanogaster oskar with respect to germ plasm assembly (Webster et al. 1994) (Online Resource 1). We also used Sanger sequencing to confirm the entire nucleotide sequence of this D. virilis allele (see Online Resource 1).
Results and Discussion
The complete oskar coding region is under purifying selection
We obtained oskar coding sequences from the Drosophilid genome sequences, as well as the sequenced D. immigrans oskar coding region (Fig. 2; Online Resource 1), and generated multiple sequence alignments using two different multiple sequence alignment (MSA) tools, the similarity-based MSA MUSCLE (Edgar 2004) (Online Resource 2, Figure S1) and the evolutionarily informed MSA PRANK (Löytynoja and Goldman 2008) (Online Resource 3, Fig S2). Because the choice of MSA can influence the outcome of such analyses (Blackburne and Whelan 2013), we performed evolutionary rate analyses using both MSA outputs. We conducted maximum likelihood analyses with codeml implemented in PAML v4 (Yang 2007) to estimate non-synonymous (dN) and synonymous (dS) substitution rates, and their ratio (ω = dN/dS). We then used the Likelihood Ratio Test to compare the fit of different evolutionary models to the data (Yang 2007; Yang et al. 2000b).
We first applied the simplest model M0, which sets all branches and sites to evolve at the same rate, to obtain a single global ω estimate for the entire oskar coding region alignment (Yang et al. 2000a). The log likelihood for this model was −4,397.07 using the MUSCLE alignment, and −16,563.12 using the PRANK alignment, with ω estimates of 0.32 and 0.16, respectively, indicating overall purifying selection of full-length oskar.
Distinct oskar domains are evolving at different rates
Next, to estimate ω for each domain separately and test if these domains were under different selective constraints, we applied different fixed site models (Yang and Swanson 2002). The highest log likelihood was obtained using model E (MG4), which assumes different ω, κ (transition/transversion ratio), π (equilibrium codon frequencies), and rs (proportional branch lengths) between oskar domains. Using either the MUSCLE MSA (Online Resource 2, Table S1) or the PRANK MSA (Online Resource 3, Table S4), the fit of this model was significantly better than that of model B, which assumes identical κ, ω, and π but different rs (P < 0.001). This shows that the strength of selection varies among distinct oskar domains. Estimates of ω were less than 1 for all domains, indicating overall purifying selection on each domain (Fig. 1b). However, the SGNH, LOTUS, and Valois-interacting domains are the most constrained, while the Lasp-binding and Long Osk domains are evolving approximately four to five and two to three times faster, respectively, than the former three domains. This suggests that evolutionary changes in oskar function may not be due to changes within the SGNH, LOTUS, or Valois-interacting domains. Moreover, it is consistent with the relatively poor conservation of the Lasp-binding region across insects (Lynch et al. 2011) and the recent evolution of Long Osk in the Drosophila lineage. As selective pressures on individual oskar domains are heterogeneous, further analyses were conducted separately for each domain.
Evidence for positive selection on specific oskar domains
Likelihood Ratio Test statistics for Oskar predicted structural and interaction domains: 18 species analyses
M1 vs M21
M7 vs M8
MA vs MAfix
M1 vs M21
M7 vs M8
MA vs MAfix
The results of the analyses thus far show that when using site models to consider the evolution of Oskar orthologs from these 18 Drosophila lineages, no specific domain appears to be under positive selection. However, if a gene evolves under purifying selection most of the time, but is occasionally subject to episodes of adaptive change, site models may not yield a significant dN/dS value. Given the inability of D. virilis oskar to substitute for D. melanogaster oskar, we wished to test the hypothesis that positive selection might be acting on the D. virilis branch alone. We therefore assessed differences in ω on that specific branch using branch site models (Zhang et al. 2005). We used the modified model A, or test 2, because it is more conservative, yields fewer false positives, and is better able to distinguish relaxed selective constraint from positive selection (Zhang et al. 2005). Setting the D. virilis branch as the foreground branch, we tested whether model MA (which assumes three site classes, 0 > ω0 < 1 between 0 and 1, and ω1 = 1 and ω2 > 1 only for the foreground branch) fit the data better than the alternative model MAfix (which fixes ω2 at 1 in foreground branches). Using either of the MUSCLE (Online Resource 2, Table S3) or PRANK (Online Resource 3, Table S6) alignments, we found that comparisons between MA and MAfix were not significant for the LOTUS or SGNH predicted structural domains, or for the Valois interaction domain (Table 1). Under the MUSCLE MSA analysis of the Lasp-binding domain, the log likelihood of the MA model (−2,350.39) was greater than that of the MAfix model (−2,352.01) (χ2 = 3.24, P = 0.07) (Table 1; Online Resource 2, Table S3), but the difference was not significant. We note here that PAML documentation recommends the use of a critical value of χ2 (3.84 at 5 % and 5.99 at 1 %) to guide against violation of model assumptions for branch site models (Yang 2007.
In contrast, the PRANK MSA analysis did reveal a statistically significant signature of positive selection for the Lasp-binding domain (χ2 = 7.9; P < 0.01) on the D. virilis branch (Table 1; Online Resource 3, Table S6). The PRANK MSA analysis also identified significant signatures of positive selection for the Vasa-interacting domain (χ2 = 5.1; P < 0.05) (Table 1; Online Resource 3, Table S6). This suggests some residues within these domains could be evolving under positive selection and may contribute to oskar functional evolution. Strikingly, we found that under both the MUSCLE (χ2 = 14.7; P < 0.001) and PRANK (χ2 = 5.43; P < 0.05) MSA analyses, the MA model fit the data significantly better than MAfix for the Long Osk domain (Table 1), suggesting that non-synonymous changes at some Long Osk codons may additionally have been subject to positive selection on the D. virilis branch.
Likelihood ratio test statistics for MA vs MAfix model for Long Osk, Lasp-binding and Vasa-interacting domains aligned with PRANK MSA
Conserved sequence blocks
Seven species alignment
Candidate sites for functional divergence between D. virilis and D. melanogaster oskar
To identify candidates for specific sites that could be evolving under positive selection in our 18 species analysis, we applied the Bayes Empirical Bayes (BEB) approach to all alignments that showed a statistically significant signature for adaptive evolution on the D. virilis branch, namely (1) the MUSCLE alignment of the Long Osk domain extracted from the original alignment of full-length Oskar (Online Resource 2, Figure S1); (2) PRANK alignments of the entire Long Osk, Lasp-binding, and Vasa-interacting domains extracted from the original alignment of full-length Oskar (Online Resource 3, Figure S2); and (3) the conserved sequence blocks of the Long Osk and Lasp-binding domains obtained by trimming poorly aligned regions from the original full-length PRANK MSA alignment (Online Resource 4, Figure S3). To be conservative we focused only on sites with a BEB posterior probability of being under positive selection greater than 0.9. Using these criteria, we identified five residues, two in the Long Osk domain and three in the Lasp-binding domain that may be evolving under positive selection on the D. virilis branch.
Analysis of the Long Osk domain extracted from the MUSCLE MSA of full-length Oskar identified one residue with signatures of positive selection (D. melanogaster F51 (residue numbers throughout refer to the position in the primary amino acid sequence of D. melanogaster Long Oskar); Online Resource 2, Figure S1). In addition, analyses of the Long Osk domain extracted from the PRANK alignment of full-length Oskar, and of the conserved sequence blocks of Oskar, both supported the hypothesis of positive selection at a second residue (D. melanogaster R65; Online Resource 3, Figure S2; Online Resource 4, Figure S3).
In the Lasp-binding domain, analysis of this domain extracted from the PRANK MSA of full-length Oskar (Online Resource 3, Figure S2) identified two residues (D. melanogaster E306 and P353) as being under positive selection. These two residues were also identified as being under positive selection by analysis of the Vasa-interaction domain extracted from the full-length PRANK MSA (Online Resource 3). Finally, a third residue (D. melanogaster Y339) was identified by analysis of the Lasp-binding domain derived from the conserved sequence block PRANK MSA (Online Resource 4, Figure S3).
Chemical properties of amino acids under positive selection in the branch leading to D. virilis
PP value of positive selection
AA position in D. melanogastera
D. virilis AA
D. melanogaster AA
Long Oskar domain—MUSCLE MSA (supported by full-length genomic and Macdonald allele)
Long Oskar domain—PRANK MSA (supported by analysis of full-length and conserved sequence blocks alignments of genomic allele and full-length Macdonald allele)
Lasp-binding domain—PRANK MSA (supported by full-length alignment)
Lasp-binding domain—PRANK MSA (supported by genomic allele conserved sequence blocks alignment and full-length Macdonald allele)
Analysis of D. melanogaster subgroup species
A second issue related to the high divergence time between our species of interest is that the saturation of dS in highly divergent sequences is expected to inflate false positive rates of likelihood ratio tests for positive selection (Gharib and Robinson-Rechavi 2013). Determining the optimal analytical design for these studies is challenged, however, by the fact that the phylogenetic boundaries for points of saturation, and the extent to which the branch site test is affected by divergence time, are not clear (Gharib and Robinson-Rechavi 2013). Nonetheless, we undertook two additional analyses as a further step towards gaining some insight into the effect of divergence on our inferences of positive selection.
First, we assessed the relative rates of evolution of predicted structural and functional domains using a reduced set of sequences with relatively low divergence. We generated MUSCLE alignments of the five species of the D. melanogaster subgroup (Online Resource 5, Figure S4), which are often considered to be at an appropriate level of divergence for PAML analyses (Clark et al. 2007; Larracuente et al. 2008), as they diverged from a last common ancestor only approximately 3–4 Mya (Tamura et al. 2004). Analyses of only these sequences should thus avoid the potential problem of saturation of dS. The global rate of evolution of oskar in these five species using model MO was 0.25 with a log likelihood of −4,086.41, indicating overall purifying selection as observed in the analysis of 18 species. Next, we used this alignment to assess rates of evolution of the component subdomain models using the MG4 model, and again obtained the same results as those obtained with the alignment containing 18 species (Fig. 1b). ω values for each domain showed that SGNH was the most conserved domain, while the Long Osk and Lasp-binding domains were the two most rapidly evolving (Online Resource 5, Table S8). However, in contrast to our original 18 species analysis, neither partition nor site tests detected significant positive selection for any domain except LOTUS, which was significant with P = 0.04 (Online Resource 5, Table S9, Table S10).
Second, to test for positive selection on the D. virilis branch, we generated another set of PRANK alignments using a subset of sequenced Drosophila species comprising those in D. melanogaster subgroup, plus D. immigrans and D. virilis, the only two non-melanogaster species for which data on oskar function are available (Webster et al. 1994; Jones and Macdonald 2007) (Online Resource 6, Figure S5). These alignments, like those of the melanogaster subgroup sequences, should be less likely than the full 18 species analysis to suffer from problems with saturated synonymous sites and inaccurate alignments. We used these seven species alignments to assess rates of evolution of the component subdomain using the MG4 model, and obtained the same results as those obtained with the alignment containing 18 species (Fig. 1b). ω values for each domain showed that SGNH was the most conserved domain, while the Long Osk and Lasp-binding domains were the two most rapidly evolving (Online Resource 6, Table S11). However, branch site test comparison of MA with MAfix was not significant for any of the Lasp-binding (χ2 = 0; P = 1), Long Osk (χ2 = 0; P = 1) or Vasa-interacting domains (P = 0.51) (Table 2; Online Resource 6, Table S12).
Assessment of putative polymorphisms in D. virilis oskar
All analyses described above used the oskar nucleotide sequence annotated from the D. virilis genome (Clark et al. 2007) (NCBI accession XM_002053233.1; henceforth designated “genomic allele”). However, we noticed that the translation of this sequence differs from the reported D. virilis oskar cDNA translation (Webster et al. 1994) (NCBI accession L22556.1; henceforth designated “Macdonald allele”) by two amino acids in the Long Osk region (QQ66-67 indel), and three amino acids in the Lasp-binding region (A341P, H348 indel, P383R) (Online Resource 7, Figure S6). We used Sanger sequencing to confirm that the sequence of the Macdonald allele was identical to that reported in NCBI accession L22556.1, except for one residue (A341P) that matched the genomic allele rather than the Macdonald allele (Online Resource 7). Because the Macdonald allele was used for the transgenic experiments that suggested functional divergence between D. virilis and D. melanogaster oskar (Webster et al. 1994), we repeated the branch site test for 18 species using the Macdonald allele sequence, and focused our analysis on those protein domains and residues that showed a consistent signal for positive selection. Specifically, these were the Long Oskar domain aligned with the MUSCLE MSA, and the Long Oskar and Lasp-binding domains aligned with the PRANK MSA. We found that the same domains and residues were predicted to be under positive selection regardless of whether we used the genomic or Macdonald alleles for analysis. Specifically, we found a significant signal of positive selection on the Long Osk domain using the MUSCLE alignment of the Macdonald allele (χ2 = 8.44; P = 0.0037) (Online Resource 8, Table S13). Within the Long Osk domain, the same residue (D. melanogaster F51) was identified as being under positive selection in analyses of both alleles. Similarly, we found a significant signal of positive selection on the Long Osk domain using the PRANK alignment of the Macdonald allele (χ2 = 5.76; P = 0.016) (Online Resource 9, Table S14), and within this domain, the same residue (D. melanogaster R65) was identified as being under positive selection in analyses of both alleles. We also detected a significant signature of positive selection for the Lasp-binding region (χ2 = 5.5; P = 0.019) on the D. virilis branch using the PRANK alignment and the Macdonald allele. This analysis identified two residues: D. melanogaster E306, which was also identified in the analysis of the full-length PRANK alignment of the genomic allele (Online Resource 3), and D. melanogaster Y339, which was also identified in the PRANK analysis of conserved sequence blocks of the genomic allele (Online Resource 4).
Thus, the minor sequence differences between the Macdonald and genomic alleles do not change our detection of signatures of positive selection on the same domains and residues of Oskar. The specific residues thus identified differ as a function of the alignment method used, but not due to the D. virilis oskar sequence used. The only exception is a single residue, D. melanogaster P353, which was identified as being under positive selection in the PRANK analysis of the genomic allele but not of the Macdonald allele. This is likely due to the fact that this residue is different in the Macdonald allele (A341P). We therefore cannot conclude that this particular residue is a candidate for functional divergence between the two species. Overall, we conclude that while there are a small number of polymorphisms in sequenced D. virilis oskar alleles, our results are generally robust to these minor sequence differences.
In summary, all analyses clearly support heterogeneous rates of evolution on distinct Oskar domains, with Long Osk and Lasp-binding being the fastest evolving. However, we cannot exclude the possibility that signatures of positive selection on these domains, which were detected only with the 18 species analyses, may be false positives. Alternatively, the lack of a significant signal for positive selection in the seven species analysis could be due to the decrease in power that results from using fewer sequences. In support of the positive selection result obtained with the 18 species analyses, simulations have shown that the branch site model is more robust to high sequence divergence than might have been expected (Gharib and Robinson-Rechavi 2013; Studer et al. 2008), and that high dS values may be more of a concern for false negatives than false positives (Gharib and Robinson-Rechavi 2013).
Overall, the effects of sequence divergence and dS saturation are complex and difficult to fully resolve. Nonetheless, the consistent pattern of higher rates of evolution of Long Osk and Lasp-binding domains seen in all analyses of five species, seven species and 18 species alignments make changes in these domains strong candidates for functional divergence between Drosophila Oskar proteins. We further suggest that the specific sites within these domains exhibiting changes in physicochemical properties between D. virilis and D. melanogaster are strong candidates for changes underlying the functional divergence of Oskar between D. melanogaster and D. virilis. However, we note that caution is warranted in the inference of positive selection on these sites along the D. virilis branch.
The roles of oskar in the evolution of germ plasm morphology
To date the functional characteristics of oskar with respect to germ plasm assembly have been tested for the orthologs of only three Drosophila species: D. melanogaster, D. immigrans (immosk), and D. virilis (virosk). All of these species possess germ plasm, but the germ plasm morphology is distinct in each species (Mahowald 1962, 1968; Counce 1963). Mapping germ plasm morphology on to a phylogeny of Drosophilids (Online Resource 10) shows that germ plasm morphology displays some clade-specific patterns. We hypothesize that changes in germ plasm morphology may be the result of evolutionary changes in oskar function. In support of this hypothesis, as noted above immosk can assemble a functional germ plasm that has a D. immigrans morphology in a D. melanogaster context (Jones and Macdonald 2007). In contrast, virosk cannot assemble functional germ plasm in a D. melanogaster context (Webster et al. 1994). This indicates that the characteristics of germ plasm morphology and the assembly of germ plasm itself are both processes directed by oskar. At present, oskar orthologs have not been identified for most species with well-studied germ plasm morphology (Online Resource 10), and germ plasm morphology has not been characterized for most of the species whose oskar sequence has been determined. Further studies that fill these knowledge gaps will be useful in determining the specific changes in oskar that accompanied and/or led to evolutionary changes in germ plasm morphology and function within Drosophilids.
Possible significance of the evolution of the Drosophila-specific Long Oskar domain
The Long Oskar domain was identified by both MUSCLE and PRANK MSA analyses as possibly being under positive selection in the lineage leading to D. virilis. Known oskar loss of function alleles in D. melanogaster do not contain lesions in the Long Oskar domain, and therefore do not shed light on the potential functions of the positively selected amino acids identified in this study. However, evidence from studies of D. melanogaster suggests that this domain may play a role in stabilizing assembled germ plasm. During oogenesis, osk mRNA is synthesized and transported to the posterior pole of the oocyte (Rongo et al. 1995), where it is translated into the Long Osk and Short Osk isoforms from two alternative start codons (Markussen et al. 1995). The Long Osk isoform includes all four protein domains described above, whereas the Short Osk isoform excludes the Long Oskar domain and thus consists only of the LOTUS, Lasp-binding and SGNH domains. The two isoforms exhibit distinct molecular affinities for germ plasm components (Breitwieser et al. 1996; Suyama et al. 2009; Babu et al. 2004) and localize to distinct subcellular compartments (Vanzo et al. 2007), but both are required for effective germ plasm assembly. Short Osk alone is able to assemble enough germ plasm to yield a low frequency of germ cell formation (Markussen et al. 1995), but cannot efficiently maintain oskar mRNA or protein at the posterior pole (Vanzo and Ephrussi 2002). In contrast, Long Osk can maintain both oskar mRNA and Oskar protein at the posterior pole, but no germ cells form when only Long Osk is expressed, suggesting that it cannot efficiently assemble stable germ plasm. The current model for Osk-mediated germ plasm assembly in D. melanogaster is thus one in which Long Osk anchors Short Osk to the posterior oocyte cortex, and Short Osk in turn localizes multiple germ cell determinants. The fact that Nasonia vitripennis (Hymenoptera) oskar lacks the Long Osk domain but nonetheless appears to act as a functional germ plasm determinant (Lynch et al. 2011), raises the question of what the functional significance of the evolution of the Long Osk domain might be.
One possibility is that the evolution of the Long Oskar domain was linked to the emergence of additional mRNA localization mechanisms that increased the stability of oskar mRNA at the oocyte posterior. This could have been beneficial in improving the stability of germ plasm and increasing robustness of oskar-directed germ cell specification. Consistent with this hypothesis, although oskar mRNA is localized to embryonic germ plasm in all studied holometabolous insects, its localization to the oocyte posterior cortical cytoplasm prior to germ cell formation is variable, and correlates with the presence or absence of the Long Oskar domain. In D. melanogaster, oskar mRNA (as visualized by in situ hybridization) is very tightly localized in a thin crescent (Ephrussi et al. 1991), but in D. virilis, comparable gene expression methods show that only some oskar mRNA is localized to a small spot at the posterior cortex, while additional transcripts are localized in a diffuse cloud in an apparent posterior to anterior gradient (Webster et al. 1994). In holometabolous insects without a Long Osk isoform (the wasp Nasonia vitripennnis, the ant Messor pergandi (Lynch et al. 2011), and the mosquitoes Aedes aegypti (Juhn and James 2006) and Culex quinquefasciatus (Juhn et al. 2008)), posterior oskar mRNA localization also appears more diffuse than in D. melanogaster. Finally, the only oskar ortholog identified to date in a hemimetabolous insect (the cricket Gryllus bimaculatus) also lacks a Long Oskar domain, and its mRNA is distributed ubiquitously in oocytes and embryos, showing no asymmetric localization at all (Ewen-Campen et al. 2012).
While D. melanogaster oskar function has been well characterized at the genetic level, the specific molecular mechanisms by which it functions remain unknown. Previous comparative approaches using the entire gene have shown functional divergence of Drosophila oskar (Jones and Macdonald 2007; Webster et al. 1994) but have not identified the specific regions or selective pressures involved. Our molecular evolution analysis shows that the two fastest evolving domains, Long Osk and Lasp-binding, also show a statistically significant signature of positive selection on the D. virilis branch in our analysis of 18 species orthologs. Specific putatively positively selected sites within these domains also exhibit major differences in the physicochemical properties of amino acids between D. melanogaster and D virilis. However, the high divergence time between D. melanogaster, D. immigrans, and D. virilis means that we can only cautiously infer positive selection at these sites. Further polymorphism-based tests of positive selection will be required to elucidate the selection pressures involved. Nonetheless, based on their faster rate of evolution, we suggest that changes in the Long Oskar and Lasp-binding domains underlie functional differences between the Oskar proteins. Functional verification of the roles of these domains and specific sites will be required to evaluate the contributions of each of these candidates to the evolution of oskar function. The Long Osk and Lasp-binding domains are the first candidates for oskar functional evolution identified using an evolutionary approach, and provide specific hypotheses that can be tested for functional verification in future in vivo studies. This analysis thus represents an important step towards understanding the role of Oskar in germ plasm evolution and assembly.
Thanks to Paul Macdonald for the plasmid containing the D. virilis oskar cDNA, to John Srouji for Sanger sequencing and discussion of the results, to Victor Zeng and Amit Indap for assistance with preliminary analyses, and to Extavour lab members for discussion of the manuscript. This work was partly supported by NSF grant IOS-0817678 to CGE and funds from Harvard University.
The authors declare that they have no competing interests.
AA conceived of the study, created alignments, and performed evolutionary rate analyses. CGE assisted with study design and performed analyses of amino acid physicochemical properties and phylogenetic distribution of germ plasm morphology. Both authors wrote and approved the final manuscript.
- Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Aquadro CF, Ardell DH, Arguello R, Artieri CG, Barbash DA, Barker D, Barsanti P, Batterham P, Batzoglou S, Begun D, Bhutkar A, Blanco E, Bosak SA, Bradley RK, Brand AD, Brent MR, Brooks AN, Brown RH, Butlin RK, Caggese C, Calvi BR, Bernardo de Carvalho A, Caspi A, Castrezana S, Celniker SE, Chang JL, Chapple C, Chatterji S, Chinwalla A, Civetta A, Clifton SW, Comeron JM, Costello JC, Coyne JA, Daub J, David RG, Delcher AL, Delehaunty K, Do CB, Ebling H, Edwards K, Eickbush T, Evans JD, Filipski A, Findeiss S, Freyhult E, Fulton L, Fulton R, Garcia ACL, Gardiner A, Garfield DA, Garvin BE, Gibson G, Gilbert D, Gnerre S, Godfrey J, Good R, Gotea V, Gravely B, Greenberg AJ, Griffiths-Jones S, Gross S, Guigo R, Gustafson EA, Haerty W, Hahn MW, Halligan DL, Halpern AL, Halter GM, Han MV, Heger A, Hillier L, Hinrichs AS, Holmes I, Hoskins RA, Hubisz MJ, Hultmark D, Huntley MA, Jagadeeshan S, Jeck WR, Johnson J, Jones CD, Jordan WC, Karpen GH, Kataoka E, Keightley PD, Kheradpour P, Kirkness EF, Koerich LB, Kristiansen K, Kudrna D, Kulathinal RJ, Kumar S, Kwok R, Lander E, Langley CH, Lapoint R, Lazzaro BP, Lee S-J, Levesque L, Li R, Lin C-F, Lin MF, Lindblad-Toh K, Llopart A, Long M, Low L, Lozovsky E, Lu J, Luo M, Machado CA, Makalowski W, Marzo M, Matsuda M, Matzkin L, McAllister B, McBride CS, McKernan B, McKernan K, Mendez-Lago M, Minx P, Mollenhauer MU, Montooth K, Mount SM, Mu X, Myers E, Negre B, Newfeld S, Nielsen R, Noor MAF, O’Grady P, Pachter L, Papaceit M, Parisi MJ, Parisi M, Parts L, Pedersen JS, Pesole G, Phillippy AM, Ponting CP, Pop M, Porcelli D, Powell JR, Prohaska S, Pruitt K, Puig M, Quesneville H, Ram KR, Rand D, Rasmussen MD, Reed LK, Reenan R, Reily A, Remington KA, Rieger TT, Ritchie MG, Robin C, Rogers Y-H, Rohde C, Rozas J, Rubenfield MJ, Ruiz A, Russo S, Salzberg SL, Sanchez-Gracia A, Saranga DJ, Sato H, Schaeffer SW, Schatz MC, Schlenke T, Schwartz R, Segarra C, Singh RS, Sirot L, Sirota M, Sisneros NB, Smith CD, Smith TF, Spieth J, Stage DE, Stark A, Stephan W, Strausberg RL, Strempel S, Sturgill D, Sutton G, Sutton GG, Tao W, Teichmann S, Tobari YN, Tomimura Y, Tsolas JM, Valente VLS, Venter E, Venter JC, Vicario S, Vieira FG, Vilella AJ, Villasante A, Walenz B, Wang J, Wasserman M, Watts T, Wilson D, Wilson RK, Wing RA, Wolfner MF, Wong A, Wong GK-S, Wu C-I, Wu G, Yamamoto D, Yang H-P, Yang S-P, Yorke JA, Yoshida K, Zdobnov E, Zhang P, Zhang Y, Zimin AV, Baldwin J, Abdouelleil A, Abdulkadir J, Abebe A, Abera B, Abreu J, Acer SC, Aftuck L, Alexander A, An P, Anderson E, Anderson S, Arachi H, Azer M, Bachantsang P, Barry A, Bayul T, Berlin A, Bessette D, Bloom T, Blye J, Boguslavskiy L, Bonnet C, Boukhgalter B, Bourzgui I, Brown A, Cahill P, Channer S, Cheshatsang Y, Chuda L, Citroen M, Collymore A, Cooke P, Costello M, D’Aco K, Daza R, De Haan G, DeGray S, DeMaso C, Dhargay N, Dooley K, Dooley E, Doricent M, Dorje P, Dorjee K, Dupes A, Elong R, Falk J, Farina A, Faro S, Ferguson D, Fisher S, Foley CD, Franke A, Friedrich D, Gadbois L, Gearin G, Gearin CR, Giannoukos G, Goode T, Graham J, Grandbois E, Grewal S, Gyaltsen K, Hafez N, Hagos B, Hall J, Henson C, Hollinger A, Honan T, Huard MD, Hughes L, Hurhula B, Husby ME, Kamat A, Kanga B, Kashin S, Khazanovich D, Kisner P, Lance K, Lara M, Lee W, Lennon N, Letendre F, LeVine R, Lipovsky A, Liu X, Liu J, Liu S, Lokyitsang T, Lokyitsang Y, Lubonja R, Lui A, MacDonald P, Magnisalis V, Maru K, Matthews C, McCusker W, McDonough S, Mehta T, Meldrim J, Meneus L, Mihai O, Mihalev A, Mihova T, Mittelman R, Mlenga V, Montmayeur A, Mulrain L, Navidi A, Naylor J, Negash T, Nguyen T, Nguyen N, Nicol R, Norbu C, Norbu N, Novod N, O’Neill B, Osman S, Markiewicz E, Oyono OL, Patti C, Phunkhang P, Pierre F, Priest M, Raghuraman S, Rege F, Reyes R, Rise C, Rogov P, Ross K, Ryan E, Settipalli S, Shea T, Sherpa N, Shi L, Shih D, Sparrow T, Spaulding J, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Strader C, Tesfaye S, Thomson T, Thoulutsang Y, Thoulutsang D, Topham K, Topping I, Tsamla T, Vassiliev H, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Young G, Yu Q, Zembek L, Zhong D, Zimmer A, Zwirko Z, Jaffe DB, Alvarez P, Brockman W, Butler J, Chin C, Grabherr M, Kleber M, Mauceli E, MacCallum I (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450(7167):203–218. doi:10.1038/nature06341 PubMedCrossRefGoogle Scholar
- Lynch JA, Özüak O, Khila A, Abouheif E, Desplan C, Roth S (2011) The phylogenetic origin of oskar coincided with the origin of maternally provisioned germ plasm and pole cells at the base of the holometabola. PLoS Genet 7(4):e1002029. doi:10.1371/journal.pgen.1002029 PubMedCentralPubMedCrossRefGoogle Scholar
- Oliveira DCSG, Almeida FC, O’Grady PM, Armella MA, DeSalle R, Etges WJ (2012) Monophyly, divergence times, and evolution of host plant use inferred from a revised phylogeny of the Drosophila repleta species group. Mol Phylogenet Evol 64(3):533–544. doi:10.1016/j.ympev.2012.05.012 PubMedCrossRefGoogle Scholar
- Robe LJ, Valente VLS, Budnik M, Loreto ELS (2005) Molecular phylogeny of the subgenus Drosophila (Diptera, Drosophilidae) with an emphasis on Neotropical species and groups: a nuclear versus mitochondrial gene approach. Mol Phylogenet Evol 36(3):623–640. doi:10.1016/j.ympev.2005.05.005 PubMedCrossRefGoogle Scholar
- Yang Y, Hou Z-C, Y-h Q, Kang H, Zeng Q-t (2012) Increasing the data size to accurately reconstruct the phylogenetic relationships between nine subgroups of the Drosophila melanogaster species group (Drosophilidae, Diptera). Mol Phylogenet Evol 62(1):214–223. doi:10.1016/j.ympev.2011.09.018 PubMedCrossRefGoogle Scholar