Genomic analysis of the four ecologically distinct cactus host populations of Drosophila mojavensis
Relationships between an organism and its environment can be fundamental in the understanding how populations change over time and species arise. Local ecological conditions can shape variation at multiple levels, among these are the evolutionary history and trajectories of coding genes. This study examines the rate of molecular evolution at protein-coding genes throughout the genome in response to host adaptation in the cactophilic Drosophila mojavensis. These insects are intimately associated with cactus necroses, developing as larvae and feeding as adults in these necrotic tissues. Drosophila mojavensis is composed of four isolated populations across the deserts of western North America and each population has adapted to utilize different cacti that are chemically, nutritionally, and structurally distinct.
High coverage Illumina sequencing was performed on three previously unsequenced populations of D. mojavensis. Genomes were assembled using the previously sequenced genome of D. mojavensis from Santa Catalina Island (USA) as a template. Protein coding genes were aligned across all four populations and rates of protein evolution were determined for all loci using a several approaches.
Loci that exhibited elevated rates of molecular evolution tend to be shorter, have fewer exons, low expression, be transcriptionally responsive to cactus host use and have fixed expression differences across the four cactus host populations. Fast evolving genes were involved with metabolism, detoxification, chemosensory reception, reproduction and behavior. Results of this study give insight into the process and the genomic consequences of local ecological adaptation.
KeywordsGenome evolution Adaptation Drosophila Ecological genomics Genome sequencing Genome assembly Drosophila mojavensis
- 2 L
Left arm of 2nd chromosome in D. melanogaster
Right arm of 2nd chromosome in D. melanogaster
- 3 L
Left arm of 3rd chromosome in D. melanogaster
Right arm of 3rd chromosome in D. melanogaster
Accessory gland protein
Analysis of Variance
Binary Alignment Map
European Molecular Biology Open Software Suite
False Discovery Rate
Number of nonsynonymous substitution per nonsynonymous site
Kyoto Encyclopedia of Genes and Genomes
Number of synonymous substitution per synonymous site
Molecular Evolutionary Genetics Analysis software
Phylogenetic Analysis of Maximum Likelihood program
PAML significant loci post-FDR correction
Phylogeny Inference Package
Reads Per Kilobase per Million mapped reads
Loci with ω values in the top 10% of the distribution
Increasing availability of whole-genome sequencing data provides new insights into the complex relationship between an organism and its environment. By examining changes in the genetic code both at the level of individual genes and at the whole-genome level it is possible to gain a better understanding of how local ecological conditions can shape the pattern of variation within and between ecologically distinct populations [1, 2]. A comprehensive integrative approach combining genomic, phenotypic data has been identified as the gold standard in understanding the adaptation process [3, 4]. Yet, an examination of the genomic divergence of ecologically distinct populations can yield valuable insight into the adaptation process especially when the genomic data is placed in an ecological context . This later approach can identify genomic regions and loci that exhibit a pattern of variation and evolution suggesting their role in local ecological adaptation. Furthermore, a consequence of the fixation of ecologically-relevant variants has been implicated in the evolution of barriers to gene flow and potentially the origins of reproductively isolated populations, i.e. species [6, 7].
While it has long been accepted that natural selection is a primary driver of change within species as a response to environmental pressures, understanding the mechanism of how this selection leads to speciation is unclear [8, 9]. More recently the idea of ecological speciation, where various mechanisms work to prevent gene flow between populations causing reproductive isolation and eventually speciation, has more directly shown how selection to local ecological conditions may affect the process of speciation [6, 7]. Reproductive isolation interrupts gene flow between populations and may potentially lead to the formation of new species . When different populations of a species inhabits and/or utilizes distinct resources this opens many possibilities for local differentiation that can lead to obstacles of gene flow as these populations are likely to have differing environmental pressures [6, 7]. For example, in the leaf beetle Neochlamisus bebbianae, different populations have distinct host preferences and larvae perform significantly worse when growing on alternative host species . Host preferences and performance in this system facilitates the genetic and genomic isolation observed between the host populations, as each prefers a different microenvironment and likely does not interact and hybridize with members of the other population [11, 12].
Comparative genomic studies in mammals have shown clear evidence of positive selection both between humans, mice, and chimpanzees as well as between human populations [13, 14, 15, 16]. Genes involved in the immune system, gamete development, sensory perception, metabolism, cell motility, and genes involved with cancer were those found to have signatures of positive selection. While in Drosophila, a genome level analysis of 12 species provided insight into the evolution of an ecological, morphological, physiological and behaviorally diverse genus . Findings were relatively consistent with previously studies in other taxa with genes involving defense, chemosensory perception, and metabolism shown to be under positive selection [6, 13, 16, 18]. Since the Drosophila 12 genome project , several population genomics studies in D. melanogaster have examined variation within a single population, between clinal populations and between ancestral (African) and cosmopolitan populations to assess the consequence of population subdivision, evolution of quantitative trait variation and the adaptation to local ecological conditions [19, 20, 21, 22, 23, 24]. These genome level analysis have been extended to other D. melanogaster species group flies with distinct life history and ecological strategies such as the Morinda citrifolia specialist D. sechellia  and the invasive agricultural pest D. suzukii .
Population genetics on individual candidate host adaptation genes in D. mojavensis has shown evidence for positive selection in loci involved with xenobiotic metabolism . In addition, transcriptome-wide differences have been observed in D. mojavensis in response to host shifts [43, 44] as well as indicating fixed expression differences between the host populations . Among the loci that are differentially expressed or constitutively fixed between populations many are involved in detoxification, metabolism, chemosensory perception and behavior, supporting the role of the local necrotic cactus conditions in shaping transcriptional variation [43, 44, 45]. Taking into consideration the breadth of ecological information of D. mojavensis this study highlights how selection pressures caused by local ecological environments differentially shape patterns of genomic variation across the host populations and provides further insight into how selection acts on organisms and its genome level consequences.
Number of cleaned reads and assembled reads for each population
Characteristics and patterns of divergence of D. mojavensis loci
To describe the characteristics of loci whose evolutionary trajectory could have been shaped by the adaptation of D. mojavensis populations to their respective ecological conditions we examined loci with ω values in the top 10% of the distribution, hereafter referred to as TOP10 loci. Furthermore, using codeml we performed a series of gene-wide tests of positive selection for each individual locus. Via a maximum likelihood rate test (model 7 vs. model 8) we identified 912 loci that exhibited a pattern of adaptive protein evolution. We used a smaller set of 244 loci, following an FDR correction, for all subsequent analyses, hereafter referred to as PAML-FDR loci. The set of TOP10, PAML significant loci and those with an FDR correction (PAML-FDR) can be found in Additional file 1: Table S1. The distribution of both the PAML-FDR and TOP10 loci was uniform across the D. mojavensis chromosomes (Additional file 2: Figures S3 and S4), with the exception that significantly fewer PAML-FDR genes were present in Muller E (Fisher’s Exact test, P = 0.02).
Relationship between expression and rate of molecular evolution
We also integrated our genomic data with two prior ecological transcriptional studies. We compare rates of molecular evolution of loci that are differentially expressed in response to cactus host utilization  as well as those loci who exhibit fixed significant expression differences between the four host populations in the absence of cactus compounds (i.e. constitutive differences) . To remove the potential confounding effect of those loci that show a pattern of positive selection, we removed those loci from the subsequent expression analysis. For both datasets, loci that are either differentially expressed in response to necrotic cactus (P < 0.001 post FDR correction) or those that show constitutive differences between the populations (P < 0.001 post FDR correction) have a significantly greater value of ω (ANOVA, P < 0.001, for both comparisons) (Additional file 2: Figure S15, Table S7).
Functional gene groups analysis
In this study we sequenced, assembled and analyzed the genomes of each of the four cactus host populations of D. mojavensis for the purpose of assessing the genomic consequences of the adaptation to local ecological conditions. Overall, we were able to analyze the sequence, pattern of divergence and structure of 9087 genes. And although the four genomes examined diverged relatively recently [30, 31, 32, 33, 34], for several loci, sufficient number of substitutions occurred for us to begin to assess the changes associated with cactus host adaptation.
Unlike what is present in D. melanogaster, D. mojavensis chromosomes are all acrocentric and its karyotype is composed of six Muller elements . In D. melanogaster element A is the X chromosome and elements B/C and D/E form large metacentric chromosomes (2 L/2R and 3 L/3R, respectively), while the F element or dot chromosome is reduced in sized and highly heterochromatic [50, 51]. In D. mojavensis we observed the highest rate of molecular evolution in the small F element, followed by elements B and E, and then the remaining autosomal elements and the X chromosome (Fig. 2).
Selection on the X chromosome has been examined in a number of studies with somewhat variable results . Analysis of several melanogaster group species has shown significant elevated ω values for genes on the X chromosome . From population genetics theory it is generally predicted that the X chromosome would show elevated rates of evolution due to its reduced population size and level of recombination . A subsequent genomic analysis of the X chromosome across more distant Drosophila species (D. melanogaster, D. pseudoobscura, D. miranda and D. yakuba) failed to find evidence of increased protein evolution on the X chromosome . It is difficult to make any conclusions about the lack of a pattern of accelerated X chromosome evolution found here, it may be possible that there has not been enough divergence time between these populations for factors such as effective population size to have a measurable effect. The greatest ω values were present in the dot chromosome which in D. mojavensis is heterochromatic and has a highly reduced level of recombination , which would make it highly susceptible to sweeps and hence higher rates of molecular evolution.
Within D. mojavensis there are polymorphic inversions in Muller elements B and E , both exhibited overall higher chromosomal-wide levels of ω (Fig. 3). Lower levels of recombination and higher divergence rates have been known to occur around the inversion breakpoint regions in Drosophila . One possible explanation for the elevated rates of molecular evolution in these chromosomes is the distinct karyotypes of the sequenced lines (Additional file 2: Table S9). One consequence of a template-based assembly as performed in this study, is that chromosomal structural differences can be largely wiped away. A more detailed analysis of the consequence of chromosomal inversion on the evolutionary trajectories of associated loci will be performed in future analyses of de novo assemblies of D. mojavensis genomes from all host populations  as well as from sibling species (D. arizonae and D. navojoa) (unpublished data, Matzkin). Furthermore, these new chromosome-level genome assemblies of D. mojavensis and related species will allow us to determine the fraction of loci with high ω that are de novo and unique to the D. mojavensis lineage.
Genes across the genome as well as those with evidence of positive selection or in the top 10% of ω values were assessed for a number of characteristics. Genome-wide loci exhibiting greater ω values tended to be shorter, have fewer exons (3 or less), have low expression, be differentially expressed in response to cactus host use and have fixed expression differences across the four cactus host populations of D. mojavensis (Fig. 3; Additional file 2: Figures S7, S12, S15). Overall this pattern of divergence was similar when examining the TOP10 or PAML-FDR loci. Previous genomic analyses in D. melanogaster and related species have observed similar characteristics of loci with elevated ω values. This indicates that although the phylogenetic scale of the present study is limited (within D. mojavensis) the forces shaping genome evolution between diverged species can also be observed between recently isolated populations within species.
The first comparative genomic study within the D. melanogaster group species  observed an association between coding length and ω, which they partially attributed to a positive correlation between Ks and protein length. Longer genes have more of these mutations and this may explain in part why genes with high ω values are likely to be shorter. In this study we did not observe such correlation, in fact the relationship is negative (P < 0.001), but explains very little of the variation in Ks (r2 = 0.004) (Additional file 2: Figure S21). Therefore, it is difficult to infer the effect of the association between Ks and protein length, and the lack of positive correlation might be a function of the close relationship between the genomes studied here. The negative association between intron number and rate of molecular evolution has been previously suggested to be due to the presence of exonic splice site enhancers which help in the correct removal of introns from the transcription sequence. As mutations in these regions are more likely to be conserved changes here could cause an intron to not be removed or part of an exon to be removed instead . The link between intron presence and ω values may also help explain why TOP10 genes tend to be shorter as long genes are more likely to have introns . The correlation between gene length and rate of molecular evolution could also be explained as a result of the increased level of interactions between sites of larger exons . In this study a negative correlation between ω and exon length (r2 = 0.08, P < 0.001) was observed (Additional file 2: Figure S22). These interactions between residues of a protein, commonly refer to as Hill-Robertson interference , have a tendency to buffer against the accumulation of amino acid substitutions and can explain a significant portion of the pattern of molecular evolution in genomes .
Highly expressed genes tend to have a higher level of constraint as indicated by the tendency of having lower rates of molecular evolution. This has been previously explained as being a result of selection against mutations that alter transcriptional and translational efficiency as well as selection for the maintenance of correct folding (translational robustness) [58, 64, 65, 66, 67, 68]. Given our coarse transcription data we were not able to tease apart which of the above-mentioned forces might more strongly shape the rate of molecular evolution in these genomes. Nonetheless we observed a clear negative relationship across the four D. mojavensis genomes between transcriptional level and ω. In addition to overall expression, both tissue and sex-bias expression have been known shape the evolutionary trajectories of genes [63, 69, 70, 71]. Male, or more specifically testes expressed genes have been associated with elevated rates of molecular evolution in Drosophila and across many taxa . Many of these loci are believed to be under strong sexual selection, which would explain their accelerated rate of molecular evolution. As predicted we observed an overall higher rate of molecular evolution in male-biased genes. Even female-biased loci exhibited a significant greater ω than unbiased genes. Previous behavioral and molecular studies in D. mojavensis have shown that this species experiences strong and recurrent bouts of sexual selection [29, 73, 74, 75, 76, 77, 78, 79].
Loci indicating a pattern of positive selection and those with elevated ω appear to be associated with a wide range of metabolic processes. These changes are likely a result of the distinct nutritional and xenobiotic environment the different D. mojavensis populations experience. The chemical composition of the cacti and the species of yeast found in each rot varies [35, 36, 37, 38, 39, 40, 41, 42] and thus the populations have likely needed to optimize the recognition, avoidance and processing of these necrosis-specific compounds through changes in metabolism, physiology and behavior.
ne aspect of metabolism that has likely been shaped by cactus host adaptation is the detoxification of cactus compounds, as the distinct cactus hosts have different chemical compositions. Expression studies have shown that genes involved in detoxification are enriched when flies develop in an alternative necrotic cactus species. Fitness costs of living on the alternative cactus have also been shown to be quite high with those flies having low viability (< 40%) [44, 80, 81]. Out of all GO terms examined in this study, the only ones that were consistently overrepresented were those associated with serine-type endopeptidase activity. These type of proteins perform a number of function within organisms, among them is their targeting of organophosphorus toxins . These compounds are often used in pesticides and are found to inhibit serine hydrolase function in both insects and vertebrates . While the apparent positive selection on these genes could be due to a response to pesticides they might experience in the field, but more likely they may be evolving in response to the effects of the toxic or nutritional compounds found in cactus rots.
Cactophilic Drosophila have been shown to deploy a number of enzymatic strategies to ameliorate the deleterious consequences of ingesting cactus necrosis-derived compounds. Many of the previously identified proteins playing a role in detoxification in cactophiles (Glutathione S-transferases, Cytochrome P450s, Esterases and UDP-glycosyltransferase) have been associated with detoxification in a broad number of taxa [83, 84, 85, 86, 87]. In fact, in recent comparative genomic analysis of the cactophilic D. buzzatii  and D. aldrichi , a number of metabolic genes, including those associated with detoxification were shown to be under positive selection. In the present genomic analysis of the D. mojavensis genome we observed that the largest functional cluster (Fig. 5) was composed of several genes belonging to known detoxification protein families, such as Cytochrome P450 and Glutathione S-transferases (Gst). Furthermore, previous transcriptional studies have indicated that these same categories of detoxification loci are differentially expressed when D. mojavensis are utilizing necrotic cactus tissues [43, 44]. A population genetics analysis of GstD1 has indicated a pattern of adaptive amino acid evolution at this locus in the Sonora and Baja California populations . The location of the fixed residue fixed in the lineages leading to these two populations indicated potential functional consequences and a recent kinetic analysis of these proteins have support this prediction (Matzkin, unpublished data).
The diversity of bacterial species found on each necrotic cactus provides, directly or indirectly, nutritional resources for the fly populations, but also are composed of potentially distinct pathogenic organisms [90, 91]. A number of genes with elevated rates of molecular evolution in this study are linked to a range of processes involved with the immune response. As each population is faced with a different composition of threats, the evolutionary arms race between flies and their pathogens creates further divergence between the populations as they face different pathogenic landscapes. Studies in other species, such as D. simulans, have found that genes with immune related functions were found to have higher rates of positive selection than the genome average . Exposure to bacterial pathogens in D. mojavensis could occur while utilizing the necrotic cactus substrate, but as has been previously suggested , via sexual transmission.
A number of the TOP10 loci in this study perform functions associated with sensory perception and behavior (Fig. 6). Drosophila mojavensis larvae actively seek out patches of preferred yeast species  and across the four host populations there are distinct larval foraging strategies . More specifically genes involved in chemosensory behavior were observed to have elevated ω values in these genomes. Across Drosophilids, there have been a number of studies indicating the links between the evolution of chemosensory genes and host specialization [96, 97, 98]. In D. sechellia, a specialist species, was found to be losing olfactory receptor genes at a faster rate than its sibling generalist species D. simulans . In D. mojavensis each cactus species rot contains different compounds and thus have distinct set of volatiles emanating from the necroses [40, 41]. These chemical differences have shaped the feeding and oviposition behavior of flies as has been shown by the exposure of adults to cactus volatiles [100, 101, 102]. Recent analysis of populations differentiation in odorant and gustatory receptors have shown that unlike what might be initially predicted a number of the changes in these receptors suggests that effects at the level of signal transduction in addition to odorant recognition . Further functional analysis is needed to better understand the evolution and functional changes of chemosensory pathways associated with the adaptation to necrotic cacti.
In addition to their role in xenobiotic metabolism, serine proteases have been shown to be involved in the network of proteins associated with reproductive interactions in several taxa. In D. melanogaster accessory gland proteins (ACP), such as sex peptide, are found to perform a wide range of functions ranging from stimulating ovulation and reducing a female’s remating rate to helping to defend against infections [104, 105, 106]. Knockouts of serine proteases have been shown to interfere with the behavioral and physiological effects of the male-derived sex peptide . In D. mojavensis and its sister species D. arizonae a large number of proteases are expressed in female reproductive tracts and several have been shown to be under strong positive selection [76, 107, 108, 109]. In addition to ACPs being transferred via the ejaculate, gene transcripts have been found to be deposited by males into females during copulation . Some of these male-derived transcripts could alter the female’s transcriptional response, while other may potentially be translated within females. Furthermore, the loci of several of these male-transferred transcripts show a pattern of strong and continuous positive selection, likely as the result of persistent sexual selection . While there seems to be no postzygotic effects of sexual isolation within the D. mojavensis populations there is some evidence of prezygotic isolation, where certain populations prefers to mate with members of its own population . The pattern of positive selection and/or elevated rate of molecular evolution for proteases and reproductive loci in the present study may highlight the continuing genomic consequence of sexual selection in this species.
Local ecological adaptation can shape the pattern variation at multiple levels (life history, behavior and physiological), and the imprint of this multifaceted selection can be observed at the genomic level. In this first ever genome-wide analysis of the pattern of molecular evolution across the four ecologically distinct populations of D. mojavensis, we have begun to describe the genomic consequences of the adaptation of these cactophilic Drosophila to their respective environments. Given that across the four populations are known differences in cactus host use, which encompass differences in both toxic and nutritional compounds, but as well as necrotic host density, temperature, exposure to desiccation and likely pathogens and predators, it was expected that a number of functional classes of loci might be under selection. Among genes with elevated rates of change are those involved in detoxification, metabolism, chemosensory perception, immunity, behavior and reproduction. We observed general patterns of variation across the genomes indicating that loci with elevated rates of molecular evolution tended to be shorter, with fewer exons and have low overall expression. Furthermore, fast evolving loci also were more likely to be differentially expressed in response to cactus host use and have fixed inter-population expression differences, indicating that both transcriptional and coding sequence changes have been involved in the local ecological adaptation of D. mojavensis.
Drosophila mojavensis lines and sample preparation
Fly lines MJBC 155 collected in La Paz, Baja California in February 2001, MJ 122 collected in Guaymas, Sonora in 1998, and MJANZA 402–8 collected in ANZA-Borrego Park, California in April 2002 were used as the source lines for the sequencing of three D. mojavensis populations. These lines were highly inbred to reduce the heterozygosity of their DNA. Summary of the karyotype of each of the lines sequenced as well as the Catalina Island template genome stock (15081–1352.00) can be found in Additional file 2: Table S9. The flies were grown for two generations in banana molasses media  supplemented with ampicillin (125 μg/ml) and tetracycline (12.5 μg/ml), to prevent the isolation of bacterial DNA in addition to the flies’. DNA was extracted from homogenized whole male flies using a combination of phenol/chloroform DNA extraction and Qiagen DNeasy spin-columns to achieve the required amount of DNA material. RNase A was used to reduce RNA contamination. Gel electrophoresis was run on each sample to check the quality of the extraction. Any samples with RNA contamination were run through a Qiagen QIAquick PCR Purification Kit spin column to filter contaminates. Extracted DNA was sent to the HudsonAlpha Institute for Biotechnology Genomic Services Lab (Huntsville, Alabama) for sequencing. One hundred base pair paired-end and mate pair sequencing was done on an Illumina HiSeq 2000 with one lane for each.
Paired-end and mate pair Illumina reads were filtered and trimmed using step one of the A5 Pipeline . This step uses SGA  and TagDust  with the quality scores from the Illumina FASTQ files to reduce the number of low quality reads. A5 was run on the Dense Memory Cluster of the Alabama Super Computer Center with four processing cores and 64 gigabytes of memory allocated for each run. With the reads cleaned they were assembled to the template genome. The reference genome of the Catalina Island population of D. mojavensis was assembled as part of the Drosophila 12 Genomes Consortium . Version 1.04 of the reference genome was retrieved from FlyBase version FB2015_02 . From the reference sequence, genome scaffolds  containing the protein-coding genes previously mapped to a chromosome, were extracted for use as a template for the assembly; these scaffolds are detailed in Additional file 2: Table S10. The reference templates as well as the Illumina reads were imported into Geneious 8.1. Assembly was done separately for paired-end and mate pair data. Using Geneious 8.1 and its Map to Reference feature the cleaned reads were assembled to each of the template scaffolds. BAM files were exported for each paired-end and mate pair assembly. SAMtools  was used to merge BAM files to create an assembly with both types of reads. This merged BAM file was imported into Geneious 8.1 where consensus sequences were determined for each scaffold using majority calling to limit the number of ambiguities. GTF files for each scaffold used were retrieved from FlyBase version FB2015_02 . These annotations were transferred to each of the new genomes by aligning each assembled genome scaffold to the reference genome scaffold using Mauve Genome Alignment  with default settings except for selecting assume collinear genomes. After alignment, annotations were transferred from the reference to the new assembly. The resulting scaffolds were exported in GenBank format. Using the EMBOSS program, extractfeat , CDS sequences were extracted from the assembled scaffolds. Sequence files for each gene were concatenated and then aligned using the default settings of the aligner Muscle 3.8.31 . Only the longest transcript for each gene was used as some genes have multiple splice variants.
Molecular evolution analysis
To generate substitution counts for filtering, the software KaKs Calculator 1.2  was used. Files of aligned genes were converted to AXT format using the Perl script parseFastaIntoAXT.pl including in the package. After conversion each gene was run through the software using the NG method . The output files for each loci were concatenated and then imported into JMP 10 for filtering.
Values for ω were calculated using codeml part of the PAML 4.9 package . Aligned genes were converted to PHYLIP format using BioPerl . As PAML requires a phylogenetic tree to be provided for its calculations a neighbor joining tree was constructed in MEGA 5 . This was done by concatenating all exons from each population and then aligning them using Mauve Genome Alignment . The alignment was converted to MEG format using MEGA and a neighbor joining tree was built using the default settings. The tree was exported in newick format for use by PAML. Genes were removed from analysis if they were not divisible by three, these genes were manually screened and if alignment errors appeared to be the cause, these were manually corrected. Screening was done for stop codons within the sequences by translating the DNA sequence to protein sequence with Transeq, part of the EMBOSS package  and any genes with internal stop codons were removed.
Using the BioPython PAML module , control files were built for each gene alignment with default values taken except codon frequency was set to F3x4. Site-class models 0, 7, and 8 were used to calculate the ω values [123, 124, 125]. Model 0 is a single ratio based omega value for the entire gene. Model 7 is a null model with 10 classes, which does not allow for positive selection while model 8 adds an additional class that allows for positive selection. Both the ω values and log likelihood values were extracted from each output file and the data was organized in Microsoft Excel. If model 8 significantly better fits the data this is evidence of positive selection . Significance values were found by taking the difference between the log likelihood values of the two outputs and multiplying them by two. This value was then compared a chi-square distribution to find P values for each gene. Genes with less than five total substitutions as determined by KaKs Calculator  were filtered out and not considered. This was done to help deal with the low power of these methods when there are very few changes between the populations. Genes with few changes are more likely to cause the software to either return an undefined result or to reach the maximum ω the software allows. In addition, genes with either no nonsynonymous or no synonymous changes were also removed. This yielded a total of 9087 genes that were used in the analysis. Histograms of a log2 transformation of the ω values were produced using JMP 10. A comparison between the log2 transformations of the NG Ka/Ks and the omega value from model 0 of codeml was generated with JMP 10.
The length of each gene’s coding sequence was extracted from the PHYLIP sequence headers. This was to determine if genes with longer length have significantly different omega values. Genes were binned based on length and an ANOVA with post-hoc Tukey test using JMP 10 was used to compare length bins for significance. Intron data was extracted from the reference genome annotation using Geneious 8.1. Based on this, genes were binned based on the number of exons. ANOVA with post-hoc Tukey test in JMP 10 compared the bin sets for significant difference in omega. To determine if there was a significant difference in omega between genes present on each Muller element ANOVA with post-hoc Tukey test was used in JMP 10 to compare omega value distribution on each element.
Previous transcriptional studies provided differential expression data for cactus host shifts  and between populations . Loci that were found to be significant with codeml model 7 and 8 were removed from this analysis. The model 0 omega for loci with a FDR significance greater than 0.001 for third-instar larva from the D. mojavensis Sonora population that were raised on agria cactus rot was compared to non-significant loci using ANOVA in JMP 10. Comparison of model 0 omega between FDR significant loci and non-significant loci was also done for differential expression between third-instar larva of the four host populations with ANOVA in JMP 10.
To explore the relationship between omega and gene expression level RNAseq data from  was retrieved for whole male and female D. mojavensis flies as aligned BAM files. Differential expression was calculated by using edgeR  to look for genes with significantly higher male or female expression. Box plots of omega model 0 for genes with significant male or female expressed genes as well as genes without sex based expression were compared using ANOVA with post-hoc Tukey test in JMP 10. Average adjusted (+ 0.25) log2 RPKM of non-sex biased genes was plotted against log2 omega model 0 and linear regression was performed on the data with JMP 10.
Gene ontology terms analysis
Network graphs were generated using Cytoscape 3.2.1  with the add-on app ClueGO 2.2.5 . GO term and KEGG pathway data used was from the June 2016 release. A custom D. melanogaster reference set was used for analysis based on D. melanogaster genes with a D. mojavensis ortholog that was present in the unfiltered dataset as retrieved from FlyBase version FB2017_06 . Both the TOP10 and PAML-FDR genes were run on, biological processes, molecular function and KEGG terms. Data for GO term summary tables was retrieved from FlyBase version FB2017_06 D. melanogaster release 6.19 . For each D. mojavensis gene with a D. melanogaster ortholog, GO term summaries were phrased from the FlyBase GO Summary Ribbons for molecular function and biological process. Clustering done with JMP 10 using the Ward method and 15 groups allowed.
The authors greatly acknowledge the work of Laurel Brandsmeier in this project.
CWA performed the assembly and analysis of the genomic data and was involved in the writing of the manuscript. LMM conceived of and designed the study, was involved in the analysis and the writing of the manuscript. All authors read and approved the final manuscript.
This work was supported by a Junior Faculty Distinguished Research award from the University of Alabama in Huntsville and partly supported by grants from the National Science Foundation (DEB-1219387 and IOS-1557697) to LMM.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 19.Pool JE, Corbett-Detig RB, Sugino RP, Stevens KA, Cardeno CM, Crepeau MW, Duchen P, Emerson JJ, Saelao P, Begun DJ, et al. Population genomics of sub-saharan Drosophila melanogaster: African diversity and non-African admixture. PLoS Genet. 2012;8(12):e1003080.PubMedPubMedCentralCrossRefGoogle Scholar
- 23.Grenier JK, Arguello JR, Moreira MC, Gottipati S, Mohammed J, Hackett SR, Boughton R, Greenberg AJ, Clark AG. Global diversity lines-a five-continent reference panel of sequenced Drosophila melanogaster strains. G3-Genes Genom Genet. 2015;5(4):593–603.Google Scholar
- 26.Chiu JC, Jiang XT, Zhao L, Hamm CA, Cridland JM, Saelao P, Hamby KA, Lee EK, Kwok RS, Zhang GJ, et al. Genome of Drosophila suzukii, the spotted wing Drosophila. G3-Genes Genom Genet. 2013;3(12):2257–71.Google Scholar
- 28.Heed WB. Ecology and genetics of Sonoran desert Drosophila. In: Brussard PF, editor. Ecological genetics: the interface. New York: Springer-Verlag; 1978. p. 109–26.Google Scholar
- 36.Starmer WT. Associations and Interactions Among Yeasts, Drosophila and their habitats. In: Barker JSF, Starmer WT, editors. Ecological genetics and evolution: the cactus-yeast-Drosophila model system. New York: Academic Press; 1982. p. 159–74.Google Scholar
- 38.Starmer WT, Lachance MA, Phaff HJ, Heed WB. The biogeography of yeasts associated with decaying cactus tissue in North America, the Caribbean, and Northern Venezuela. Evol Biol. 1990;24:253–96.Google Scholar
- 40.Kircher HW. Chemical composition of cacti and its relationship to Sonoran Desert Drosophila. In: Barker JSF, Starmer WT, editors. Ecological genetics and evolution: the cactus-yeast-Drosophila model system. New York: Academic Press; 1982. p. 143–58.Google Scholar
- 42.Fogleman JC, Danielson PB. Chemical interactions in the cactus-microorganism-Drosophila model system of the Sonoran Desert. Am Zool. 2001;41(4):877–89.Google Scholar
- 45.Matzkin LM, Markow TA. Transcriptional differentiation across the four cactus host races of Drosophila mojavensis. In: Michalak P, editor. Speciation: natural processes, genetics and biodiversity. Hauppauge: Nova Science Publishers Inc.; 2013. p. 119–36.Google Scholar
- 57.Jaworski CC, Allan CW, Matzkin LM. Chromosome-level hybrid de novo genome assemblies as an attainable option for non-model organisms. bioRxiv. 2019. https://doi.org/10.1101/748228.
- 86.Feyereisen R. Insect cytochrome P450. In: Gilbert LI, Iatrou K, Gill SS, editors. Comprehensive Molecular Insect Science, vol. 4. Amsterdam: Elsevier; 2005. p. 1–77.Google Scholar
- 90.Foster JLM, Fogleman JC. Identification and ecology of bacterial communities associated with Necroses of 3 Cactus species. Appl Environ Microb. 1993;59:1):1–6.Google Scholar
- 91.Foster J, Fogleman J. Bacterial succession in necrotic tissue of Agria cactus (Stenocereu gummosus). Appl Environ Microb. 1994;60(2):619–25.Google Scholar
- 98.Arguello JR, Cardoso-Moreira M, Grenier JK, Gottipati S, Clark AG, Benton R. Extensive local adaptation within the chemosensory system following Drosophila melanogaster’s global expansion. Nat Commun. 2016;7. https://doi.org/10.1038/ncomms11855.
- 114.Schaeffer SW, Bhutkar A, McAllister BF, Matsuda M, Matzkin LM, O'Grady PM, Rohde C, Valente VLS, Aguade M, Anderson WW, et al. Polytene chromosomal maps of 11 Drosophila species: the order of genomic scaffolds inferred from genetic and physical maps. Genetics. 2008;179(3):1601–55.PubMedPubMedCentralCrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.