Background

Two major classes of methods are currently in use to detect natural selection: population methods, based on analyzing the nature and frequency of allele diversity within a species, and codon analysis methods, based on comparing patterns of synonymous and non-synonymous changes in protein coding sequences. A substantial private and public effort has been undertaken to characterize SNPs tightly associated with genetic diversity. SNPs are identified in ESTs, thus polymorphism could be directly used to map functional and expressed genes [1]. The majority of SNPs in coding regions (cSNPs) are single-base substitutions, which may or may not result in amino acid changes. However, some SNPs may alter a functionally important amino acid residue, and these are of interest for their potential links with phenotypes [2]. If the phenotypic effect impacts survival and reproduction, natural selection operates on SNP alleles [3]. Evolutionary pressures of various kinds have often been hypothesized to cause active and rapid evolutionary changes. Positive selection is a form of natural selection that influences the process by which new advantageous genetic variants sweep across populations. Though advantageous mutations are of great interest, they are difficult to detect and analyze because neutral and deleterious mutations predominate by frequency. In contrast, purifying selection is expected to act against mutations that have deleterious effects on protein structure by causing changes to functionally important amino acid residues or by altering the regulation of gene expression [4]. Since SNPs are almost always bi-allelic, relatively low-gene diversity at a given SNP site is equivalent to lower allelic frequency than the less frequent of the two alleles. The reduction of gene diversity at these SNP sites, in comparison to SNPs in the same genes that do not affect protein structure, provides evidence that the purifying selection has reduced the population allelic frequencies of deleterious SNP alleles [5]. A classic measure for selective pressure on protein-coding genes is the dN/dS (Ka/Ks) ratio. The ratio of the observed non-synonymous mutation rate to the synonymous mutation rate can be utilized as an estimate of selective pressure, where dN/dS < 1 suggests that most amino acid substitutions have been eliminated by the purifying selection, while a dN/dS > 1 indicates positive selection [6].

Wild emmer wheat (Triticum dicoccoides) presumably adaptively diversified from northeastern Israel and Syria into the Near East Fertile Crescent, where it harbors rich genetic diversity and resources [7]. Previous studies in cereals have shown significant nonrandom adaptive molecular genetic differentiation at single and multi-locus structures among micro-ecological environments [8, 9]. The genetic differentiation of variable wild emmer wheat populations included regional and local patterns with sharp genetic differentiation over short distances [10].

Alpha-amylase inhibitors are attractive candidates for the control of seed weevils as these insects are highly dependent on starch as an energy source [11]. In vitro and in vivo trials using α-amylase inhibitors, including those made in field conditions, have now fully confirmed their potential for increasing yields by controlling insect populations [11]. In cereal seeds, α-amylase inhibitor proteins with 120-130 amino acids, which include trypsin inhibitors as well as α-amylase inhibitors, can be grouped into one large family on the basis of the homology between their amino acid sequences [12]. It is known that the bulk of wheat albumins consists of a few amylase isoinhibitor families that are very likely phylogenetically related and coded by a small number of parental genes [13]. The monomeric, homo-dimeric, and hetero-tetrameric α-amylase inhibitors (WMAI, WDAI, and WTAI) are the main members of a family of wheat kernel proteinaceous inhibitors that are active on exogenous alpha-amylases of various origins. WMAI-0.28, WMAI-0.39, WDAI-0.19, and WDAI-0.53 were extensively investigated [14, 15]. WMAI is a proteinaceous inhibitor with a molecular weight of 12 kDa; WDAI is a 24 kDa protein formed by the combination of two 12 kDa subunits; WTAI is a mixture (about 60 kDa) of WTAI-CM2 plus 2 WTAI-CM3 plus WTAI-CM16, where none of the subunits is active on its own [14, 16]. It was well established that each family was closely related, with largely identical amino acid sequences and conformational structures, and it was suggested that the inhibitors derived from common ancestral genes [17]. In a co-evolving system of plant-insect interactions, plants synthesize a variety of toxic proteinaceous and non-proteinaceous molecules for their protection against insects [18, 19]. Proteinase inhibitors are therefore a potential model system in which to study basic evolutionary processes, such as functional diversification [20]. The structure and diversity of WDAI genes in wild emmer wheat from Israel was revealed, and the relationship between the emmer wheat genes and ecological factors was elucidated by 16 specific SNP markers [21]. It was found that the populations of wild emmer wheat showed a wide range of diversity in WDAI, both between and within populations.

In this study, SNP diversity of the wheat monomeric α-amylase inhibitor formed 14 natural wild emmer wheat populations in Israel and Golan. A population analysis including an examination of its ecological characteristics as well as comparing patterns of synonymous and non-synonymous changes in coding sequences of all accessions were also used to detect the natural selection of genes. The results yielded further insight into the correlation between plant defense proteinaceous inhibitors and their environmental stresses.

Results

Characterization of monomeric α-amylase inhibitors

Genomic PCR amplifications were conducted by specific WMAI cloning primers, and desired PCR products were detected in accessions of wild emmer wheat. A total of 348 novel gene sequences of WMAI were obtained and submitted to NCBI [GenBank: FJ874277-FJ874629], consisting of 456 bp including a 90 bp signal peptide coding domain. The frequency of SNPs was 1 out of 16.3 bases in which 28 cSNPs were detected in the coding sequence. Some population-specific SNPs were detected, such as 96G, which was found only in six sequences from the population of Daliyya (29). All of the 31 sequences from the population of Amirim (24) were 114G, 228C, 246G, 282C, 288G, 297A, 315A, 339A, and 438G while sequences from other populations had SNPs in these sites (Table 1).

Table 1 SNP mining and haplotype classification of WMAI from wild emmer wheat.

Thirty-three haplotypes were revealed by the sequence alignment of monomeric α-amylase inhibitor sequences from wild emmer wheat (Table 1). Haplotypes were highly separated by Median-joining network analysis, and at least three groups emerged (Figure 1). For each haplogroup, there was a primary haplotype (H02, H16, or H30), while haplotype H16 was the main one occurring in 158 WMAI sequences, followed by haplotype H10, H30, and H02. The mutations (different SNPs) between haplotypes and the primary haplotype in each group were less than five (Figure 1). Our findings indicate that the three haplogroups were not equally distributed (Figure 2). Besides the haplotypes having only one sequence, some haplotypes were characterized geographically. Fifteen sequences belonging to H15 were from the Qazrin (5), Gamla (8), Rosh-Pinna (9), Mt. Gilboa (16), and Amirim (24) populations, which were located in northern sites of Israel and Golan.

Figure 1
figure 1

Median-joining networks of the haplotypes of monomeric α-amylase inhibitor genes.

Figure 2
figure 2

Geographic distribution of the populations of wild emmer wheat and the three haplogroups' distribution of WMAI. Details on numbered populations can be found in Table 7.

It was found that the SNPs in α-amylase inhibitor gene sequences could determine amino acid substitution in proteins (Table 2). Only nine SNPs in the nucleotide sequence of WMAI resulted in amino acid variations; among these SNPs, two were at the third codon position, and, as expected, most SNPs that resulted in amino acid changes were found at the first or second position (Table 2). Amino acid sequence alignments indicated that alpha-amylase inhibitors were highly homologous and belonged to a monomeric α-amylase inhibitor family including WMAI-0.28 and WMAI-0.39 (data not shown); however, some haplotypes shared the same deduced protein sequence. The amino acid substitutions probably determined a charge difference resulting in different relative mobility on gel electrophoresis and in differential inhibitory activities of WMAI-0.28, WMAI-0.32, WMAI-0.35, WMAI-0.39, and WMAI-0.48 with similar molecular weights [13]. Additionally, ten Cys were observed in most of the deduced proteins of the monomeric α-amylase inhibitors, and all cereal-type α-amylase inhibitors had ten Cys as well. Only seven sequences belonging to haplotypes H09, H21, H22, and H23 had Arg128 in place of Cys128 (Table 2). Interestingly, these seven sequences were all from middle Israeli populations: Mt. Gilboa (16), Beit-Oren (28), and Bat-Shelomo (30), respectively.

Table 2 Variation of amino acids caused by nucleotide changes in genes

Adaptive evolution analysis

The dN/dS ratio is widely used as an indicator of natural selection in gene sequences. An excess of non-synonymous mutations relative to synonymous mutations is a clear indication of positive selection, whereas a lack of non-synonymous mutations relative to synonymous polymorphisms suggests negative or purifying selection imposed by functional constraint. To analyze the natural selection of wild emmer wheat monomeric α-amylase inhibitors, the dN/dS codon analysis for purifying and positive selection was calculated.

At first, the software PAL2NAL was used to calculate the dN/dS ratio for each haplotype compared with the main haplotype, which was used as a reference (Table 3). The site-specific models implemented in codeml indicated that both purifying and positive selection had occurred; however, the purifying selection predominated. Since the main haplotype was used instead of the original sequence, other methods should be applied to clarify the selection of this protein.

Table 3 The calculation of synonymous and non-synonymous substitution rates for α-amylase inhibitor genes between main haplotype and other haplotypes by PAL2NAL in PAML codeml program.

Next, the selection Z-test was carried out using MEGA 3.1. To determine whether natural selection contributes to diversity in WMAI, the ratio of non-synonymous to synonymous substitutions was evaluated for each pair of haplotypes. The Tajima's Neutrality test showed that pS (segregating sites per site) = 0.06, pi (Nucleotide/amino acid diversity) = 0.02, and D = 0.82. The Z-test results revealed a very high proportion of sequences that were under selection across populations; nine haplotype sequences showed the ratio of dN/dS expected under the hypothesis of selection (dN≠dS, p < 0.05). These nine sequences represent cases where a haplotype differed from the other 32 haplotypes by more than 70%. The results of purifying and positive selection hypothesis also showed that the sequences of WMAI were contributed by both selection hypotheses in this protein (data not shown).

Finally, we assessed positive selection with the FEL routines by HYPHY [22]. The sequence alignments and NJ tree were used to calculate the dN/dS (ω) ratio for each site (Figure 3). The results from HYPHY analyses confirm the occurrence of selection in WMAI sequences (Table 4). According to the site-by-site LRT data, it was indicated that the majority of selected amino acid residues were subjected to purifying selection. The ratio values of whole sequences were also substantially lower than 1, indicating that the inhibitors were under strong purifying selection pressure. However, few amino acid residues at the C-terminal were positively selected, suggesting that this part of the protein was subjected to directional or diversifying selection.

Figure 3
figure 3

The NJ phylogeny tree of the haplotypes used to calculate the dN/dS (ω) ratio for each site.

Table 4 Selection analysis of emmer wheat monomeric α-amylase inhibitor genes by HYPHY.

Genetic diversity and distance of α-amylase inhibitor genes

The proportion of polymorphic loci P (5%), the expected heterozygosity He, and Shannon's information index I of the 14 populations of wild emmer wheat were summarized in Table 5. It was obvious that some populations had higher diversities. The values of He ranged from 0.094 to 0.353 with the population of Beit-Oren (28) having the highest value (0.353), followed by the population of Nahef (25). The population from Kokhav Hashahar (19) had the lowest He value of 0.094. Genetic distances (D) were calculated for comparisons of all populations based on the gene sequences of monomeric amylase inhibitors among all population pairs (Table 6). However, low D values ( < 0.050) were observed between some populations from different areas, and the estimates of D values were significantly and geographically independent. Large genetic distances and sharp genetic differentiation over long geographic distances could be found.

Table 5 Genetic diversity of wheat α-amylase inhibitor genes based on SNPs in populations of wild emmer wheat.
Table 6 Nei's genetic distance of inhibitor genes in 14 populations.

Multiple regression and Spearman rank correlations between environmental variables and SNPs

Principle components analysis (PCA) was carried out using the ecological factors as variables (as listed in Table 7, except for Sz, Ma, and So). The main ecological factors were selected for multiple regression analysis, which was mentioned in our previous paper [20]. Multiple regression analysis was done next using the ecological factors to investigate the relationship between environmental variables and SNPs. The geography, temperature, water, and solar radiation factors, singly or in combination, explained a significant proportion of the diversity in the SNPs (Table 8). The best variable predictors of He and I were two temperature factors: Tdd and Trd. The combination of geographic, temperature, and water availability factors were significant (p < 0.05) for genetic diversity (Table 8). SNPs in this gene could be classified into several categories associated with water, temperature, and geographic factors, respectively. The best single-variable predictors of SNP allele frequencies were: (1) water (Dw, Rr, Rv, Hu-14, and Th); (2) temperature (Tdd, Ta, and Sh); (3) geographic (Ln, Lt, and Al) factors (Table 8). It was obvious that water and temperature factors were the best variable predictors, singly or in combination, with other ecological factors (Table 8).

Table 7 The ecogeographical background of populations studied.
Table 8 Coefficient of multiple regressions of genetic diversity, allele frequencies, and environmental variables in 14 populations of wild emmer wheat as independent variables.

Spearman rank correlations of ecological factors and genetic diversity of each of the SNP sites were shown in Table 8. Only one SNP was correlated with geographical factors (WMAI-451G was negatively correlated with Ln, rs = -0.521). Seven SNPs were positively or negatively correlated with the temperature factors of Tm, Ta, Td, Sh, and Tdd; another three SNPs were correlated with water (Table 9).

Table 9 Spearman rank correlations of ecological factors and genetic diversity of each SNP site (p < 0.10 level).

Discussions

Genetic polymorphism of α-amylase inhibition sequences

The present study analyzed the extent of genetic polymorphisms and the effect of diversifying selection on SNPs in wild emmer wheat monomeric α-amylase inhibitor sequences among specific Israeli and Golan Heights' populations. This is the first time large numbers of emmer wheat accessions were sequenced for functional protein genes and natural selection is depicted on a modeled structure. Although SNP markers had already been used to assay the polymorphism of dimeric inhibitors, only 20% of SNP sites were successfully analyzed by the markers [21]. We obtained 384 WMAI gene sequences from 14 populations that belonged to 33 haplotypes, and 28 SNPs were detected (Table 1).

Different amino acid residues determined by polymorphic sites would significantly affect the structure, charge, and function of the inhibitors. The charge difference could be responsible for the relative mobility of gel electrophoresis and the differential inhibitory activities of α-amylase inhibitors [23]. In former investigations, more than five inhibitor proteins belonging to the WMAI family had been found, such as inhibitor 0.28, 0.32, 0.35, 0.39, and 0.48. They had similar molecular weights but different inhibiting activity [13]. The relationship between nucleotide polymorphisms and the amino acid changes in WMAI were summarized in Table 2. Only 11 SNPs in the nucleotide sequence of WMAI resulted in amino acid variations. Most polymorphic sites did not occur at the functional conservative site, which ensures the α-amylase inhibitors maintain their ability to combine with α-amylase. All cereal-type α-amylase inhibitors had 10 Cys (5 disulfide bonds). Both WMAI 0.28 and 0.39, similar to WDAI 0.19, were readily inactivated by treatments that break disulphide bonds, thus indicating that their stability mainly depends on the integrity of their disulphide bridges [24]. By combining FAB-MS and automatic sequencing, it was possible to assign the five disulfide bonds of the α-amylase inhibitor 0.28 from wheat kernels as follows: Cys7-Cys54, Cys21-Cys42, Cys29-Cys82, Cys43-Cys98, and Cys56-Cysl13 [25]. The monomeric α-amylase inhibitors from Israeli and Golan Heights' populations had 10 Cys (except for 7 accessions), and the positions of the 10 Cys were conserved. Most of the SNPs did not occur at highly conserved positions, which ensured that the α-amylase inhibitors would keep their correct 3D structure to combine with α-amylase.

The mutants obtained by García-Maroto et al. (1991) allow the identification of two regions of the molecule that are critical for inhibition mechanisms: the N-terminal sequence (positions 31-36; the signal peptide residues were 1-30) before the first Cys (Cys37) and the sequence after the seventh Cys, which is right after a CRC (positions 84-86) motif [26]. According to the alignment of deduced amino acid sequences of emmer wheat WMAI, no amino acids were changed by nucleotide mutations in this domain (Table 2). The majority of changes in amino acids occurred at the C-terminal, and most of the amino acids in the middle domain were conserved, ensuring the stability of WMAI (Table 2).

Although the most important positions were conserved, position analysis of the ratio of synonymous and non-synonymous substitutions provided strong evidence for natural selection acting on WMAI. The dN significantly differed from dS according to PAL2NAL results. However, the whole sequences dN/dS < 1 suggested that the inhibitors were under strong purifying selection pressure (indicating that there might be a structural requirement) and that amino acids at the C-terminal were positively selected; in other words, amino acid-altering substitutions offer fitness advantages that would result in the diversity of WMAI.

Ecological Genetics and Evolution of WMAI

Experimental populations evolving under natural selection represent an interesting tool to study genetic bases of adaptation [27]. The ecological genetic analysis was carried out to investigate the evolutionary mechanism of WMAI from wild emmer wheat. The diversity of WMAI gene sequences from Israeli and Golan Heights' populations was revealed. Populations could be divided by SNPs, even within closely related populations originating in approximate geographic locations. Our results demonstrated that the polymorphism of monomeric α-amylase inhibitor genes in wild emmer wheat was correlated with the ecogeographic distribution of the accessions. Observations were consistent with previous results on other seed storage proteins such as HMW-GS and WDAI according to molecular markers [21, 2830].

Central populations used in this study were collected in warm, semi-humid environments on the Golan Plateau and near the Sea of Galilee. Marginal steppic populations were collected across a wide geographic area on the northern, eastern, and southern borders of wild emmer distribution involving hot, cold, and xeric peripheries; while marginally mesic (Mediterranean), populations were collected from the western border of wild emmer distribution [31]. The present study included 14 populations from different sites in Israel and Golan, and covered a wide range of ecogeographical conditions across the distribution range of the species. Specific SNP positions detected in WMAI were found highly effective in distinguishing genotypes and populations of wild emmer wheat originating from diverse ecogeographic sites in Israel and Golan. High levels of polymorphic loci (P), expected heterozygosity (He), and Shannon's information index (Table 5) with high genetic distance values between populations were found (Table 6). These results suggest that genetic variation at these SNP positions in the WMAI was at least partly ecologically determined for these populations.

The relationship between SNP genetic distance and geographical distance was investigated, and it was found that the estimates of genetic distance (D) were geographically independent. Sharp genetic divergence (large D) over very short geographic distances against small genetic divergence (small D) between large distances were observed, which was also found by allozymes, RAPD loci, SSR, and SNP marker analysis [21, 32, 33]. For example, the genetic distance between populations of Tabigha(11) and Amirim(24) (located only about 10 km apart with D = 0.1819) was 25 times higher than the genetic distance between populations Qazrin(5) and Bat-Shelomo(30) (separated by 80 km with D = 0.0072) (Figure 1).

Collection areas of wild emmer were different in altitude, longitude, latitude, and several other environmental factors (Table 7) [31]. It was noteworthy that SNPs in WMAI were correlated with ecological factors by multiple regressions and the Spearman rank correlations' matrix. Among the 28 SNPs, 11 SNPs led to amino acid changes at nine positions (Table 2). It was shown that these SNPs were significantly more correlated with water availability factors (Rv and Dw), temperature factors (Ta, Sh, and Tdd), and geographical factors (Ln, Al, and Lt) than the other factors (Table 8). Ecogeography, temperature, and water availability factors, singly or in combination, explained a significant proportion of the diversity in SNPs of α-amylase inhibitor genes. Diversity could be further explained by changes in ecological factors, i.e., Al (altitude), the sharp gradient of climatic conditions from north to south in Israel and Golan, with increasing temperatures and decreasing water availability towards the semiarid zones in southern Israel. Also, ecological factors taken into account for this study were not representative of all the possible components involved in the determination of the actual climate [32, 33].

Environmental stress can greatly influence plant susceptibility to herbivores and pathogens, and drought stress can promote outbreaks of fungal diseases and plant-eating insects [34, 35]. Herbivore insects and the level of herbivore pressure may vary with ecological factors. Different herbivore-related selection pressures at these ecological locations may influence polymorphism of insect-resistant loci in wild emmer wheat [21]. Different environmental pressures at each site related directly to the climate, but WMAI expression responded indirectly to environmental factors. It is possible that several evolutionary mechanisms underlie the differences in diversity of α-amylase inhibitors and ecological factors. It could be concluded that the variation in genetic diversity of the WMAI gene between populations is a product of selective forces.

The genetic structure of wild emmer wheat populations in Israel is mosaic [30]. This patchy genetic distribution appears to reflect the underlying ecological heterogeneity at micro- and macro-scales [32, 33, 36, 37]. Thus, higher polymorphisms and genetic variations of WMAI within and between populations could be explained as adaptive complexes generated by natural selection and co-evolution with insects.

Conclusions

Alpha-amylase inhibitors are attractive candidates for the control of seed weevils, as these insects are highly dependent on starch as an energy source. A total of 348 gene sequences of wheat monomeric α-amylase inhibitor (WMAI) were obtained; the frequency of SNPs was 1 out of 16.3 bases; 28 SNPs were detected in the coding sequence. Great diversity at WMAI loci, both between and within populations, was detected in the populations of Israeli and Golan Heights' wild emmer wheat. It was revealed that WMAI were naturally selected for across populations by the expected ratio of dN/dS. The results of purifying and positive selection hypothesis (p < 0.05) showed the sequences of WMAI were contributed by both natural selection and co-evolution, which ensures the conserved function as well as the inhibition of a variety of insect amylases. Ecological factors, singly or in combination, explained a significant proportion of the variations in SNPs. The conflict between genetic divergence and geographic distances also suggested that the SNPs in WMAI were subjected to natural selection, and ecological factors had an important evolutionary role in gene differentiation at this locus. These results suggested that α-amylase inhibitors are adaptively selected under different environments according to population and codon analysis.

Methods

Plant material and ecological background of wild emmer wheat

Wild emmer wheat (T. dicoccoides) is the wild progenitor of modern tetraploid and hexaploid wheat, which is predominantly a self-pollinated wheat distributed over the Near East Fertile Crescent [38]. A center of distribution and diversity of emmer wheat was found in the catchment area of the upper Jordan Valley in Israel and its vicinity [7]. Wild emmer wheat included 114 accessions from 14 populations, collected from various locations in Israel and Golan, which are representative of a wide range of ecological conditions such as soil, temperature, altitude, and water availability. Individual plants of emmer wheat were collected at random, at least 1 m apart, from populations differing in major ecological properties. These collection sites and populations have been described in detail in the literature [7, 31]. The genotypes used for the present study are conserved in the cereal gene bank of the Institute of Evolution, University of Haifa. Populations used in this study along with their geographic origins and climatic conditions are listed in Table 7.

DNA isolation and PCR amplification

Ten seeds of each accession were germinated in the dark at room temperature. Genomic DNA was extracted from plant leaves at about two weeks of age with a modified CTAB protocol as described in Murray and Thompson [39]. Specific primers (F: ATGTGGATGAAGACCGKGTT; R1: GACTAGRYGTCCGKATACGC; R2: CACGCACCGCACCATTACTT) for WMAI were used to amplify the gene coding sequences [40]. PCR amplification was performed with PTC-240 cycler (Bio-Rad) in a volume of 50 μL, which consisted of about 100 ng of genomic DNA, 100 μM of each dNTPs, 1 μM of each primer, 1U Taq polymerase, 1.5 mM Mg2+ in 1 × PCR buffer. The cycling parameters were 95°C for 5 min to pre-denature, followed by 35 cycles of 95°C for 1 min, 55°C for 30 sec, and 72°C for 1 min with a final extension at 72°C for 5 min.

SNPs mining and haplotype identification

Amplification products were separated in 2% agarose gels. Since the WMAI are encoded by multigenes, the desired DNA fragments were ligated to the pBluescript SK (+) T-vector plasmid (Stratagene), and then five positive clones were screened and sequenced.

The alignment of sequences and the SNP assessment were carried out using the multiple-sequence alignment software Clustal W http://www.ebi.ac.uk/clustalw and DNAman 5.2.2 http://www.lynnon.com. The α-amylase inhibitor ORFs were translated into amino acid sequences using the ORF Finder program at the NCBI http://www.ncbi.nlm.nih.gov. Polymorphic positions were identified by MEGA version 3.1 [41] and were used in place of all of the mutations' positions (the positions with change observed only once in the dataset were removed) in the subsequent analysis.

Subsequently, the alpha-amylase inhibitor genes from wild emmer wheat were analyzed by the median-joining network method [42], which was suitable to analyze the sequence from wheat and Aegilops species [43], and the phylogenetic clusters were demonstrated by using the program Network 4.5.1.0 http://www.fluxus-engineering.com/sharenet.htm. Since no phylogenetic study had been performed on alpha-amylase inhibitor gene types, a median-joining (MJ) network based on the sequence alignment of haplotypes was constructed.

Selective pressure analysis

Codon-based Z-test to selection (p < 0.05) was carried out by using MEGA version 3.1 [41] to estimate nucleotide sequence divergence distances from synonymous and non-synonymous sites with the Nei-Gojobori model in standard error determined from 1000 bootstrap replicates. Ratios of non-synonymous substitutions per non-synonymous site to synonymous substitutions per synonymous site were computed for haplotypes of WMAI by PAL2NAL http://www.bork.embl.de/pal2nal/ using codon-based maximum likelihood methods contained in the codeml program of the software package PAML (dN/dS: dN/dS-value > 1 indicates positive selection, dN/dS < 1 purifying selection, and neutral evolution when dN/dS = 1) [44]. We also analyzed the selection of α-amylase inhibitor sequences with the program HYPHY http://www.hyphy.org version 1.002 beta [22]. FEL analyses were applied; all analyses employed the MG94 model of codon substitution. Results were considered significant when P value < 0.05. The sequence alignments and NJ tree were used to calculate the dN/dS (ω) ratio for each site.

Ecological genetics analysis

The programs POPGENE 1.32, and STATISTICA 6.0 were used to carry out the ecological genetics analysis. POPGENE 1.32 [45] was used to analyze genetic polymorphism (P), expected heterozygosity (Nei's gene diversity) (He), and Shannon's information index (I) for each SNP position and population. STATISTICA version 6.0 http://www.statsoft.com/textbook/stathome.html was used to perform PCA analysis, stepwise multiple regression (MR), and Spearman rank correlation coefficients. Multiple regression analysis was conducted to test the best predictors using SNP frequencies as dependent variables and the ecogeographic factors as independent variables at each of the polymorphic SNP loci. Spearman rank correlation coefficients were used to assess differences in genetic indices P, He, and Shannon's information index in climatic variables for 14 populations.