Structural characterization of the DRF1 gene of Aegilops speltoides and comparison of its sequence with those of B and other Triticeae genomes

The genus Aegilops L. has been intensively investigated due to its close relationship with wheat (Triticum L.) as contributor of B and D subgenomes. Because of their vast genetic diversity, Aegilops species represent a rich source of alleles of agronomic interest, which could be used to widen the wheat gene pool and improve tolerance to diseases, pests, drought, cold and other environmental stresses. We report the isolation and characterization of the Dehydration Responsive Factor 1 (DRF1) gene in three accessions of Ae. speltoides coming from different regions of the Fertile Crescent. The DRF1 gene belongs to the DREB gene family and encodes transcription factors which play a key role in plant response to water stress. As in other cereals, the DRF1 gene in Aegilops speltoides consists of four exons and three introns and undergoes alternative splicing. A processed pseudogene was also identified and compared with the sequence of an actual mRNA transcript, breaking new ground in the understanding of the complex regulation mechanism of this gene. The genetic diversity was evaluated by comparison of inter- and intra-species variation among some Aegilops and Triticeae, by considering both the whole gene and exon 4 sequences. The phylogenetic analyses were able to cluster the sequences in well-supported clades attributable to the genomes analysed. The overall results suggest that there is a high similarity between the B and S genome copies of the DRF1 gene but also features indicating that the two genomes have evolved independently.


Introduction
The genus Aegilops L. has been intensively investigated due to the close relationship with wheat (Triticum L.), of which it forms the largest part of the so-called secondary gene pool (GP-2) (Harlan and de Wet 1971). Following the commonly accepted system suggested by Van Slageren (1994), the genus comprises 22 species arranged in five sections (Aegilops, Comopyrum, Cylindropyrum, Sitopsis and Vertebrata) and includes diploid, tetraploid and hexaploid genomes. The Sitopsis section is particularly interesting because its five species (Ae.bicornis-S b S b , Ae. longissima-S l S l , Ae. searsii-S s S s , Ae. sharonensis-S sh S sh and Ae. speltoides-SS) share the S genome found to be closely similar to the B genome in T. durum (Kerby and Kuspira 1987;Haider 2013;Ruban and Badaeva 2018), whereas Ae. tauschii (Vertebrata section) has been involved in the evolution of the wheat D genome (Marcussen et al. 2014;Pont et al. 2019). The Aegilops genus thus represents a natural laboratory for the study of the effects of the evolution and domestication of cultivated wheat.
Aegilops species are naturally distributed in an area ranging from the Mediterranean to Western and Central Asia. From the Transcaucasus, thought to be their centre of origin, diploid species migrated west and east, the centre of diversity being in the Fertile Crescent (modern day Iraq and parts of Turkey, Syria and Iran), which contains the majority of Aegilops species (Van Slageren 1994). Such a wide spread implies a great adaptability to different environments and a strong resistance to biotic and abiotic stresses. Indeed, many Aegilops species have a strong resistance to disease and insect pests, high drought and salt tolerance and/or good grain quality, thus representing a significant reservoir of new alleles with valuable traits for broadening the genetic diversity of wheat (Schneider et al. 2008;Brozynska et al. 2016;Zhang et al. 2017;Kumar et al. 2019). A number of studies have analysed and compared the nucleotide diversity of a single gene inside the Triticeae tribe; for instance Adderley and Sun (2013) analysed the nuclear phosphoglycerate kinase (pgk1) gene, Liu et al. (2016) studied the Wcor15 gene and Salse et al. (2008) investigated the storage protein activator (SPA) genomic region. In this work, we have analysed the Dehydration Responsive Factor 1 (DRF1) gene in some Aegilops species, with a focus on Aegilops speltoides. DRF1 belongs to the Dehydration Responsive Element Binding (DREB2) gene family and encodes important transcription factors (TFs) involved in plant response to drought. It is characterized by the presence of the APETALA2 (AP2)/Ethylene-Responsive element binding Factor (ERF) motif, through which the TFs bind to the drought-responsive element (DRE)/C repeat (CRT) element in the promoters of several stress responsive genes, thus modulating their expression (Agarwal et al. 2006). DREB2 genes were first identified in Arabidopsis thaliana by Yamaguchi-Shinozaki and Shinozaki (1994), and later on in various cereals, such as Hordeum vulgare (HvDRF1) (Xue and Loveridge 2004), Triticum aestivum (Wdreb2) (Egawa et al. 2006) and Triticum durum (TdDRF1) (Latini et al. 2007). In these cereals, the gene is structurally organized in four exons and three introns, and produces three transcripts through the alternative splicing mechanism (AS). Two of the three transcripts encode putative TF proteins endowed with the AP2 domain, while the other, the DRF1.2 transcript (exon 1, exon 2 and exon 4), encodes an abortive protein, due to the presence of a premature termination codon (PTC), caused by a frameshift mutation.
Here, the whole DRF1 gene was amplified, cloned and sequenced from three accessions of Aegilops speltoides, its nucleotide diversity was analysed and compared with the B subgenome copy of the DRF1 gene in T. durum (Cantale et al. 2018). Furthermore, the nucleotide diversity of the DRF1 exon 4, containing the AP2 domain, was analysed in several accessions of Aegilops speltoides and other diploid and polyploid Aegilops species and compared with available corresponding Triticeae sequences to investigate the reciprocal relationships.

Plant material
The Aegilops accessions used in the present study are listed in Table 1. Seeds were provided by ICARDA (International Center for Agricultural Research in the Dry Areas), except one accession provided by the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK, DE).
Ten seeds from each accession were germinated in Petri dishes; the plantlets were then transferred into small pots and grown in a greenhouse under controlled standard light and temperature conditions. After 25-30 days, there were sufficient plant leaves (about 100 mg) for harvesting for the molecular analyses.

DNA/RNA extraction and PCR amplification
Genomic DNA was extracted from the leaves using the CTAB method (Murray and Thompson 1980), including a treatment with RNase I to avoid RNA contamination. Total RNA was extracted from the leaves of plants (Ae. speltoides ssp. speltoides, accession IG 47280) grown under normal conditions, using the RNAfast-II Isolation System (Molecular Systems, San Diego, CA, USA). The quality and integrity of the DNA/RNA extracts were checked using agarose gel electrophoresis and their final concentration was measured using a spectrophotometer (OD 260,280,260/280). Primer design was based on TdDRF1 gene sequence (NCBI ID: EU089819). The whole gene sequence was isolated by amplifying two gene fragments in optimized specific PCR reactions. The first amplicon was a * 2.5 Kb gene fragment obtained using the pair of primers drebfor1 (5 0 -CATGACGG-TAGATCGGAA-3 0 ) and drebrev1 (5 0 -TGCGCAGG-GAAGTTGGTA-3 0 ). This * 2.5 Kb gene fragment mapped exon 1, including the ATG start codon, up to part of exon 4 (experimental conditions: annealing temperature 55°C for 1 m, extension temperature 72°C for 5 m). The second amplicon was a * 1 Kb gene fragment, obtained using the pair of primers drebfor2 (5'-CATGATCCACAGGGTGCAA-3') and E4rev-down (5 0 -GGTCCACCATTTGATCTTCATT-3 0 ). This * 1 Kb gene fragment overlapped the previous one and completed exon 4, including the 3'-terminal of the gene (experimental conditions: annealing temperature 58°C for 1 m, extension temperature 72°C for 2 m). This latter pair of primers was also used to amplify the exon 4 fragment from 42 Aegilops accessions (see Table 1). The PCR reactions were performed in an overall volume of 25 ll, containing 12.5 ll REDTaq Ò ReadyMix TM PCR reaction mix (20 mM Tris-HCl, pH 8.3, 100 mM KCl, 3 mM MgCl2, 0.002% gelatin, 0.4 mM dNTP mix, stabilizers and 0.06 unit/ml of Taq DNA polymerase), 20 pmol of each primer and 200 ng of gDNA. The PCR thermal cycle conditions included an initial denaturation at 95°C for 5 m, then 35 cycles, with melting at 94°C for 1 m, followed by the annealing and extension steps as reported above for each pair of primers, ending at ? 4°C.
Furthermore, the pair of primers drebfor1 and E4rev-down was used to amplify transcripts from the RNA extract, following the Latini et al. procedure (2007).
In Online Resource 1, the annealing regions of the forward and reverse primers used are represented.

PCR products purification and bacterial transformation
The PCR products were purified using the Gel Band Purification kit (Amersham, USA), followed by Sepharose CL-6B affinity chromatography (Pharmacia Corp., NYC, NY, USA), and then cloned into a pCR Ò II-TOPO Ò vector (Invitrogen, USA). After transformation of E. coli cells, positive clones were screened by using agar plates containing ampicillin (50 ll/ml), X-Gal (40 mg/ml) and IPTG (40 ll of 100 mM IPTG -Isopropyl b-D-1-thiogalactopyranoside solution). Plasmids were extracted using the SDS plasmid isolation protocol (Sambrook et al. 1990). In order to confirm the presence of the expected cloned fragment, plasmid DNA clones were digested before sequencing by the EcoRI restriction enzyme and visualized on agarose gel.

Sequences from public databases
Various relevant DRF1 sequences, belonging to the Aegilops and Triticum genera, were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/) and other data sources. A list of these sequences is provided in Table 2.

Data analyses
Electropherograms were analyzed using FinchTV (Geospiza Inc., USA) and sequences were aligned using Clustal Omega at EMBL-EBI (Sievers et al. 2011) and assembled using CAP3 (Huang and Madan 1999). FGENESH (by Softberry) at http://www. softberry.com/berry.phtml?topic=fgenesh&group= programs&subgroup=gfind (Soloviev et al. 2006) was used to identify the gene structure (parameter settings: organism = triticum; suboptimal exon cutoff = 0.1). DnaSP v6 (Rozas et al. 2017) was used to assess the diversity and genetic relationships among sequences. Analyses were carried out using alignments manually edited to improve their accuracy and excluded gaps and missing data.
Phylogenetic analyses were carried out using MEGA 7 (Kumar et al. 2016). The Neighbor-Joining (NJ) method was applied and the evolutionary distances were computed using the Maximum Composite Likelihood model. A bootstrap analysis based on 1000 re-samplings was used to determine the confidence values of the tree branches. All positions containing gaps and missing data were eliminated. For completeness, a separate analysis was also conducted by applying the Maximum Likelihood method and PHYML algorithm at http://www.atgc-montpellier.fr/ phyml/ (Guindon and Gascuel 2003;Lefort et al. 2017).

Results
The DRF1 gene in Aegilops speltoides and comparison with the B subgenome copy of T. durum The whole DRF1 gene was isolated and sequenced from three Ae. speltoides accessions. Representative PCR amplification and restriction results after cloning of the gene fragments are shown in Fig. 1. The three gene sequences were submitted to GenBank: two  The FGENESH gene finder tool, used for the prediction of the gene structure of the Ae. speltoides sequences, recognized the presence of four exons and three introns, with a high confidence, showing higher scores for exons 1 and 4 than for exons 2 and 3. The same structure had been already observed in wheat and barley. As an example, the prediction obtained using the AeslDRF1-FJ843102 sequence as the template is shown in Fig. 2. An analogous result was obtained when using the other two sequences as templates.
The three Ae. speltoides sequences were aligned and the genetic diversity quantification is summarized in Table 3. 3210 sites were analysed and 20 out of 30 identified polymorphisms were located in exon 4, with an overall nucleotide diversity of Pi = 0.006. A total of 4 indels (mutations due to the insertion or deletion of nucleotides) were observed and the lack of a triplet in the simple sequence repeat (SSR) region of exon 1 in AeslDRF1-FJ858187 was particularly interesting, as it caused a shorter alanine stretch in the putative ligustica IG 47948. c EcoRI restriction analysis of clones containing the * 500 bp fragments. The samples in lanes 2 and 4 (Ae. speltoides ssp. speltoides IG 126225 and IG 116071 accessions, respectively) represented two positive clones, whose sequences were deposited (lateral arms of vector were included in the bands, thus justifying 600 bp size), while lanes 1 and 3 were found to be empty plasmids. d 1 Kb fragment of exon 4 amplified using the pair of primers drebfor2 E4rev-down primers, from various accessions of Ae. speltoides. Lane 1: IG 117905, seed 1; lane 2: IG 117905, seed 2; lane 3: IG 116079; lane 4: IG 135337. MM stands for 1 kb molecular marker size protein, whereas the other three indels were located in non-coding regions. Worth noting is that the AessDRF1-FJ858188 sequence showed a nonsynonymous mutation at nucleotide position 2431, within the AP2 domain coding region, resulting in the replacement of a Thr (ACG) in an Ala (GCG). All the abovereported polymorphisms are highlighted in the alignment shown in Online Resource 1.
Afterwards, the Ae. speltoides sequences were aligned with two different B subgenome copies of T. durum, namely KM520370 and KM504245, recently published (Cantale et al. 2018). A very high similarity (more than 93% identity) was observed among the sequences. The analysis involved 2989 sites and identified 5 haplotypes and 135 polymorphic sites, the overall nucleotide diversity being Pi = 0.025 (Table 3). Focusing on the coding regions, the SSR region in exon 1 of the Aegilops group is shorter and characterized by the GCG triplet, while the B genome copies of T. durum showed the distinctive GCC triplet (both triplets coding for alanine). Furthermore, two inframe deletions, each 15 bp long, were present in exon 4 of the B genome copies of T. durum. As regards the non-coding regions, two different insertions at alignment position 325 bp (intron 1) distinguished the Aegilops and Triticum groups, the former with a length of 65 bp and the latter of 25 bp, characterized by completely different nucleotide sequences. There were also few other differences among the indels of the two groups (SSR and indels are highlighted in light grey in Online Resource 1).

Transcription of the DRF1 gene in Aegilops speltoides
The activity of the DRF1 gene in Aegilops speltoides was ascertained with RNA extraction and reverse transcription PCR (RT-PCR) which revealed the presence of the DRF1.2 transcript (Online Resource 2). Its exonic combination (exon 1, exon 2 and exon 4) generated a frameshift causing a premature termination codon (PTC). The sequence of this AesDRF1.2 transcript, isolated from accession IG 47280, was deposited at GenBank (NCBI ID: EU197054) and used as the template in a BLAST search. High identity values were observed between this and DRF1 transcripts of different varieties of T. durum, such as Atil (NCBI ID EU089824, 95.43%), Ciccio (NCBI ID FJ560492, 93.78%) and Karalis (NCBI ID FJ560496, 93.77%), T. aestivum, such as TaDREB3C (NCBI ID AY781351, 95.34%), TaDREB4C (NCBI ID AY781356, 95.16%) and TaDREB5C (NCBI ID AY781357, 95.52%), and also the T. dicoccoides DRE-binding transcription factor (NCBI ID HM015904, 95.43%). The DRF1.2 transcript was the only one detectable since plants were grown in normal conditions, as reported in our previous work (Latini et al. 2013). Fig. 2 The FGENESH prediction of gene structure using AeslDRF1-FJ843102 as template. CDSf stands for first exon including start codon, CDSi stands for internal exon, CDSl stands for last coding segment, ending with stop codon, CDSo stands for genes with only one CDS (starting with start codon and ending with stop codon) TSS stands for the position of transcription start, PolA stands for polyA segment Table 3 Quantification of genetic variation in the sequences analysed The genomic DNA amplification using the pair of primers drebfor1 and drebrev1 always amplified shorter fragments, * 500 bp, in addition to the expected * 2.5 Kb gene fragment (Fig. 1a). These amplicons were cloned (Fig. 1c) and sequenced. For each sample, some sequences turned out to be unspecific genomic fragments, but one corresponded to a DRF1 gene copy lacking introns. The analysis showed that this genomic sequence joined exons 1, 2 and 4, thus looking like the mature DRF1.2 transcript. This structure pointed to the presence in the Ae. speltoides genome of a pseudogene linked to its cognate DRF1 gene. Two sequences, obtained from the accessions IG126225 and IG116071, were deposited at GenBank (NCBI IDs MN639698 and MN639699, respectively). The sequences alignment of these two pseudogenes, the above described mRNA of DRF1.2 of Ae. speltoides and a DRF1.2 transcript of the B genome of T. durum, is shown in Fig. 3.

Nucleotide diversity analysis of exon 4 in Aegilops
A genomic DNA fragment of about 1 Kb, mapping in exon 4 (Fig. 1d), was isolated from 42 Aegilops accessions, obtaining a total of 95 cleaned sequences; then, after removing identical sequences obtained from different seeds of the same accession, 81 unique sequences were submitted to GenBank (see Table 1 for Aegilops accessions and the GenBank accession numbers). Three other sequences of Aegilops mapping DRF1 exon 4 were downloaded from public databases (see Table 2) and added to the set. The nucleotide diversity of these 84 sequences was analysed and the results are reported in Table 4. Sequences were analysed as a sole group (bulked group) and also split into subgroups, in accordance with their subspecies (Ae. speltoides ssp. speltoides and Ae. speltoides ssp. ligustica, Ae. speltoides, Aegilops). A further analysis was carried out by merging all Ae. speltoides sequences, irrespective of subspecies (Ae speltoides merged group). Interestingly, the nucleotide diversity of diploid Aegilops speltoides sequences was Pi = 0.018, when considering both the merged and individual subgroups, and it was 4.5 times higher than the nucleotide variation observed when analysing the same region in various Triticum durum varieties (nucleotide diversity Pi = 0.0040, data not shown). Most sequences represented a single, non-shared haplotype (71 haplotypes out of 84 sequences) (Table 4), except a few cases shared inside and between taxa, with Haplo 4 being the most common one (Table 5). Furthermore, a comparison between the Ae. speltoides ssp. ligustica and Ae. speltoides ssp. speltoides subspecies highlighted the presence of several SNPs, even if most of them were synonymous and did not cause replacements (data not shown). A phylogenetic analysis was carried out based on these 84 sequences using both the Neighbor-Joining (NJ) and Maximum Likelihood (ML) methods and the inferred NJ tree is shown in Fig. 4. The inferred ML tree showed the same topology (data not shown). Aegilops speltoides and the other Aegilops sequences formed two well-supported clusters and no noticeable separation could be observed between Ae. speltoides ssp. speltoides and Ae. speltoides ssp. ligustica subspecies sequences.
To investigate the relationships among DRF1 exon 4 sequences in Aegilops and Triticum, 16 DRF1 exon 4 sequences from the Triticeae tribe (7 out of 16 were isolated, sequenced and submitted to GenBank by our laboratory) were added to the dataset (see Table 2) and the total 102 sequences, including 2 outgroups, were aligned and analysed. Two further sequences of Psathyrostachys juncea and Brachypodium distachyon mapping DRF1 exon 4 were also added as outgroups. The NJ phylogenetic tree inferred is represented in Fig. 5. The 102 sequences formed distinct clades: all Ae. speltoides sequences clustered in a single clade (S genome), which was graphically collapsed to better highlight the relationships with the other taxa; four sequences grouped together with three B copies of the gene in T. durum, forming a wellsupported clade (96%, B genome), sister to the S one; four other sequences grouped together with two A copies of the gene in T. durum, forming a wellsupported clade (99%, A genome); two sequences grouped together with three Ae. tauschii sequences, forming the D genome clade (99%) and finally all other Aegilops sequences, sharing the U genome copy, formed a single clade (92%, U genome) distinct from the Ae. cylindrica sequence.

Discussion
The study of gene variability in the Aegilops species has recently attracted considerable attention as wild relatives represent a valuable source of gene alleles that can be exploited for wheat improvement. In this context, the DRF1 gene, involved in the drought response in plants, was studied in Aegilops speltoides and in other Aegilops species to investigate its various features. The gene was isolated and sequenced in three Fig. 3 Alignment of two about 500 bp long fragments with two DRF1.2 transcripts. MN639698 and MN639699 are the NCBI ID of two sequences, isolated from Ae. speltoides accessions IG126225 and IG116071, respectively; EU197054 is the sequence of the AesDRF1.2 transcript (accession IG 47280); KM520370 is the sequence of a TdDRF1.2 transcript from the B subgenome copy of T. durum. Exon 1 is shown in bold, exon2 using white letters and black background and exon 4 using white letters and grey background. The SSR region, consisting of 15 bp, GCG pattern, plus further 6 bp (all coding for alanine) and the premature stop codon are highlighted in boxes accessions of Ae. speltoides and its structure fully corresponded to that previously observed in barley and wheat (Xue and Loveridge 2004;Egawa et al. 2006;Latini et al. 2007), as deduced by sequence similarity and gene structure prediction (Fig. 2).
The detection of the DRF1.2 transcript in Ae. speltoides suggested the occurrence of transcription involving the alternative splicing mechanism as this transcript arises by joining exons 1, 2 and 4. It was the only transcript isolated, as plants were grown in normal conditions and it is reasonable to speculate that the other two transcripts, DRF1.1 and DRF1.3, would also be expressed and detectable in plants when affected by drought stress, as previously demonstrated in Triticeae (Xue and Loveridge 2004;Egawa et al. 2006;Latini et al. 2007). The sequence of this transcript was highly similar to the same transcripts in wheat and barley and does not encode for a transcription factor, but for an abortive short protein whose function is still unknown. Such a kind of transcript variant can often be found in alternatively spliced genes and is reported to play an important role in gene regulation through its modulated expression and degradation by nonsense-mediated mRNA decay (Dubrovina et al. 2013) or other RNA surveillance mechanisms (Reddy et al. 2012;Bedre et al. 2019).
The isolation of a pseudogene suggests an increasingly complex scenario for DRF1 gene regulation in Ae. speltoides. Indeed, during the isolation of the whole gene, an unexpected 500 bp genomic fragment was always amplified and its sequence corresponded to the genomic variant of the mature DRF1.2 transcript, thus fully satisfying the criteria for representing a processed pseudogene (retropseudogene) arising by   The analysis was carried out with DNAsp software reverse transcription and reinsertion into the genome (Zou et al. 2009;Tutar 2012). Because they are derived from mature transcripts, retropseudogenes lack upstream promoters, introns and other regulatory elements and were previously considered transcriptionally silent, dead-on-arrival (Vanin 1985). However, more recently, there has been increasing evidence that they are still active and often involved in the regulation of their parental genes, particularly at the post-transcriptional level, through various mechanisms including RNA interference and RNA antisense (Vinckenbosh et al. 2006;Wen et al. 2012;Sen and Ghosh 2013;Xie et al. 2019). To the best of our knowledge, this is the first reported case of a pseudogene belonging to the DREB gene family in Triticeae.
More targeted research is necessary to investigate the potential roles of both the DRF1.2 transcript and pseudogene in DRF1 gene regulation in Ae. speltoides.
The analysis of DNA polymorphisms of the nucleotide sequences of the three Ae. speltoides whole DRF1 gene accessions revealed a relatively low total nucleotide diversity (Pi = 0.006), in line with the observation reported that TFs and particularly those containing specific domains, like the AP2 domain, are more conserved than other gene families (Tatarinova et al. 2016). Interestingly, the nucleotide diversity turned out to be lower in introns than in exons, which was somewhat unexpected, as introns are considered freer to accumulate mutations than exons, which are constrained by their function. This genetic feature (exons more variable than introns) has been observed in animals, in specific gene families involved with the immune system or associated with predator-prey responses, and its role in adaptive evolution has been hypothesised (Nakashima et al. 1995;Samonte and Eichler 2002;Caporale 2003). Accordingly, one can speculate that the molecular evolution of the DRF1 gene, involved in response to abiotic stress, has undergone a similar genetic mechanism, even if more investigation is necessary to confirm this result.
The DNA sequence polymorphism analyses highlighted that there were no differences between the two Ae. speltoides subspecies, ssp. ligustica and ssp. speltoides, when considering either the whole gene or the exon 4 sequences (Table 4). Indeed, the three groups, Ae. speltoides ssp. speltoides, Ae. speltoides ssp. ligustica and Ae. speltoides, showed very similar Pi values and the phylogenetic analysis further confirmed the above observation. In the NJ tree (Fig. 4), clades are well-supported (bootstrap value C 93%) and all Ae. speltoides sequences clustered in a single clade, in which ssp. speltoides and ssp. ligustica were scattered and mixed, but definitively separated from other the Aegilops species. Our results are in line with the conclusions of Belyayev and Raskina (2013) showing that ssp. ligustica and ssp. speltoides were a case of population dimorphism in fruit structure and did not in fact represent two different subspecies, as initially proposed by Zhukovsky (1928).
In the phylogenetic tree (Fig. 4) the sequences of Aegilops species were arranged according to their specific known genome even though the sampling performed of the various diploid and polyploid Aegilops species was somewhat limited and a little unbalanced for experimental reasons. It is worth noting that the two sequences of Ae. sharonensis from the Sitopsis section were found to be sisters of the Ae. speltoides clade and the two sequences of Ae. tauschii clustered together in a separated and well-supported branch. Furthermore, Ae. umbellulata, the only other diploid Aegilops species included in this analysis and donor of the U genome to some of the tetraploid species included, was able to form the highly supported (93%) U genome clade. In this regard, it is reported that the U genome occurring in some polyploid species is very well conserved and similar to the parental diploid one, whereas the second genomes evolved rapidly from their ancestors (Badaeva et al. 2004). Interestingly, the Ae. cylindrica sequence, which has the genomic formula C c C c D c D c (Linc et al. 1999), stood apart and was placed in a single branch, leading us to suppose that this sequence proceeded from the C c genome, since the D genome of Ae. cylindrica is closely related to the D genomes of both Ae. tauschii and T. aestivum (Caldwell et al. 2004).
Due to the close genetic relationship between the S genome of Aegilops speltoides and the B subgenome of wheat (Haider 2013;Marcussen et al. 2014), the three DRF1 gene sequences isolated from Ae. speltoides (AesDRF1 gene) were aligned with the two DRF1 gene sequences from the B subgenome of T. durum (see the alignment in Online Resource 1) and the comparison appeared very interesting, in particular when considering exon 1, containing a SSR region coding for a variable number of alanines (Di Bianco et al. 2011). We recently found that this region consists Fig. 4 The optimal Neighbor Joining tree based on 84 sequences of DRF1 exon 4 from Aegilops species (sum of branch length = 0.4504). There was a total of 812 positions in the final dataset. The bootstrap support values ([ 50%) are shown close to the node. The species, the NCBI IDs and the simplified genome codes are reported on each branch. Genome clusters are indicated in bold to right of tree. Asterisks denote branches shared with the tree inferred by the Maximum-Likelihood method of the repetition of two different patterns, both encoding for alanine, namely GCG and GCC, peculiar to the A and B subgenome copies, respectively, thus representing a specific subgenome signature in T. durum (Cantale et al. 2018). The length of the SSR region was also variable and we found it was 21 bp long in the A subgenome and 18 bp in the B subgenome copies, respectively. In Ae. speltoides, the SSR consisted of 15 nucleotides in AeslDRF1-FJ843102 and AessDRF1-FJ858188, and 12 nucleotides in AeslDRF1-FJ858187, always displaying the GCG triplet. It is worth noting that the SSR regions of the two pseudogene sequences, isolated from different accessions, showed the GCG triplet for a stretch of 15 nucleotides, but also two further downstream triplets (GCC GCA), both encoding for alanine, with sequences identical to those observed in the B subgenome copies (see Fig. 3). The presence of a variable repeat number, the GCG triplet and the GCC GCA motif places the Ae. speltoides SSR sequence in an intermediate position between the A and B subgenomes, but the few sequences containing this region currently available do not make it possible to generalize. On the other hand, it is reported that SSRs show an extremely high rate of reversible variation in the repeat number, which can be functionally significant (Gemayel et al. 2010). Recent literature provides evidence of an evolutionary role for SSRs in transcribed regions as important sources of adaptive genetic variation, as they are involved in a number Fig. 5 The optimal Neighbor Joining tree based on 102 DRF1 exon 4 sequences from Aegilops and Triticum species (sum of branch length = 0.9089). For clarity, the Aegilops speltoides group of 68 sequences was graphically collapsed. There was a total of 730 positions in the final dataset. The bootstrap support values ([ 50%) are shown close to the node. The species, the NCBI IDs and a single capital letter indicating the genome type are reported on each branch. When the genome type was unknown, the simplified genome code was indicated: AB for T. durum and T. dicoccoides or ABD for T. aestivum. Genome clusters are indicated in bold on the right of the tree. Asterisks denote branches shared with the tree inferred by Maximum-Likelihood method of effects such as the regulation of transcription and translation of their own genes, organization of chromatin and morphological variations (Kashi and King 2006;Viera et al. 2016). All things considered, further research into the SSR region of the DRF1 gene in Ae. speltoides looks particularly interesting, as it may represent another possible player involved in the gene regulation.
To widen the comparison with the Triticum genus, the exon 4 sequences from Aegilops, T. aestivum and T. durum were aligned and a phylogenetic tree was inferred (Fig. 5). Variability of a single gene has been used to investigate the phylogenetic relationship among species (Adderley and Sun 2013;Liu et al. 2016), although this task appears particularly challenging in this tribe, and sometimes large incongruities among the trees built from different genes could be observed, due to the complex nature of allopolyploidy, possible introgressions and the involvement of mechanisms such as gene hybridization, gene flow or horizontal gene transfer (Degnan and Rosenberg 2006; Liberles and Dittmar 2008;Escobar et al. 2011). In spite of the above, and also considering that the number of accessions analysed among the Aegilops and Triticum genera was not balanced, the phylogenetic tree inferred showed well-supported clades attributable to the current genomes (S, A, B, D and U genomes). Furthermore, those sequences of which the subgenome was initially not known placed themselves correctly with respect to a genomic designation consistent with their species, so that the association enriches information about them. The topology observed among the A, B and D clades could be symbolized by B (A, D) as the A clade is closely related to the D clade and both are distant from the B clade. Marcussen et al. (2014) found that the above topology is the most frequent in bread wheat when analysing both genes in the whole genome and a large number of trees from different genes. Furthermore, with coalescent-based genome divergence analyses they estimated that Ae. speltoides is a sister to the B genome, T. urartu is a sister to the A genome and Ae. sharonensis and Ae. tauschii are successive sisters to the D genome (Marcussen et al. 2014).
In conclusion, we report a study focused on the isolation and characterization of the DRF1 gene in Aegilops speltoides and other Aegilops species. The overall results suggest that there is a high similarity between the B and S genome copies of the DRF1 gene, but also some features indicating that both genomes have evolved independently, in line with other observations reporting that B is the most variable among the three subgenomes of current polyploid wheats (Wendel 2000;Petersen et al. 2006). As DRF1 is a key gene in drought response, we intend to continue our studies: the possible involvement of the pseudogene, SSR variability and the DRF1.2 transcript in the gene regulation mechanisms is very interesting and deserves further research. Furthermore, the sequencing of the whole gene in a larger taxon sampling would increase knowledge of DRF1 gene diversity in the Triticeae tribe. The advancement of genomics and the growing availability of wild relative genomes, coupled with the development of new statistical methods for comparative genomics, will make it possible for molecular data to be supplemented by other information that is underutilized in phylogenetic analyses, such as protein structural constraints in specific residues, chromosomal information and population genetics, leading in the end to a better understanding of the evolutionary history of Triticeae. Looking further ahead, the increase in knowledge should provide an important contribution to the widening of crop genetic diversity in support of global food security.
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.