Allelic variation of gliadin-encoding genes in a collection of tetraploid wheat genotypes

Wheat is one of the main crops bred worldwide. Durum wheat, specifically, is a key element of the Mediterranean diet, representing an élite crop grown in Italy. Durum wheat nutritional and technological values are largely due to the grain protein content (GPC), a complex genetic trait strongly affected by environmental factors and management practices. In the last decades, several breeding programs have been focused on improving GPC by both traditional and innovative approaches. Among seed storage proteins, prolamins, including both gliadins and glutenins, represent the major component. These two classes of proteins are indeed responsible of gluten formation and confer the extensibility and elasticity to the dough. Besides being of crucial importance for both technological properties and rheological characteristics, prolamins, and especially gliadins, have been found to be major triggers for human health, as involved in a number of wheat consumption-related conditions, such as the celiac disease, non-celiac gluten sensitivity, defined as the onset of a variety of manifestations related to wheat, rye and barley ingestion, and wheat allergies, both due to wheat ingestion or inhalation (of flour or pollen). The identification of loci responsible for the gliadin expression, and particularly of polymorphism in the aforementioned genes, which could result in a lower immunogenic/toxic potential, could be of great importance in breeding programs. For this purpose, we screened a collection of tetraploid wheat genotypes for allelic variants of annotated gliadin genes in the durum wheat genome, in order to identify genetic resources available to breeders to improve wheat nutritional and technological properties. Phylogenetic analysis among different species of Triticum genus and an in silico expression data analysis may also be useful in the exploitation of the complex scenario of gliadin–glutenin interaction and gluten role in the adverse reactions due to wheat consumption.


Introduction
Wheat is one of the most widely grown crop worldwide, providing about 20% of the human daily protein intake. The majority of cultivated wheats belong either to the hexaploid Triticum aestivum L. (genomes AABBDD, bread wheat) or to the tetraploid T. turgidum L. var. durum (genomes AABB, durum wheat) species, the latter being an élite crop in drylands, such as the Mediterranean basin.
Grain protein content is one of the most important agronomic traits in wheat, as strictly related to both its nutritional and end-use properties. Breeding programs have been focusing on its improvement for decades, and a number of studies have been trying to better define its genetic control (Gadaleta et al. 2011;Nigro et al. 2014aNigro et al. , 2016Nigro et al. , 2019. Prolamins, which include both gliadins and glutenins, account for up to 80% of the total grain proteins. These two classes of proteins, which interaction implies the gluten formation, determine the technological quality of flour and semolina, conferring extensibility and elasticity to the dough and allowing it to be processed in a variety of products (Shewry et al. 2002;Shewry 2009).
Despite their importance in contributing to specific technological properties, wheat proteins, and especially gliadins, represent a critical trigger for human health. A number of conditions have been related to wheat consumption, such as the celiac disease (CD), an autoimmune disorder which impacts the 5% of human population, but also wheat allergies (WA), and non-celiac gluten sensitivity (NCGS) (Sollid et al. 2012;Valenti et al. 2017). Recent investigations have been carried out on a set of tetraploid wheats by both genetic and proteomic/peptidomic approaches in order to identify wheat genotypes with reduced gluten content and low immunogenic/toxic potential (Pilolli et al. 2019a, b). Moreover, a first attempt to identify allelic variants for allergenic epitopes in a collection of durum wheat genotypes was reported by Nigro et al. (2014b).
As reported by Huo et al. (2018a), genomic localization, as well as gene structure, have been well characterized for both glutenins and gliadins genes, which are mainly located in three genomic regions, Glu-3, Gli-1 and Gli-2 (D'Ovidio and Masci 2004;Gu et al. 2004;Dong et al. 2016). Gliadins are subdivided into α, γ, ω and δ-gliadins, with the α-gliadins accounting for 15-30% of the total seed storage proteins in the wheat grain (Altenbach et al. 2011). Several studies focused on sequencing gliadin loci in bread wheat, wild emmer and progenitors Triticum urartu and Aegilops tauschii (Huo et al. 2017(Huo et al. , 2018b(Huo et al. , 2019Zhang et al. 2015), suggested that gliadin gene families organization might actually be less complex than reported in the early studies (Sabelli and Shewry 1991;Anderson et al. 1997).
The recently released bread (IWGS 2018) and durum wheat genomes (Maccaferri et al. 2019) might be greatly helpful tools in unveiling the complexity of the gluten protein gene families, especially gliadins, as well as their allelic variations, which can strongly affect wheat functional properties and its immunogenic potential and have a huge impact on the development of healthier wheat foods.
In this perspective, we analyzed a collection of 240 tetraploid wheat genotypes for SNPs in gliadin-encoding genes in order to find natural allelic variants. The protein sequences of the identified gliadin genes in durum wheat were then included in a phylogenetic analysis along with their paralogues of bread wheat, wild emmer, T. urartu and A. tauschii. Also, we evaluated the gene expression in orthologues bread wheat Chinese Spring cv gliadins by analyzing in silico data of the considered genes. The identification of wheat genotypes with lower amounts of toxic epitopes, as well as the less expressed gliadin genes, could be valuable information to be used in new breeding programs and in the development of new detoxification strategies.

SNP discovery in gliadins genes in tetraploid wheat
Bread wheat gliadin gene sequences (Juhász et al. 2018;Huo et al. 2018a, b) were used as queries and launched against the publicly available T. turgidum L. var. durum genome browser (https ://d-gbrow se.inter omics .eu/), and only regions annotated as coding genes were considered for further analysis. The retrieved gene sequences were blasted against the available dataset of SNP marker sequences (Akhunov et al. 2009;Wang et al. 2014), as previously reported in other studies (Nigro et al. 2017a(Nigro et al. , b, 2019, and the ones aligned with at least 90% identity were considered as markers within the coding sequences of the gliadin genes. Additionally, SNPs markers were investigated and confirmed to locate within gene sequences by searching the variants table section reported for each gene in Ensembl Plants database (http://plant s.ensem bl.org/). The identified SNP markers were then screened in a collection of 240 accessions of tetraploid wheat genotypes (Triticum turgidum L., 2n = 4x = 28; AABB genome) (Laidò et al. 2013). The collection was recently phenotyped for grain protein content (GPC) and genotyped for SNPs with the wheat 90 K Infinium iSelect array containing 81,587 gene-associated SNP markers, as reported by Nigro et al. (2017aNigro et al. ( , b, 2019).

Phylogenetic analysis of gliadin genes
Gliadins' aminoacidic sequences retrieved for bread wheat, durum wheat, wild emmer (Triticum turgidum ssp. dicoccoides), A. tauschii and T. urartu genomes (http://plant s.ensem bl.org/), were aligned by using the ClustalW method via Mega7 software. Phylogenetic analysis was carried out on a total of 111 sequences using the maximum likelihood method. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The tree was drawn to scale, with branch lengths measured in the number of substitutions per site. The tree was generated with Mega7 (Kumar et al. 2016) (http:// www.ebi.ac.uk/Tools /phylo geny/) and modified with the FigTree program (http://tree.bio.ed.ac.uk/softw are/figtr ee/).

Alpha-gliadins genes in silico expression analysis
Bread wheat gliadin genes identified by Juhász et al. (2018) and Huo et al. (2018a, b) were investigated for their in silico expression, in order to identify an eventual differential expression among all of them.
Each gene was used as a query in the Expression Atlas website (https ://www.ebi.ac.uk/gxa/home). All experiment were screened, and specifically: four experiments reporting data of RNA-seq by organism part (Pfeifer et al. 2014); three RNA-seq experiments based on different developmental stage (Gillies et al. 2012;Pingault et al. 2015) and one RNAseq experiment on different grain developmental stages (Li et al. 2013).

SNPs identification in gliadin-encoding genes
A total of 36 gliadin genes were retrieved from the durum wheat genome browser (https ://d-gbrow se.inter omics .eu/), 25 of which were annotated as high confidence genes. In Table 1, gene IDs, functional annotation, physical map position, level of confidence as well as the strand of each gene are reported. As showed, 19 sequences belonged to the A genome, 8 to the B genome and the remaining ones were unmapped. Also, only genes within alpha-and gamma-gliadins groups were annotated as high confidence, while all the omega gliadins genes were annotated as low confidence ones. Alpha-gliadin genes were the most represented ones. Out of 25 identified 25 genes, 13 mapped on 6a chromosome, 8 of which as high confidence genes, while 4 were located on homoeologues 6B, one of which as low confidence gene. Nine additionally alpha-gliadin genes were identified and unmapped, even though only 4 of them were HC genes. All the 36 gene sequences were used as queries in a BLASTn with the whole set of wheat SNPs of the 90 K iSelect Illumina chip, allowing the identification of 30 SNPs markers within gene sequences. Specifically, 13 SNPs were identified in the genomic sequence of gliadin genes on the short arm of homoeologous chromosome group 6, while the remaining SNPs are located on the short arm of homoeologous chromosome group 1. Sixteen of them were monomorphic among the 240 genotypes, one resulted to be failed, while the other thirteen showed different levels of polymorphism (Table 2). The minor allele frequency (MAF) ranged from 1.25% for IWB11263 to 32.92% for IWB65373, both located in gamma-gliadin genes. The 30 SNPs markers sequences aligned with a total of 59 genomic regions on 1S and 6S homoeologous chromosomes groups on durum wheat genome, probably due to the gene evolution process and the presence of a high number of genes in the Gli loci, as well as the high number of transcript for each gene. In Table S1, all the aligned regions and information retrieved from Svevo browser (https ://d-gbrow se.inter omics .eu/) are reported. Sixteen of the identified 59 regions corresponded to coding regions, while the remaining were located in introns. Interestingly, SNP marker IWB63758 aligned with TRITD6Bv1G014610 gene in four different positions, three of which corresponding to exon regions. SNP IWB63758-6B.1 was found to align to 16 different alternative mRNAs of gene TRITD6Bv1G014610 (TRITD6Bv1G014610.5-TRIT-D6Bv1G014610.21), which shows quite a complex structure, as 25 different splicing form were identified. SNP IWB63758-6B.3 was found to align to 2 other different alternative mRNAs of gene TRITD6Bv1G014610 (TRIT-D6Bv1G014610.24 and TRITD6Bv1G014610.25). The single nucleotide polymorphism determined a glutamine (Q)-lysine (K) aminoacid change at position 46 of the protein sequence. SNP IWB63758-6B.4 aligned to gene TRITD-6Bv1G020780, which also shows 9 different transcripts, and also in this gene, a non-synonymous glutamine-lysine aminoacid change was observed due to the SNP. Three SNP markers located in gamma-gliadin genes blasted with TRIT-D1Av1G002230 identifying three different point mutations. Specifically, while IWB11262 corresponded to a synonymous substitution, two others, IWB11263 and IWB11264, caused two non-synonymous changes, a serine (S)-proline (P) and isoleucine (I)-valine (V), respectively.

Phylogenetic analysis
Orthologues gliadin gene sequences were retrieved for bread wheat (42 sequences), wild emmer (20), A. tauschii (21) and T. urartu (14) genomes from the Ensembl Plants database (http://plant s.ensem bl.org/). These identified protein sequence, along with the 23 HC previously identified in durum wheat (Table 1), were checked for sequence structure and similarity. A total of 111 gene sequences located on the short arms of 1 and 6 chromosome groups were retained to build a phylogenetic tree comprising the five considered species, which were firstly aligned by using the ClustalW method via Mega7 software. Nine sequences were discarded, probably subjected to some evolutionary polymorphism such as TE insertion or deletion, and multiple alternative splicing events resulting in a number of different transcripts for the same gene. The evolutionary history was inferred by using the maximum likelihood method based on the JTT matrix-based model. The tree with the highest log likelihood (− 26,965.5799) is shown in Fig. 1. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 111 aminoacid sequences. There were a total of 609 positions in the final dataset.
Two main clades were generated: the first one, grouped 19 sequences belonging to the short arm of chromosome group 6, annotated as alpha-gliadins. Therefore, their protein sequence was quite different from the other groups of gliadins reported in the tree. The same situation was observed for the second clade, which showed several differentiation nodes. As expected, all gliadins sequences belonging to 1S chromosome groups, corresponding to gamma-gliadin genes grouped together, as well as the highest number of 6S genes (29) which shared the highest sequence similarity. The phylogenetic result reflected the complexity of gliadin gene family, showing the high number of speciation and duplication events the gene family underwent.

Differential expression of gliadin genes
Despite being the main components of seed storage proteins, alpha-gliadins are also among the main triggers for celiac disease and other wheat-related pathologies. As no similar experiments have been yet carried out on durum wheat, we exploited the close relation and syntheny with bread wheat in order to detect eventual differential expression among known alpha-gliadins gene in seeds and in different experimental condition. To this purpose, an in silico analysis was carried out on 21 alpha-gliadin gene sequences belonging either to A and B genomes (reported in Juhász et al. 2018;Huo et al. 2018a, b) through by the Expression Atlas website (https ://www.ebi.ac.uk/gxa/home). Expression level were reported in TPM (transcripts per million), a normalized measure of the proportion of transcripts in an RNA sample. TPM were calculated from the number of reads that mapped to each particular gene sequence taking into account the gene length (more reads might be produced from longer genes) and the sequencing depth (more reads might be produced from the sample that has been sequenced to a greater depth). All the experiment carried out on Triticum aestivum and available in the Atlas were considered (see Materials and method section). The first experiment (Pfeifer et al. 2014) considered the transcriptional profile in different tissues 10, 20 and 30 days post-anthesis (Fig. 2). Results showed that gene expression in highly variable among genes and cell tissues, but basically the alpha-gliadin genes are more expressed in the endosperm during all the seed development. Indeed, 10 days post-anthesis, expression was detectable only in the endosperm, with TraesCS6A02G049400, TraesCS6A02G049100 and TraesCS6B02G065856 showing the highest expression levels (Fig. 2a). At 20 and 30 days post-anthesis, the maximum gene expression was detected in endosperm, with lower expression was detected in the aleurone layer (Fig. 2b, c). This was not unexpected, considering that alpha-gliadins are among the most abundant seed storage proteins.
The second experiment reports RNA-seq data from different phenological phases. Gillies et al. (2012) reported RNAseq analysis on central endosperm during seed development 6, 9 and 14 days post-anthesis. Interestingly, and in accordance with previous data, gliadin gene expression increased notably 9 days after anthesis and reached their highest values at 14 daa. TraesCS6A02G049100 gene showed the highest expression ( Figure S1a). Also, Pingault et al. (2015) analyzed fruit transcription profiling by high throughput sequencing of five different organs of wheat (leaf, shoot, root, spike and grain) at three developmental stages each ( Figure S1b). The highest gene expression was observed when plants were between 70% of their fruit formation to final size, and, also in this case, TraesCS6A02G049100 gene showed the highest expression. Before and after this phenological phase, no expression at all was detected for none of the considered gliadin genes.
The last considered experiment was carried out by Li et al. (2013), who run an RNA-seq experiment on grains at different developmental stages ( Figure S1c). The highest gene expression was found in samples collected 12 days after pollination, and particularly, two genes showed very high expression level, both belonging to A genome gliadins, TraesCS6A02G049100, which has been shown the highest expression in all experiments, and TRAESCS6A02G049800 gene. By comparison of the three experiments, two genes, TraesCS6B02G066001 and TraesCSU02G265913, were never expressed, suggesting these two genes might have undergone to pseudogenization events.

Discussion
Studies on gliadin genes have been carried out for bread wheat, wild emmer and the two progenitors of A and D genome, T. urartu and A. tauschii, respectively (Zhang et al. 2015;Huo et al. 2017Huo et al. , 2018bHuo et al. , 2019. Despite being among the most effective seed storage protein on food processing properties, wheat gliadins are also a main trigger for health-related conditions associated with the consumption of its derived products (Shewry et al. 2002). So far, understanding gliadins structure, evolution and expression pattern, as well as allelic variation within their coding sequence, might be extremely important in order to focus on new breeding programs to develop wheat varieties with a reduced immunogenic potential.
In this study, we exploited the recent genome sequencing data of tetraploid durum wheat (Maccaferri et al. 1 3 2019), in order to assess the number of gliadin genes annotated in the genome, as well as SNPs polymorphism in a collection of tetraploid wheat genotypes. SNPs are nucleotide variations in the DNA sequence of individuals in a population and represent the class of molecular markers more frequent in the genome. Indeed, SNPs are widely distributed along the genomes, may vary in frequency and distribution among species and are usually prevalently but not exclusively located in the noncoding regions of the genome (Akhunov et al. 2009;Wang et al. 2014).
This work provides a new insight into the genetic diversity and allelic variation at gliadin genes in durum wheat. Indeed, a collection of 240 tetraploid wheat genotypes was screened for polymorphic SNPs in the gliadin gene sequences retrieved in the durum wheat genome browser, which allowed us to identify several SNPs within their genomic sequence. As expected, most of them were localized in intronic region, but nine of them were found to be located in exon regions. Moreover, four of them were also found to be polymorphic within the screened collection, resulting in one silent mutation and three non-synonymous aminoacidic change. Despite none of them seems to be located in known major CD epitopes, the approach we followed showed that the combination of recent genome sequencing and annotation data with SNP polymorphism analysis it could be a valid tool to quickly screen and identify gliadins allelic variation, as previously reported for other allergenic wheat epitopes (Nigro et al. 2014b). Indeed, more high throughput data needs to be generated and analyzed in order to detect useful information about point mutations for the development of new wheat varieties.
The analysis of in silico expression data run in different tissues and different phenological phases showed how highly variable the gliadin gene expression is. Due to high sequence similarity, it is quite complex to characterize each member of a large gene family, as well as their gene expression. Another important fact to be considered about gliadin gene family is the high pseudogenization rate in wheat. As we reported, some of the identified genes showed a low expression level in all the considered situations, probably due to the instability of their transcript, as also identified by Huo et al. (2018bHuo et al. ( , 2019 in other related species. A deeper analysis of all the identified durum wheat gene sequences will be required in order to establish the pseudogenization rate also in durum wheat, and identify the ones actually expressed. These information could be useful in further silencing and genome editing experiments to reduce their toxicity.