Keywords

9.1 Introduction

Wheat holds a central position among major food crops by providing 20% of the total caloric requirements for the humans around the world. Common wheat (Triticum aestivum L.) is an allohexaploid (2n = 6x = 42; AABBDD) crop successfully cultivated all over the world covering an area of approximately 220 million ha. Genetic improvement in wheat productivity, resilience to climate extremes, and quality are challenges to be met in continuing to feed the global population, mitigate the effects of climate change, and fulfill the end user quality preferences. Since the expansion of wheat production area will not be possible due to the continuous shrinking of arable land, the increase in the grain yield by improved agronomic practices and breeding are feasible approaches. It has been recognized that conventional crop breeding approaches are not able to deliver the target of 70% increase in crop productivity by the end of 2050 (Tester and Langridge 2010). The innovation required in all breeding components includes selection accuracy, selection intensity, deploying new genetic variations, and shortening of the breeding cycles in developing cultivars (Li et al. 2018).

Conventional plant breeding heavily has relied on the selection of key phenotypes related to yield-related traits such as harvest index in wheat (Lopes et al. 2012), and it seems impossible to further improve harvest index using conventional breeding. Secondly, the phenotypic-based selections are labor intensive and time consuming, and off-spring can only be selected at the certain homozygous generation at the later growth stages. The concept of genomics-assisted breeding (GAB) was proposed as an alternate to overcome the selection challenges associated with conventional breeding (Varshney et al. 2005). The marker-assisted selection component dominated in the breeding programs where the diagnostic markers for the genes with major phenotypic effects were developed and successfully used for selection (Liu et al. 2012). However, many complex traits such as yield and adaptability to stressed environments are controlled by many genes with minor effects or quantitative trait loci (QTL), further interacting with environment (Gao et al. 2015). Their individual effects are too small to be efficiently captured by one or few markers (Bernardo and Yu 2007). Therefore, a transition from marker to genome-based breeding is indispensable to achieve the productivity targets (Rasheed and Xia 2019).

The next-generation sequencing (NGS) has revolutionized plant genomics and resulted in development of techniques and resources amenable to plant breeding (Bevan et al. 2017). The ever-growing plant genomic resources have provided plethora of SNP information distributed throughout the plant genomes, which have made them markers of choice for a variety of research applications, especially in breeding and genetics research. Until now, the reference genome sequences are available for most of the crop species, including wheat, while pan-genome sequences are increasing with the rapid pace (see Chap. 14). Characterization of the pan-genome can rapidly identify variations within the candidate genes, which have a direct application in breeding. In this chapter, we discuss different genome-informed scenarios being pursued to discover genes underpinning important phenotypes (Blake et al. 2016). We also provide a framework of functional genes of wheat in the context of the recent reference genome sequence assembly and discuss database resources necessary to reduce redundancy in research.

9.2 Wheat Reference Genome Sequence and Other Genomic Resources

9.2.1 The Reference Genome Sequence of cv. CHINESE SPRING

Wheat has a history in being used a model plant for understanding cytogenetics, physical mapping of genes, and to facilitate pre-breeding to introduce inter-specific and intergenic diversity. For example, the array of wheat aneuploid stocks, unequaled in any other crop, was developed by Sears (1954). All these genetic stocks were developed using wheat cv. CHINESE SPRING (Sears and Sears 1978). Such aneuploids include all the possible chromosome addition or deletion lines in the form of nullisomics, trisomics, monosomics, and tetrasomics. These cytogenetic stocks greatly facilitated the genetic studies which were not possible in many of the higher organisms at that time. These stocks were used to identify major genes controlling important traits and physically map their positions along chromosomes, including the genes related to waxiness, maturity, endosperm proteins, and vernalization (Driscoll and Jensen 1964; Shepherd 1968; Halloran and Boydell 1967; Law 1966). Later, these efforts provided the basis for starting a ‘Catalogue of Gene Symbols for Wheat’ to catalogue wheat genes (McIntosh 1973). Since a wide array of genetic stocks were available in the CHINESE SPRING wheat background, this cultivar was selected to develop the first reference genome sequence in wheat. The International Wheat Genome Sequencing Consortium (IWGSC) was established in 2005, and after 13 years of its establishment, the high-quality reference sequence was released in 2018 (IWGSC 2018).

9.2.2 Other Genomic Resources in Wheat

All genome sequence resources available in wheat to date are provided in Table 9.1 and include population-level whole-genome resequencing, exome sequencing, and to lesser extent some SNP genotyping resources. The analysis of the CHINESE SPRING reference genome is now complemented by de novo sequences of ten important wheat cultivars from global breeding programs and has allowed the documentation of breeding histories, wild introgressions in the cultivated wheat, and chromosomal structural rearrangements that facilitated wheat breeding (Walkowiak et al. 2020); Jayakodi et al. 2021). Apart from the sequencing efforts in cultivated wheat, the genome sequences of diploid and tetraploid progenitors of bread wheat including Ae. Tauschii (Zhao et al. 2017), T. monococcum (Ling et al. 2018), Ae. Speltoides (Avni et al. 2022), and T. dicoccoides (Avni et al. 2017) are available. Recently, a population-level genome sequence resource of global Ae. Tauschii accessions was provided for use in trait discovery and functional genetic validation of D-genome introgressions in bread wheat (Gaurav et al. 2022). The shared utility of all such resources is underpinning the assignment of functional attributes to genes through association genetics or by selective sweeps. For example, 120 Chinese wheat cultivars and landraces were resequenced, and it was identified that the D-subgenome of modern cultivars is mostly derived from landraces, while A- and B-subgenomes were mainly derived from European landraces (Chen et al. 2019). Strong signals of selective sweeps were restricted to 48 high-confidence (HC) genes selected during modern wheat breeding. The strongest signals were for genes TaNPF6.1-6B, TaNAC24, and TaRVE3, which are associated with nitrogen use efficiency, drought and heat stress tolerance, and flowering time, respectively (Chen et al. 2019).

Table 9.1 Wheat genomic resources post-reference genome sequence

The exome capture of more than 500 global wheat accessions was conducted to identify the genes underpinning selection of adaptation of modern-day bread wheat during last 10,000 years (Pont et al. 2019). The authors concluded that dispersion of wheat and human migration patterns were consistent with an origin out of the Fertile Crescent and Egypt to Maghreb (Northern Africa) with a coastal route. The major driving forces in wheat adaptation were the vernalization requirement, historical groupings, and geographic origins (Europe, Asia, Africa, and America) and thus resulted in the partitioning of the genetic diversity in wheat. Furthermore, a total of 168 Mb of genome regions on different chromosomes contained selective sweeps which were identical between the Asian and European germplasm, even though European wheats had more frequent introgressions compared to wheats from Eastern Asia (He et al. 2019; Zhou et al. 2020), based on the resequencing of 890 bread and durum wheat accessions and the identification of introgressions from wild species favoring global wheat adaptation. Another globally important genomic resource is the DArTseq database of 44,624 wheat accessions from the International Maize and Wheat Improvement Center (CIMMYT) GenBank (Juliana et al. 2019). The DArTseq data was used to conduct genome-wide association studies (GWAS) for 50 different traits of breeding interest and identified important loci for end-use quality, biotic, and abiotic stress resistances. These studies provide a deep insight into genetic diversity and genetic regions in wheat under artificial and natural selection and will keep proving important resources for use of such information in breeding.

9.3 Wheat Functional Genes Discovery: Strategies and Inventory

Quantitative trait loci (QTL) mapping and GWAS have dominated wheat genomics research to date. These studies identify the favorable alleles and their diagnostic markers which can be then used in wheat breeding to introgress important QTL or genes (Rasheed and Xia 2019). In Table 9.2, we provide a near-to-complete framework of the functional genes discovered so far by such approaches. However, such genetic dissection especially in case of GWAS can be ambiguous due to the confounding effects of population structure or low-accuracy genotype calls at some loci (Browning and Yu 2009), or due to the small population size (Finno et al. 2014). It is, therefore, necessary to further validate the phenotypic effects of such loci in biparental mapping populations or other genetic backgrounds, as well as by other biological means such as genetic transformation, gene silencing or gene knockout, and gene editing. The population-level whole-genome resequencing or exome capture data facilitated the discovery of several genes for economically important traits. From the resequencing data of 145 Chinese wheat accessions, Hao et al. (2020) identified that TaFRK2-7A gene contained three non-synonymous mutations compared to CS allele and was strongly associated with starch and amylose contents in mature seeds. The exome sequence of 287 wheat accessions identified the causal variations in TaARF12 encoding an auxin response factor and TaDEP1 encoding the G-protein γ-subunit, pleiotropically regulating both plant height and grain weight in wheat (Li et al. 2022a, b).

Table 9.2 Framework of functional genes characterized in wheat with positions in wheat genome and associated traits

In recent years, several loci were identified simultaneously by GWAS and biparental mapping strategies. Liu et al. (2017) identified marker-trait association for black point resistance. Loci underpinning flour color (Zhai et al. 2016), kernel number per spike (Shi et al. 2017), and thousand grain weight (Sehgal et al. 2020; Wang et al. 2021) were also identified following a similar strategy. A functional gene, TaRPP13L1 associated with flour color, was identified by GWAS in wheat cultivars from China and two KRONOS wheat mutants carrying premature stop codons of the TaRPP13L1 gene and was thus validated as a gene influencing flour color (Chen et al. 2019).

Another gene discovery approach which is now widely used is bulk segregation analysis (BSA), where DNA from individuals of a population showing contrasting, extreme, and phenotypes is pooled and then RNAseq, exome sequencing, or whole-genome resequencing is applied (Zou et al. 2016). This is a rapid method to identify consistent polymorphic regions between contrasting pools of wheat lines. In addition to the discovery of SNPs between contrasting pools, differentially expressed genes can also be identified in the case of RNAseq analysis of tissues. Using this approach, a QTL interval with four candidate genes has been discovered on chr4A underpinning resistance against orange wheat blossom midge (OWBM) affecting wheat production in many countries (Hao et al. 2019). Likewise, resistance to yellow rust in wheat cultivar ZHOUMAI 22 was delimited to a physical interval of 4 Mb using BSA and RNAseq approach (Wang et al. 2017a). Other studies where this approach has been effective in discovering candidate genes include nitrogen-dependent lesion mimic gene Ndhrl1 (Li et al. 2016), powdery mildew resistance gene Pm4b (Wu et al. 2018), leaf senescence gene els1 (Li et al. 2018), stripe rust resistance gene Yr26 (Wu et al. 2018), YrMM58, YrHY1 (Wang et al. 2018a, b), dwarfing gene Rht12 (Sun et al. 2019), and Pm61 (Hu et al. 2019). It is likely that this approach will get more attention because it replaces the genotyping of complete populations (Zou et al. 2016).

Very few genes in wheat have been discovered using the traditional map-based cloning approach, and most of the genes have been identified by comparative genomics between wheat and related grass species due to the high collinearity and genetic organization among grass genomes (Rasheed and Xia 2019; Chen et al. 2020). According to the recent literature search, almost 33 genes related to grain morphology have been isolated by homology-based cloning and functional markers have been developed for use in breeding (Table 9.1). Likewise, genes related to other morphological and phenological traits have been isolated including TaPRR73 (Zhang et al. 2016) and TaZIM-A1 (Liu et al. 2018) underpinning flowering time; TaPPH-7A (Wang et al. 2018a; b) underpinning morphological traits; TaARF4 (Wang et al. 2019b) controlling root growth and plant height; and TaSnRK2.9-5A (Ur Rehman et al. 2019) controlling drought tolerance.

9.3.1 Functional Genomics and Map-based Cloning in Wheat

The continuous development of new genomic resources in wheat including new reference genomes, transcriptome resources, wheat TILLING mutants with exome sequencing data, and high-density SNP database are conduits for carrying out map-based cloning to discover new genes in wheat. A QTL for head length and spikelet number was identified and then fine mapped to an interval of 0.2 cM (Yao et al. 2019). The map-based cloning identified that Head Length 2 (HL2) is the designated gene controlling head length and spikelet number. Zhang et al. (2018) fine mapped a heading time gene, TaHdm605, in an EMS mutant line. Spike architecture is an important yield-related attribute, and three genes TaTFL1-2D, TaHOX2-2B, and TaAGLG1-5A, controlling spike architecture were discovered analyzing a large-scale transcriptome data of 90 wheat lines (Wang et al. 2017b). The effects of these genes were validated by the transgenic assays. Another approach used for discovery of gene was the screening of a yeast cDNA library constructed from a heat- and drought-tolerant wheat cv. HANXUAN 10. Using this approach, TaPR-1-1, for tolerance to abiotic stress tolerance, was identified which encodes the pathogenesis-related (PR) protein family (Wang et al. 2019a).

The development of male sterile lines is an important component of hybrid wheat breeding program. Two studies simultaneously cloned Male Sterile 2 (Ms2) gene underpinning male sterility in wheat (Ni et al. 2017; Xia et al. 2016). The causal mutation was identified to be a terminal-repeat retrotransposon in miniature (TRIM) element in the promoter of Ms2. The TRIM element was involved in the gene activation and causes male sterility. Liu et al. (2019) cloned TaSPL8 gene controlling leaf angle and is an important component of auxin and brassinosteroid pathways and associated with cell elongation. The knockout mutants of TaSPL8 had erect leaves due to the loss of the lamina joint, compact architecture, and increased spike number. Pm21 is a durable disease resistance gene derived from Haynaldia villosa confers resistance against powdery mildew, and currently wheat cultivars with Pm21 are cultivated on 4 m ha in China (Cao et al. 2011). Two complementary studies cloned Pm21 and identified that it encodes a typical CC-NBS-LRR protein involved in broad spectrum resistance to powdery mildew (He et al. 2019).

Fusarium head blight (FHB) is one of the most important yield and quality limiting factors in wheat globally. There are very few resources providing durable resistance to FHB in wheat including some landraces from China like SUMAI 3, which is known to carry Fhb1 gene. Rawat et al. (2016) used multiple approaches including positional cloning, development of overexpression lines, and gene silencing to report that a pore-forming toxin-like (PFT) gene was the candidate for Fhb1. However, it was later found that several FHB susceptible cultivars also carry PFT and its candidacy was doubted. Two new studies further established that a histidine-rich calcium-binding (TaHRC or His) gene adjacent to PFT is the actual Fhb1 and was identified as a susceptibility factor (Su et al. 2019). In contrast, Li et al. (2019) concluded that Fhb1 is a gain-of-function gene and that the newly generated protein acts as a regulator of host immunity.

9.3.2 Functional Genes and Their Diagnostic Markers

All the above examples show the discovery of genes following different strategies and include various validation approaches. Once a gene is discovered and its phenotypic effect is validated, it becomes important to identify and select the favorable alleles of those genes in breeding using functional markers (FMs). FMs are referred to the PCR-based diagnostic markers designed to identify causal polymorphism underpinning phenotypic differences. FMs are routinely used in crop breeding programs to identify and select the desirable allelic variations of specific functional genes (Liu et al. 2012; Rasheed et al. 2017; Rasheed and Xia 2019; Rouse et al. 2019). As mentioned earlier, FMs due to their high diagnostic value are ideal markers for use in breeding to identify and pyramid different genes in marker-assisted recurrent selection. FMs are also used in genomic selection to improve selection accuracy. Rasheed et al. (2016) converted a collection of 72 FMs to kompetitive allele-specific PCR (KASP) formats for their use in high-throughput platforms. This effort currently now includes 157 KASP markers to diagnose alleles of traits of breeding interest. These KASP markers have been used by various breeding programs, and a recent estimate from citation indicated that currently more than 35 wheat breeding and genetic programs all over the world used these markers. For example, CIMMYT elite lines were tagged with TaGS3-D1, TaTGW6, and TaSus1 genes using these KASP markers (Sehgal et al. 2019). Zhao et al. (2019) screened 1152 diverse global wheat germplasm lines with KASP markers of 47 functional genes underpinning a number of important traits of breeding interest (Zhao et al. 2019). Favorable alleles of more than 39 genes of breeding importance were also identified in East African wheat germplasm using the aforementioned KASP markers (Wamalwa et al. 2020).

Several commercial alternatives to the KASP master mix are now available which have made SNP genotyping more cost effective. Apart from these commercial alternates to the KASP technology, some open-source SNP genotyping methods are also available. Two examples are the development of semi-thermal asymmetric reverse PCR (STARP) (Long et al. 2017) and Amplifluor (Jatayev et al. 2017) methods which can be used with wide range of commercial master mix. Several SNP markers were converted to STARP format to further reducing the cost of genotyping (Wu et al. 2020).

9.4 Mining Gene Networks Using Database Resources

We have outlined many genome sequencing projects carried out to generate genome variation data in wheat populations (Table 9.1). The amount of genome sequencing data being generated in wheat can often hinder scientists from translating complex and sometimes contradictory information into biological understanding and discoveries. Apart from using the data to investigate the genetic diversity, population-level genomic variation data provides a valuable resources and great opportunities for identifying trait-related genes, designing markers, constructing gene trees, exploring the evolutionary history, and assisting the design of molecular breeding. Mining the relevant information from the extensive genome variation datasets is a time-consuming and error-prone process if the proper tools are not used to explore the genes in questions. New tools are indispensable to develop for explaining how genes and gene networks might be implicated in a complex trait or disease. Another limitation is that tapping large and complex genome variation datasets requires computational skills exceeding the abilities of the most crop breeders. In nutshell, the reuse of genomic variation data plays an important role in driving current plant science research. We have provided an overview of the various genome variation tools and resources for quick analysis of gene and gene networks (Table 9.3).

Table 9.3 Genomics database in wheat for genome-informed characterization of wheat genes

9.4.1 Gene–gene Synteny Using PRETZEL

In defining a genetic framework at the genome level, the reliance on similarity searches with transcripts and proteins is of primary importance, and in this context, features of genome structure such as sequence/gene repetition impact on the capacity to identify the correct gene for detailed analysis. Sequence alignments underpin all the studies. The capacity to visualize genome features such as uneven repetition between loci aligned between several genomes (Fig. 9.1) can anticipate complications when gene alignments are carried out without this prior knowledge.

Fig. 9.1
figure 1

Comparative analysis of 7AS fructan locus. In a, the arrows indicate the location of the locus within the entire chromosome, and b and c are the images resulting from ZOOMing into the locus. The marker genes TraesCS7A02G009100, TraesCS7A02G009200 through TraesCS7A02G010200 indicate the array of GH32 fructosyltransferases located at the locus in a ca 750 kb region (c). Scaffold columns to the right side of the PRETZEL maps are important for checking aberrations in colinearity (based on sequence similarity of 70% over 70% of the length of the sequence) as discussed in the text in terms of relating the boundaries of inverted regions to the boundaries of scaffolds in the assembly. In the region illustrated for MACE (b, c), the chance of the inverted region being an assembly error is reduced because the inversion is well within the respective scaffold

PRETZEL (https://plantinformatics.io; Keeble-Gagnere et al. 2019) is an online, interactive, and real-time visualization tool for analyzing and integrating genetic and genomic datasets. In Fig. 9.1, the alignments of the fructosyltransferase genes at the fructan synthesis locus on 7AS for the wheat cv. LANCER, cv. CHINESE SPRING, and cv. MACE are shown as a complex example where the IWGSC 7A-LANCE 7A alignment of the array of GH32 genes is fully syntenic between gene models within the LACER and CS loci. In contrast, the IWGSC 7A-MACE 7A alignment is evidently ambiguous as a result of small genome rearrangements possibly due to assembly errors. The software PRETZEL enables any locus of interest to be analyzed and potential issues to be identified.

The variations in fructosyltransferases on chromosomes 7A, 4A, 7D, 6A, 6B, and 6D are candidate genes in QTL that characterize fructan content in wheat grain and thus relate to quality/nutritional attributes of the grain (Zhang et al 2008; Huynh et al 2012; Langridge and Fleury 2012). The component fructosyltransferases genes in the 4A and 7D loci showed good alignment across LANCE, CS, and MACE except for an inversion relative the CS in the MACE locus similar to that shown for the 7A locus (Fig. 9.1). The 6B and 6D loci carried the component fructosyltransferases genes, referred to as fructan 1-exohydrolase (1-FEH) in Zhang et al (2008), and showed good alignment across LANCE, CS, and MACE. The 6A locus showed an inversion in MACE relative to CS and an absence of the locus in LANCER, consistent with the presence/absence polymorphism among wheat varieties for the 6A locus reported by Zhang et al. (2008).

In contrast to the locus carrying the fructosyltransferases, the wheat-APO1 (WAPO-A1) locus on the long arm of 7A shows unambiguous alignments across the varieties examined (Fig. 9.2a, left-hand panel for entire chromosomes and right panel for the WAPO1 locus region), and thus, the variation at the structural level that needs to be considered when gene functions are examined is not a significant factor. Interestingly, the h1 and h2 haplotypes at this locus (Fig. 9.2b) identified by Voss-Fels et al. (2018) using SNP variation in the genome sequence indicate striking sequence-level divergence in this WAPO1 gene region that is not reflected at the gene–gene syntenic level shown in Fig. 9.2a.

Fig. 9.2
figure 2

a PRETZEL view of chr7A region (right panel) showing several genes including WAPO-A1 (Voss-Fels et al. 2019; Kuzay et al. 2019, 2022) and structural changes in the WAPO-A1 gene across three cvs. LANCER, CS, and MACE can be visualized with high-resolution (right panel). b is the genome viewer from DAWN (Watson-Haigh et al. 2018), and shows variation relative to the CHINESE SPRING refseq 2.1 as a reference and uses cv. LANCER and cv. MACE from the wheat 10Xgenome sequence dataset, and cv. XIAOYAN 54 and WESTONIA from Whole-Genome Shotgun (WGS) resequencing data

The genome viewer in Fig. 9.2b is from DAWN (Watson-Haigh et al. 2018) and shows variation in SNP (colored drops) positions relative to the CHINESE SPRING refseq 2.1 as a reference and uses cv. LANCER and cv. MACE from the wheat 10Xgenome sequence dataset, and cv. XIAOYAN 54 and WESTONIA from Whole-Genome Shotgun (WGS) resequencing data (Watson-Haigh et al. 2018). In field trails, under rain-fed conditions, the SNP-based haplotype h2 was found to be significantly associated with increased grain yield compared to h1, conferring a 24% yield advantage relative to all other haplotypes, especially h1 which was the other prominent haplotype in the field trial (Voss-Fels et al. 2019).

PRETZEL aims to solve alignment problems and structural changes in cultivar sequences by providing an interactive, online environment for data visualization and analysis which, when loaded with appropriately curated data, can enable researchers with no bioinformatics training to exploit the latest genomic resources (Keeble-Gagnère et al. 2019). Apart from the visualization, PRETZEL can be used to retrieve the genome information (features including markers, genes, annotations, etc.) as dataset files of any selected chromosomal region for further downstream analysis.

9.4.2 Knowledge Graphs

Knowledge graphs (KG) are now extensively used to make search and information discovery more efficient. Knetminer is a data integration platform to visualize biological knowledge networks in an interactive web application (Hassani-Pak and Rawlings 2017). The data integration approach to build KGs has the ability to capture complex biological relationships between genes, traits, diseases, and many more information types derived from curated or predicted information sources. For example, Rht24 is a new gene discovered associated with semi-dwarf phenotype in wheat and is present on chr6A. The Knetminer identified the gene network of Rht24, partially shown as Fig. 9.3 for clarity. The Traes IDs of both of the chr6B and chr6D homeologue are shown as interacting genes, and another gene, TraesCS5B02G265400, strongly interacts with Rht24. It can also be visualized that the gene interacts with bHLH27 transcription factor and physiologically influences the Gibberellin 20 pathway. Another feature is the identification of any stop/gain mutations in the CADENZA TILLING population, and mutant names and SNP positions can also be visualized.

Fig. 9.3
figure 3

KnetMiner network depicts connections with Rht-24 on chr6A in wheat. This wheat reduced height gene, Rht-24, its homeologs on B- and D-genome along with other genes in cross-talk like TraesCS5B02G265400, associated transcription factors, and the mutations in the wheat TILLING population (e.g., two mutations in CADENZA TILLING population) can be visualized. Not all connections present in the KnetMiner network are depicted in the figure; only a subset is shown for clarity

The causal mutation of Rht24 on chr6A was identified in the exome capture data of the global hexaploid wheat collection (He et al. 2019). The target SNP was plotted for the frequency of wild-type and alternate SNP among global wheat accessions using SnpHub portal (Fig. 9.4).

Fig. 9.4
figure 4

SnpHub-based global haplotype map of non-synonymous mutation in Rht-24 is plotted based on the global exome sequencing data. In pie chart, the red proportion represents the frequency of wild-type mutation, while the blue proportion represents the frequency of non-synonymous mutation associated with reduced height

9.4.3 SnpHub Portal for Global Overview of Functional Gene Frequencies

SnpHub portal is a convenient way to identify mutations in the wheat genomes and then plotting the frequency of the SNPs country-wise in global what population (Wang et al. 2020). It is a Shing/R-based platform for mining and visualizing large genome variation data in wheat. Genome variation data in terms of .vcf files and genome annotation files can be accessed by a chromosomal interval of specific gene (Traes ID) to visualize genomic variation in heatmap, phylogenetic trees, haplotype networks, and haplotype geographic maps.

Apart from these platforms, several other platforms can be interactively used to mine useful genome variation and gene expression analysis (Table 9.3). The exVIP is an excellent resource for gene expression studies across various tissues and various experiments where the expression of certain genes can be visualized as heatmaps or as datafiles for further analysis. Similarly, WheatOmics (Ma et al. 2021) provides several features for analysis of genes including JBrowse with distinct track of several SNP genotyping and exome sequencing resources, TraesID converter, and sequence retriever. Last but not least, a wheat QTL database has been released recently which is an important resource to align QTL information with the IWGSC reference sequence (Singh et al. 2021).

9.5 Conclusion and Prospects

The complete annotation of functional genes in wheat is a challenge at multiple levels. For example, a first important intrinsic feature to impact annotation is the fragmentation level at the level of the number of exons per gene. As a CDS is fragmented into several exons, the difficulty to predict the correct intron/exon structure increases. In a detailed analysis of the wheat genome space by Choulet et al., (this volume, Chap. 4) it was emphasized that an important intrinsic feature of eukaryote gene structure that impacts on annotation is the fragmentation level at the level of the number of exons per gene. Choulet et al., (Chap. 4) noted that as a CDS is fragmented into several exons, the difficulty in predicting the correct intron/exon structure increases, although in wheat, (RefSeq Annotation v2.1) the average number of exons per CDS is only 4, and some genes (up to 10%) can have up to 17 exons. In this chapter, we have assigned genes and QTL to the reference genome and utilized available annotations to significantly improve the value of the outputs as reference documentation to be used in wheat breeding. The alignment of traits to annotated genes in the reference genome provides their position and TraesIDs to define a framework for establishing more informative markers for selecting lines to be deployed in crosses as well as for tracking targeted traits in segregating progeny from crosses.

Integration of a range of datasets has been emphasized in this chapter in order to deal with the complexity of the wheat genome and generating robust associations between genome haplotypes and agronomic traits for selecting parents for crossing and accurately tracking progeny from crosses. Since only 17% of genes are single copies, most key agronomic traits are likely to be the product of gene network interactions involving genes/gene families distributed across the chromosomes of the A-, B-, and D-subgenomes and genome signatures (haplotypes).

The sequencing data generated from cultivated and wild wheats, natural and breeding populations, and mutants is enabling the discovery of genes underpinning important traits of breeding interest. This information is useful to further develop and deploy the diagnostic markers for use in wheat breeding. The wheat genome variation is very complex for downstream analysis; therefore, the data analytics platforms have been developed to visualize genome variations and expression in heatmaps, haplotype and geographic maps, and gene networks. We have provided an elucidated of gene frameworks discovered so far in wheat, and these need to be integrated with the thousands of QTL that have been discovered in wheat in different mapping populations and with many different marker platforms. The integration of wheat QTL information with genome visualization platforms for better understanding of gene networks and trait discovery is a key challenge.