Keywords

3.1 Development of Wheat Chromosome Genomics

The development of DNA sequencing technique by Sanger et al. (1977) marked the beginning of genomics with a prospect of obtaining complete genome sequences and studying entire genomes. The progress in DNA sequencing and genome assembly technologies, which followed the pioneering projects on small bacterial genomes (Fleischmann et al. 1995; Fraser et al. 1995), made it possible to deliver the first genome of a plant—Arabidopsis thaliana (Arabidopsis Genome Initiative 2000), followed by Oryza sativa (International Rice Genome Sequencing Project 2005). Together with the progress in human genome sequencing (Lander et al. 2001) these achievements stimulated the interest to produce genome sequence of hexaploid bread wheat (Triticum aestivum, 2n = 6x = 42), one of the three most important crops worldwide. This was a daunting task at that time given its genome size exceeding 15 Gb (IWGSC 2018), presence of three homoeologous genomes and high repeat content.

Despite the difficulties foreseen, participants of the workshop on wheat genome sequencing held in Washington DC in 2003 agreed on a need for a bread wheat genome sequence (Gill et al. 2004). Among available strategies, it was decided to explore the use of DNA libraries prepared from individual chromosomes and chromosome arms for the assembly of a global physical map and chromosome sequencing. As individual chromosomes and chromosome arms represent only about 4–6% and 1–3% of the bread wheat genome, respectively, dissecting the genome to chromosomes or even chromosome arms offered a dramatic and lossless reduction in DNA sample complexity to facilitate targeted development of DNA markers, gene mapping and cloning as well as genome sequencing. The chromosome-based approach avoided problems due to the presence of homoeologous DNA sequences and enabled a division of labor so that different groups could work on physical mapping and sequencing different chromosomes simultaneously (Gill et al. 2004). A principal condition for the application of this approach was the ability to purify particular chromosomes and chromosome arms in sufficient numbers (~103–106) so that enough DNA may be obtained. Until today, the only method suitable for this task is flow-cytometric sorting.

3.1.1 Flow Cytogenetics

Unlike microscopy, flow cytometry analyzes condensed mitotic metaphase chromosomes during their movement, one after another, in a narrow liquid stream. To distinguish this approach from microscopic analysis, the term flow cytogenetics has been coined. Prior to flow cytometry, chromosomes are stained by a DNA fluorochrome so that they can be classified according to relative DNA content. The analysis can be performed at rates of ~103 s so that large numbers of chromosomes can be interrogated to obtain statistically accurate data and potentially discriminate individual chromosomes. A histogram of DNA content thus obtained is termed flow karyotype, and ideally, each chromosome is represented by a well-discriminated peak. In fact, the extent to which the chromosome peak is discriminated from peaks of other chromosomes determined the purity in the sorted fraction, or the frequency of contaminating chromosomes in flow-sorted fraction. Not all flow cytometers are equipped by a sorting module, and only some are designed to physically separate (sort) microscopical particles with particular optical parameters. Gray et al. (1975a, b), Stubblefield et al. (1975) and Carrano et al. (1976) were the first to confirm that flow cytometry can be used not only to classify mammalian chromosomes according to DNA content, but also to sort them. These experiments paved the way to the use of flow-sorted chromosomes during the initial phases of human genome sequencing (Van Dilla and Deaven 1990).

The samples for flow cytometry must have a form of a concentrated suspension of intact chromosomes. In contrast to animals and human, their preparation in plants is hampered by low frequency of dividing mitotic cells and by the presence of a rigid cell wall. A successful approach has been to artificially induce cell cycle synchrony in root tips of hydroponically grown seedlings, accumulate dividing cells at mitotic metaphase and release intact chromosomes from formaldehyde-fixed root tips by mechanical homogenization. This high-yielding procedure was developed for faba bean (Doležel et al. 1992), and by optimizing it for wheat, Vrána et al. (2000) set a foundation for using flow-sorted chromosomes in wheat genomics (Figs. 3.1 and 3.2).

Fig. 3.1
figure 1

Major developments in wheat chromosomal genomics

Fig. 3.2
figure 2

Applications of wheat chromosomal resources. Depending on downstream application, flow-sorted chromosomes can be processed by two distinct approaches. For applications with high demand on DNA amount and contiguity, i.e., BAC libraries, optical mapping and TArgeted Chromosome-based Cloning via long-range Assembly (TACCA), high molecular weight (HMW) DNA is prepared by purifying chromosomes embedded in agarose plugs. Low molecular weight (LMW) DNA, to be used for short-read sequencing or DArT marker development (DNA microarrays), is obtained after treating chromosomal DNA in solution

3.1.2 Chromosome Sorting in Wheat

The study of Vrána and co-workers (Vrána et al. 2000) revealed that out of the 21 chromosomes of bread wheat, only chromosome 3B could be discriminated from other chromosomes and sorted at high purity (Fig. 3.3a). The remaining chromosomes formed three composite peaks on a flow karyotype, each of them representing three to ten chromosomes, which could be only sorted as groups. In order to determine chromosome content in the flow-sorted fractions, samples of ~103 chromosomes were sorted onto a microscopic slide and microscopically identified after fluorescence in situ hybridization with probes giving chromosome-specific labeling patterns (Fig. 3.3e; Kubaláková et al. 2002). The study of Vrána et al. (2000) indicated the suitability of chromosomal stocks with altered chromosome sizes for purification of other chromosomes than 3B. In two cultivars of wheat, the authors identified and sorted translocation chromosome 5BL.7BL, which is larger than chromosome 3B (Fig. 3.3c). A subsequent study of Kubaláková et al. (2002) confirmed the potential of cytogenetic stocks. The most important observation concerned the ability to sort any single chromosome arm, either in the form of a telosome or isochromosome. As almost all telosomic lines were developed in the background of cv. CHINESE SPRING (Sears and Sears 1978), their use offered a possibility to analyze the wheat genome chromosome-by -chromosome. In 13 double-ditelosomic lines, both chromosome arms could be discriminated and sorted simultaneously (Fig. 3.3b), saving time to collect DNA from both arms (Doležel et al. 2012).

Fig. 3.3
figure 3

Flow karyotyping of bread wheat. Histograms of relative DAPI fluorescence intensities representing chromosomes of varying sizes are termed flow karyotypes. a Flow karyotype of cv. CHINESE SPRING consists of three composite peaks, harboring 3, 7 and 10 chromosomes, respectively, and a standalone peak representing the largest wheat chromosome 3B. b Flow karyotype of 7D double ditelosomic line, where both the long and the short arm of chromosome 7D are discriminated and can be sorted simultaneously. c The translocated chromosome 5BL.7BL, present in cv. ARINA and some other cultivars, is the largest one in the karyotype and can be sorted with a high purity. d Standard monoparametric flow karyotype of cultivar CERTO, where three chromosomes from composite peak III—2A, 2B and 6B—form a defined but still unresolvable sub-population. e Bivariate flow karyotype of the same cultivar, where the difference in relative abundance of GAA repeat motif allows further discrimination of these chromosomes and results in well-defined populations containing a single chromosome type each. The chromosome 2B, shown in the inset, can be sorted with purity exceeding 85%. For the purity check, FISH was done with probes for GAA (green) and Afa repeats (red)

While this advance made chromosome flow sorting technology ready to support various genomics analyses in bread wheat (Fig. 3.2), including genome sequencing, its dependence on cytogenetic stocks limited its potential for marker development and gene cloning in other wheat genotypes. To overcome this obstacle, Giorgi et al. (2013) developed a protocol for fluorescent labeling repetitive DNA of chromosomes using fluorescence in situ hybridization in suspension (FISHIS). Chromosome classification based on two fluorescence parameters: DNA (after staining by a DNA fluorochrome) and fluorescence of regions containing DNA repeats (typically GAA microsatellites) labeled by FITC enabled discrimination of chromosomes with the same or very similar DNA content from each other. Depending on genotype, bivariate flow karyotyping after FISHIS typically allows discrimination of ~13 out of 21 wheat chromosomes (Fig. 3.3d, e) and provides to date the most powerful approach to dissect the wheat genome to single chromosomes.

If the FISHIS procedure of Giorgi et al. (2013) is not compatible with a downstream application of sorted chromosomes and, at the same time, appropriate cytogenetic stocks are not available, the option is to partition composite peaks as observed on monovariate flow karyotypes (Fig. 3.3a) (Vrána et al. 2015). Although this approach does not allow discrimination and sorting of single chromosomes, it is suitable for obtaining sub-genomic fractions comprising only a few chromosomes, with one of them being more abundant. Vrána et al. (2015) calculated a so-called enrichment factor defined as the relative proportion of chromosomal DNA in the wheat genome to the proportion of chromosomal DNA in a sorted fraction and found that a fivefold enrichment was obtained for 17 out of 21 wheat chromosomes. Importantly, subgenomic fractions for 15 out of the 21 chromosomes were not contaminated by homoeologs.

3.1.3 Sorting Chromosomes of Wild Wheat Relatives

The method for flow-cytometric chromosome analysis and sorting, originally developed for hexaploid bread wheat and subsequently modified for tetraploid durum wheat Triticum turgidum Desf. var. durum, 2n = 4x = 28 (Kubaláková et al. 2005) was also found to be suitable to sort chromosomes from their wild relatives. In fact, two options were explored. One involved sorting chromosomes from alien chromosome introgression lines of wheat. The samples are prepared from synchronized wheat root tips and, if the alien chromosome can be discriminated on a flow karyotype, it may be sorted (Molnár et al. 2011, 2015; Zwyrtková et al. 2022). In a similar manner, wheat chromosomes carrying introgressions from wild relatives can be purified (Tiwari et al. 2014; Janáková et al. 2019; Bansal et al. 2020). Second and straightforward option is to sort chromosomes directly from wild relatives. Thus, the protocol of Vrána et al. (2000) for wheat has been optimized for a variety of species from Aegilops, Agropyron and Haynaldia (Dasypyrum) genera (summarized in Doležel et al. 2021). While in some of them (like Aegilops comosa), all chromosomes may be discriminated and sorted (Said et al. 2021), in majority of species (including Aegilops geniculata, Aegilops biuncialis, Aegilops cylindrica, Haynaldia villosa, Agropyron cristatum and others) their chromosomes can only be sorted in groups of two to five (Molnár et al. 2011, 2015; Grosso et al. 2012; Said et al. 2019). As in case of wheat, fluorescent labeling of chromosomes by FISHIS prior to flow cytometry increased the number of chromosomes that could be discriminated and sorted. Availability of separated chromosomes of the relatives enabled comparative studies with the bread wheat genome (Molnár et al. 2014, 2016) and have been applied to support cloning of genes from the tertiary gene pool (see Sect. 3.5.1).

3.2 Toward Bread Wheat Reference Genome

Need for a quality bread wheat genome that would provide access to the complete gene catalogue, an unlimited amount of molecular markers to support genome-based selection of new varieties and a framework for the efficient exploitation of natural and induced genetic diversity (Choulet et al. 2014a) stimulated the establishment of the International Wheat Genome Sequencing Consortium, a collaborative platform launched in 2005 (https://www.wheatgenome.org). By that time, a proven strategy to obtaining high-quality reference sequences of large genomes was the clone-by-clone approach, i.e., sequencing clones from large-insert DNA libraries ordered in physical maps. These constituted a technology-neutral resource for accessing complex genomes, enabling possible resequencing of the ordered clones by more advanced technologies. Considering the ability to dissect the wheat genome to individual chromosomes or chromosome arms (Vrána et al. 2000; Kubaláková et al. 2002), and after confirming the feasibility of constructing large-insert DNA libraries from the flow-sorted chromosomes (Šafář et al. 2004; Janda et al. 2004), the Consortium settled on coupling the chromosome purification with the clone-by-clone strategy and producing clone-based physical maps of individual wheat chromosomes that would allow the engagement of multiple teams in the challenging sequencing effort.

3.2.1 Generation of Chromosomal BAC Resources

The prerequisite of the proposed strategy was the ability to separate by flow sorting each of bread wheat chromosomes or chromosome arms. This was only possible in cultivar CHINESE SPRING (CS), for which a complete set of telosomic lines, essential to sort the chromosome arms, was available (Sears and Sears 1978), predestining the cultivar to become the reference genome of bread wheat. The primary resource needed to construct a clone-based physical map is a large-insert genomic DNA library, commonly cloned in the bacterial artificial chromosome (BAC) vector, typically bearing inserts of 100–200 kb. To generate a library of these parameters, several micrograms of high molecular weight (HMW) DNA are needed. Achieving this from the flow-sorted material involved the elaboration of a customized protocol (Šimková et al. 2003) including DNA preparation in agarose plugs (Fig. 3.2), which enabled cumulating samples from multiple sorting days. Based on this advance, Šafář et al. (2004) constructed the first-ever chromosome-specific BAC library in a eukaryotic organism. The library, prepared from two million 3B chromosomes flow-sorted over 18 working days, comprised 67,968 clones with 103 kb average insert size, representing 6.2 equivalents of the chromosome 3B, whose molecular size is close to one gigabase. Further improvements in the procedure permitted the construction of BAC libraries with chromosome coverage up to 18 × and average insert size exceeding 120 Kb (https://olomouc.ueb.cas.cz/en/resources/dna-libraries (Šafář et al. 2010; Table 3.1 and references therein). The effort toward preparing the full set of CS libraries for the chromosomal physical maps lasted over ten years and was completed in the end of 2013 (Fig. 3.1). Individual clones and BAC libraries used to construct chromosome-specific physical maps are publicly available and can be obtained at https://cnrgv.toulouse.inrae.fr/en/Library/Wheat. Besides the ‘CHINESE SPRING’ BAC libraries generated for the reference genome project, several customized chromosomal libraries from other cultivars were created for the purpose of gene cloning projects, including 3B-specific library from cv. HOPE (Mago et al. 2014) and a BAC library from 4AL arm of cv. TÄHTI, bearing an introgressed segment of Triticum militinae (Janáková et al. 2019) (Table 3.1).

Table 3.1 Wheat chromosomal BAC resources

Upon their construction, the CS libraries were distributed among national teams engaged in the IWGSC effort who embarked on constructing physical maps. In a proof-of-concept experiment, Paux and co-workers (2008) generated the first chromosomal physical map from chromosome 3B, employing SNaPShot-based High Information Content Fingerprinting (HICF) technology (Luo et al. 2003) to generate fingerprints and FingerPrinted Contig (FPC) software to assemble the physical map and select minimal tiling path (MTP) for sequencing. This achievement validated the feasibility of constructing sequence-ready physical maps of hexaploid wheat by the chromosome-by-chromosome approach and the strategy was subsequently followed for other chromosome arms (Table 3.1; IWGSC 2018). As alternative procedures, Whole Genome Profiling (WGP, van Oeveren et al. 2011) was applied for BAC fingerprinting in several projects and Linear Topological Contig (LTC, Frenkel et al. 2010) software was developed and utilized for map assembly and validation. Procedures applied for individual chromosomes/arms are summarized in IWGSC 2018. The resulting chromosomal physical maps are available at https://urgi.versailles.inra.fr/download/iwgsc/Physical_maps/ and displayable at https://urgi.versailles.inra.fr/gb2/gbrowse/wheat_phys_pub/. In addition to the construction of physical maps for several chromosomes, the WGP technology was utilized to profile MTP clones identified from chromosome physical maps constructed previously by the HICF procedure. Thus generated WGP tags of all 21 wheat chromosomes were used to support the assembly of the IWGSC RefSeq v1.0 genome and are available for download from IWGSC-BayerCropScience WGP™ tags https://urgi.versailles.inra.fr/download/iwgsc/IWGSC_BayerCropScience_WGPTM_tags.

3.2.2 BAC Clone Sequencing

Availability of BAC clones ordered in chromosomal physical maps opened avenue to systematic analyses of bread wheat genome and its selected parts. The early studies, based on sequencing ends of BAC clones by Sanger technology, provided first insights into gene and repeat content of particular chromosomes, enabled comparative analyses of homoeologous chromosomes and delivered information for targeted marker development (Paux et al. 2006; Sehgal et al. 2012; Lucas et al. 2012).

Later studies, employing next-generation sequencing of whole BAC contigs, provided more comprehensive information about organization of genes and transposable elements (TEs). Choulet et al. (2010) sequenced and annotated 13 BAC contigs, totaling 18 Mb sequence, selected from different regions of the 3B chromosome and revealed that genes were present along the entire chromosome and clustered mainly into numerous small islands of 3–4 genes separated by large blocks of repetitive elements. They observed that wheat genome expansion had occurred homogeneously along the chromosome through specific bursts of TEs. Bartoš et al. (2012), after sequencing a megabase-sized region from wheat arm 3DS and comparing it with the homoeologous region on wheat chromosome 3B, revealed similar rates of non-collinear gene insertion in wheat B and D subgenomes with a majority of gene duplications occurring before their divergence. Li et al. (2013) provided valuable information about the structure of wheat centromeres. Analyzing 1.1-Mb region from the centromere of chromosome 3B, they revealed that 96% of the DNA consisted of TEs. The youngest elements, CRW and Quinta, were targeted by the centromere-specific histone H3 variant CENH3—the marker of the functional centromere. In contrast to the TEs, long arrays of satellite repeats found in the region were not associated with CENH3. Several other studies employing sequencing of BAC contigs focused on analysis of narrow regions comprising their genes of interest (Breen et al. 2010; Mago et al. 2014; Janáková et al. 2019; Tulpová et al. 2019b).

Although these studies markedly advanced the knowledge on bread wheat genome, the major breakthrough came only with the generation of chromosome-scale sequence assemblies. Choulet and co-workers (2014b) produced a BAC-based reference sequence of the largest bread wheat chromosome—3B. After sequencing 8452 BAC clones, representing the 3B MTP, the authors assembled a sequence of 833 Mb split in 2808 scaffolds, 1358 of which, containing 774 Mb sequence, had known position on the chromosome. The assembly comprised 5326 protein-coding genes, 1938 pseudogenes and 85% of transposable elements. Most interestingly, the distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses with other grass genomes indicated high wheat-specific inter- and intrachromosomal gene duplication activities that were postulated to be sources of variability for adaption. As a contribution to the IWGSC sequencing effort, sequence assemblies of BAC clones representing complete or partial MTPs of seven chromosomes and two chromosome arms were produced (Table 3.1 and references therein; IWGSC 2018) and are publicly available at https://urgi.versailles.inrae.fr/download/iwgsc/BAC_Assemblies/. These assemblies, complemented by information from chromosomal physical maps, and—for group 7 chromosomes—also chromosomal optical maps, were applied to support the assembly of the bread wheat reference genome, IWGSC RefSeq v1.0 (IWGSC 2018), as described in Chap. 2.

It is clear nowadays that the whole-genome-shotgun became the predominant approach to sequencing, even for large polyploid genomes. Still, the generated wheat chromosomal physical maps and BAC clones integrated therein remain a valuable genomic resource for bread wheat, enabling a fast access to and a detailed analysis of a region of interest. The availability of BAC clones with a known genomic position facilitated a focused and affordable resequencing of a region of interest with long-read technologies, revealing discrepancies and missing segments in the previously generated bread wheat assemblies (Kapustová et al. 2019; Tulpová et al. 2019b).

3.3 Chromosome Survey Sequencing

While the generation of the full set of chromosomal libraries, physical maps and BAC clone sequences proved to be a long-distance run, the requirement for homoeolog-resolved wheat genome information was increasing over time. Apparently, this demand could be met by low-pass chromosome sequencing, which would provide approximate information about the genic component of individual chromosomes. The separation of each bread wheat chromosome or chromosome arm was, in principle, feasible but the yield of flow-sorted chromosomes, typically 1–2 × 105 per sorting day, did not meet the demands of the early sequencing technologies on the DNA input, which was in the microgram range. Coupling of chromosome flow sorting with multiple-displacement amplification (MDA) of the chromosomal DNA, originally developed for physical mapping on DNA microarrays (Šimková et al. 2008), opened the door to shotgun sequencing of cereal chromosomes one-by-one. Wheat genome researchers adopted the strategy of chromosome survey sequencing (CSS) developed for barley (Mayer et al. 2009, 2011). In barley, low-coverage (1–3×) chromosomal data, obtained by 454 sequencing, were compared with reference genomes of rice, sorghum and Brachypodium, and EST or full-length-cDNA datasets, which led to the estimation of gene content for each of the barley chromosomes. Moreover, an integration of the shotgun sequence information with the collinear gene order of orthologous rice, sorghum and Brachypodium genes allowed proposing virtual gene order maps of individual chromosomes. The syntenic integration, known as genome zipper, resolved gene order in regions with limited genetic resolution, such as genetic centromeres, which were intractable to genetic mapping.

The first experiments with the CSS in bread wheat were done to compare chromosome arms of homoeologous group 1 (Wicker et al. 2011), and it methodologically followed the barley model, employing the low-pass 454 sequencing. The study revealed that all three wheat subgenomes had similar sets of genes that were syntenic with the model grass genomes but the number of genic sequences in non-syntenic positions outnumbered that of the syntenic ones. Further analysis indicated that a large proportion of the genes that were found in only one of the three homoeologous wheat chromosomes were most probably pseudogenes resulting from transposon activity and double-strand break repair. These findings were supported by a study of Akhunov et al. (2013) who, working with CSSs of both arms of chromosome 3A, found that ~35% of genes had experienced structural rearrangements leading to a variety of mis-sense and non-sense mutations—a finding concordant with other studies indicating ongoing pseudogenization of the bread wheat genome. Another focus of the CSS studies was the evolutionary rearrangement of wheat chromosomes. Hernandez et al. (2012) analyzed bread wheat chromosome 4A, which has undergone a major series of evolutionary rearrangements. Using the genome zipper approach, the authors produced an ordered gene map of chromosome 4A, embracing ~85% of its total gene content, which enabled precise localization of the various translocation and inversion breakpoints on chromosome 4A that differentiate it from its progenitor chromosome in the A-subgenome diploid donor.

In contrast to the above studies, Berkman and co-workers, aiming to shotgun sequence wheat 7DS arm, favored the use of the more cost-efficient Illumina technology and compensated its short reads (75–100 bp) by higher sequencing coverage (34×), which allowed a partial assembly of the reads and capture of ~40% of the sequence content of the chromosome arm (Berkman et al. 2011). Using the same technology, the team proceeded with sequencing the 7BS arm (Berkman et al. 2012) and supplemented the 4A study by delimiting the 7BS segment that was involved in the reciprocal translocation that gave rise to the modern 4A chromosome. After extending the sequencing effort to all group7 homoeologs (Berkman et al. 2013), the team compared the sequences and concluded that there had been more gene loss in 7A and 7B than in 7D chromosome. Chromosome survey sequences of additional chromosomes/arms followed and were mostly utilized in estimating gene and repeat content of particular chromosomes (Vitulo et al. 2011; Tanaka et al. 2014; Sergeeva et al. 2014; Helguera et al. 2015; Garbus et al. 2015; Kaur et al. 2019), synteny-based ordering of arising clone-based physical maps (Lucas et al. 2013), identifying miRNA-coding sequences (Vitulo et al. 2011; Kantar et al. 2012; Deng et al. 2014; Tanaka et al. 2014) and delimiting linage-specific translocations (Lucas et al. 2014). Utilization of the chromosome sequencing for gene mapping and cloning is described further in Sect. 3.5.1.

The chromosome survey sequencing in bread wheat has been crowned by a joint effort coordinated by the IWGSC, which exploited the existing Illumina-based CSSs and complemented them by newly produced Illumina data for the remaining chromosomes. The sequences were applied to generate draft assemblies and genome zippers for all wheat chromosomes (IWGSC 2014). As a result, a total of 124,201 gene loci were annotated and more than 75,000 genes were positioned along chromosomes. The IWGSC team anchored more than 3.6 million marker loci to chromosome sequences, uncovered the molecular organization of the three subgenomes and described patterns in gene expression across the subgenomes. The study also provided new insights into the phylogeny of hexaploid bread wheat, which was elaborated in detail in an accompanying study of Marcussen et al. (2014). Moreover, this new wheat genome information was used as a reference to analyze the cell type-specific expression of homoeologous genes in the developing wheat grain (Pfeifer et al. 2014).

The technique of chromosome survey sequencing soon expanded beyond the cultivated crop and was successfully applied to explore individual chromosomes or whole genomes of close wheat relatives, such as Aegilops tauschii (Akpinar et al. 2015a) and Triticum dicoccoides (Akpinar et al. 2015c; 2018), and even species from the tertiary gene pool, including Ae. geniculata (Tiwari et al. 2015), H. villosa (Xiao et al. 2017), Ae. comosa, Aegilops umbellulata (Said et al. 2021) and A. cristatum (Zwyrtková et al. 2022). These studies informed about the chromosome gene content and organization, enabling comparative studies important for gene transfer from the wild species to the crop as well as identifying the sequences enabling marker development for tracing introgressions in wheat. Specific examples are provided in Sect. 3.5.1 and Table 3.2.

Table 3.2 Leveraging wheat chromosomal resources in gene mapping and cloning

3.4 Optical Mapping

Extensive experience with preparing quality HMW DNA from flow-sorted chromosomes paved the way to establish a new branch of wheat chromosomal genomics—chromosome optical mapping (OM). The OM technology, commercialized by Bionano Genomics and therefore also known as Bionano genome mapping, is a physical mapping technique based on labeling and imaging short sequence motives along 150 kb to 1 Mb long DNA molecules (Lam et al. 2012). Resulting restriction maps, assembled from high-coverage single-molecule data, are composed of contigs up to > 100 Mb in size, which are instrumental in finishing steps of genome assemblies by enabling contig scaffolding, gap sizing and assembly validation. The optical maps also provided a high-resolution and cost-effective tool for comparative structural genomics.

Staňková et al. (2016) demonstrated the feasibility of generating optical maps from DNA of flow-sorted chromosomes and constructed the first-ever optical map for the bread wheat genome. Using 1.6 million flow-sorted 7DS chromosome arms and the first-generation platform of Bionano Genomics, the authors prepared a map consisting of 371 contigs with N50 of 1.3 Mb, which supported a physical-map and a BAC-based sequence assembly of the chromosome arm (Tulpová et al. 2019a). Applied in a gene cloning project, the OM posed a targeted tool for sequence validation and analysis of structural variability in a region of interest (Tulpová et al. 2019b). Similar maps have been constructed for other group-7 chromosome arms and were used in the process of assembling the wheat reference genome (IWGSC 2018), as well as a complementary BAC-based assembly of chromosome 7A (Keeble-Gagnère et al. 2018).

Another set of chromosomal optical maps was prepared from chromosome arms 1AS, 1BS, 6BS and 5DS, the last being generated on the second-generation platform of Bionano Genomics, with the aim to position and characterize 45S rDNA loci located on those arms. The chromosome-based approach applied in the rDNA project enabled analyzing the loci one-by-one and provided more comprehensive information about individual loci than achieved in long-read bread wheat assemblies (Tulpová et al. 2022).

3.5 Gene Mapping and Cloning

In parallel with the chromosome sequencing efforts, the wheat community started exploiting flow-sorted chromosomes for targeted marker development, aiming to generate a high-density map in a region of interest and, possibly, clone a gene by a map-based approach. This conventional strategy was later complemented by new methods of ‘rapid gene cloning’ (reviewed in Bettgenhaeuser and Krattinger, 2019). Some of these still capitalize on the complexity reduction by chromosome flow sorting but they avoid the lengthy step of marker development and map saturation while employing mutation genetics and comprehensive sequencing techniques to assemble a highly contiguous sequence for the chromosome of interest.

3.5.1 Marker Development and Map-Based Gene Cloning

The first effort toward massive marker development from a selected chromosome or chromosome arm was bound with the microarray platform of Diversity Array Technologies, able to identify and utilize polymorphic DNA markers without knowledge of the underlying sequence (Jaccoud et al. 2001). Wenzl et al. (2010) demonstrated that a chromosome-enriched DArT array could be developed from only a few nanograms of chromosomal DNA. Of 711 polymorphic markers derived from non-amplified DNA of bread wheat chromosome 3B, 553 (78%) mapped to the chromosome, and even higher efficiency (87%) was observed for the short arm of bread wheat chromosome 1B (1BS).

Before the availability of wheat chromosomal survey sequences, researchers aiming to develop new markers for their locus of interest mined data from sequenced genomes of model grasses, mainly rice, Brachypodium and sorghum. Efficiency of this synteny-based approach was compromised by limitations in designing gene-derived primers with sufficient specificity to distinguish homoeologous genes in polyploid wheat. Amplified DNA from individual wheat chromosome arms used as a template for locus-specific PCR and subsequent amplicon sequencing, significantly increased the efficiency of the procedure and the facilitated targeted generation of gene-associated SNP markers in a time- and cost-effective manner (Jakobson et al. 2012; Michalak de Jimenez et al. 2013; Terracciano et al. 2013; Staňková et al. 2015). Additionally, particular chromosomal arms used as a PCR template were applied to validate specificity of the newly designed markers (Staňková et al. 2015; Janáková et al. 2019).

Advancement in marker development came along with the release of ‘CHINESE SPRING’ CSSs and genome zippers that informed about putative gene content and order in the region of interest in the reference genome. Nevertheless, studies comparing shotgun sequences of CS chromosomes with those of other wheat accessions revealed extensive intra- and interchromosomal rearrangements in CS (Ma et al. 2014, 2015; Liu et al. 2016), implying limitations in the transferability of data from the wheat reference to other genomes. Moreover, it became obvious that agronomically important traits were frequently controlled by rare, genotype-specific alleles or had even been introgressed to wheat from its relatives. Under such scenario, genetic maps had to be created from a mapping population derived from a donor of the trait and sequence information from the donor was essential for marker development. As a proof-of-concept experiment, Shatalina et al. (2013) generated tenfold coverage of Illumina data from chromosome 3B isolated from wheat cultivars ARINA and FORNO—the parents of their mapping population. Relying on a synteny with the Brachypodium genome, they identified sequences close to coding regions and used them to develop 70 SNP markers, which were found dispersed over the entire 3B chromosome and contributed to fourfold increase in the number of available markers. The new markers were utilized for mapping a QTL conferring resistance to Stagonospora nodorum glume blotch located on 3BS (Shatalina et al. 2014). Chromosome sequencing was then applied by other groups to fine-map Yellow Early Senescence 1 (Harrington et al. 2019), leaf rust resistance gene Lr49 (Nsabiyera et al. 2020) and powdery mildew resistance gene Pm1 (Hewitt et al. 2021).

The procedure was also adopted to develop markers in species from wheat tertiary gene pool, such as Ae. geniculata (Tiwari et al. 2014) and H. villosa (Wang et al. 2017; Zhang et al. 2021), with the aim to trace the alien chromatin in the wheat background. For this purpose, the method was refined by Abrouk et al. (2017) who developed an in silico pipeline termed Rearrangement Identification and Characterization (RICh). To delimit a segment transferred from T. militinae to the long arm of chromosome 4A of bread wheat cv. TÄHTI, the authors generated a virtual gene order of ‘TÄHTI’ chromosome 4A. Comparison of homoeologous gene density between 4AL arm of CS and the arm with the introgression, which harbored powdery mildew resistance locus QPm.tut-4A, identified alien chromatin with 169 putative genes originating from T. militinae. A similar approach was used by Bansal et al. (2020) to fine-map leaf rust and stripe rust resistance genes Lr76 and Yr70 introduced from Ae. umbellulata. The authors sequenced flow-sorted chromosomes 5U from Ae. umbellulata, 5D from a bread wheat-Ae. umbellulata introgression line and 5D from the recurrent parent. Sequencing reads were explored with the aim to identify introgression-specific SNP markers whose projection on the IWGSC RefSeq v1.0 sequence (IWGSC 2018) delimited the introgression to a 9.47 Mb region, in which candidates for Lr76 and Yr70 genes were identified. Konkin et al. (2022), streaming to identify genes for resistances to several fungal pathogenes, including fusarium head blight, sequenced 7EL telosome, originated from Thinopyrum elongatum and existing as addition in CS wheat. They thus built a reference for comparative transcriptome analysis between CS and CS-7EL addition line, which resulted in a list of candidate genes for the resistance.

Alongside the wheat chromosomal survey sequences, emerging BAC assemblies from individual chromosomes of ‘CHINESE SPRING,’ just as customized chromosomal BAC libraries from other cultivars showed instrumental in gene cloning projects. Šimková et al. (2011) demonstrated that BAC libraries constructed from chromosome arms 7DS and 7DL, consisting of tens of thousands BAC clones, were highly representative and easy to screen, which facilitated fast chromosome walking in a region of green bug resistance gene Gb3 in 7DL. The 7DS BAC library was screened for markers tightly linked to a Russian wheat aphid resistance locus Dn2401 (Staňková et al. 2015) and a BAC contig spanning the locus was identified in a 7DS physical map (Tulpová et al. 2019a). BAC clones from 0.83 cM interval, delimited by Dn2401-flanking markers, were sequenced by combination of short Illumina and long nanopore reads and the resulting sequence assembly, validated by optical mapping of the 7DS arm (Staňková et al. 2016), revealed six high-confidence genes. Comparison of 7DS-specific optical maps prepared from susceptible cv. CHINESE SPRING and resistant line CI2401 revealed structural variation in proximity of Epoxide hydrolase 2, which gave support to the gene as the most likely Dn2401 candidate (Tulpová et al. 2019b). Similarly, a BAC library and physical map of CS 4A chromosome were used to approach and analyse pre-harvest sprouting resistance locus Phs-A1, which revealed a causal role of TaMKK3-A for the trait (Shorinola et al. 2017). Customized BAC libraries constructed from 3B chromosome of cv. HOPE and 4AL telosome bearing introgressed segment of T. militinae were utilized to clone stem rust resistance gene Sr2 (Mago et al. 2014) and to approach powdery mildew resistance locus Qpm.tut-4A (Janáková et al. 2019), respectively.

3.5.2 Contemporary Approaches

The completion and release of the ‘CHINESE SPRING’ reference genome (IWGSC 2018) in hand with rapid technological advancements, allowing resequencing and large-scale pan-genome projects even in a crop with a complex polyploid genome, revolutionized strategies of gene cloning in bread wheat. Whole-genome long-read sequencing, resulting in high-quality sequence with resolved gene duplications, became realistic for wheat but challenges of producing, handling and analyzing the big data still appear too high for the majority of wheat gene cloning projects. Apart from the WGS and pan-genome efforts, several approaches to rapid gene cloning have been developed (Bettgenhaeuser and Krattinger 2019, and Chap. 10 of this book), including several utilizing the complexity reduction by chromosome flow sorting. Among them, Mutant Chromosome Sequencing (MutChromSeq; Sánchez-Martín et al. 2016) and TArgeted Chromosome-based Cloning via long-range Assembly (TACCA; Thind et al. 2017) have been used most widely. As indicated by the acronym, the former method couples chromosome flow sorting and sequencing with reference-free forward genetics. A chromosome bearing the gene of interest is Illumina-sequenced from both wild type and several independent ethyl methanesulfonate (EMS) mutants and the sequences are compared. A candidate gene is identified based on overlapping mutations in a genic region. The feasibility and efficiency of the method were first demonstrated by re-cloning barley Eceriferum-q gene and by de novo cloning wheat powdery mildew resistance gene Pm2 (Sánchez-Martín et al. 2016). This speedy, cost-efficient approach to gene cloning generated a lot of interest in both wheat and barley community (reviewed in Steuernagel et al. 2017). It was successfully applied to identify the semi-dwarfism locus Rht18 in T. durum (Ford et al. 2018) and the SuSr-D1 gene that suppresses resistance to stem rust in bread wheat (Hiebert et al. 2020). Moreover, it contributed to cloning the race-specific leaf rust resistance gene Lr14a (Kolodziej et al. 2021) and the powdery mildew resistance gene Pm4 (Sánchez-Martín et al. 2021) from hexaploid wheat.

MutChromSeq is a method of choice for traits with a strong phenotype, for which the production of independent mutants is feasible. As an alternative, suitable for any phenotype, Thind et al. (2017) proposed a procedure based on producing a high-quality de novo assembly of the gene-bearing chromosome and named it TACCA. The procedure utilized the so-called Chicago mapping technique (Putnam et al. 2016) developed by Dovetail Genomics. To clone leaf rust resistance gene Lr22a, the authors flow-sorted and Illumina-sequenced wheat chromosome 2D from resistant line CH CAMPALA Lr22a. The resulting sequences were scaffolded with Chicago long-range linkage. The assembly comprised 10,344 scaffolds with an N50 of 9.76 Mb and with the longest scaffold of 36.4 Mb. The high contiguity of the chromosomal assembly significantly reduced the number of markers needed to delimit the gene in a narrow interval and, complemented by information from EMS mutants, allowed rapid cloning of this broad-spectrum resistance gene. The TACCA approach was also applied by Xing et al. (2018) to clone powdery mildew resistance gene Pm21, introduced to bread wheat from H. villosa chromosome 6V. Besides, the quality chromosomal assemblies generated by long-range linkage were used for comparative analyses with chromosomes of the wheat reference genome (Thind et al. 2018; Xing et al. 2021).

3.6 Conclusions and Perspectives

Since its establishment in 2000, flow-cytometric chromosome sorting contributed to major achievements in bread wheat genomics, including the generation of the wheat reference genome. Due to the rapid advancements in next-generation sequencing technologies, the reduction of genome complexity is no more essential in the context of whole-genome sequencing, but remains beneficial in gene cloning projects that call for a high-quality sequence from a narrow region of the genome. This demand was met in coupling chromosome sorting with the long-range linkage method, which resulted in contiguous chromosome assemblies. Since Dovetail Genomics discontinued the Chicago method, other approaches need to be developed to satisfy the demand of the wheat community. Long-read sequencing technologies, such as PacBio or nanopore sequencing, appear to be the logical tools for achieving the goal but to make them compatible with the flow-sorted material, challenges relating to inherent features of the flow sorting technique—formaldehyde fixation and a high laboriousness of producing large DNA amounts—still need to be resolved. Low-input protocols, being developed by the sequencing companies, go toward this demand.