Background

The number and morphology of chromosomes describe a karyotype, which is a fundamental characteristic of all organisms [1]. Karyotype analysis, an important technique for chromosome examination and genetic background screening, is widely used in prenatal diagnosis [2], species evolution analysis [3], chromosome engineering [4], etc. The traditional method of chromosome identification compares chromosome morphology, including length, arm ratio, and secondary constriction position [5]. However, chromosomal morphology may vary depending on material processing methods or cell cycle, with accurate results not being obtained [6]. Researchers resorted to staining the chromosomes with chemicals to identify them by their constant striation features using the chromosomal banding technique [7]. However, due to the tight superhelix structure of chromosomes in some plants, the bands were not obvious or absent after staining, making this method not applicable universally [8].

The development of the fluorescence in situ hybridization (FISH) technique marked the transition from the classical cytogenetics era to the modern molecular cytogenetics era [9]. FISH hybridizes fluorescent-labeled nucleic acid fragments (probes) to denatured genomic DNA based on the principle of complementary base pairing, and identifies chromosomes by detecting the number and arrangement of signals on them using a fluorescence microscope. The widely used techniques include (1) genome in situ hybridization (GISH), which uses whole genome sequences as probes to distinguish different chromosomal sets [10, 11], (2) Multi-color FISH, which uses the characteristics of high copies and special distribution on chromosomes of repeated sequences combined with polychromatic probes to identify chromosomes [12], (3) BAC (bacterial artificial chromosome)-FISH, where large DNA fragment cloning vectors marked as probes are used to identify chromosomes [13, 14], and (4) Oligo-FISH, where chromosome-specific oligonucleotides are first designed based on a reference genome. Oligo-FISH requires no extensive library screening, is flexible in design, and is also not limited to special regions of chromosomes (such as telomeres, centromeres, and rDNA sites). It also offers several advantages compared to traditionally prepared probes, including consistent probe quality and less time for probe preparation [15, 16]. Notably, whole-chromosome oligo-FISH paints using synthetic oligonucleotide libraries can be applied to visualization of simple or complex chromosomal aberrations, establishment of chromosomal domains, illustration of mitotic and meiosis behavior, and providing of insights into chromosomal relationships for genetically diverse lines [17].

However, the application of the FISH techniques is mainly limited by the lack of robust DNA probes in most plant species, especially non-model plants [9, 18]. Besides, a series of technical challenges were encountered in FISH, including the hardness of the plant cell wall and the density of the microsporocyte’s cytoplasm, hampering the accessibility of the probes to the chromosomes [19]. For example, the development of karyotypes for Brassica was challenging due to their small chromosome size and the lack of distinct karyological features in the metaphase chromosomes [13, 20]. Thus, although great progress has been achieved on FISH, there are still many disadvantages, including only a few plants having chromosome identification protocols, special tissues at special growth periods being required as materials, the experimental process being time-consuming, consumables being expensive, the testing equipment having high requirements, aneuploids and polyploids having too many chromosomes to spread out during tissue preparation, specific information about the variations cannot be provided, etc. [9, 18, 21].

Compared to FISH, molecular karyotyping has better resolution, a higher degree of automation, and a faster detection cycle. For example, chromosomal microarray analysis (CMA) involves molecular hybridization of a labeled sample with DNA probes covering important segments of the chromosome, and analysis of the hybridization signals yielding the molecular number and sequence information of the sample [22]. However, due to the limitation of microarray design, the copy number variation (CNV) of uncovered genomic regions on the platform cannot be detected. The application field of CMA is mostly prenatal diagnosis and is yet to be popularized in plants.

Plant scientists face multiple challenges, particularly those working on crop improvement and breeding. Advances in genome sequencing and resequencing play a role in meeting these challenges [23]. By the end of 2020, 1031 genomes of 788 different plant species were sequenced and published, of which 360 species have genomes assembled to the chromosome level [24]. These data provide a great choice for obtaining the copy numbers of each chromosome using whole genome resequencing to make molecular karyotypes by aligning to the genome at the chromosome level [25, 26]. Its advantages are as follows: DNA can be extracted from any material; low coverage resequencing is cheap; results can be obtained from a computer in half an hour; specific variation positions can be obtained; the analysis process is easy to repeat; and samples can be analyzed in batches.

However, published methods infer chromosomal copy numbers from standardized or normalized read counts combined with statistical tests and require a control sample [25, 26], thus presenting a significant challenge for wet-lab biologists. We assumed that the copy number of chromosomes can be inferred from large CNVs. According to Smolander et al. ‘s evaluation, BIC-seq2 and FREEC are the two best-performing tools for identifying large CNVs from low coverage whole-genome resequencing (lcWGR) data [27]. Because FREEC has a much shorter runtime (~ 3min) than BIC-seq2 (> 3 h), and contains ploidy setting parameters, it is reasonable to assume that FREEC is the most suitable tool for the CNV analysis in both aneuploid and polyploid plants. If these aneuploid or polyploid plants are newly generated hybrids, it is necessary to analyze their genome constitutes. Published pipelines such as VcfHunter (https://github.com/SouthGreenPlatform/VcfHunter) use large populations to study the evolution and domestication of crops hundreds or even millions of years ago, performing chromosome painting of accessions based on the contribution of ancestral groups [28]. These pipelines are not suitable for the analysis of a small number of newly generated hybrids. Therefore, we refer to the theory of QTL-seq [29], using variations (SNP and InDel) to characterize genome structure along chromosomes. Finally, we proposed a simpler and more accurate karyotyping pipeline for plants, which was verified through FISH and published resequencing data. Our method was used to demonstrate gene flow from diploid to allopolyploid plants using triploid plants as a bridge [30].

Results

Molecular karyotypes are consistent with cytogenetic karyotypes and have a great advantage

According to the method described, molecular and cytogenetic karyotypes of allopolyploid rapeseeds and autopolyploid potatoes were analyzed simultaneously. Molecular karyotypes were analyzed using the resequencing data with 1× depth. The karyotype of QIS4_8 showed that it was a recessive aneuploid (2n = 38, euploidy alike) with two pairs of homoeologous chromosomes dosage variations (three A1-one C1; one A10-three C9) and two copies of the other chromosomes (Fig. 1A). ESS1_17 was a monosomic alien addition line (MAAL, AA + C3) composed of two sets of A genomes and one C3 chromosome (Fig. 2A). The 21A020 was an allotriploid with two sets of A genomes and one set of C genome (Fig. 2B). Moreover, molecular karyotypes were also applied to autopolyploids, including tetraploid At (Fig. 3A) and hexaploid EA49 (Fig. 3B), having four and six sets of genomes, respectively.

Fig. 1
figure 1

Karyotyping of the resynthesized B. napus QIS4_8. (A) Molecular karyotype of the QIS4_8 with 1× sequence depth. The scatter represents Ratio * 2 (expected ploidy), and the black line shows the copy number of chromosomes. (B, C, and D) Genotyping of the QIS4_8 with different sequence depths (10, 5, and 1×). The scatter represents the index of each position. The black line was obtained by sliding window analysis. The two red dotted lines divide the average index (black line) into three regions as previously described [30]. Thus, 0-0.2 (variation type from Quinta) and 0.8-1 (IMB218 alleles) are homozygous regions, and 0.2–0.8 is the heterozygous region. (E) For cytogenetic karyotype analysis, the A chromosomes are shown in lanes A and B, and the C chromosomes are shown in lanes C and D. The first round of FISH included 45 S rDNA (white), 5 S rDNA (yellow), BAC clone KBrB072L17 (green), and KBrH092N24 (red) probes, and the hybridization results are shown in lanes A and C. The second round of FISH included CentBr1 (white), CentBr2 (green), and BAC BNIH 123L05 (red) probes containing C genome-specific repeated sequences, and the hybridization results are shown in lanes B and D. One near-tetraploid QIS4_8 with 38 chromosomes derived from the AnArCn allotriploid

Fig. 2
figure 2

Karyotyping for different ploidy rapeseeds. (A) For molecular karyotypes of ESS1_17 with 1× sequence depth, the scatter represents Ratio * 2 (expected ploidy). (B) For molecular karyotypes of 21A020 with 1× sequence depth, the scatter represents Ratio * expected ploidy (2 for subgenome A; 1 for subgenome C). The black line shows the copy number of chromosomes. (C-D) For cytogenetic karyotype analysis, the A chromosomes are shown in lanes A and B, and the C chromosomes are shown in lanes C and D. The first round of FISH included 45 S rDNA (white), 5 S rDNA (yellow), BAC clone KBrB072L17 (green), and KBrH092N24 (red) probes, and the hybridization results are shown in lanes A and C. The second round of FISH included CentBr1 (white), CentBr2 (green), and BAC BNIH 123L05 (red) probes containing C genome-specific repeated sequences, and the hybridization results are shown in lanes B and D. (C) A MAAL ESS1_17 with two sets of A genomes plus one more C3 chromosome. (D) An allotriploid 21A020 is composed of two sets of A genomes and one set of C genome

Fig. 3
figure 3

Karyotyping for different ploidy potatoes. Molecular karyotypes of At (A), and EA49 (B) with 1× sequence depth. The scatter represents Ratio * expected ploidy (4 for At; 6 for EA49), and the black line shows the copy number of chromosomes. (C) Genotyping of the EA49 with 10× sequence depth. Using S. etubersoum (EE) as the reference parent to calculate the index, and the scatter represents the index of each position. The black line, obtained by sliding window analysis, represents the proportion (2/3, black dotted line) of AC142 (AA) genotypes. The red line is “1 - black line”, representing the ratio (1/3, red dotted line) of EE genotypes. (D) FISH mapping of At chromosomes using two oligo-FISH probes (lanes A and C). The same cell was reprobed with subtelomeric repeated sequences CL34 (green), CL14 (red), and 45 S rDNA (white) probes (lanes B and D). At is an autotetraploid with 48 chromosomes. (E) GISH mapping of EA49 chromosomes using two genomic probes (AA, green; EE, red). EA49 is a hexaploid with 72 chromosomes consisting of two sets of A genomes and one set of E genome. Scale bar, 10 μm

For cytogenetic karyotypes, combining two rounds of hybridization and seven probes (four in the first round and three in the second round), every chromosome from the A and C genomes of Brassica was unambiguously identified. QIS4_8 was found to contain 38 chromosomes with three A1 and one homoeologous C1 chromosomes, and one A10 and three homoeologous C9 chromosomes (Fig. 1E). ESS1_17 was a MAAL, having two sets of A genomes plus one more C3 chromosome (Fig. 2C), while 21A020 was an allotriploid composed of two sets of A genomes and one set of C genome (Fig. 2D). Two FISH techniques, namely Oligo-FISH and Multi-color FISH using two and three probes, respectively, were then used to identify the potato karyotypes. The result showed that At was an autotetraploid (Fig. 3D).

Finally, molecular and cytogenetic karyotypes were compared, and it was found that the karyotypes obtained by both methods were highly consistent (Figs. 1A and E, 2A, C, B and D and 3A and D). Despite technicians using the cytogenetic technique for seven years, misidentification of C1 and C5 chromosomes of Brassica still occurred due to their similar signals (especially hybridization is less effective) and chromosome length. However, this problem was absent in molecular karyotyping. Notably, molecular karyotyping not only identified each chromosome and obtained the copy number, but also showed the loss and duplication of partial chromosomal segments. For example, QIS4_8 had three A1 chromosomes and one homoeologous C1 chromosome (Fig. 1A, E), with the ends of both A1 (about 32–38 Mb) and C1 (about 50–58 Mb) chromosomes having two copies, as a result of homoeologous exchange [31].

The molecular karyotyping pipeline in this study can reproduce the results of published molecular karyotypes

Further, published resequencing data of Solanum tuberosum cv Desiree for karyotyping [25] was used to verify our method in this study. Consistent with previous studies, p.2D-10 was found to be an autotetraploid (Fig. 4A), while Plant-74 (Fig. 4B) and PSK23 (Fig. 4C) were found to be aneuploid. Plant-74 lost one chromosome 2 and part of chromosome 8, gaining an extra chromosome 4. PSK23 lost one chromosome 5 and half of chromosome 4. Thus, molecular karyotypes derived from different methods showed similar genome dosages, including plants propagated by stem cutting (Fig. 4A), and plants regenerated from protoplasts (Fig. 4B) and stem internodes by Agrobacterium-mediated transformation (Fig. 4C). Some samples were sequenced at depths as low as 0.4×, with accurate karyotypes being obtained. In conclusion, if only chromosome copy number is analyzed, this study suggested sequencing 1× for small genome species or small sample size, while the sequencing depth for large genome species or large sample size can be reduced appropriately (at least 0.01x [27]). As for the appropriate minimum depth for target species, researchers can use the software seqtk to extract different depths for testing.

Fig. 4
figure 4

Karyotyping for potato published data. Molecular karyotypes of p.2D-10 (A), Plant-74 (B), and PSK23 (C). The scatter represents Ratio * 4 (expected ploidy), and the black line shows the copy number of chromosomes

Inference of homologous chromosome and genome origin based on molecular karyotypes

In addition to chromosome copy numbers, genotypes of the offspring of hybrids can be inferred if their parental resequencing data is available. For example, QIS4_8 is an offspring of a triploid hybrid (2n = 29, AnArCn) crossed between allotetraploid B. napus Quinta and diploid B. rapa IMB218, and its C genome is derived from B. napus, while the A genome has different origins. Therefore, after aligning samples to the rapeseed reference genome (containing A and C chromosomes), only the A genome of QIS4_8 was analyzed. QIS4_8 was observed to have two copies of chromosome A7 (Fig. 1A, E), with the first half of the chromosome being almost the IMB218 genotype (ArAr) and the last half almost the Quinta genotype (AnAn; Fig. 1B). Besides, there were three copies of chromosome A1 (Fig. 1A, E), which most of the chromosomal fragments were observed in the heterozygous region with an average index of about 1/3, indicating that the genotype was AnAnAr (Fig. 1B). Using the methods in this study, newly formed allotetraploids were successfully analyzed among the progeny of allotriploids (interploidy hybrids between B. napus and B. rapa). It was found that large chromosomal fragments and even main chromosomes came from the diploid parent B. rapa. The results demonstrated that genome sequences from the diploid B. rapa were transferred to the newly formed allotetraploids [30].

As molecular karyotypes show the origin of homologous chromosomes, different genomes can also be identified using our method. EA49 is a resynthesized hexaploid derived from a diploid potato (AA, 2n = 24) and S. etubersoum (EE, 2n = 24). Our pipeline analysis showed that each chromosome of EA49 has six copies (Fig. 3B) with a genotype ratio (A:E) of 2:1 (Fig. 3C), indicating that EA49 consisted of two sets of AA genomes and one set of EE genome. Further analysis with GISH demonstrated 48 A chromosomes and 24 E chromosomes (Fig. 3E). Thus, molecular karyotypes can not only replace GISH to identify different genomes but also allow the estimation of the size of the added alien chromosomal segment as well as the missing chromosomal segment described above.

If both chromosome copy number and genome structure are obtained, higher resequencing depth is required. In theory, the deeper the resequencing depth is, the more accurate the results, with the cost being higher. Here, 10× depth was found to be sufficient to obtain accurate genotyping results (Figs. 1B and 3C). Some chromosomal regions, such as A10, showed errors at 5× depth (Fig. 1C). Also, it was completely impossible to obtain valid genotypes at 1× depth (Fig. 1D) due to the minimum depth for variation detection being 3–4× [32, 33]. Therefore, it was recommended to choose the resequencing depth between 5–10×.

Discussion

Karyotype fundamentally determines the traits of species. Accurate and rapid karyotype analysis greatly shortens the cycle of chromosome engineering [4]. Compared to euploidy, karyotype identification of aneuploidy is more challenging, especially for samples with euploidy chromosomal variations, such as QIS4_8 (2n = 38). Also, all individual chromosomes of aneuploids cannot be identified by flow cytometry and chromosome counts. Though cytogenetic karyotyping can identify each chromosome, it is challenging and low in efficiency, and aneuploid and polyploid plants have too many chromosomes to spread out during tissue preparation. However, it is easy for molecular karyotyping to identify various aneuploids and polyploids.

Compared to using standardized or normalized read counts to infer chromosome copy numbers, the pipeline in this study utilizes CNVs to make inference more convenient and accurate, without the need for control samples (optional). Control samples are not available most of the time, and in most cases, control samples with large chromosomal copy variations are not completely euploid, which can lead to analysis errors if a relative chromosomal copy is not desired. For example, Fossi et al. used p.2D-10 as a control to obtain the karyotype of Plant-74. However, chromosome 10 of p.2D-10 was not completely four copies (Fig. 4A), and so the variation of chromosome 10 in Plant-74 (Fig. 4B) was not detected [25]. This phenomenon was also observed in other chromosomes, including chromosomes 3, 5, and 6 (Fig. 4A, B). Notably, it is difficult to infer the karyotype when CNVs of a sample are too irregular, as partial CNVs cannot be reflected in chromosome copy numbers. This could also be the reason for the copy number of certain chromosomal fragments being inconsistent between the molecular and cytogenetic karyotypes.

Introgression of alien chromosomal segments containing useful genes into crop plants through wide hybridization is a valuable method for plant breeding. For example, the short arm of rye chromosome lR, carrying several disease-resistance genes, was incorporated into many high-yielding wheat cultivars [34]. Molecular karyotyping used in this study allowed the monitoring of alien chromatin during introgression. Besides, if two genomes are very closely related and share most of the repetitive DNA sequences, the distinction between the two genomes becomes relatively difficult by GISH [18]. On the contrary, molecular karyotyping helps in the most efficient and accurate identification of different genomes. The molecular karyotype system will allow karyotyping of a large number of accessions or ecotypes within species to study genetic adaptation and evolution on a chromosome scale. In addition to plants, the method can be used to analyze data for any organism [35].

Conclusions

This study proposed a new molecular karyotyping method based on low coverage whole-genome resequencing, which had the advantages of wide application, simple operation, easy repetition, less time and cost. The method is of great significance for species evolution analysis, chromosome engineering, crop improvement, and breeding.

Methods

Plant materials

The resynthesized Brassica napus allopolyploid line EL500 (CCAA, 2n = 38) was obtained from a previous study [12]. ESS1_17 was a backcross progeny between a male parent and triploid developed by hybridizing EL500 (egg donor) with the inbred B. rapa parent line Si (AA, 2n = 20, pollen donor). 21A020 was a triploid developed by hybridizing the cultivar B. napus Quinta (AACC, 2n = 38, egg donor) with the doubled haploid B. rapa line IMB218 (AA, 2n = 20, pollen donor) as described previously [36]. QIS4_8 was a self-crossing progeny of the triploid 21A020. Plants were grown at 23 °C during the day and 20 °C at night with 16-h light in a growth chamber.

The somatic cell hybrid EA49 was derived from a protoplast electrofusion of the diploid potato cultivar AC142 (AA, 2n = 24) with S. etubersoum (EE, 2n = 24). EA49 and tetraploid cultivar At (S. tuberosum Atlantic, 2n = 48) were propagated by cutting nodes with an axillary bud and incubated in vitro under fixed conditions (16-h light/8-h dark, 24 °C, 40 µmol m-2s-1). To produce flower buds, plantlets from each line were shifted from tissue culture to greenhouse pots.

Whole genome resequencing

Genomic DNA was extracted from young leaflets using the DNeasy Plant Mini Kit (Qiagen). Libraries were constructed using the KAPA Hyper Prep kit (KAPA Biosystems KK8504) throughout the protocol. Libraries were sequenced with paired-end 150 bp reads on an Illumina NovaSeq 6000 platform (Novogene, Beijing, China). Adaptor sequences and low-quality reads were trimmed by fastp software with default settings [37], and the remaining ones were called clean reads. Sequencing data of IMB218 was obtained from a previous study [38]. Sequencing data with different depths (1, 5, and 10×) were extracted from original FASTQ files using seqtk software (https://github.com/lh3/seqtk).

Molecular karyotyping using resequencing data

The flow chart of karyotype analysis is shown in Fig. 5. A computer cluster node with 48 Intel(R) Xeon(R) CPU E5-2650 at 2.20 GHz cores and 256 GB of random-access memory (RAM) was used to perform the analyses in this study. Clean reads were aligned to the reference genome of rapeseed (ZS11, contains A and C chromosomes) [39] or potato (DM v6.1) [40] using BWA software [41]. Alignment files were converted to BAM files using SAMtools [42], and applied to the absolute copy number variation (CNV) analysis by Control-FREEC [35] with default settings, except that ExpectedGC was 0.3–0.5 for rapeseeds or 0.25–0.45 for potatoes. If the subgenomic ploidy of allopolyploids, such as triploid 21A020 (AAC), is different, it is better to analyze the chromosome copy number of each subgenome separately by setting “chrLenFile” and “ploidy” parameters. Only chromosomes in the chrLenFile list will be considered by Control-FREEC, so we can analyze the A and C subgenomes separately based on the same BAM file that aligns the reference genome ZS11. If the ploidy of the material is in doubt, different values can be set and Control-FREEC will select the one that explains most observed CNVs. See the manual for detailed instructions on this tool (http://boevalab.inf.ethz.ch/FREEC/index.html). Karyotypes were then inferred from CNV visualizations by ggplot2 in R-3.6.3 (Control-FREEC_visualization.R on GitHub). All codes (including detailed notes) and test data are available on GitHub (https://github.com/kangluzhao/karyotyping).

Fig. 5
figure 5

Flow chart of molecular karyotype analysis in this study. The gray background represents the chromosome copy number analysis pipeline, which requires no more than 1x depth data. The blue background represents the hybrid genome structure analysis pipeline, which requires 5-10x depth data

Further, each homologous chromosome or chromosomal segment of progenies provided by which parent was distinguished and traced. The theory was similar to QTL-seq [29]. Taking QIS4_8, a progeny of Quinta and IMB218, as an example, variations (SNP and InDel) were called from BAM files with marked duplicates, and positions with QD < 2.0 or FS > 60.0 were filtered using GATK4 [43]. According to the VCF file, selecting the homozygous and differential genotype positions of parents (select_homozygous_differential_position.pl on GitHub), and then the “index” of QIS4_8 was calculated for these positions (calculate_index.pl on GitHub). If the position’s genotype of QIS4_8 exactly matches the reference parent (Quinta), we assign an index of 0. Otherwise, it’s 1 (i.e., exactly matching the IMB218). Thus, the genetic proportion of each parent could be calculated. Besides, positions with read depth < 7 were excluded (exclude_low_depth_position.pl on GitHub), as their corresponding indexes were less accurate. Finally, sliding window analysis was applied to index plots with a 2 Mb window size and 10 kb increment. The average index of the positions located in the window was calculated to reduce noise. Figures were plotted using ggplot2 in R-3.6.3 (index_visualization.R on GitHub).

Cytogenetic karyotyping by FISH

FISH was performed on pollen mother cells at mitosis metaphase. Probes used for FISH, tissue preparation, hybridization, karyotyping, and imaging were described previously [12, 13, 44, 45].