Chromosome-level genome assembly of the cereal cyst nematode Heterodera flipjevi

Yao, Ke; Cui, Jiangkuan; Jian, Jinzhuo; Peng, Deliang; Huang, Wenkun; Kong, Lingan; Wang, Qianghui; Peng, Huan

doi:10.1038/s41597-024-03487-7

Chromosome-level genome assembly of the cereal cyst nematode Heterodera flipjevi

Data Descriptor
Open access
Published: 17 June 2024

Volume 11, article number 637, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome assembly of the cereal cyst nematode Heterodera flipjevi

Download PDF

Ke Yao¹^na1,
Jiangkuan Cui²^na1,
Jinzhuo Jian¹,
Deliang Peng¹,
Wenkun Huang¹,
Lingan Kong^1,3,
Qianghui Wang⁴ &
…
Huan Peng¹

564 Accesses
1 Altmetric
Explore all metrics

Abstract

As an economically important plant parasitic nematode (PPN), Heterodera filipjevi causes great damage on wheat, and now it was widely recorded in many countries. While multiple genomes of PPNs have been published, high-quality genome assembly and annotation on H. filipjevi have yet to be performed. This study presents a chromosome-scale genome assembly and annotation for H. filipjevi, utilizing a combination of Illumina short-read, PacBio long-read, and Hi-C sequencing technologies. The genome consists of 9 pseudo-chromosomes that contain 134.19 Mb of sequence, with a scaffold N50 length of 11.88 Mb. In total, 10,036 genes were annotated, representing 75.20% of the total predicted protein-coding genes. Our study provides the first chromosome-scale genome for H. filipjevi, which is also the inaugural high-quality genome of cereal cyst nematodes (CCNs). It provides a valuable genomic resource for further biological research and pest management of cereal cyst nematodes disease.

Chromosome-level genome assembly of an agricultural pest Zeugodacus tau (Diptera: Tephritidae)

Article Open access 01 December 2023

Chromosome-level genome assembly and annotation of the prickly nightshade Solanum rostratum Dunal

Article Open access 01 June 2023

Chromosomal-level genome assembly of potato tuberworm, Phthorimaea operculella: a pest of solanaceous crops

Article Open access 03 December 2022

Background & Summary

The cereal cyst nematodes (CCNs) are a group of 12 closely related species and considered to be one of the most damaging plant parasitic nematodes (PPNs) that limit production of cereal crops in many parts of the world including Australia, China, India, USA, Europe, North Africa and West Asia^1,2. The species Heterodera filipjevi, H. avenae, and H. litipones are among the most economically important species and caused significant economic losses³. The yield losses caused by CCNs have been recorded from 10–35% on wheat in China, 24% on spring wheat in Oregon, and 20% on barley in Australia⁴. Among the CCNs, H. filipjevi is an important constraint to cereal crop production in different climatic regions¹, and now it was widely recorded in many countries such as Tadzhikistan, Russia, Morocco, Tunisia, Pakistan, Libya, Turkey^5,6, Estonia, Sweden, India, Norway, Iran, China⁷, United Kingdom, and USA⁸. Smiley et al.⁴ reported a 35% yield loss in spring wheat in Oregon, USA, due to H. filipjevi⁴, and Karimipour et al.⁹ estimated yield losses in wheat yield ranging between 20% and 25% in Iran by the same nematode species⁹. Also, Hajihasani et al.¹⁰ reported that grain yield loss caused by H. filipjevi occurred even at the lowest population density and reached a maximum loss of 48% with an initial population density (Pi) of 20 eggs and J2/ (g soil)^–1 in Iran¹⁰.

Genomic data have proven to be powerful tools to explain the successful parasitization of plant nematodes. The first plant-parasitic nematodes genomes Meloidogyne incognita and M. hapla have been published in 2008. Recently, several PPNs genomes from Globodera pallida¹¹, Globodera rostochiensis¹², Heterodera glycines^13,14, Bursaphelenchus xylophilus¹⁵, Bursaphelenchus mucronatus¹⁶, Ditylenchus destructor¹⁷, M. floridensis¹⁸ and M. graminicola¹⁹ have been published. However, the available reference genome of CCNs was absent, only the transcriptome of H. avenae was sequenced using short-read sequencing technology^20,21,22,23. In the present study, a total of 95.79 Gb (711.56 x) raw data was obtained by Illumina, PacBio, 10x Genomics and Hi-C technologies, the detailed sequencing data were summarized in Table 1. The 17-mers were counted as 17,119,184,513 from 21.8 G Illumina short reads, and the k-mer depth was 124 (Table 1). Then, we used PacBio long-read, Illumina short-read, 10 × Genomics and Hi-C data to generate a high-quality chromosome-level reference genome for H.filipjevi. The genome assembly spanned 134.19 Mb with a scaffold N50 length of 11.88 Mb (Table 2). After chromosome-level anchoring, 9 chromosomes with a total length of 120 Mb (89.4% of the draft assembly) were constructed corresponding to the karyotype. In addition, the mapping rate of Illumina short reads was 93.14% and the genome coverage was 99.71% (Table 3). Moreover, 2,226 homozygous single-nucleotide polymorphisms (SNP) and a low homozygous rate (0.0032%) were identified, suggesting a low error rate in the assembly (Table 4). In conclusion, our evaluation indicated a high quality of the assembled H. filipjevi genome. Finally, we annotated 13,352 protein-coding genes in H. filipjevi genome with a mean of 8.14 exons per gene (Table 5 and Table 6) and found 61.9 Mb (46.14%) repeat elements. The reference genome obtained in this study will provide a foundation for future investigations on the pathogenesis of CCNs.

Table 1 Sequencing data used for the genome Heterodera flipjevi assembly.

Full size table

Table 2 The statistics of length and number for the de novo assembled Heterodera flipjevi genome.

Full size table

Table 3 Statistics of reads coverage of the Heterodera flipjevi genome.

Full size table

Table 4 SNP statistics of the Heterodera flipjevi genome.

Full size table

Table 5 Statistical results of the repetitive sequences of the Heterodera flipjevi genome.

Full size table

Table 6 Gene annotation of Heterodera flipjevi genome via three methods.

Full size table

Methods

Nematode sample and DNA extraction

The cysts of H. filipjevi were collected from wheat fields in Xuchang city Henan province. Ten cysts were chosen and inoculated on susceptible wheat cultivars Wenmai 19 in greenhouse for 6 generations. The fresh, healthy and unbroken cysts were manually picked and used for extraction of eggs by crushed in sterile water. the eggs were subsequently collected using the sucrose flotation technique²⁴ and cleaned with sterile distilled water for three times. Six developmental stages of H. flipjevi including pre-parasitic second stage juveniles (Pre-J2), parasitic-J2 (Para-J2), third stage juveniles (J3), fourth stage juveniles 4 (J4), adult females (Fes) and eggs were collected according to the previous report²⁰.

DNA isolation and sequencing

Genomic DNA was isolated from H. filipjevi egg mass according to the CTAB method. The DNA quality and concentration were assessed using agarose gel electrophoresis and a Qubit 2.0 Fluorometer (Life Technologies, CA, USA). A 20 kb insert sizes library was produced following the manufacturer’s protocol (PacBio, CA) and sequenced with the PacBio RS technology. For short-read sequencing, libraries with 350 bp insert sizes was prepared and sequenced on Illumina HiSeq 2500 as 2 × 150 bp reads (Table 1). The GEM (Gel Beads in Emulsion) reaction was conducted for 10 × Genomics sequencing using approximately 1 ng input DNA of 50 kb length, and 16 barcodes were introduced into droplets, subsequence, the droplets were fractured and 600 bp fragments were used for constructing libraries, which were sequenced on the Illumina HiSeq X platform at the Novogene Bioinformatics Institute, Beijing.

Genome size estimation, assembly and evaluation

For survey analysis, the H. filipjevi genome size was estimated using the 21.8 Gb paired-end Illumina sequencing data, based on the K-mer formula: Genome size = (total number of 17-mer) / (position of the homozygous peak). With the 14.74 Gb long reads generated from PacBio Sequel platform, the contig assembly of H. filipjevi genome was conducted using the FALCON assembler (version 1.2.4)²⁵. Then, the assembly from PacBio data was polished by Quiver (smrtlink 5.0.1)²⁶. The heterozygosity of assembly was removed by using the Purge Haplotigs software (version 1.1.1)²⁷. The resulting contigs were connected to super-scaffolds by 42.58 Gb 10 × Genomics linked-read data using the fragScaff software (Version 140324)²⁸ (Table 1). Finally, the short reads from Illumina were used to correct any remaining errors by Pilon (Version 1.22)²⁹. These processes would yield a final draft H. filipjevi genome.

To acquire a high-quality H. filipjevi genome, the draft assembly was further improved using Hi-C analysis with 16.67 Gb Hi-C data. Firstly, the Hi-C reads were mapped to the draft assembly by using BWA³⁰. Then, the low-quality reads and duplications were removed to build raw inter/ intra-chromosomal contact maps. Last, based on the agglomerative hierarchical clustering algorithm³¹, Lachesis (Version 201701) was applied for clustering, ordering and orienting, and the scaffolds from genomics were clustered into 9 pseudochromosomes³². Finally, Juicebox software (Version 2.20.00) was used to manually correct scaffolded chromosomes and plot heatmap of genomic interactions³³. Above all, we obtained a 134,189,547 bp H. filipjevi genome including 9 pseudo-chromosomes, covering ~89.4% of the whole genome (Fig. 1) and 652 supper-scaffolds, the contig N50 and scaffold N50 are 0.45 Mb and 11.88 Mb, respectively (Table 2). Circos (version 0.64) was used to visualize the H. filipjevi genome data³⁴.

The completeness of genome assembly was assessed by the following methods. First, the Core Eukaryotic Genes Mapping Approach (CEGMA)³⁵ was conducted based on a core gene set involved in 248 evolutionarily conserved genes from six eukaryotic model organisms. The CEGMA evaluation results showed that 248 CEGs assembled 230 genes, with a proportion of 92.74%, indicating that the assembly results were relatively complete. Second, the BUSCO³⁶ (version 5.4.3) at genome model was used to evaluate the completeness of genomes in this study using nematoda_odb10 as a database. And we obtained a 55.8% assembly completeness, similar to other reported cyst nematode genomes (43.4–59.3%)¹⁹. Finally, small fragment library reads were aligned to the assembled genome using BWA software the alignment rate of the total small fragment reads to the genome is about 93.14% and the coverage is about 99.71%, indicating a good consistency between the reads and the assembled genomes (Table 3).

Genomic repeat annotation

Two technologies were applied to the annotation of repetitive sequences within H. filipjevi genome, including homologous comparison and ab initio prediction. For homologous comparison, RepeatMasker (Version 3.3.0) and the associated RepeatProteinMask were performed by aligning against Repbase database³⁷. For ab initio prediction, LTR_FINDER (version 1.0.7)³⁸, RepeatScout (Version 1.0.5)³⁹ and RepeatModeler (Version 1.0.4) were firstly used for de novo candidate database construction of repetitive elements. Followed by, the repetitive sequences were annotated using RepeatMasker, while the tandem repeats were ab initio predicted using TRF (Version 4.07b)⁴⁰. By combining Repbase and de novo datasets, we obtained a total of 61.91 Mb of consensus and nonredundant repetitive sequences, which occupied 46.14% of the genome (Table 5).

Gene prediction and functional annotation

Three approaches were employed for predicting the protein-coding genes within H. filipjevi genome, including homology-based prediction, ab initio annotation, and transcriptome-based prediction. For homology-based prediction, firstly, protein repertoires of H. glycines (GCA_004148225.2)⁴¹, G. pallida (GCA_000724045.1)⁴², G. rostochiensis (GCA_900079975.1)⁴³, M. incognita (GCA_900182535.1)⁴⁴, M. hapla (GCA_000172435.1)⁴⁵, Caenorhabditis elegans (GCA_000002985.3)⁴⁶, Haemonchus contortus (GCA_000442195.1)⁴⁷, Pristionchus pacificus (GCA_918442795.1)⁴⁸, Brugia malayi (GCA_000002995.5)⁴⁹, Drosophila melanogaster (GCA_029775095.1)⁵⁰ and Homo sapiens (GCA_024586135.1)⁵¹ were aligned against the H. filipjevi genome using TBLASTN (Version 2.2.29)⁵². Secondly, the BLAST hits were conjoined by Solar software (version 0.9.6)⁵³. Thirdly, GeneWise (version 2.2.0)⁵⁴ was used to predict the exact gene structure of the corresponding genomic region on each BLAST hit. Notably, homology predictions were denoted as “Homology-set”. In addition, about 33.2 Gb clean data of RNA-sequencing (RNA-seq) data derived from six developmental stages of H. filipjevi were assembled with Trinity (version 2.0)⁵⁵, followed by, the assembled sequences were aligned against H. filipjevi genome using Program to Assemble Spliced Alignment (PASA) (version 2.0.2)⁵⁶. The resulting effective alignments were clustered based on genome mapping location and assembled into gene structures. Notably, gene models created by PASA were denoted as PASA-T-set (PASA Trinity set). For ab initio annotation, five tools were simultaneously employed, including Augustus (version 3.0.2)⁵⁷, GeneID (version 1.4)⁵⁸, GeneScan (version 1.0)⁵⁹, GlimmerHMM (version 3.0.2)⁶⁰ and SNAP (version 11-29-2013)⁶¹. Among them, Augustus, SNAP and GlimmerHMM were trained by PASA-T-set gene models. For transcriptome-based prediction, RNA-seq reads were directly mapped to the genome using Tophat (version 2.0.9)⁶². Then the mapped reads were assembled into gene models (Cufflinks-set) by Cufflinks (version 2.1.1)⁶³. According to these three approaches, all the gene models were ultimately integrated by Evidence Modeler⁶⁴. The weight of each evidence was set as follows: PASA-T-set > Homology-set > Cufflinks-set > Augustus > GeneID = SNAP = GlimmerHMM = GeneScan. Meanwhile, in order to get the untranslated regions (UTR) and alternative splicing variation information, PASA2 was used to update the gene models. Ultimately, a total of 13,352 protein-coding genes were predicted in the H. filipjevi genome. The average transcript length was 3,258.25 bp with an average coding sequence (CDS) length of 1,235.80 bp. The average exon number per gene was 8.14 with an average exon length of 151.89 bp and average intron length of 283.41 bp (Table 6). The statistics of gene models, including lengths of a gene, CDS, intron, and exon in H. filipjevi were comparable to those for close-related species (Fig. 2).

In addition, the gene structures of transfer RNAs (tRNA), ribosomal RNAs (rRNA) and other non-coding RNAs in H. filipjevi genome were predicted. Specifically, the tRNA were predicted using tRNAscan-SE software (version 1.3.1)⁶⁵. The rRNA fragments were predicted by searching against invertebrate rRNA database using BLAST with an E-value of 1E⁻¹⁰. The microRNAs (miRNA) and small nuclear RNAs (snRNA) genes were predicted by INFERNAL (version 1.1.1)⁶⁶ using Rfam database⁶⁷.

The predicted protein-coding genes in H. filipjevi genome were functionally annotated based on homologous searches against databases of SwissProt⁶⁸, NR database (NCBI)⁶⁹, Gene Ontology⁷⁰, InterPro⁷¹ and KEGG pathway⁷². Notably, InterproScan tool⁷³ in coordination with InterPro database⁷⁴ were applied to predict protein function based on the conserved protein domains and functional sites. A total of 10,036 genes (75.20%) were successfully annotated by at least one public database (Fig. 3).

Data Records

The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive⁷⁵ in National Genomics Data Center (NGDC)⁷⁶, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA014195)⁷⁷ that are publicly accessible at https://ngdc.cncb.ac.cn/gsa. The genome assembly has been deposited in DDBJ/ENA/GenBank under the accession number JBDPZO000000000⁷⁸, and NGDC under the GSA accession CRA015002⁷⁹. Data of the gene functional annotations had been deposited at Figshare⁸⁰.

Technical Validation

Nucleic acid quality

The DNA quality and concentration were assessed using agarose gel electrophoresis and a Qubit 2.0 Fluorometer (Life Technologies, CA, USA).

Evaluation of genome assembly

Various different strategies were used to evaluate the completeness and accuracy of the H. filipjevi genome. First, our assembly genome was verified to have high completeness by CEGMA³⁵ (92.74%), indicating that the assembly results are relatively complete. Second, the BUSCO³⁶ (v5.4.3) at genome model was used to evaluate the completeness of genomes in this study and other published genome, using nematoda_odb10 as a database. We obtained a 55.8% assembly completeness, similar to other reported plant nematode genomes (43.4–59.3%)^{11,12,13,14,19}. The low completeness of the BUSCO estimates can be attributed to the substantial genetic divergence between the nematoda_odb10 database and cyst nematodes, with large differences in protein sequences. Moreover, to evaluate the accuracy of the assembly, we used BWA software to align small fragment library reads to the assembled genome to calculate the alignment rate, coverage degree and depth of reads. The results show that the alignment rate of the total small fragment reads to the genome is about 93.14% and the coverage is about 99.71%, indicating a good consistency between the reads and the assembled genomes (Table 3).

Code availability

No custom code was used for this study. All data analyses were conducted using published bioinformatics software with default settings, unless otherwise specified.

References

Nicol, J. M., Elekçioğlu, I. H., Bolat, N. & Rivoal, R. The global importance of the cereal cyst nematode (Heterodera spp.) on wheat and international approaches to its control. Commun. Agric. Appl. Biol. Sci. 72, 677–686 (2007).
CAS PubMed Google Scholar
Sikora, R. A. Nematodes Parasitic to Cereals & Legumes in Temperate Semi-arid Regions: Plant Parasitic Nematodes of Wheat and Barley in Temperate and Temperate Semiarid Regions—A Comparative Analysis (A Workshop Held at Larnaca, 1988).
Mokrini, F. et al. The importance, biology and management of cereal cyst nematodes (Heterodera spp.). Institut Agronomique et Vétérinaire Hassan II 4, 414 (2017).
Google Scholar
Smiley, R. W. et al. Plant-parasitic nematodes associated with reduced wheat yield in Oregon: Heterodera avenae. J. Nematol. 3, 297–307 (2005).
Google Scholar
Nicol, J. M. et al. Genomics and Molecular Genetics of Plant-Nematode Interactions: Current Nematode Threats to World Agriculture (The Netherlands: Springer, 2011).
Folkertsma, R. T. et al. Gene pool similarities of potato cyst nematode populations assessed by AFLP analysis. Mol Plant Microbe Interact. 9, 47–54 (1996).
Article CAS PubMed Google Scholar
Li, H. L. et al. First record of the cereal cyst nematode Heterodera filipjevi in China. Plant Dis. 94, 1505 (2010).
Article CAS PubMed Google Scholar
Smiley, R. W. et al. First record of the cyst nematode Heterodera filipjevi on wheat in Oregon. Plant Dis. 92, 1136 (2008).
Article CAS PubMed Google Scholar
Karimipour, F. H. et al. Assessment of yield loss of wheat cultivars caused by Heterodera filipjevi under field conditions. J Phytopathol 166, 299–304 (2018).
Article Google Scholar
Hajihasani et al. Effect of the cereal cyst nematode, Heterodera filipjevi, on wheat in microplot trials. Nematology 3, 357–363 (2010).
Google Scholar
Cotton, J. A. et al. The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode. Genome Biol. 15, 43 (2014).
Article Google Scholar
Eves-van den Akker, S. et al. The genome of the yellow potato cyst nematode, Globodera rostochiensis, reveals insights into the basis of parasitism and virulence. Genome Biol. 17, 124 (2016).
Article PubMed PubMed Central Google Scholar
Lian, Y. et al. Chromosome-level reference genome of X12, a highly virulent race of the soybean cyst nematode Heterodera glycines. Mol. Ecol. Resour. 19, 1637–1646 (2019).
Article CAS PubMed PubMed Central Google Scholar
Masonbrink, R. et al. The genome of the soybean cyst nematode (Heterodera glycines) reveals complex patterns of duplications involved in the evolution of parasitism genes. BMC Genomics 20, 119 (2019).
Article PubMed PubMed Central Google Scholar
Kikuchi, T. et al. Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus. Plos Pathogens. 7, e1002219 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wu, S. et al. A reference genome of Bursaphelenchus mucronatus provides new resources for revealing its displacement by pinewood nematode. Genes (Basel) 11, 570 (2020).
Article CAS PubMed Google Scholar
Zheng, J. W. et al. The Ditylenchus destructor genome provides new insights into the evolution of plant parasitic nematodes. Proc. Biol. Sci. 283, 20160942 (2016).
PubMed PubMed Central Google Scholar
Lunt, D. H., Kumar, S., Koutsovoulos, G. & Blaxter, M. L. The complex hybrid origins of the root knot nematodes revealed through comparative genomics. PeerJ. 2, e356 (2014).
Article PubMed PubMed Central Google Scholar
Dai, D. et al. Unzipped chromosome-level genomes reveal allopolyploid nematode origin pattern as unreduced gamete hybridization. Nat Commun. 14, 7156 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Cui, J. K. et al. Characterization of putative effectors from the cereal cyst nematode Heterodera avenae. Phytopathology 108, 264–274 (2018).
Article CAS PubMed Google Scholar
Kumar, M. et al. De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae. PLoS One 9, e96311 (2014).
Article ADS PubMed PubMed Central Google Scholar
Yang, D., Chen, C., Liu, Q. & Jian, H. Comparative analysis of pre- and post-parasitic transcriptomes and mining pioneer effectors of Heterodera avenae. Cell & Bioscience 7, 11 (2017).
Article Google Scholar
Zheng, M. et al. RNA-Seq based identification of candidate parasitism genes of cereal cyst nematode (Heterodera avenae) during incompatible infection to Aegilops variabilis. PLoS One 10, e0141095 (2015).
Article PubMed PubMed Central Google Scholar
Hussey, R. S. & Barker, K. R. A comparison of methods of collecting inocula of Meloidogyne species, including a new technique. Plant Dis. Rep. 57, 1025–1028 (1973).
Google Scholar
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).
Article CAS PubMed Google Scholar
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).
Article CAS PubMed PubMed Central Google Scholar
Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).
Article CAS PubMed PubMed Central Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with burrows- wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cotten, J. A. Cytological investigations in the genus Heterodera. Nematologica. 11, 337–342 (1965).
Article Google Scholar
Robinson, J. T. et al. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258 (2018).
Article CAS PubMed PubMed Central Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Article CAS PubMed PubMed Central Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Article CAS PubMed Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
Article Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_004148225.2 (2021).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000724045.1 (2014).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_900079975.1 (2016).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_900182535.1 (2017).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000172435.1 (2008).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000002985.3 (2013).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000442195.1 (2013).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_918442795.1 (2021).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000002995.4 (2019).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_029775095.1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_024586135.1 (2022).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Castro, D., Duarte, V. C. M. & Andrade, L. Perovskite solar modules: design optimization. ACS Omega. 7, 40844–40852 (2022).
Article CAS PubMed PubMed Central Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 8, 1494–1512 (2013).
Article CAS PubMed Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435–439 (2006).
Article Google Scholar
Parra, G., Blanco, E. & Guigó, R. GeneID in drosophila. Genome Res. 10, 511–515 (2000).
Article CAS PubMed PubMed Central Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Article PubMed PubMed Central Google Scholar
Kim, D. & Salzberg, S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12, R72 (2011).
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, 7 (2008).
Article Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: Inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
Article CAS PubMed PubMed Central Google Scholar
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2013).
Article Google Scholar
Kretschmann, E., Fleischmann, W. & Apweiler, R. Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT. Bioinformatics 17, 920–926 (2001).
Article CAS PubMed Google Scholar
Schoch, C. L. et al. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford). 2020, baaa062 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, 1049–1056 (2015).
Article Google Scholar
Finn, R. D. et al. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 45, 190–199 (2017).
Article Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, 109–114 (2012).
Article Google Scholar
Jones, P. et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Article CAS PubMed PubMed Central Google Scholar
The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genom, Proteom & Bioinf. 19, 578–583 (2021).
Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res. 50, 27–38 (2022).
NGDC/CNCB. Genome Sequence Archive. https://ngdc.cncb.ac.cn/gsa/search?searchTerm=CRA014195 (2024).
Yao, K. Heterodera filipjevi isolate KY-2024, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBDPZO000000000 (2024).
NGDC/CNCB. Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/search?searchTerm=CRA015002 (2024).
Peng, H. This is the Heterodera flipjevi genome of chromosome level, longest transcripts, predicted gene models and proteins. figshare https://doi.org/10.6084/m9.figshare.25243105 (2024).

Download references

Acknowledgements

This research was supported by the National Key R&D Program of China (2021YFD1400100), the National Natural Science Foundation of China (32302328 & 31972247), the first batch of “2 + 5” key talent plan in Xinjiang Uygur Autonomous Region and the Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences.

Author information

These authors contributed equally: Ke Yao, Jiangkuan Cui.

Authors and Affiliations

State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
Ke Yao, Jinzhuo Jian, Deliang Peng, Wenkun Huang, Lingan Kong & Huan Peng
National Key Laboratory of Wheat and Maize Crop Science, College of Plant Protection, Henan Agricultural University, Zhengzhou, 450002, China
Jiangkuan Cui
Zhongyuan Research Center, Chinese Academy of Agricultural Sciences, Xinxiang, 453000, China
Lingan Kong
Novogene, Bioinformatics Institute, Beijing, 100193, China
Qianghui Wang

Authors

Ke Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jiangkuan Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jinzhuo Jian
View author publications
You can also search for this author in PubMed Google Scholar
Deliang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Wenkun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lingan Kong
View author publications
You can also search for this author in PubMed Google Scholar
Qianghui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huan Peng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.P. and Q.H.W. designed the study. K.Y., J.K.C., J.Z.J., D.L.P., W.K.H., L.A.K., S.M.L. carried out genome sequencing and assembly. K.Y. and H.P. drafted the manuscript.

Corresponding authors

Correspondence to Qianghui Wang or Huan Peng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yao, K., Cui, J., Jian, J. et al. Chromosome-level genome assembly of the cereal cyst nematode Heterodera flipjevi. Sci Data 11, 637 (2024). https://doi.org/10.1038/s41597-024-03487-7

Download citation

Received: 01 March 2024
Accepted: 06 June 2024
Published: 17 June 2024
DOI: https://doi.org/10.1038/s41597-024-03487-7
Springer Nature Limited

Associated content

Genomics data for plant ecology, conservation and agriculture

Collection 20 January 2023

Chromosome-level genome assembly of the cereal cyst nematode Heterodera flipjevi

Abstract

Similar content being viewed by others

Chromosome-level genome assembly of an agricultural pest Zeugodacus tau (Diptera: Tephritidae)

Chromosome-level genome assembly and annotation of the prickly nightshade Solanum rostratum Dunal

Chromosomal-level genome assembly of potato tuberworm, Phthorimaea operculella: a pest of solanaceous crops

Background & Summary