Abstract
As an economically important plant parasitic nematode (PPN), Heterodera filipjevi causes great damage on wheat, and now it was widely recorded in many countries. While multiple genomes of PPNs have been published, high-quality genome assembly and annotation on H. filipjevi have yet to be performed. This study presents a chromosome-scale genome assembly and annotation for H. filipjevi, utilizing a combination of Illumina short-read, PacBio long-read, and Hi-C sequencing technologies. The genome consists of 9 pseudo-chromosomes that contain 134.19 Mb of sequence, with a scaffold N50 length of 11.88 Mb. In total, 10,036 genes were annotated, representing 75.20% of the total predicted protein-coding genes. Our study provides the first chromosome-scale genome for H. filipjevi, which is also the inaugural high-quality genome of cereal cyst nematodes (CCNs). It provides a valuable genomic resource for further biological research and pest management of cereal cyst nematodes disease.
Similar content being viewed by others
Background & Summary
The cereal cyst nematodes (CCNs) are a group of 12 closely related species and considered to be one of the most damaging plant parasitic nematodes (PPNs) that limit production of cereal crops in many parts of the world including Australia, China, India, USA, Europe, North Africa and West Asia1,2. The species Heterodera filipjevi, H. avenae, and H. litipones are among the most economically important species and caused significant economic losses3. The yield losses caused by CCNs have been recorded from 10–35% on wheat in China, 24% on spring wheat in Oregon, and 20% on barley in Australia4. Among the CCNs, H. filipjevi is an important constraint to cereal crop production in different climatic regions1, and now it was widely recorded in many countries such as Tadzhikistan, Russia, Morocco, Tunisia, Pakistan, Libya, Turkey5,6, Estonia, Sweden, India, Norway, Iran, China7, United Kingdom, and USA8. Smiley et al.4 reported a 35% yield loss in spring wheat in Oregon, USA, due to H. filipjevi4, and Karimipour et al.9 estimated yield losses in wheat yield ranging between 20% and 25% in Iran by the same nematode species9. Also, Hajihasani et al.10 reported that grain yield loss caused by H. filipjevi occurred even at the lowest population density and reached a maximum loss of 48% with an initial population density (Pi) of 20 eggs and J2/ (g soil)–1 in Iran10.
Genomic data have proven to be powerful tools to explain the successful parasitization of plant nematodes. The first plant-parasitic nematodes genomes Meloidogyne incognita and M. hapla have been published in 2008. Recently, several PPNs genomes from Globodera pallida11, Globodera rostochiensis12, Heterodera glycines13,14, Bursaphelenchus xylophilus15, Bursaphelenchus mucronatus16, Ditylenchus destructor17, M. floridensis18 and M. graminicola19 have been published. However, the available reference genome of CCNs was absent, only the transcriptome of H. avenae was sequenced using short-read sequencing technology20,21,22,23. In the present study, a total of 95.79 Gb (711.56 x) raw data was obtained by Illumina, PacBio, 10x Genomics and Hi-C technologies, the detailed sequencing data were summarized in Table 1. The 17-mers were counted as 17,119,184,513 from 21.8 G Illumina short reads, and the k-mer depth was 124 (Table 1). Then, we used PacBio long-read, Illumina short-read, 10 × Genomics and Hi-C data to generate a high-quality chromosome-level reference genome for H.filipjevi. The genome assembly spanned 134.19 Mb with a scaffold N50 length of 11.88 Mb (Table 2). After chromosome-level anchoring, 9 chromosomes with a total length of 120 Mb (89.4% of the draft assembly) were constructed corresponding to the karyotype. In addition, the mapping rate of Illumina short reads was 93.14% and the genome coverage was 99.71% (Table 3). Moreover, 2,226 homozygous single-nucleotide polymorphisms (SNP) and a low homozygous rate (0.0032%) were identified, suggesting a low error rate in the assembly (Table 4). In conclusion, our evaluation indicated a high quality of the assembled H. filipjevi genome. Finally, we annotated 13,352 protein-coding genes in H. filipjevi genome with a mean of 8.14 exons per gene (Table 5 and Table 6) and found 61.9 Mb (46.14%) repeat elements. The reference genome obtained in this study will provide a foundation for future investigations on the pathogenesis of CCNs.
Methods
Nematode sample and DNA extraction
The cysts of H. filipjevi were collected from wheat fields in Xuchang city Henan province. Ten cysts were chosen and inoculated on susceptible wheat cultivars Wenmai 19 in greenhouse for 6 generations. The fresh, healthy and unbroken cysts were manually picked and used for extraction of eggs by crushed in sterile water. the eggs were subsequently collected using the sucrose flotation technique24 and cleaned with sterile distilled water for three times. Six developmental stages of H. flipjevi including pre-parasitic second stage juveniles (Pre-J2), parasitic-J2 (Para-J2), third stage juveniles (J3), fourth stage juveniles 4 (J4), adult females (Fes) and eggs were collected according to the previous report20.
DNA isolation and sequencing
Genomic DNA was isolated from H. filipjevi egg mass according to the CTAB method. The DNA quality and concentration were assessed using agarose gel electrophoresis and a Qubit 2.0 Fluorometer (Life Technologies, CA, USA). A 20 kb insert sizes library was produced following the manufacturer’s protocol (PacBio, CA) and sequenced with the PacBio RS technology. For short-read sequencing, libraries with 350 bp insert sizes was prepared and sequenced on Illumina HiSeq 2500 as 2 × 150 bp reads (Table 1). The GEM (Gel Beads in Emulsion) reaction was conducted for 10 × Genomics sequencing using approximately 1 ng input DNA of 50 kb length, and 16 barcodes were introduced into droplets, subsequence, the droplets were fractured and 600 bp fragments were used for constructing libraries, which were sequenced on the Illumina HiSeq X platform at the Novogene Bioinformatics Institute, Beijing.
Genome size estimation, assembly and evaluation
For survey analysis, the H. filipjevi genome size was estimated using the 21.8 Gb paired-end Illumina sequencing data, based on the K-mer formula: Genome size = (total number of 17-mer) / (position of the homozygous peak). With the 14.74 Gb long reads generated from PacBio Sequel platform, the contig assembly of H. filipjevi genome was conducted using the FALCON assembler (version 1.2.4)25. Then, the assembly from PacBio data was polished by Quiver (smrtlink 5.0.1)26. The heterozygosity of assembly was removed by using the Purge Haplotigs software (version 1.1.1)27. The resulting contigs were connected to super-scaffolds by 42.58 Gb 10 × Genomics linked-read data using the fragScaff software (Version 140324)28 (Table 1). Finally, the short reads from Illumina were used to correct any remaining errors by Pilon (Version 1.22)29. These processes would yield a final draft H. filipjevi genome.
To acquire a high-quality H. filipjevi genome, the draft assembly was further improved using Hi-C analysis with 16.67 Gb Hi-C data. Firstly, the Hi-C reads were mapped to the draft assembly by using BWA30. Then, the low-quality reads and duplications were removed to build raw inter/ intra-chromosomal contact maps. Last, based on the agglomerative hierarchical clustering algorithm31, Lachesis (Version 201701) was applied for clustering, ordering and orienting, and the scaffolds from genomics were clustered into 9 pseudochromosomes32. Finally, Juicebox software (Version 2.20.00) was used to manually correct scaffolded chromosomes and plot heatmap of genomic interactions33. Above all, we obtained a 134,189,547 bp H. filipjevi genome including 9 pseudo-chromosomes, covering ~89.4% of the whole genome (Fig. 1) and 652 supper-scaffolds, the contig N50 and scaffold N50 are 0.45 Mb and 11.88 Mb, respectively (Table 2). Circos (version 0.64) was used to visualize the H. filipjevi genome data34.
Characteristics of the H. filipjevi genome. (a) Hi-C intra-chromosomal contact map of the H. filipjevi genome assembly. (b) Circos plot of the H. filipjevi genome assembly. (1) TRF distribution density; (2) DNA type repeat density; (3) LINE type; (4) SINE type repeat density; (5) LTR repeat density; (6) gene density; (7) GC content.
The completeness of genome assembly was assessed by the following methods. First, the Core Eukaryotic Genes Mapping Approach (CEGMA)35 was conducted based on a core gene set involved in 248 evolutionarily conserved genes from six eukaryotic model organisms. The CEGMA evaluation results showed that 248 CEGs assembled 230 genes, with a proportion of 92.74%, indicating that the assembly results were relatively complete. Second, the BUSCO36 (version 5.4.3) at genome model was used to evaluate the completeness of genomes in this study using nematoda_odb10 as a database. And we obtained a 55.8% assembly completeness, similar to other reported cyst nematode genomes (43.4–59.3%)19. Finally, small fragment library reads were aligned to the assembled genome using BWA software the alignment rate of the total small fragment reads to the genome is about 93.14% and the coverage is about 99.71%, indicating a good consistency between the reads and the assembled genomes (Table 3).
Genomic repeat annotation
Two technologies were applied to the annotation of repetitive sequences within H. filipjevi genome, including homologous comparison and ab initio prediction. For homologous comparison, RepeatMasker (Version 3.3.0) and the associated RepeatProteinMask were performed by aligning against Repbase database37. For ab initio prediction, LTR_FINDER (version 1.0.7)38, RepeatScout (Version 1.0.5)39 and RepeatModeler (Version 1.0.4) were firstly used for de novo candidate database construction of repetitive elements. Followed by, the repetitive sequences were annotated using RepeatMasker, while the tandem repeats were ab initio predicted using TRF (Version 4.07b)40. By combining Repbase and de novo datasets, we obtained a total of 61.91 Mb of consensus and nonredundant repetitive sequences, which occupied 46.14% of the genome (Table 5).
Gene prediction and functional annotation
Three approaches were employed for predicting the protein-coding genes within H. filipjevi genome, including homology-based prediction, ab initio annotation, and transcriptome-based prediction. For homology-based prediction, firstly, protein repertoires of H. glycines (GCA_004148225.2)41, G. pallida (GCA_000724045.1)42, G. rostochiensis (GCA_900079975.1)43, M. incognita (GCA_900182535.1)44, M. hapla (GCA_000172435.1)45, Caenorhabditis elegans (GCA_000002985.3)46, Haemonchus contortus (GCA_000442195.1)47, Pristionchus pacificus (GCA_918442795.1)48, Brugia malayi (GCA_000002995.5)49, Drosophila melanogaster (GCA_029775095.1)50 and Homo sapiens (GCA_024586135.1)51 were aligned against the H. filipjevi genome using TBLASTN (Version 2.2.29)52. Secondly, the BLAST hits were conjoined by Solar software (version 0.9.6)53. Thirdly, GeneWise (version 2.2.0)54 was used to predict the exact gene structure of the corresponding genomic region on each BLAST hit. Notably, homology predictions were denoted as “Homology-set”. In addition, about 33.2 Gb clean data of RNA-sequencing (RNA-seq) data derived from six developmental stages of H. filipjevi were assembled with Trinity (version 2.0)55, followed by, the assembled sequences were aligned against H. filipjevi genome using Program to Assemble Spliced Alignment (PASA) (version 2.0.2)56. The resulting effective alignments were clustered based on genome mapping location and assembled into gene structures. Notably, gene models created by PASA were denoted as PASA-T-set (PASA Trinity set). For ab initio annotation, five tools were simultaneously employed, including Augustus (version 3.0.2)57, GeneID (version 1.4)58, GeneScan (version 1.0)59, GlimmerHMM (version 3.0.2)60 and SNAP (version 11-29-2013)61. Among them, Augustus, SNAP and GlimmerHMM were trained by PASA-T-set gene models. For transcriptome-based prediction, RNA-seq reads were directly mapped to the genome using Tophat (version 2.0.9)62. Then the mapped reads were assembled into gene models (Cufflinks-set) by Cufflinks (version 2.1.1)63. According to these three approaches, all the gene models were ultimately integrated by Evidence Modeler64. The weight of each evidence was set as follows: PASA-T-set > Homology-set > Cufflinks-set > Augustus > GeneID = SNAP = GlimmerHMM = GeneScan. Meanwhile, in order to get the untranslated regions (UTR) and alternative splicing variation information, PASA2 was used to update the gene models. Ultimately, a total of 13,352 protein-coding genes were predicted in the H. filipjevi genome. The average transcript length was 3,258.25 bp with an average coding sequence (CDS) length of 1,235.80 bp. The average exon number per gene was 8.14 with an average exon length of 151.89 bp and average intron length of 283.41 bp (Table 6). The statistics of gene models, including lengths of a gene, CDS, intron, and exon in H. filipjevi were comparable to those for close-related species (Fig. 2).
The composition of gene elements in the H. filipjevi genome to other species. (a) CDS length distribution and comparison with other species. (b) Exon length distribution and comparison with other species. (c) Exon number distribution and comparison with other species. (d) Gene length distribution and comparison with other species. (e) Intron length distribution and comparison with other species.
In addition, the gene structures of transfer RNAs (tRNA), ribosomal RNAs (rRNA) and other non-coding RNAs in H. filipjevi genome were predicted. Specifically, the tRNA were predicted using tRNAscan-SE software (version 1.3.1)65. The rRNA fragments were predicted by searching against invertebrate rRNA database using BLAST with an E-value of 1E−10. The microRNAs (miRNA) and small nuclear RNAs (snRNA) genes were predicted by INFERNAL (version 1.1.1)66 using Rfam database67.
The predicted protein-coding genes in H. filipjevi genome were functionally annotated based on homologous searches against databases of SwissProt68, NR database (NCBI)69, Gene Ontology70, InterPro71 and KEGG pathway72. Notably, InterproScan tool73 in coordination with InterPro database74 were applied to predict protein function based on the conserved protein domains and functional sites. A total of 10,036 genes (75.20%) were successfully annotated by at least one public database (Fig. 3).
Data Records
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive75 in National Genomics Data Center (NGDC)76, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA014195)77 that are publicly accessible at https://ngdc.cncb.ac.cn/gsa. The genome assembly has been deposited in DDBJ/ENA/GenBank under the accession number JBDPZO00000000078, and NGDC under the GSA accession CRA01500279. Data of the gene functional annotations had been deposited at Figshare80.
Technical Validation
Nucleic acid quality
The DNA quality and concentration were assessed using agarose gel electrophoresis and a Qubit 2.0 Fluorometer (Life Technologies, CA, USA).
Evaluation of genome assembly
Various different strategies were used to evaluate the completeness and accuracy of the H. filipjevi genome. First, our assembly genome was verified to have high completeness by CEGMA35 (92.74%), indicating that the assembly results are relatively complete. Second, the BUSCO36 (v5.4.3) at genome model was used to evaluate the completeness of genomes in this study and other published genome, using nematoda_odb10 as a database. We obtained a 55.8% assembly completeness, similar to other reported plant nematode genomes (43.4–59.3%)11,12,13,14,19. The low completeness of the BUSCO estimates can be attributed to the substantial genetic divergence between the nematoda_odb10 database and cyst nematodes, with large differences in protein sequences. Moreover, to evaluate the accuracy of the assembly, we used BWA software to align small fragment library reads to the assembled genome to calculate the alignment rate, coverage degree and depth of reads. The results show that the alignment rate of the total small fragment reads to the genome is about 93.14% and the coverage is about 99.71%, indicating a good consistency between the reads and the assembled genomes (Table 3).
Code availability
No custom code was used for this study. All data analyses were conducted using published bioinformatics software with default settings, unless otherwise specified.
References
Nicol, J. M., Elekçioğlu, I. H., Bolat, N. & Rivoal, R. The global importance of the cereal cyst nematode (Heterodera spp.) on wheat and international approaches to its control. Commun. Agric. Appl. Biol. Sci. 72, 677–686 (2007).
Sikora, R. A. Nematodes Parasitic to Cereals & Legumes in Temperate Semi-arid Regions: Plant Parasitic Nematodes of Wheat and Barley in Temperate and Temperate Semiarid Regions—A Comparative Analysis (A Workshop Held at Larnaca, 1988).
Mokrini, F. et al. The importance, biology and management of cereal cyst nematodes (Heterodera spp.). Institut Agronomique et Vétérinaire Hassan II 4, 414 (2017).
Smiley, R. W. et al. Plant-parasitic nematodes associated with reduced wheat yield in Oregon: Heterodera avenae. J. Nematol. 3, 297–307 (2005).
Nicol, J. M. et al. Genomics and Molecular Genetics of Plant-Nematode Interactions: Current Nematode Threats to World Agriculture (The Netherlands: Springer, 2011).
Folkertsma, R. T. et al. Gene pool similarities of potato cyst nematode populations assessed by AFLP analysis. Mol Plant Microbe Interact. 9, 47–54 (1996).
Li, H. L. et al. First record of the cereal cyst nematode Heterodera filipjevi in China. Plant Dis. 94, 1505 (2010).
Smiley, R. W. et al. First record of the cyst nematode Heterodera filipjevi on wheat in Oregon. Plant Dis. 92, 1136 (2008).
Karimipour, F. H. et al. Assessment of yield loss of wheat cultivars caused by Heterodera filipjevi under field conditions. J Phytopathol 166, 299–304 (2018).
Hajihasani et al. Effect of the cereal cyst nematode, Heterodera filipjevi, on wheat in microplot trials. Nematology 3, 357–363 (2010).
Cotton, J. A. et al. The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode. Genome Biol. 15, 43 (2014).
Eves-van den Akker, S. et al. The genome of the yellow potato cyst nematode, Globodera rostochiensis, reveals insights into the basis of parasitism and virulence. Genome Biol. 17, 124 (2016).
Lian, Y. et al. Chromosome-level reference genome of X12, a highly virulent race of the soybean cyst nematode Heterodera glycines. Mol. Ecol. Resour. 19, 1637–1646 (2019).
Masonbrink, R. et al. The genome of the soybean cyst nematode (Heterodera glycines) reveals complex patterns of duplications involved in the evolution of parasitism genes. BMC Genomics 20, 119 (2019).
Kikuchi, T. et al. Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus. Plos Pathogens. 7, e1002219 (2011).
Wu, S. et al. A reference genome of Bursaphelenchus mucronatus provides new resources for revealing its displacement by pinewood nematode. Genes (Basel) 11, 570 (2020).
Zheng, J. W. et al. The Ditylenchus destructor genome provides new insights into the evolution of plant parasitic nematodes. Proc. Biol. Sci. 283, 20160942 (2016).
Lunt, D. H., Kumar, S., Koutsovoulos, G. & Blaxter, M. L. The complex hybrid origins of the root knot nematodes revealed through comparative genomics. PeerJ. 2, e356 (2014).
Dai, D. et al. Unzipped chromosome-level genomes reveal allopolyploid nematode origin pattern as unreduced gamete hybridization. Nat Commun. 14, 7156 (2023).
Cui, J. K. et al. Characterization of putative effectors from the cereal cyst nematode Heterodera avenae. Phytopathology 108, 264–274 (2018).
Kumar, M. et al. De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae. PLoS One 9, e96311 (2014).
Yang, D., Chen, C., Liu, Q. & Jian, H. Comparative analysis of pre- and post-parasitic transcriptomes and mining pioneer effectors of Heterodera avenae. Cell & Bioscience 7, 11 (2017).
Zheng, M. et al. RNA-Seq based identification of candidate parasitism genes of cereal cyst nematode (Heterodera avenae) during incompatible infection to Aegilops variabilis. PLoS One 10, e0141095 (2015).
Hussey, R. S. & Barker, K. R. A comparison of methods of collecting inocula of Meloidogyne species, including a new technique. Plant Dis. Rep. 57, 1025–1028 (1973).
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).
Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with burrows- wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Cotten, J. A. Cytological investigations in the genus Heterodera. Nematologica. 11, 337–342 (1965).
Robinson, J. T. et al. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258 (2018).
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Xu, Z. & Wang, H. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_004148225.2 (2021).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000724045.1 (2014).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_900079975.1 (2016).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_900182535.1 (2017).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000172435.1 (2008).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000002985.3 (2013).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000442195.1 (2013).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_918442795.1 (2021).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000002995.4 (2019).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_029775095.1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_024586135.1 (2022).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Castro, D., Duarte, V. C. M. & Andrade, L. Perovskite solar modules: design optimization. ACS Omega. 7, 40844–40852 (2022).
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988–995 (2004).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 29, 644–652 (2011).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 8, 1494–1512 (2013).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435–439 (2006).
Parra, G., Blanco, E. & Guigó, R. GeneID in drosophila. Genome Res. 10, 511–515 (2000).
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Kim, D. & Salzberg, S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12, R72 (2011).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, 7 (2008).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: Inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2013).
Kretschmann, E., Fleischmann, W. & Apweiler, R. Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT. Bioinformatics 17, 920–926 (2001).
Schoch, C. L. et al. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford). 2020, baaa062 (2020).
Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, 1049–1056 (2015).
Finn, R. D. et al. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 45, 190–199 (2017).
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, 109–114 (2012).
Jones, P. et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genom, Proteom & Bioinf. 19, 578–583 (2021).
Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res. 50, 27–38 (2022).
NGDC/CNCB. Genome Sequence Archive. https://ngdc.cncb.ac.cn/gsa/search?searchTerm=CRA014195 (2024).
Yao, K. Heterodera filipjevi isolate KY-2024, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBDPZO000000000 (2024).
NGDC/CNCB. Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/search?searchTerm=CRA015002 (2024).
Peng, H. This is the Heterodera flipjevi genome of chromosome level, longest transcripts, predicted gene models and proteins. figshare https://doi.org/10.6084/m9.figshare.25243105 (2024).
Acknowledgements
This research was supported by the National Key R&D Program of China (2021YFD1400100), the National Natural Science Foundation of China (32302328 & 31972247), the first batch of “2 + 5” key talent plan in Xinjiang Uygur Autonomous Region and the Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences.
Author information
Authors and Affiliations
Contributions
H.P. and Q.H.W. designed the study. K.Y., J.K.C., J.Z.J., D.L.P., W.K.H., L.A.K., S.M.L. carried out genome sequencing and assembly. K.Y. and H.P. drafted the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yao, K., Cui, J., Jian, J. et al. Chromosome-level genome assembly of the cereal cyst nematode Heterodera flipjevi. Sci Data 11, 637 (2024). https://doi.org/10.1038/s41597-024-03487-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03487-7
- Springer Nature Limited