Chromosome genome assembly and whole genome sequencing of 110 individuals of Conogethes punctiferalis (Guenée)

Gao, Bojia; Peng, Yan; Jin, Minghui; Zhang, Lei; Han, Xiu; Wu, Chao; Yuan, He; Awawing, Andongma; Zheng, Fangqiang; Li, Xiangdong; Xiao, Yutao

doi:10.1038/s41597-023-02730-x

Chromosome genome assembly and whole genome sequencing of 110 individuals of Conogethes punctiferalis (Guenée)

Data Descriptor
Open access
Published: 16 November 2023

Volume 10, article number 805, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome genome assembly and whole genome sequencing of 110 individuals of Conogethes punctiferalis (Guenée)

Download PDF

Bojia Gao ORCID: orcid.org/0000-0003-2947-5598¹^na1,
Yan Peng¹^na1,
Minghui Jin¹^na1,
Lei Zhang¹^na1,
Xiu Han²,
Chao Wu¹,
He Yuan¹,
Andongma Awawing³,
Fangqiang Zheng⁴,
Xiangdong Li⁴ &
…
Yutao Xiao¹

1603 Accesses
1 Altmetric
Explore all metrics

Abstract

The yellow peach moth, Conogethes punctiferalis, is a highly polyphagous pest widespread in eastern and southern Asia. It demonstrates a unique ability to adapt to rotten host fruits and displays resistance to pathogenic microorganisms, including fungi. However, the lack of available genomic resources presents a challenge in comprehensively understanding the evolution of its innate immune genes. Here, we report a high-quality chromosome-level reference genome for C. punctiferalis utilizing PacBio HiFi sequencing and Hi-C technology. The genome assembly was 494 Mb in length with a contig N50 of 3.25 Mb. We successfully anchored 1,226 contigs to 31 pseudochromosomes. Our BUSCO analysis further demonstrated a gene coverage completeness of 96.3% in the genome assembly. Approximately 43% repeat sequences and 21,663 protein-coding genes were identified. In addition, we resequenced 110 C. punctiferalis individuals from east China, achieving an average coverage of 18.4 × and identifying 5.8 million high-quality SNPs. This work provides a crucial resource for understanding the evolutionary mechanism of C. punctiferalis’ innate immune system and will help in developing new antibacterial drugs.

Chromosome-level genome assembly of the flower thrips Frankliniella intonsa

Article Open access 30 November 2023

A chromosome-level genome assembly of the spider mite Tetranychus piercei McGregor

Article Open access 05 April 2024

The miniature genome of broad mite, Polyphagotarsonemus latus (Tarsonemidae: Acari)

Article Open access 09 July 2024

Background & Summary

The yellow peach moth, Conogethes punctiferalis (Guenée) (Lepidoptera: Crambidae) is a polyphagous pest widely distributed in eastern and southern Asia, and can damage many crops, fruits, vegetables and spices¹. Ostrinia furnacalis (Guenée) is the most important maize pest in most of Asia². However, in some areas, C. punctiferalis has replaced and caused even more serious damage than O. furnacalis in recent years due to an enhanced fitness on maize³. It is challenging to control C. punctiferalis as its larvae bores into fruit and maize plants, enabling them to evade insecticide applications, entomopathogenic fungi and parasitoids⁴. C. punctiferalis infestations inflict serious damage to crops, and despite poor plant health, larvae survive well on moldy maize (Figure S1)¹. Interestingly, recent studies showed that C. punctiferalis prefers to oviposit on apples infected with the fungus Penicillium citrinum, which may be explained by the altering the emission of host plant volatile organic compounds⁵. The bacteria Wolbachia can infect 66% of insect species and the infection rates follow a ‘most or few’ pattern, either being very high (>90%) or very low (<10%)⁶. In C. punctiferalis, the infection rate is only 4.5% which suggests that C. punctiferalis has the ability to resist some intracellular bacterial infections. In addition, few entomopathogenic fungi have been isolated from C. punctiferalis in its natural environment⁷. All of this suggest that, C. punctiferalis can adapt to a diverse microbial environment throughout its life history, which poses challenges for its management using entomopathogenic fungus. Genome analysis could provide valuable insights in developing pest control strategies for managing agricultural pests. The lack of a reference genome could hinder the understanding of molecular mechanisms underlying innate immune system. In this study, we constructed a chromosome-level genome assembly of C. punctiferalis using PacBio circular consensus sequencing (CCS) and Hi-C (High-throughput chromosome conformation capture) technology which provides a high resolution for phylogenetic and comparative genomic analyses. Furthermore, we have sequenced 110 C. punctiferalis individuals from 9 provinces in eastern China and generated a variants dataset containing 5.8 million SNPs which can be directly analysed without the need for re-processing the original data. Our study will facilitate the understanding of innate immune evolution of C. punctiferalis and help in developing new antibacterial drugs or an efficient pest control strategy.

Methods

Sample collection and sequencing

The experiment utilized a colony of C. punctiferalis, which has been maintained in the lab for more than 35 generations, originating from wild individuals collected from ears of maize in Tai’an, China (36° 10′ 00″ N, 117° 09′ 18″ E). The larvae used for sequencing were reared on fresh maize kernels at 25°C, a 14:10 (L:D) photoperiod cycle, and 75% relative humidity. A male and a female moth were chosen for mating. Among their offspring, a healthy male moth was selected for the extraction of the genomic DNA for PacBio HiFi sequencing. Additionally, another male individual was used for the construction of Hi-C library. The PacBio library (10 kb) was constructed and sequenced on Sequel II system. HiFi reads were produced from the raw subreads using the CCS workflow with the suggested parameters⁸. In total, it generated ~13.8 Gb PacBio long reads (N50 read length ~11.4 kb) which corresponded to approximately 28 × coverage of the genome. To achieve chromosome-level assembly, the Hi-C technique was used to identify contacts between different regions of the chromatin filament. The Hi-C library building process was carried out following the method described in a previous study⁹. A fifth instar male larvae from the same offspring was placed in formaldehyde to fix chromatin. The restriction enzyme Dpn II was used to digest DNA. The nucleotides of 5′ overhangs were labelled with biotin and blunt-end fragments were ligated. Subsequently, by reversing the crosslinks, the purified and sheared DNA was sequenced using a standard procedure on the Illumina platform with a paired-end (PE150) sequencing strategy.

Based on its distribution, a total of 110 C. punctiferalis individuals were sampled from nine locations in China: Anhui (n = 3), Hebei (n = 18), Henan (n = 14), Shandong (n = 50), Liaoning (n = 4), Jiangsu (n = 2), Sichuan (n = 3), Jiangxi (n = 1), and Hubei (n = 15) (Table S1). Genomic DNA (~0.4 μg per individual) was extracted from the whole body using DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany). For each individual, the genomic DNA was used to construct a sequencing library with an average insert size of approximately 350 bp. Subsequently, paired-end reads of 2 × 150 bp were generated using both the Illumina Novaseq 6000 platform and the BGISEQ-500 platform.

Estimation of genome size and assembly of the genome

Genome size estimation, heterozygosity, and repetitiveness were analyzed using Jellyfish v2.2.10 and GenomeScope (http://qb.cshl.edu/genomescope/0) with 17 K-mer frequencies¹⁰, which predicted a genome size of approximately 440 Mb in homogametic males (Figure S2). The PacBio HiFi reads were used to produce final assemblies by Hifiasm v0.3.0 (https://github.com/chhylp123/hifiasm)¹¹. Purge_Dups v1.2.3 (https://github.com/dfguan/purge_dups) was used to remove haplotigs¹². The clean Hi-C generated reads were mapped to the draft assembly by BWA v0.7.5a¹³. Subsequently, the alignment file was used to produce scaffold-level assembly by SALSA2 (https://github.com/marbl/SALSA)¹⁴. Then the Hi-C generated reads mapped to scaffold-level assembly by BWA v0.7.5a and linkage information between scaffolds were saved as BAM files. Finally, the chromosome-level assembly was produced by ALLHiC v0.9.13 (https://github.com/tangerzhang/ALLHiC) with the parameters “-e GATC -k 31”¹⁵. BUSCO v4.0.0 was used to assess genome assembly with the insecta.odb10 database¹⁶. The assembled genome size was determined to be 494 Mb with a contig N50 of 3.25 Mb. Through Hi-C scaffolding, 1,226 out of 1,244 contigs were assigned to 31 pseudochromosomes, with an N50 of 17.97 Mb (Fig. 1, Table 1). In comparison to the GC content of Lepidoptera genomes, the C. punctiferalis genome exhibited a higher GC content at 39.5% (Table S2). Transposable elements (TEs) comprise approximately 43% of the C. punctiferalis genome. The most abundant repeat elements were long interspersed elements (LINEs), accounting for 16.95% (Table 2).

Table 1 Statistics of the C. punctiferalis genome assembly.

Full size table

Table 2 Repetitive sequences in C. punctiferalis genome assembly.

Full size table

Genome annotation

Prior to gene prediction, we employed RepeatModeler v1.0.11 (https://www.repeatmasker.org/RepeatModeler/) for de novo identification of repeat elements and used RepeatMasker v4.1.0 (http://www.repeatmasker.org) with the parameter “-e ncbi” to annotate the repeats in the genome sequence¹⁷. Then the BRAKER2 (https://github.com/Gaius-Augustus/BRAKER) pipeline was used to annotate the soft-masked genome with the parameters “--geneMarkGtf, --softmasking, --crf, --useexsiting”, which integrates RNAseq based, homology-based and de novo methods¹⁸. RNA was extracted from four developmental stages including eggs, fifth instar larvae (male and female), pupae (male and female), and adults (male and female) using Trizol and sample integrity was analysed on an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA). The construction of the cDNA library and paired-end RNA-seq (Illumina) were carried out by Grandomics Co. Ltd. The RNA-seq data of different developmental stages were mapped to genome by STAR v 2.7.1a (https://github.com/alexdobin/STAR) to generate RNA-Seq spliced alignment¹⁹. Meanwhile, protein sequences of six insect genomes (Chilo suppressalis, Ostrinia furnacalis, Amyelois transitella, Galleria mellonella, Drosophila melanogaster, Bombyx mori) obtained from NCBI were mapped to C. punctiferalis genome using GenomeThreader (https://genomethreader.org/) with the parameters “-gff3out, -skipalignmentout”. Then RNAseq and homologous proteins alignment information was used for GeneMark-ET v4 (http://exon.gatech.edu/GeneMark/license_download.cgi) and AUGUSTUS v3.3.3 (https://github.com/Gaius-Augustus/Augustus) training and gene prediction. Function annotation was carried out by aligning the predicted gene sets to NCBI-NR, Kyoto Encyclopedia of Genes and Genomes (KEGG) and eggNOG databases²⁰. The GO annotation and InterPro and PFAM domain identification was obtained using InterProscan prediction²¹. Using the BRAKER2 genome annotation pipeline, a total of 21,663 protein-coding genes were annotated and the average lengths of CDS were 1,485 bp. The functional annotation showed that 99.35% of genes had hits in the Nr databases and 78.20% in the eggnog database (Table 3). Additional assembly and annotation statistics can be found in Tables 1, 3. The annotations of gene structure, gene function, and repeats were uploaded in figshare (see Data Records).

Table 3 Functional annotation of the C. punctiferalis genome assembly.

Full size table

Variants calling and filtering

Quality control of sequencing data was conducted by fastp (https://github.com/OpenGene/fastp)²². The clean reads were mapped to the reference genome using BWA v0.7.5¹³. Variant calling was carried out by Genome Analysis Toolkit (GATK v 4.0.12.0)²³. The PCR duplicates of each of the samples were marked by MarkDuplicates and HaplotypeCaller was run on each bam file to call SNPs. The SNPs were further filtered using the following criteria: “QD < 2.0 || FS > 60.0 || MQ < 40.0 || SOR > 3.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0”. SNPs with minor allele frequency (MAFs) less than 5% were discarded. The missing rate of a genotype for population was 0. Linkage disequilibrium pruning was performed by PLINK v1.90b6.24 using a window size of 50 SNPs (advancing 5 SNPs at a time) and an r² threshold of 0.5²⁴. The filtered SNPs sets were used for PCA performed by GCTA v1.93.2 (https://yanglab.westlake.edu.cn/software/gcta/#Download)²⁵. The acquired SNPs were annotated and categorized by SnpEff (http://pcingola.github.io/SnpEff/download/)²⁶. We sampled a total of 110 C. punctiferalis individuals from 9 provinces in eastern China and carried out genome sequencing (Fig. 2a). Each individual was sequenced at an average of ~18.4 × , yielding a total of 4104.9 Gb of data (Table S1). The analysis of the 5.8 million high-quality single nucleotide polymorphisms (SNPs) revealed that a majority of these SNPs are situated either upstream (36.04%) or downstream (36.17%) (Fig. 3a). The SNP density within the genome is shown in Fig. 2b and the average nucleotide diversity (Π) was 3 × 10⁻³. According to Gene Ontology (GO) enrichment analysis of genes with the lowest 5% Π score, these genes were enriched in molecular functions including egg chorion (GO:0042600), multicellular organism development (GO:0007275) and ATP binding (GO:0005524) (Fisher’s exact test, p < 0.05) (Fig. 3b).

Data Records

The CCS reads (SRR25434291), Hi-C reads (SRR25443395), RNA-Seq reads (SRR25435727, SRR25435728, SRR25435729 and SRR25435730), and resequencing data (SRR25472874 - SRR25472913 and SRR25619108 - SRR25619177) have been deposited in the NCBI’s Sequence Read Archive (SRA), under study accession number SRP451679²⁷. The reference genome was deposited in the GenBank with the accession number GCA_031163375.1²⁸. The genetic variation data have been deposited in the European Variation Archive (EVA) with accession number ERZ21819523²⁹. The results of genome annotation, variants calling, orthology, genome synteny, and positive selection were deposited in the Figshare database³⁰. For convenience, data are organized into fourteen files:

1.
blastp_result.txt (2.88 MB): This file contains the annotation result from the alignment to NCBI-NR.
2.
emapper.annotations (12.06 MB): This file contains the annotation result from the alignment to the eggNOG database. It also contains GO and KEGG terms.
3.
interproscan.tsv (1019.61 MB): This file contains the annotation results from the alignment to multiple structure databases.
4.
gene_structure_ypm.gff (121.11 MB): It provides various genomic features of the genome, such as gene and transcript locations and structure.
5.
ypm_genome.codingseq (34.85 MB): This file contains predicted gene sets of the whole genome in FASTA format.
6.
positive_selection.tsv (29.17 kB): This file contains information about genes that showed positive selection in the genome of C. punctiferalis.
7.
C. punctiferalis-Cn. medinalis.collinearity (346.86 kB): This file contains identified syntenic blocks information between C. punctiferalis and Cn. medinalis.
8.
C. punctiferalis-Ch. suppressalis.collinearity (350.41 kB): This file contains identified syntenic blocks information between C. punctiferalis and Ch. suppressalis.
9.
Organism.tree (0.43 kB): It provides the phylogenetic tree of fourteen species in NEWICK format.
10.
ortho.csv (1.59 kB): This file contains the number of orthologous genes present in different species.
11.
go_enrich.csv (16.16 kB): It provides data from the GO enrichment analysis of expanded gene families in C. punctiferalis compared to the ancestor.
12.
snp_count.csv (0.14 kB): It provides the count of SNPs based on their genomic locations.
13.
low_pi_go_enrich.txt (15.92 kB): This file contains the list of genes with the lowest 5% Π score.
14.
repeats.gff (121.11 MB): It provides genomic features of the repeats sequence identified in the genome.

Technical Validation

Genome assembly assessment

The completeness of genome assembly was assessed by searching for 1,367 single-copy insect genes using Benchmarking Universal Single-Copy Ortholog (BUSCO). The assembly contains 95.4% complete and single-copy BUSCOs, 0.8% complete and duplicated, 0.1% fragmented and 3.7% missing (Table 1). The Hi-C heatmap was used to confirm the accuracy of the chromosome assembly which revealed a well-organized interaction contact pattern along the diagonals within/around the chromosome inversion region (Fig. 1). Furthermore, we used minimap2 v 2.17-r954-dirty with the parameters “-ax map-pb” to align the assembly with the HiFi data³¹. A total of 99.87% of the HiFi reads were mapped to the genome assembly and the coverage rate was 99.83%.

Orthology validation

The protein-coding sequences of Lepidoptera except for C. punctiferalis and D. melanogaster (FlyBase Release 6.32) were downloaded from NCBI and Ensembl. Each gene with the longest transcript was used to identify orthologous groups. Orthologous genes cluster were performed with OrthoFinder v2.4.0 (https://github.com/davidemms/OrthoFinder)³². Single-copy orthologous genes were extracted from the clustering results. Multiple alignments were independently performed for each group using MAFFT v7.453 with the “-auto” parameter and TrimAL v1.2rev59 was used for alignment trimming³³. The alignments of all single-copy orthologs were concatenated to a super gene. A species tree was built using IQ-TREE v2.0.6 (https://github.com/Cibiv/IQ-TREE) with a model selection across each partition and 1000 ultrafast bootstrap replicates³⁴. R8s v1.81 was used to estimate species divergence time and mean substitution rates³⁵. Two secondary calibration points obtained from the TimeTree database were used to calibrate our phylogeny: 67 and 76–102 million years ago for the split time of B. mori – M. sexta and D. plexippus – H. melpomene, respectively³⁶.

The phylogeny was analysed using 1,213 single-copy orthologs of 13 lepidopteran species and D. melanogaster as an outgroup. The phylogenetic analysis showed that the yellow peach moth C. punctiferalis and the Asian corn borer O. furnacalis formed a sister lineage to the Crambidae. The three Crambidae (Ch. suppressalis, C. punctiferalis and O. furnacalis) and two Pyralidae (G. mellonella and A. transitella) species were clustered together and diverged less than 100 Myr ago (Fig. 2d). There were 10,971 orthogroups (70.5% of 15,557 orthogroups) shared by Crambidae and Pyralidae (Fig. 2d). For the three Crambidae species, C. punctiferalis shares more orthologs with O. furnacalis than Ch. suppressalis (Figure S3).

The identification of each gene family expansions and contractions within the phylogeny was obtained using CAFÉ v 4.0.1 with a p-value < 0.05 as the cut-off³⁷. Expanded gene set enrichment analysis was carried out using R package clusterProfiler³⁸. 5,686 orthogroups are generally conserved across different species, and 2,167 ones are species-specific for C. punctiferalis (Fig. 2d). In comparison to the common ancestor of C. punctiferalis and O. furnacalis, there were 1,834 gene families that exhibited expansion, while 1,121 gene families experienced contraction in C. punctiferalis. The expansion genes were involved in 33 classes of molecular functions including the catalytic activity (GO:0003824) and zinc ion binding (GO:0008270) (Fisher’s exact test, p < 0.05) (Fig. 4). From the analysis on the domain of expanded gene families, a majority of them were related to transposon activity and membrane proteins (Table S3).

Genome synteny

The genome of C. punctiferalis was compared with the genomes of two other Crambidae species, Ch. Suppressalis and Cnaphalocrocis medinalis, respectively. Collinear blocks that included a minimum of five genes were identified using MCScanX³⁹. The analysis revealed 413 syntenic blocks between C. punctiferalis and Ch. Suppressalis, and 304 syntenic blocks between C. punctiferalis and Cn. medinalis. In the genome of C. punctiferalis, there were 5,662 genes that exhibited relatively conservation and could be identified in both C. punctiferalis-Ch. suppressalis and C. punctiferalis-Cn. medinalis syntenic blocks. Out of the genes in C. punctiferalis, only 2,266 genes were found to be present in C. punctiferalis-Ch. suppressalis syntenic blocks, while 2,441 genes from C. punctiferalis were identified in C. punctiferalis-Cn. medinalis syntenic blocks. The identified syntenic blocks information have been uploaded to figshare (see Data Records).

Positive selection

We tested positive selection on 5,584 single-copy genes of Pyraloidea lineages including three Crambidae species (O. furnacalis, Ch. suppressalis and C. punctiferalis) and two Pyralidae species (A. transitella and G. mellonella). PRANK v130410 and PAL2NAL v14 were used to align proteins and generate codon alignments⁴⁰. The branch-site model aBSREL of HYPHY v2.5.28 framework (https://github.com/veg/hyphy) was used to analyse positive selection on C. punctiferalis⁴¹. The dn/ds less than 50 across more than 5% sites and p-value of less than 0.05 at C. punctiferalis node were kept. There were 154 genes showing the signature of positive selection in the genome of C. punctiferalis. The information regarding these positively selected genes has been uploaded to figshare (see Data Records).

Population structure

The pairwise F_ST of all populations of C. punctiferalis were calculated using VCFtools v0.1.13 software with window size 10 kb⁴². The results of principal component analysis (PCA) and F_ST showed that there was no significant differentiation among different geographic populations (Fig. 2c and Table S4). Due to various reasons, we could not sample equal numbers of individuals from all locations. The uneven sampling might have an impact on the final results. However, the results from our population genetics studies are consistent with the results from a previous study⁴³.

Code availability

All software and pipelines were executed following the manuals and protocols provided by the published bioinformatic tools. The version and parameters of software have been described in Methods. If no detail parameters were mentioned for a software, default parameters were used as recommended by developer.

References

Shwe, S. M., Prabu, S., Jing, D., He, K. & Wang, Z. Synergistic interaction of Cry1Ah and Vip3Aa19 proteins combination with midgut ATP-binding cassette subfamily C receptors of Conogethes punctiferalis (Guenée) (Lepidoptera: Crambidae). Int J Biol Macromol. 213, 871–879 (2022).
Article CAS PubMed Google Scholar
He, K. et al. Evaluation of transgenic Bt corn for resistance to the Asian corn borer (Lepidoptera: Pyralidae). J Econ Entomol. 96, 935–940 (2003).
Article CAS PubMed Google Scholar
Yang, S. et al. Impact of durian fruit borer Conogethes punctiferalis on yield loss of summer corn by injuring corn ears. J Plant Prot. 42, 991–996 (2015).
Google Scholar
Lu, J., Wang, Z., He, K. & Liu, Y. Research history, progresses and prospects in the yellow peach moth, Conogethes punctiferalis. Plant Prot. 36, 31–38 (2010).
CAS Google Scholar
Guo, H. et al. Penicillium fungi mediate behavioral responses of the yellow peach moth, Conogethes punctiferalis (Guenée) to apple fruits via altering the emissions of host plant VOCs. Arch Insect Biochem Physiol. 110, e21895 (2022).
Article CAS PubMed Google Scholar
Hilgenboecker, K., Hammerstein, P., Schlattmann, P., Telschow, A. & Werren, J. H. How many species are infected with Wolbachia?-a statistical analysis of current data. FEMS Microbiol Lett. 281, 215–220 (2008).
Article CAS PubMed Google Scholar
Li, J., Zhang, Y., Wang, Z. & He, K. Wolbachia infection in four geographic populations of yellow peach moth,Conogethes punctiferalis in China. J Environ Entomol. 32, 322–328 (2010).
Google Scholar
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 37, 1155–1162 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lieberman-Aiden, E. et al. mapping of long-range interactions reveals folding principles of the human genome. Science. 326, 289–293 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Vurture, G. W. & Sedlazeck, F. J. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 33, 2202–2204 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 36, 2896–2898 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ghurye, J. & Rhie, A. et al. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 15, e1007273 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat Plants. 5, 833–845 (2019).
Article CAS PubMed Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31, 3210–3212 (2015).
Article CAS PubMed Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 25, 4.10.1–4.10.14 (2009).
Article Google Scholar
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics. 3, lqaa108 (2021).
Article PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Article CAS PubMed Google Scholar
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005).
Article CAS PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Mckenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 6, 80–92 (2012).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP451679 (2023).
Gao, B. et al. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_031163375.1 (2023).
European Variation Archive https://identifiers.org/ena.embl:ERZ21819523 (2023).
Gao, B. et al. related data of genome of yellow peach moth. figshare https://doi.org/10.6084/m9.figshare.24189168.v2 (2023).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genomecomparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).
Article PubMed PubMed Central Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 37, 1530–1534 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 19, 301–302 (2003).
Article CAS PubMed Google Scholar
Kumar, S. et al. TimeTree 5: An expanded resource for species divergence times. Mol Biol Evol. 39, msac174 (2022).
Article CAS PubMed PubMed Central Google Scholar
De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 22, 1269–1271 (2006).
Article PubMed Google Scholar
Yu, G., Wang, L., Han, Y. & He, Q. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 16, 284–287 (2012).
Article CAS PubMed Central Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Löytynoja, A. Phylogeny-aware alignment with PRANK. Methods Mol Biol. 1079, 155–170 (2014).
Article PubMed Google Scholar
Kosakovsky Pond, S. L. et al. HyPhy 2.5-A customizable platform for evolutionary hypothesis testing using phylogenies. Mol Biol Evol. 37, 295–299 (2020).
Article PubMed Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics. 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y., Li, J., Wang, Z. & He, K. Genetic diversity of Conogethes punctiferalis (Guenée) (Lepidoptera:Crambidae) populations from different geographic regions in China. Acta Entomol. Sin. 53, 1022–1029 (2010).
CAS Google Scholar

Download references

Acknowledgements

This project was funded by the STI 2030–Major Projects (2022ZD04021), the Shenzhen Science and Technology Program (KQTD20180411143628272), the Agricultural Science and Technology Innovation Program of Chinese Academy of Agricultural Sciences, and the Shandong Modern Agricultural Technology and Industry System (SDAIT-02-10).

Author information

These authors contributed equally: Bojia Gao, Yan Peng, Minghui Jin, Lei Zhang.

Authors and Affiliations

Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
Bojia Gao, Yan Peng, Minghui Jin, Lei Zhang, Chao Wu, He Yuan & Yutao Xiao
Taishan Academy of Forestry Sciences, Taian, 271000, China
Xiu Han
Lancaster Environment Centre, Lancaster University, Lancaster, LAI 4YQ, United Kingdom
Andongma Awawing
College of Plant Protection, Shandong Agricultural University, Taian, 271018, China
Fangqiang Zheng & Xiangdong Li

Authors

Bojia Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Peng
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Jin
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiu Han
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wu
View author publications
You can also search for this author in PubMed Google Scholar
He Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Andongma Awawing
View author publications
You can also search for this author in PubMed Google Scholar
Fangqiang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yutao Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.X. and L.Z. conceived, designed and led the project. B.G. performed sequencing, genome assembly and annotation, gene family analysis, and transcriptome analysis. B.G. and Y.P. performed population genetic analysis. B.G. wrote the manuscript. L.Z., M.J., Y.P., C.W., A. A, F. Z, H.Y., X.H. and X.L. revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yutao Xiao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Chromosome genome assembly and whole genome sequencing of 110 individuals of Conogethes punctiferalis (Guenée)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gao, B., Peng, Y., Jin, M. et al. Chromosome genome assembly and whole genome sequencing of 110 individuals of Conogethes punctiferalis (Guenée). Sci Data 10, 805 (2023). https://doi.org/10.1038/s41597-023-02730-x

Download citation

Received: 25 July 2023
Accepted: 07 November 2023
Published: 16 November 2023
DOI: https://doi.org/10.1038/s41597-023-02730-x
Springer Nature Limited

Chromosome genome assembly and whole genome sequencing of 110 individuals of Conogethes punctiferalis (Guenée)

Abstract

Similar content being viewed by others

Chromosome-level genome assembly of the flower thrips Frankliniella intonsa

A chromosome-level genome assembly of the spider mite Tetranychus piercei McGregor

The miniature genome of broad mite, Polyphagotarsonemus latus (Tarsonemidae: Acari)

Background & Summary