Abstract
Lettuce (Lactuca sativa L., Asteraceae) is one of the most important vegetable crops, known for its various horticultural types and significant morphological variation. The first reference genome of lettuce, a crisphead type (L. sativa var. capitata cv. Salinas), was previously released. Here, we reported a near-complete chromosome-level reference genome for looseleaf lettuce (L. sativa var. crispa). PacBio high-fidelity sequencing, Oxford Nanopore, and Hi-C technologies were employed to produce genome assembly. The final assembly is 2.59 Gb in length with a contig N50 of 205.47 Mb, anchored onto nine chromosomes, containing 14 recognizable telomeres and only 11 gaps. Repetitive sequences account for 77.11% of the genome, and 41,375 protein-coding genes were predicted, with 99.10% of these assigned functional annotations. This chromosome-level genome enriched genomic resources for various horticultural types of lettuce and will facilitate the characterization of morphological variation and genetic improvement in lettuce.
Similar content being viewed by others
Background & Summary
Lactuca sativa L. (Asteraceae), known as lettuce, is considered one of the most important vegetable crops1,2,3,4,5. Originating in the coastal Mediterranean regions, lettuce was featured in Egyptian tombs around 2,500 BC2,3. Today, lettuce is cultivated as diverse horticultural varieties for different purposes, including leafy types (looseleaf, crisphead, romaine, and butterhead) and non-leafy types (stem and oilseed), each with distinct morphological characteristics6,7. Leafy lettuces, particularly looseleaf and crisphead, are consumed globally in salads and hamburgers, and are also popular in hotpot cuisine in China and grilled with red meat in other parts of Asia. Looseleaf lettuce, compared to crisphead, grows faster, can be harvested earlier, and has better tolerates to abiotic stress. Thus, looseleaf lettuce is an important horticultural type for the annual leafy vegetable supply, and genomic research could greatly enhance its economic value.
A high-quality reference genome is crucial for identifying genetic variations, conducting phylogenetic research, and facilitating molecular marker-assisted breeding. As a representative species of the genus Lactuca in the Asteraceae family, the first reference genome for a crisphead lettuce type (L. sativa var. capitata cv. Salinas) was released in 2017, with a genome size of 2.38 Gb and contig N50 of 36 Kb8. With advancements in sequencing technology and broader use of Lactuca species, additional genome assemblies have been published, including those for two wild relatives (L. saligna and L. virosa), and one stem lettuce (L. sativa var. angustana cv. Yanling1)9,10,11. Although these data are useful for identifying intraspecific variation, only two chromosome-level genome assemblies of cultivated lettuce (the crisphead and stem types) have been generated to date8,11. A single or limited number of reference genomes for an economically important crop is insufficient for exploring genetic diversity, which hinders genomic research and molecular breeding12,13. A high-quality genome assembly for the looseleaf type is crucial for identifying genetic variations, inferring phylogenetic relationships among different horticultural types, and facilitating comparative genomic analysis and genetic improvement in lettuce.
In this study, we generated a chromosome-level and near-complete reference genome assembly for looseleaf lettuce (L. sativa var. crispa cv. Green Elegance) using PacBio high-fidelity reads (~46×), Oxford Nanopore reads (~13×), Illumina short reads (~50.39×), and Hi-C reads (~97×). The assembled genome (Green Elegance) had a total length of 2.59 Gb, with a contig N50 of 205.47 Mb and a BUSCO completeness score of 98.39%. A total of 2,580.61 Mb (99.61%) of the genome sequences were anchored to nine chromosomes, featuring 14 recognizable telomeres and 11 gaps. Genome annotation predicted 41,375 protein-coding genes and 77.11% repetitive sequences. These genomic resources provide a roadmap for further genetic and evolutionary investigation.
Methods
Sample collection, library construction and sequencing
Looseleaf lettuce (Lactuca sativa var. crispa cv. Green Elegance) was provided by the Beijing Vegetable Research Center, Beijing Academy of Agriculture and Forestry Science, Beijing, China (Fig. 1). The seedlings were grown in a growth chamber at the Beijing Vegetable Research Center under a photoperiod of 16-hour light (200 μmol m−2 s−1) and 8-hour dark at 25 °C. Fresh and healthy leaves were collected at the rosette stage and immediately frozen in liquid nitrogen for genome survey and sequencing (Table 1). For transcriptomic sequencing, samples included mature leaves, young seedlings (including roots), and inflorescence (Table 1). Newly developed tender leaves, maintained under moist and low-temperature conditions, were used to construct the Hi-C library (Table 1).
High molecular weight genomic DNA was extracted from leaves using a modified CTAB (cetyltrimethylammonium bromide) method14. RNA was removed by adding RNase A. The quality of the DNA was assessed using agarose gel electrophoresis, which confirmed excellent integrity of the DNA molecules.
For Illumina sequencing, a short-read library with an average insert size of 350 bp was constructed and sequenced on an Illumina Novaseq platform (Illumina, CA, USA) using the PE150 program. This yielded 135.8 Gb of raw data. Finally, 124.16 Gb (50.39×) of clean reads were obtained for genome size estimation, sequence correction, and assessment of heterozygosity and repeat content (Table 1 and Fig. 2).
For PacBio HiFi sequencing, genomic DNA was fragmented to ~15 Kb to construct a long-read library following the manufacturer’s instructions (Pacific Biosciences, CA, USA). The library was sequenced on a PacBio Sequel II platform using Circular Consensus Sequencing (CCS) mode. The SMRTbell library was constructed using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences). Library size and quantity were assessed using the FEMTO Pulse (Agilent Technologies, Wilmington, DE) and the Qubit dsDNA HS Assay Kit (Life Technologies, Carlsbad, CA, USA). The library was loaded at a concentration of 55 pM using diffusion loading. Single-molecule real-time (SMRT) sequencing was conducted on a single 8 M SMRT Cell on the Sequel II System. After filtering out the low-quality reads and sequence adapters, we obtained 109.27 Gb (46×) of clean subreads with a reads-length N50 of 17.81 Kb.
Long-read sequencing using the PromethION platform from Oxford Nanopore Technologies (ONT) was performed to fill assembly gaps. High-quality genomic DNA was fragmented to ~8 Kb using a gTube, and the library was constructed with the Ligation Sequencing Kit 1D (Nanopore, SQK-LSK109). We generated 85.80 Gb of raw data, and after filtering out adapters, low-quality reads, and reads shorter than 2 Kb reads, 32.55 Gb clean reads with a clean N50 length of 100,148 Kb were obtained (Table 1).
Genome size and heterozygosity estimation
Short Illumina reads were quality-filtered using fastp15 (v0.12.4; settings ‘-q 10 -u 50 -y -g -Y 10 -e 20 -l 100 -b 150 -B 150’). The quality-filtered reads were used for genome size estimation. We counted the 21-kmers with Jellyfish16 (v2.1.4; k-mer size 21), and Genomescope17 (v2.0; default settings) were used to estimate a genome size of 2.46 Gb, and a genome-wide heterozygosity rate of 0.21% of sites (Fig. 2).
De novo genome assembly
High-accuracy Circular Consensus Sequencing (CCS) data were used to generate 156 contigs, with the longest contig length of 282.47 Mb and an N50 length of 205.47 Mb using hifiasm (v 0.16) software18 (Table 2). This resulted in a total genome sequence size of 2.59 Gb.
To anchor contigs, 251.37 Gb of clean reads pairs from the Hi-C library were mapped to the polished Green Elegance genome using BWA (bwa-0.7.17) with the default parameters. Invalid reads, such as self-ligation, non-ligation, PCR amplification, and random breaks, were filtered out. After correction and filtration, we obtained 77 high-accuracy scaffolds with a scaffold N50 length of 320.76 Mb and a total scaffold length of 2,590.68 Mb (Table 2). We successfully anchored 2,590.61 Mb (100%) of the genome into nine groups, which were designated as nine chromosomes of Green Elegance, using the agglomerative hierarchical clustering method in Lachesis19 (Fig. 3). Lachesis was then used to order and orient the clustered contigs. A total of 2,580.61 Mb (99.61%) was successfully ordered and oriented on the nine chromosomes (Table 3). The Hi-C contact heatmap, generated using Hicexplorer v3.720, revealed nine distinct groups based on interaction intensities between bins (a bin size of 800 Kb), indicating high quality of chromosome construction (Fig. 4). The final chromosomal-level assembly had chromosomal lengths ranging from 205,466,188 bp to 407,155,607 bp, encompassing 99.6% of the total sequence (Table 3). After gap filling with ONT sequencing data, 11 gaps remained across eight chromosomes, with one chromosome being complete. Fourteen telomeres, including 11 complete telomeres longer than 1 Kb, were distributed across the nine chromosomes (Fig. 5, Tables 2 and 3). This genome assembly of L. sativa var. crispa cv. Green Elegance represents a significant improvement in genome continuity (contig N50), gap number, and chromosome anchoring compared to the other sequenced Lactuca plants, including L. sativa var. capitata cv. Salinas, L. sativa var. angustana cv. Yanling1, L. saligna, and L. virosa (Table 2).
Repetitive sequences annotation
Transposon elements (TE) were identified using a combination of homology-based and de novo approaches. A de novo repeat library was first constructed with RepeatModeler (http://www.repeatmasker.org/RepeatModeler/)21. Full-length long terminal repeat retrotransposons (FL-LTR-RTs) were identified using LTRharvest (v1.5.9)22 and LTR_finder (v2.8)23, and a high-quality library was produced with LTR_retriever24. The de novo TE sequences library and known TE sequences from Dfam (v3.5) database were combined to create the final TE sequence set for the Green Elegance genome, which was classified using RepeatMasker (v4.12)25. Tandem repeats were annotated using Tandem Repeats Finder (TRF 409)26 and the MIcroSAtellite identification tool (MISA v2.1)27 with the default parameters (definition: 1–10 2–6 3–5 4–5 5–5 6-5; interruptions: 100). In total, transposon elements and tandem repeats accounted for 77.11% and 4.14% of the Green Elegance genome sequence, respectively, amounting to 2.00 Gb and 107.58 Mb (Table 4).
Gene prediction and functional annotation of protein-coding genes
Three approaches—de novo prediction, homology search, and transcript-based assembly—were integrated for annotating protein-coding genes in the genome (Table 5). De novo gene models were predicted using two ab initio gene-prediction software tools, Augustus (v3.1.0)28 and SNAP (Korf, 2004). For homolog-based prediction, GeMoMa (v1.7) was used with reference gene models from the various species, including Arabidopsis thaliana, Oryza sativa, L. sativa var. capitata cv. Salinas, L. sativa var. angustana, L. serriola, L. virosa, Helianthus annus, Taraxacun kok-saghyz, and Artemisia annua. For transcript-based prediction, RNA-sequencing data were mapped to reference genome using Hisat (v2.1.0)29 and assembled with Stringtie (v 2.1.4)17. GeneMarkS-T (v5.1) was used to predict genes based on these assembled transcripts. Additionally, PASA (v2.4.1) was employed to predict genes based on unigenes and full-length transcripts from PacBio/ONT sequencing assembled by Trinity (v2.11)30. Gene models from these approaches were integrated using EVM (v1.1.1) and updated with PASA. In total, 41,375 protein-coding genes with an average length of 3,744 bp were predicted in the Green Elegance genome (Table 6).
Gene functions were inferred by aligning to the National Center for Biotechnology Information (NCBI) Non-Redundant (NR), EggNOG31, KOG, TrEMBL32, InterPro33 and Swiss-Prot32 protein databases using Diamond blastp (diamond v2.0.4.142) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database34 with an E-value threshold of 1E-5. Protein domains were annotated with InterProScan (v5.34-73.0)35, while motifs and domains within gene models were identified using PFAM databases36. Gene Ontology (GO) IDs for each gene were obtained from TrEMBL, InterPro and EggNOG. Approximately 41,004 (99.10%) of the predicted protein-coding genes in Green Elegance could be functionally annotated with known genes, conserved domains, and Gene Ontology terms (Table 6). This high annotation ratio (99.10%) is the highest among five Lactuca plants, including L. sativa var. capitata cv. Salinas, L. sativa var. angustana, L. saligna, and L. virosa (Table 6).
Whole genome synteny analysis
For synteny analysis, genomes of four other Lactuca species, including L. sativa var. capitata, L. sativa var. angustana, L. virosa and L. saligna assemblies, were aligned to the L. sativa var. crispa genome using Mummer (v 4.0)37 with the parameters: -c 500 -b 500 -l 100–maxmatch (Fig. 6). Raw alignment results were filtered using delta filter with parameters: -1 -i 90 -l 500. MCScanX identified syntenic blocks38 with the parameter -s 15 (number of genes required to call a collinear block) and visualized them using jcvi v1.2.839 with the parameter–minspan = 30.
Data Records
The raw genomic sequencing data used for genome assembly are available in the Genome Sequence Archive (GSA)40 in the National Genomics Data Center (NGDC), Beijing Institute of Genomics (China National Center for Bioinformation)41, Chinese Academy of Sciences (https://bigd.big.ac.cn/gsa). The accession number CRA01487342 covers genome survey data, transcriptomic sequencing data, PacBio HiFi sequencing data, ONT sequencing data, and Hi-C sequencing data. The genome assembly and annotation files are available in the Genome Warehouse (GWH)43 in NGDC (accession number is GWHERDY0000000044), Genebank (JBFTWI000000000)45 and Figshare (https://doi.org/10.6084/m9.figshare.25116548)46.
Technical Validation
To evaluate the completeness of L. sativa var. crispa cv. Green Elegance (version 1.2) assembly, Illumina short-read and PacBio long-reads data were mapped back to the assembly. The alignment was analyzed using Qualimap v.2.2.2. The mapping rate for both libraries was 99.75% (an average 48× coverage) for Illumina short reads and 99.85% (average 42× coverage) for PacBio long-reads. BUSCO v5.2.247 with OrthoDB was used to assess genome completeness. In genome syntenic analysis, L. sativa var. crispa, L. sativa var. capitata and L. sativa var. angustana showed high conservation, compelling evidence that the gross genome structure has been accurately assembled (Fig. 6). We have observed relatively few genomic arrangement ambiguities in the Hi-C contact heat map, though with some discontinuities, which were probably caused by highly repetitive sequences. Visually inspection of the Hi-C map also revealed that some points of ambiguity appeared to be centromeres, likely due to sequence similarity in these regions. Meanwhile, clear antidiagonals for several chromosomes were also observed in the Hi-C contact heat map, such as chromosome 7 and chromosome 8. Such a pattern may suggest a Rabl configuration of the chromosomes, which could be validated in future cytological investigations. Overall, 98.39% BUSCOs were complete and 0.50% fragmented in the assembled genome (Table 7). CEGMA (Core Eukaryotic Genes Mapping Approach) (v2.5) analysis showed that 99.78% (457 CEG, Core Eukaryotic Genes) of CEGMA genes were present in the genome48. The LTR Assembly Index (LAI)49 of 17.34 indicated a high-quality genome assembly for L. sativa var. crispa cv. Green Elegance, with better continuity and completeness compared to other Lactuca species (Table 7). The higher ratio of complete BUSCOs and LAI values, compared to the other four Lactuca species, indicate the superior quality of the genome assembly for L. sativa var. crispa cv. Green Elegance (v1.2).
Code availability
No custom code was used for this study. All data analyses were conducted using published bioinformatics software with default settings unless otherwise specified.
References
Wei, T. et al. Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce. Nat. Genet. 53, 752–760, https://doi.org/10.1038/s41588-021-00831-0 (2021).
Lindqvist, K. On the origin of cultivated lettuce. Hereditas 46, 319–350, https://doi.org/10.1111/j.1601-5223.1960.tb03091.x (1960).
de Vries, I. M. Origin and domestication of Lactuca sativa L. Genet. Resour. Crop Evol. 44, 165–174, https://doi.org/10.1023/A:1008611200727 (1997).
Zohary, D. The wild genetic resources of cultivated lettuce (Lactuca sativa L.). Euphytica 53, 31–35, https://doi.org/10.1007/BF00032029 (1991).
Křístková, E., Doležalová, I., Lebeda, A., Vinter, V. & Novotná, A. Description of morphological characters of lettuce (Lactuca sativa L.) genetic resources. A review. Hortic. Sci.e 35, 113–129 (2018).
Lebeda, A., Ryder, E. J., Sideman, R., Ivana, D. & Křístková, E.in Genetic resources, chromosome engineering, and crop improvement Vol. 3 (ed R. J. Singh) 377–472 (2006).
Zhang, L. et al. RNA sequencing provides insights into the evolution of lettuce and the regulation of flavonoid biosynthesis. Nat. Commun. 8, 2264, https://doi.org/10.1038/s41467-017-02445-9 (2017).
Reyes-Chin-Wo, S. et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, 14953, https://doi.org/10.1038/ncomms14953 (2017).
Xiong, W. et al. The genome of Lactuca saligna, a wild relative of lettuce, provides insight into non-host resistance to the downy mildew Bremia lactucae. Plant J. 115, 108–126, https://doi.org/10.1111/tpj.16212 (2023).
Xiong, W. et al. Genome assembly and analysis of Lactuca virosa: implications for lettuce breeding. G3-GENES GENOM GENET 13, jkad204, https://doi.org/10.1093/g3journal/jkad204 (2023).
Shen, F. et al. Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae. Nat. Commun. 14, 4334, https://doi.org/10.1038/s41467-023-40002-9 (2023).
Ballouz, S., Dobin, A. & Gillis, J. A. Is it time to change the reference genome? Genome Biol. 20, 159, https://doi.org/10.1186/s13059-019-1774-4 (2019).
Sun, Y., Shang, L., Zhu, Q.-H., Fan, L. & Guo, L. Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci. 27, 391–401, https://doi.org/10.1016/j.tplants.2021.10.006 (2022).
Abu Almakarem, A. S., Heilman, K. L., Conger, H. L., Shtarkman, Y. M. & Rogers, S. O. Extraction of DNA from plant and fungus tissues in situ. BMC Res. Notes 5, 266, https://doi.org/10.1186/1756-0500-5-266 (2012).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125, https://doi.org/10.1038/nbt.2727 (2013).
Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, W177–W184, https://doi.org/10.1093/nar/gkaa220 (2020).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18, https://doi.org/10.1186/1471-2105-9-18 (2008).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268, https://doi.org/10.1093/nar/gkm286 (2007).
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422, https://doi.org/10.1104/pp.17.01310 (2018).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4.10.11–14.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).
Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585, https://doi.org/10.1093/bioinformatics/btx198 (2017).
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312, https://doi.org/10.1093/nar/gkh379 (2004).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360, https://doi.org/10.1038/nmeth.3317 (2015).
Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-seq data. Nat. Biotechnol. 29, 644–652, https://doi.org/10.1038/nbt.1883 (2013).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314, https://doi.org/10.1093/nar/gky1085 (2019).
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003).
Mitchell, A. et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43, D213–D221, https://doi.org/10.1093/nar/gku1243 (2015).
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114, https://doi.org/10.1093/nar/gkr988 (2012).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240, https://doi.org/10.1093/bioinformatics/btu031 (2014).
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230, https://doi.org/10.1093/nar/gkt1223 (2014).
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944, https://doi.org/10.1371/journal.pcbi.1005944 (2018).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49–e49, https://doi.org/10.1093/nar/gkr1293 (2012).
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488, https://doi.org/10.1126/science.1153917 (2008).
Chen, T. et al. The Genome sequence archive family: toward explosive data growth and diverse data types. Genomics Proteom. Bioinform.19, 578–583, https://doi.org/10.1016/j.gpb.2021.08.001 (2021).
Members, C.-N. & Partners Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2023. Nucleic Acids Res. 51, D18–D28, https://doi.org/10.1093/nar/gkac1073 (2023).
NGDC Genome Sequence Archive. https://ngdc.cncb.ac.cn/gsa/browse/CRA014873 (2024).
Chen, M. et al. Genome warehouse: a public repository housing genome-scale data. Genomics Proteom. Bioinform.19, 584–589, https://doi.org/10.1016/j.gpb.2021.04.001 (2021).
NGDC Genome Warehouse. https://ngdc.cncb.ac.cn/gwh/Assembly/83750/show (2024).
NCBI GenBank. https://identifiers.org/ncbi/insdc:JBFTWI000000000 (2024).
Zhang, B. Gemome assembly and gene annotation files of Lactuca sativa var. crispa cv. Green Elegance. figshare. https://doi.org/10.6084/m9.figshare.25116548 (2024).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067, https://doi.org/10.1093/bioinformatics/btm071 (2007).
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126–e126, https://doi.org/10.1093/nar/gky730 (2018).
Acknowledgements
This research was supported by the Key Project at Central Government Level: The Ability Establishment of Sustainable Use for Valuable Chinese Medicine Resources (2060302), the Innovation and Development Program of Beijing Vegetable Research Center (KYCX202304), Beijing Joint Research Program for Germplasm Innovation and New Variety Breeding (G20220628003-01), and Collaborative Innovation Program of Beijing Vegetable Research Center (XTCX202302).
Author information
Authors and Affiliations
Contributions
D.L., J.T., and B.Z. designed and coordinated the study; X.L., H.D., Y.Y., C.W., Z.X., and B.Z. collected and prepared plant samples; Y.X., J.T., and C.S. performed the bioinformatic analyses; B.Z. and D.L. drafted the manuscript; J.T., J.Z. and C.S. revised the manuscript. All authors approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, B., Xue, Y., Liu, X. et al. A near-complete chromosome-level genome assembly of looseleaf lettuce (Lactuca sativa var. crispa). Sci Data 11, 961 (2024). https://doi.org/10.1038/s41597-024-03830-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03830-y
- Springer Nature Limited