Objective

Escalloniales comprise approximately 130 species of herbs, shrubs, and trees that grow in diverse habitats ranging from desolate rocky outcrops to rain forests across South America, Australia, Southeast Asia, and the Indian Ocean islands [1]. It is not known how and when Escalloniales diversified so extensively and colonized the Southern Hemisphere because the phylogenetic relationships within Escalloniales and between Escalloniales and other flowering plant lineages remain elusive. Escalloniales are part of the more inclusive clade Campanulidae, a hyperdiverse group of flowering plants with approximately 35,000 species [2]. Yet, the precise phylogenetic relationships among the major lineages of Campanulidae have not been clearly resolved with strong support by current molecular data [3,4,5,6,7]. Clarifying these relationships is critical to elucidate the mechanisms of phenotypic evolution and geographic diversification for a large group of angiosperms [8, 9]. Within Escalloniales, the genus Escallonia represents a remarkable radiation across three hotspots of biodiversity in the mountains of South America [10, 11]. Escallonia species grow from sea level to snow line, and from temperate to tropical regions, showing distinct adaptations related to environmental stress such as extreme temperature and water availability. Further, groups of closely related Escallonia species have diversified independently along elevational gradients in the tropical Andes, Southern Brazil, and the temperate Andes, suggesting that repeated ecological divergence may play an important role in Escallonia speciation [10]. Thus, Escallonia is emerging as a notable system to uncover the ecological and evolutionary processes underpinning tropical plant adaptation, speciation, and the nature of plant species [12]. To begin investigating the genomic substrate and biological processes underlying the radiations in Escallonia and Escalloniales, we hereby report the draft genomes of two Escallonia species. These data will also be relevant for broader comparative genomics studies across flowering plants.

Data description

Methodology - Leaf tissues from a single Escallonia rubra plant and an Escallonia herrerae plant cultivated at the University of California Botanical Garden at Berkeley (Voucher numbers: UCBG92.1500 E. rubra, UCBG64.0493 E. herrerae) were used for genomic DNA extraction and sequencing (Table 1; Data File 1). For E. rubra, isolated DNA was prepared following the Nextera XT DNA Library Prep Kit guideline and sequenced on an Illumina HiSeq 4000 system to generate 100-bp paired-end WGS reads (Table 1; Data Set 1; 376 million paired-end reads). In addition, we sequenced high-molecular-weight genomic DNA for both E. rubra and E. herrerae using the Oxford Nanopore Technology (ONT) PromethION 24 A series and the LSK114 ligation prep kit and R10.4.1 flow cells to generate approximately 140 Gb of sequence data (Table 1; Data Sets 2 and 3); 3.4 and 12 million sequence reads with an average read length of 9.4 and 9.1 Kb (approximately 31 and 111 Gb of sequence data), for E. rubra and E. herrerae, respectively. We used the Canu genome assembler [13] to generate contigs with ONT data. These were then polished (for E. rubra) using WGS sequences through NextPolish [14] and deduplicated using Purge Haplotigs [15].

Genome descriptions

Escallonia rubra – The Escallonia rubra genome assembly (Table 1, Data Set 4) consists of 3,233 contigs (N50 = 285 kb) with a total length of 566 Mb (Table 1, Data Set 5). The genome annotation includes 31,028 gene models supported by transcriptome and protein sequences and/or the presence of Pfam domains (Table 1; Data Set 6). BUSCO (Benchmarking Universal Single-Copy Orthologs) analyses based on conserved single-copy eudicot genes [16] indicate completeness levels of 97.8% for the genome sequence and 87.8% for the genome annotation (Table 1; Data Set 7).

Escallonia herrerae - The Escallonia herrerae genome assembly (Table 1, Data Set 8) consists of 5,760 contigs (N50 = 317 kb) with a total length of 944 Mb (Table 1, Data Set 9). The genome annotation includes 47,905 gene models supported by transcriptome and protein sequences and/or the presence of Pfam domains (Table 1, Data Set 10). BUSCO analyses, relying on conserved single-copy eudicot genes [16], indicate completeness levels of 97.8% for the genome sequence and 87.8% for the genome annotation (Table 1, Data Set 11).

Table 1 Overview of data files/data sets

Limitations

The base chromosome number of Escallonia is n = 12 [27], but our assemblies consist of 3,233 and 5,760 contigs for E. rubra and E. herrerae, respectively. As such, additional genome assembly and sequencing technologies, such as Hi-C, are needed to generate chromosome-level assemblies suitable for chromosome-scale comparative genomics.