Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica

Song, Miaomiao; Gong, Wei; Tian, Yunyun; Meng, Yue; Huo, Tingyu; Liu, Yanan; Zhang, Yeming; Dang, Zhenhua

doi:10.1038/s41597-024-03644-y

Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica

Data Descriptor
Open access
Published: 22 July 2024

Volume 11, article number 812, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica

Download PDF

Miaomiao Song¹^na1,
Wei Gong¹^na1,
Yunyun Tian²,
Yue Meng¹,
Tingyu Huo¹,
Yanan Liu¹,
Yeming Zhang¹ &
…
Zhenhua Dang¹

33 Accesses
Explore all metrics

Abstract

Reaumuria soongarica is a xerophytic shrub belonging to the Tamaricaceae family. The species is widely distributed in the deserts of Central Asia and is characterized by its remarkable adaptability to saline and barren desert environments. Using PacBio long-read sequencing and Hi-C technologies, we assembled a chromosome-level genome of R. soongarica. The genome assembly has a size of 1.28 Gb with a scaffold N50 of 116.15 Mb, and approximately 1.25 Gb sequences were anchored in 11 pseudo-chromosomes. A completeness assessment of the assembled genome revealed a BUSCO score of 97.5% and an LTR Assembly Index of 12.37. R. soongarica genome had approximately 60.07% repeat sequences. In total, 21,791 protein-coding genes were predicted, of which 95.64% were functionally annotated. This high-quality genome will serve as a foundation for studying the genomic evolution and adaptive mechanisms to arid-saline environments in R. soongarica, facilitating the exploration and utilization of its unique genetic resources.

Chromosome-level genome assembly of Platycarya strobilacea

Article Open access 05 March 2024

A high-quality chromosomal-level genome assembly of Greater Scaup (Aythya marila)

Article Open access 04 May 2023

Chromosome-level genome assembly and annotation of the prickly nightshade Solanum rostratum Dunal

Article Open access 01 June 2023

Background & Summary

The drought induced by global warming is intensifying, significantly impacting plant survival and reproduction. This occurrence has led to a cascade of ecological and productivity challenges^1,2. In arid regions, the coexistence of soil salinity exacerbates the predicament, further complicating agricultural and livelihood practices reliant on plant-based production². In response to stressors like drought and salt, plants have developed sophisticated adaptive strategies at molecular, physiological, and morphological levels. This involves the coordinated interaction among the genome, transcriptome, proteome, and metabolome³. Exploring the molecular mechanisms underlying these adaptive responses to adverse conditions has been a longstanding focus of academic inquiry.

So far, significant progress has been achieved in the study of plant stress adaptation⁴. Under these investigations, numerous stress response factors have been identified, and a relatively comprehensive theoretical framework has been integrated. For instance, in drought adaptation, the most representative discoveries include abscisic acid (ABA)-dependent and ABA-independent pathways, DREB2A, and ubiquitination-related mechanisms⁵. In the ABA-dependent pathway, key regulatory genes such as NFYA5, OCP3, PLDa1, SAL1, and MYB96 have been identified for their crucial roles in stomatal regulation, osmotic substance modulation, and lateral root growth under drought stress. Research on salty environment adaptation has highlighted pathways such as the salt overly sensitive (SOS) pathway involved in reconstructing ion homeostasis, reactive oxygen species (ROS) scavenging pathways, and physiologically drought-responsive pathways caused by ion and osmotic substance imbalances^6,7. Studies on ion transport and homeostasis regulation have revealed that plants primarily utilize Na⁺/H⁺ antiport proteins on the plasma membrane and vacuolar membrane (SOS1, NHX1) to extrude or compartmentalize the influx of Na⁺ into cells, achieve substantial K⁺ uptake through K⁺ channels (e.g., AKT), and reconstruct proton motive force via plasma membrane and vacuolar membrane H⁺-ATPase⁸.

Reports indicate that species thriving in relatively harsh environments have developed unique adaptive strategies through prolonged natural selection and inherent adaptation⁹. It has been proposed that research directions for plant stress adaptation in the “omics” era. On one hand, through multi-omics integrated analysis, further post-genomic research on model plants is being conducted to comprehensively unveil the molecular mechanisms of their stress responses. On the other hand, research is expanding to encompass widely occurring non-model species in nature to obtain additional information on plant stress adaptation, supplementing and refining existing theories^10,11.

Reaumuria soongarica (Fig. 1 a, b, and c) is a shrub belonging to the family Tamaricaceae¹². This plant exhibits strong resistance to drought, cold, salinity-alkalinity, and barrenness, making it a crucial component of desert ecosystems. The R. soongarica community represents the most widespread and extensive zonal community type in arid regions such as Central Asian dunes and deserts¹³. It plays a vital role as an ecological barrier in sustaining and restoring fragile desert ecosystems in Northwest China¹⁴. R. soongarica also serves as a significant forage shrub in desert areas, providing the main source of food for camels throughout the year and for sheep during the winter and spring seasons. Due to its high salt content, livestock can obtain sufficient salt intake from it, stimulating their appetite and promoting weight gain. Additionally, the tender branches and leaves of R. soongarica can be used for treating eczema and dermatitis, exhibiting febrifuge and diaphoretic effects.

Here, we assembled a high-quality chromosome-level genome of R. soongarica using PacBio HiFi and Hi-C data. The genome has a length of 1.28 Gb, a contig N50 of 116.15 Mb, and a complete BUSCO score of 97.5%. A total of 1.25 Gb of sequences were anchored onto 11 pseudo-chromosomes. Of the genome, 21,791 protein-coding genes, and 60.07% (769.66 Mb) repetitive sequences were identified. This high-quality genome will facilitate the study of adaptive evolution mechanisms in R. soongarica, laying the foundation for exploring its unique stress-resistant genetic resources and related molecular mechanisms.

Methods

Sample collection

In August 2022, sample collection was conducted in Yihewusu Town (E107°26′02′, N40°11′02′), Ordos City, Inner Mongolia, China. Approximately 20 g of tender leaves were collected from a single plant for genomic sequencing. Following sample collection, the samples were rapidly frozen in liquid nitrogen and transported back to the laboratory for storage in a −80 °C freezer. In the same year, during October, seeds were collected from the same plant in the same sampling site.

Karyotype analysis

R. soongarica seeds were germinated at room temperature. When the roots reached a length of 1.5–2 cm, root tips were treated with nitrous oxide for 2.5 hours. Subsequently, they were immersed in acetic acid for 5 minutes and stored in 75% ethanol. During chromosome preparation, ethanol was removed by rinsing with deionized water. The root apical meristem tissues were dispersed using a mixture of cellulase and pectinase (2:1 ratio). After a 45-minute incubation at 37 °C, the mixture was washed away with deionized water. Once the meristematic tissues were completely air-dried, a 20 µL acetic acid suspension solution was added. After drying the slides, they were examined using an Olympus CX23 microscope (Olympus Corporation, Tokyo, Japan).

For karyotype analysis, well-dispersed intermediate chromosomes were selected. The terminal 21-bp repeat sequence (AG₃T₃)₃ 5′-AGGGTTTAGGGTTTAGGGTTT-3′ was used as a probe¹⁵. This oligo-probe, synthesized by Sangon Biotech Co., Ltd. (Shanghai, China), was simultaneously tested in a single round of Fluorescence In Situ Hybridization (FISH). The hybridization solution, totaling 10 µL, consisted of 1.5 µL of each probe, 8.5 µL of a mixture of 2 × SSC, and 1 × TE, which was dropped onto the chromosomes on a cover glass (24 cm × 50 cm). The slides were then incubated at 37 °C for 2 hours. Using an Olympus BX63 fluorescence microscope combined with a Photometric SenSys Olympus DP70 CCD camera (Olympus Corporation, Tokyo, Japan), the slides were recorded and analyzed. The results revealed a total of 22 detected chromosomes, indicating a diploid plant with a karyotype of 2n = 2x = 22 (Fig. 1d).

Flow cytometry-based genome size estimation

A quantity of 0.5 g of tender leaves from R. soongarica was placed in a culture dish, chopped, and disrupted. Subsequently, 1600 µL of PI solution (staining buffer + PI + RNase storage solution) was added. The mixture was incubated in the dark for 45 minutes and then analyzed using the Sysmex CyFlow^® Cube6. The reference species used was Setaria viridis, with a genome size of 0.51 Gb¹⁶. The results estimated the genome size of R. soongarica to be approximately 1.26 Gb (Fig. 1e).

Nucleic acid extraction and quality assessment

For the determination of R. soongarica genome size through k-mer analysis and genomic sequencing, genomic DNA was extracted from the leaves using a modified CTAB method¹⁷. For transcriptome sequencing, R. soongarica seeds were sterilized with a 10% sodium hypochlorite solution, rinsed several times with sterile water, and then sown in seedling trays filled with sterile nutrient soil. The cultivation conditions were maintained at 26 °C during the day and 16 °C at night, under a 16-hour light/8-hour dark cycle. The seedlings were watered with 1/2 Hoagland nutrient solution every 3 days. After 3 weeks of cultivation, a healthy plant with intact leaves, stems, and roots was harvested for full-length transcriptome sequencing. Subsequently, seedlings with similar growth were chosen for Na₂SO₄ treatment, with three concentration gradients: 0 mmol/L (CK), 200 mmol/L (S200), and 400 mmol/L (S400). After the treatment, tender leaves were collected. Each treatment had three biological replicates, for paired-end transcriptome sequencing. The total RNA of the above-collected samples was extracted using TRIzol^® reagent¹⁸ (Invitrogen, Carlsbad, CA, USA). The isolated DNA and RNA samples underwent quality assessment using NanoDrop-2000 (Thermo Fisher Scientific, Wilmington, DE, USA) and Qubit v3.0 fluorometer (Life Technologies).

K-mer based genome size assessment

The genome of R. soongarica was sequenced using the DNBseq platform. The samples were randomly fragmented using the Covaris ultrasonic high-performance sample processing system, resulting in fragments of approximately 350 bp. Subsequently, DNA fragment end repair was performed, and after passing quality control, the samples were sequenced using high-throughput sequencing. For each qualified library, the raw image data obtained from sequencing were converted into raw sequence data (raw reads) in FASTQ¹⁹ file format. The SOAPnuke v2.1.0²⁰ software was employed for data filtering with the following parameters: (-n 0.02 -l 20 -q 0.4 -i -G 2–polyX 50 -Q 2–seqType 0). A total of 135.09 Gb of clean reads were obtained. The frequency of 21-bp K-mers was calculated using Jellyfish v2.2.6²¹ with default parameters. GenomeScope v2.0²² was then utilized to estimate the genome size, heterozygosity, repeat content, and sequencing depth. The results indicated that the estimated genome size of R. soongarica is 1.26 Gb, with a heterozygosity rate of 2.24%, and a repeat rate of 50.68% (Fig. 1f).

HiFi library construction and sequencing

A 20 Kb PacBio library was constructed using the SMRTbell Template Prep Kit-SP v3, following the manufacturer’s instructions (Pacific Biosciences, Menlo Park, CA, USA). Subsequently, Circular Consensus Sequencing (CCS) mode on the PacBio Sequel II platform was employed for sequencing. The raw sequencing data were filtered using smrtlink software v11.0, resulting in a total of 74.62 Gb of clean HiFi data. The filtered CCS data were then converted to a fasta file using samtools v1.18²³. According to survey-based estimations, the sequencing depth was approximately 58.7×.

Hi-C library construction, sequencing, and quality assessment

For DNA samples passing quality checks, a sequential process was carried out, including polyformaldehyde cross-linking, MboI enzyme digestion, end repair, biotin labeling, DNA purification, and capture treatment. The Hi-C library was constructed, and paired-end sequencing was performed on the DNBSEQ platform²⁴.

The sequencing generated 128.23 Gb of raw data. After filtering out data with adapters and lower quality using Soapnuke v2.1.0, 127.19 Gb of clean data was obtained. The sequencing depth was estimated to be approximately 101×.

Genome assembly

HiFi reads were assembled into contigs using Hifiasm v0.16.1²⁵. Then, purge_dups was performed to remove redundant and erroneous assemblies obtained from the HiFi reads. The contig-level genome was indexed using WBA²⁶, and Hi-C data were aligned and merged with the contig-based genome. The generate_site_positions.py program in Juicer v1.5²⁷ was used to obtain potential enzyme cut sites in the genome, extracting Hi-C data that uniquely mapped to the genome (Hi-C Contacts) and performing clustering and redundancy removal. Next, 3D-DNA v190716 and Juicer v1.5 were used to scaffold Hi-C reads, constructing a chromosome-level genome.

The total length of the contig-level genome was 1.28 Gb, approximately matching the K-mer estimate (Table 1). The N50 was 116.15 Mb, and the complete BUSCO score was 97.5%. After Hi-C scaffolding, a total of 1.25 Gb of sequences were anchored onto 11 pseudo-chromosomes, with a scaffolding rate of 97.96% (Fig. 2a and b). The N50 of the chromosome-level genome was 116.15 Mb, and the complete BUSCOs reached 97.8%, indicating a high genome completeness. As shown in Fig. 2a, the Hi-C data signal was strongest along the diagonal, demonstrating effective genome assembly.

Table 1 Summary of Reaumuria soongarica genome assembly.

Full size table

Transcriptome sequencing

For assisting genome annotation, both full-length transcriptome sequencing and short-read transcriptome sequencing were performed. Using PacBio’s Single-Molecule Real-Time (SMRT) technology, full-length cDNA was synthesized, PCR amplified, and libraries were constructed. The sequencing was conducted using PacBio’s SMRT technology, resulting in 63.53 Gb of raw data from a R. soongarica seedling. A total of 339,291 non-redundant full-length transcripts were obtained, with an N50 of 1,396 base pairs. For short-read RNA-seq, libraries were constructed using a paired-end model and sequenced on the DNBSEQ platform. SOAPnuke v2.1.0 was used for filtering, and HISAT v2.2.1²⁸ was employed to align the filtered clean reads to the reference genome sequence. In total, 74.31 Gb of clean data was obtained, with an average output of 8.32 Gb per sample, and the reference genome alignment rate ranged from 94.12% to 95.92%.

Genome annotation

In this study, a homology-based annotation approach was employed for R. soongarica genome annotation. RNA-seq data and full-length transcriptome data were aligned to the genome using HISAT v2.2.1 and Gmap software, respectively. GeMoMa v1.9²⁹ was then used to extract intron information from all samples for subsequent auxiliary annotation. Transdecoder was utilized for ORF (Open Reading Frame) prediction on the transcriptome data, resulting in gene annotation based on the transcriptome. Gene structure prediction and information integration were performed using GeMoMa v1.9 for five reference species (Arabidopsis thaliana, Eutrema salsugineum, Populus trichocarpa, Vitis vinifera, Zea mays). Transcriptome-specific gene information was incorporated, yielding the final gene set. In total, 21,791 genes were predicted in the R. soongarica genome (Table 2).

Table 2 Summary of gene structure predictions.

Full size table

Predicted genes were functionally annotated by comparing them to known protein databases using Diamond v0.8.36³⁰. The protein databases included SwissProt³¹ (http://www.uniprot.org/), TrEMBL³² (http://www.uniprot.org/), KEGG³³ (http://www.genome.jp/kegg/), InterPro³⁴ (https://www.ebi.ac.uk/interpro/), NR³⁵, KOG³⁶, and GO³⁷. The results revealed that 95.64% of genes in the R. soongarica genome obtained functional annotation information. Among them, the majority of genes were annotated in the Nr database (20,772, 95.32%), followed by TrEMBL (20,768, 95.31%), and InterPro (18,130, 83.20%) (Table 3).

Table 3 Statistics of functional annotation result of Reaumuria soongarica genome.

Full size table

Repetitive elements annotation

Two methods were employed for the annotation of transposable elements (TEs), namely the homologous alignment method and de novo prediction³⁸. The former one was based on the RepBase v21.12³⁹ database. It utilizes RepeatMasker v1.332⁴⁰ and RepeatProteinMask v4.0.7⁴¹ to identify sequences similar to known repeat sequences. De novo prediction was based on the assembled R. soongarica genome. A de novo TE library was constructed using RepeatModeler2⁴² and LTRharvest software⁴³. Subsequently, Repeatmasker software was employed for prediction. The results from both methods were then redundantly processed to obtain the annotation of repeat sequences. Tandem repeats were identified by Tandem Repeats Finder v4.09.

In the genome of R. soongarica, a total of 769.66 Mb of repeat sequences were identified, constituting 60.07% of the genome. The most abundant repeat elements were the long terminal repeat (LTR) sequences, accounting for 47.47%, followed by DNA transposons at 4.76% and tandem repeats at 3.98% (Table 4). LTR_retriever⁴⁴ was used to identify LTR retrotransposons (LTR-RTs) in the R. soongarica genome, and the insertion time of LTR-RTs was estimated. The mutation rate for this estimation was set as 1.52 × 10⁻⁸, which is twice the mutation rate of Tamaricaceae plants^45,46,47. TEsorter v1.4.6⁴⁸ was used for the subfamily classification of LTR-RTs, with Ty1/copia and Ty3/gypsy being the two largest subfamilies in LTRs. Subsequently, iTOL v6⁴⁹ was used to visualize the subfamilies separately in evolutionary trees. The results indicated that LTR-RTs in the R. soongarica genome recently burst approximately 0.3 million years ago (Mya) (Fig. 3a). Among them, the most abundant members in the Ty1/copia and Ty3/gypsy families were Tork and Tekay, respectively (Fig. 3b).

Table 4 Repeat elements annotation of Reaumuria soongarica genome.

Full size table

Synteny gene identification

Synteny genes were identified within and between the genomes of R. soongarica and other species (V. vinifera, Z. mays, and T. chinensis). Initially, Python scripts from the Whole-Genome Duplication Integrated analysis (WGDI, v 0.6.2) software⁵⁰ were utilized to generate a modified GFF file for the genome, with the exclusion of alternatively spliced transcripts. Subsequently, Diamond v2.1.6-1 was employed to execute protein-protein BLAST (E-value ≤ 1e⁻⁵), and the results were formatted in fmt6.blast. Following this, the commands -d, -icl, -ks, -bi, -bk, -c, and -kp of WGDI were successively executed with the default parameters. As a result, 559 synteny blocks were identified in the R. soongarica genome, with the gene number ranged from 5 to 103, and the majority of these gene blocks exhibit a 1: 1 relationship in the R. soongarica genome (Fig. 4a). The synonymous substitution rate (Ks) distribution of the syntenic blocks showed a peak at 0.56 (Fig. 4b).

Identification of gene families

The genomes of R. soongarica and other 9 species (A. thaliana, E. salsugineum, P. trichocarpa, V. vinifera, Z. mays, H. ammodendron, P. euphratica, T. chinensis, Ammopiptanthus mongolicus) were used to cluster paralogous and orthologous groups using Orthofinder v2.5.5⁵¹. Among 313,500 analyzed genes, 287,806 were successfully assigned to 27,313 (91.8%) orthogroups. For R. soongarica, 20,418 genes were identified in 19,517 orthogroups, and 508 were species-specific genes in 160 orthogroups (Fig. 5a). Across the ten genomes, 793 single-copy orthologous were identified and aligned using MAFFT v7.505⁵², trimmed using trimAl v1.4.rev15⁵³. The optimal model was used with IQ-TREE v1.6.11⁵⁴ to analyze the tree and infer the divergence dates. Trees of the 10 species were constructed with the ML model using RAxML with 1000 bootstrap replicates v8.2.12⁵⁵, and the best-obtained model was GTR + F + R4. The divergence time was calculated using the MCMCtree program in PAML v4.9 h⁵⁶. The divergence time between Z. mays vs V. vinifera (142.1–163.5 Mya) and E. salsugineum vs A. thaliana (19.7–34.2 Mya) acquired from TimeTree (http://www.timetree.org/) was used as the calibration points. The phylogenetic analyses verified that R. soongarica was a sister to T. chinensis, with strong bootstrap support (>50%), and the two species diverged nearly 34.17 Mya (95% HPD = 30.04, 113.1) (Fig. 5b).

CAFE v5.0⁵⁷ was performed to identify expansions and contractions of gene families in R. soongarica genome. 218 and 196 gene families were found to have significantly expanded and contracted, comprising 1,303 and 207 genes, respectively.

Identification of salt-stress response gene

Differential expression genes (DEGs) were identified using DESeq2⁵⁸, with the criteria of |Log₂FC| > 1 and false discovery rate (FDR) < 0.05 for filtering. A total of 1,449 DEGs were identified through pairwise comparisons (Fig. 6a). Cluster analysis demonstrated distinct expression patterns of these DEGs under different salt concentrations. Functional enrichment analysis showed that these DEGs were enriched in various GO terms and KEGG pathways that may relate to salt-stress response in R. soongarica (Fig. 6b).

Data Records

All sequencing data described in the study have been deposited in the NCBI database. PacBio HiFi long reads, Hi-C reads, and full-length transcriptome Iso-Seq reads were deposited in the Sequence Read Archive (SRA) under accessions SRR27540885⁵⁹, SRR27540886⁶⁰, and SRR27540882⁶¹, respectively. The RNA-seq data of salt-stressed samples were deposited in the SRA under accessions SRR27540875-SRR27540881^{62,63,64,65,66,67,68}, SRR27540883⁶⁹, and SRR27540884⁷⁰, and the genome survey data with SRA accession number SRR28495917⁷¹. The genome assembly and annotation of R. soongarica has been deposited on the Figshare platform⁷², and GenBank with accession number JBEBFM000000000 (2024)⁷³, respectively.

Technical Validation

For the genome assembly, we assessed the quality using BUSCO v5.2.2, embryophyta_odb10). The assembly achieved completeness of 97.5% at the contig level and 97.8% at the chromosome level, indicating a highly complete genome (Table 5). For the full-length transcriptome sequencing data, we employed the single-copy ortholog database BUSCO to assess the quality of the assembled transcripts (Fig. 7a). Additionally, 97.96% of the Hi-C data were successfully anchored to the 11 pseudo-chromosomes, confirming the accuracy of the chromosome assembly (Fig. 2a). The organization of interaction contacts within and around the chromosome region was observed through the Hi-C heatmap, further supporting the quality of the chromosome assembly. We also calculated the LTR Assembly Index (LAI) using LTR_retriever v2.9.0, obtaining a value of 12.37, indicative of a genome of reference quality (Fig. 7b, Table 1).

Table 5 Evaluation of Reaumuria soongarica genome assembly.

Full size table

Code availability

All data processing commands and pipelines were carried out in accordance with the instructions and guidelines provided by the relevant bioinformatic sofware. There were no custom scripts or code utilized in this study.

References

Dandotiya, B., Sharma, H. K. Impacts of Climate Change on Agriculture and Aquaculture. (2021).
Hanin, M., Ebel, C., Ngom, M., Laplaze, L., Masmoudi, K. New Insights on Plant Salt Tolerance Mechanisms and Their Potential Use for Breeding. Front Plant Sci. 7 (2016).
Ahuja, I., de Vos, R. C. H., Bones, A. M. & Hall, R. D. Plant molecular stress responses face climate change. Trends Plant Sci. 15, 664–674 (2010).
Article CAS PubMed Google Scholar
He, M., He, C.-Q., Ding, N.-Z. Abiotic Stresses: General defenses of land plants and chances for engineering multistress tolerance. Front Plant Sci. 9 (2018).
Golldack, D., Li, C., Mohan, H., Probst, N. Tolerance to drought and salt stress in plants: unraveling the signaling networks. Front Plant Sci. 5 (2014).
Deinlein, U. et al. Plant salt-tolerance mechanisms. Trends Plant Sci. 19, 371–379 (2014).
Article CAS PubMed PubMed Central Google Scholar
Flowers, T. J. & Colmer, T. D. Plant salt tolerance: adaptations in halophytes. Ann Bot-London. 115, 327–331 (2015).
Article CAS Google Scholar
Keisham, M., Mukherjee, S., Bhatla, S. Mechanisms of sodium transport in plants—progresses and challenges. Int J Mo Sci. 19 (2018).
Ekblom, R. & Galindo, J. Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity. 107, 1–15 (2010).
Article PubMed PubMed Central Google Scholar
Savolainen, O., Lascoux, M. & Merilä, J. Ecological genomics of local adaptation. Nat Rev Genetic. 14, 807–820 (2013).
Article CAS Google Scholar
Stapley, J. et al. Adaptation genomics: the next generation. Trends Ecol Evol. 25, 705–712 (2010).
Article PubMed Google Scholar
Zhang, Y. et al. Water use strategies of dominant species (Caragana korshinskii and Reaumuria soongorica) in natural shrubs based on stable Isotopes in the Loess Hill, China. Water. 12 (2020).
Li, L. H., Chen, J. Q., Han, X. G., Zhang, W. H., Shao, C. L. Grassland Ecosystems of China: A Synthesis and Resume. (2020).
Li, E., Huang, Y., Chen, H. & Zhang, J. Floristic diversity analysis of the Ordos Plateau, a biodiversity hotspot in arid and semi-arid areas of China. Folia Geobot. 53, 405–416 (2019).
Article Google Scholar
Luo, X. M. et al. Karyotype analysis of Piptanthus concolor based on FISH with a oligonucleotides for rDNA 5S. Sci Hortic-Amsterdam. 226, 361–365 (2017).
Article CAS Google Scholar
Mamidi, S. et al. A genome resource for green millet Setaria viridis enables discovery of agronomically valuable loci. Nat Biotechnol. 38, 1203–1210 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, J. L., Wang, S., Yu, J., Wang, L. & Zhou, S. L. A modified CTAB protocol for plant DNA extraction. Chinese Bulletin of Botany. 48, 72–78 (2013).
Article Google Scholar
Rio, D. C., Ares, M., Jr., Hannon, G. J., Nilsen, T. W. Purification of RNA using TRIzol (TRI reagent). Cold Spring Harb Protoc. (2010).
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience. 7, 1–6 (2018).
Article ADS PubMed PubMed Central Google Scholar
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764–770 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. J. Bioinformatics. 33, 2202–2204 (2017).
Article CAS Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).
Article PubMed PubMed Central Google Scholar
Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods. 18, 366–368 (2021).
Article CAS PubMed PubMed Central Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Article CAS PubMed PubMed Central Google Scholar
Junker, V. et al. The role SWISS-PROT and TrEMBL play in the genome research environment. J. J Biotechnol. 78, 221–234 (2000).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. J. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS Google Scholar
Zdobnov, E. M. & Apweiler, R. InterProScan–an integration platform for the signature-recognition methods in InterPro. J. Bioinformatics. 17, 847–848 (2001).
Article CAS PubMed Google Scholar
Koonin, E. V. et al. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. J. Genome Biol. 5, 1–28 (2004).
Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucl Acids Res. 44, D733–D745 (2016).
Article PubMed Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. J. Nat Genet. 25, 25–29 (2000).
Article CAS Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics. 21(Suppl 1), i351–358 (2005).
Article CAS PubMed Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Tarailo‐Graovac, M., Chen, N. Using repeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 25 (2009).
Shapiro, J. Mobile Genetic Elements. Elsevier. (2012).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. PNAS. 17, 9451–9457 (2020).
Article ADS Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 9, 18 (2008).
Article PubMed PubMed Central Google Scholar
Ou, S. & Kwok, K. C. Ferulic acid: pharmaceutical functions, preparation and applications in foods. J Sci Food Agr. 84, 1261–1269 (2004).
Article CAS Google Scholar
Cossu, R. M., Buti, M., Giordani, T., Natali, L. & Cavallini, A. A computational study of the dynamics of LTR retrotransposons in the Populus trichocarpa genome. Tree Genet Genomes. 8, 61–75 (2012).
Article Google Scholar
Ma, J. X. & Bennetzen, J. L. Rapid recent growth and divergence of rice nuclear genomes. PNAS. 101, 12404–12410 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, J. N. et al. Genomic analyses provide insights into the evolution and salinity adaptation of halophyte Tamarix chinensis. GigaScience. 12 (2023).
Zhang, R. G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic Res. 9 (2022).
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an. online tool for phylogenetic tree display and annotation. Bioinformatics. 23, 127–128 (2007).
Article CAS PubMed Google Scholar
Sun, P. et al. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 15, 1841–1851 (2022).
Article CAS PubMed Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Article PubMed PubMed Central Google Scholar
Katoh, K. & Toh, H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 9, 286–298 (2008).
Article CAS PubMed Google Scholar
Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. TrimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972–1973 (2009).
Article CAS PubMed PubMed Central Google Scholar
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Article CAS PubMed Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 22, 1269–1271 (2006).
Article PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540885 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540886 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540882 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540875 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540876 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540877 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540878 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540879 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540880 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540881 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540883 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540884 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28495917 (2024).
Dang, Z. H. et al. Chromosome-level genome assembly and annotation of Reaumuria soongarica. figshare, https://doi.org/10.6084/m9.figshare.c.7021974.v1 (2024).
Dang, Z. H. et al. Reaumuria songarica ZD-2024a, whole genome shotgun sequencing project. NCBI GenBank https://identifiers.org/ncbi/insdc:JBEBFM000000000 (2024).

Download references

Acknowledgements

This work was supported by grants from the Natural Science Foundation of Inner Mongolia of China (2024JQ11), the National Natural Science Foundation of China (31860078 and 32160088), the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region of China (NJYT22093). Thanks to Professor Liqing Zhao and Associate Professor Zhiyong Li from the School of Ecology and Environment at Inner Mongolia University for providing the photographs of R. soongarica.

Author information

These authors contributed equally: Miaomiao Song, Wei Gong.

Authors and Affiliations

Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot, 010021, Inner Mongolia, China
Miaomiao Song, Wei Gong, Yue Meng, Tingyu Huo, Yanan Liu, Yeming Zhang & Zhenhua Dang
Ministry of Education Key Laboratory of Herbage & Endemic Crop Biotechnology, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, Inner Mongolia, China
Yunyun Tian

Authors

Miaomiao Song
View author publications
You can also search for this author in PubMed Google Scholar
Wei Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yunyun Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yue Meng
View author publications
You can also search for this author in PubMed Google Scholar
Tingyu Huo
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yeming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhua Dang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.D. conceived and designed the study. Z.D., Y.T., M.S., W.G., Y.M., T.H., Y.L. and Y.Z. prepared the materials. Z.D., M.S., W.G. and Y.T. analyzed the data and wrote the manuscript. Z.D., M.S. and Y.T. edited and improved the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhenhua Dang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Song, M., Gong, W., Tian, Y. et al. Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica. Sci Data 11, 812 (2024). https://doi.org/10.1038/s41597-024-03644-y

Download citation

Received: 12 January 2024
Accepted: 12 July 2024
Published: 22 July 2024
DOI: https://doi.org/10.1038/s41597-024-03644-y
Springer Nature Limited

Associated content

Genomics data for plant ecology, conservation and agriculture

Collection 20 January 2023

Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica

Abstract

Similar content being viewed by others

Chromosome-level genome assembly of Platycarya strobilacea

A high-quality chromosomal-level genome assembly of Greater Scaup (Aythya marila)

Chromosome-level genome assembly and annotation of the prickly nightshade Solanum rostratum Dunal

Background & Summary