High-throughput single-molecule long-read RNA sequencing analysis of tissue-specific genes and isoforms in lettuce (Lactuca sativa L.)

Shi, Zhuo-Xing; Xiang, Lei; Zhao, Hai-Ming; Yang, Lang-Qi; Chen, Zhi-Chao; Pu, Yu-Qing; Li, Yan-Wen; Luo, Bei; Cai, Quan-Ying; Liu, Bai-Lin; Feng, Nai-Xian; Li, Hui; Li, Qing X.; Tang, Chong; Mo, Ce-Hui

doi:10.1038/s42003-024-06598-4

High-throughput single-molecule long-read RNA sequencing analysis of tissue-specific genes and isoforms in lettuce (Lactuca sativa L.)

Article
Open access
Published: 31 July 2024

Volume 7, article number 920, (2024)
Cite this article

Download PDF

You have full access to this open access article

Communications Biology

High-throughput single-molecule long-read RNA sequencing analysis of tissue-specific genes and isoforms in lettuce (Lactuca sativa L.)

Download PDF

Abstract

Lettuce is one of the most widely cultivated and consumed dicotyledonous vegetables globally. Despite the availability of its reference genome sequence, lettuce gene annotation remains incomplete, impeding comprehensive research and the broad application of genomic resources. Long-read RNA isoform sequencing (Iso-Seq) offers substantial advantages for analyzing RNA alternative splicing and aiding gene annotation, yet it faces throughput limitations. We present the HIT-ISOseq method tailored for bulk sample analysis, significantly enhancing RNA sequencing throughput on the PacBio platform by concatenating cDNA. Here we show, HIT-ISOseq generates 3-4 cDNA molecules per CCS read in lettuce, yielding 15.7 million long reads per PacBio Sequel II SMRT Cell 8 M. We validate its effectiveness in analyzing six lettuce tissue samples, including roots, stems, and leaves, revealing tissue-specific gene expression patterns and RNA isoforms. Leveraging diverse tissue long-read RNA sequencing, we refine the transcript annotation of the lettuce reference genome, expanding its GO and KEGG annotation repertoire. Collectively, this study serves as a foundational reference for genome annotation and the analysis of multi-sample isoform expression, utilizing high-throughput long-read transcriptome sequencing.

A survey of the sorghum transcriptome using single-molecule long reads

Article Open access 24 June 2016

Unveiling the transcriptomic complexity of Miscanthus sinensis using a combination of PacBio long read- and Illumina short read sequencing platforms

Article Open access 22 September 2021

PacBio single-molecule long-read sequencing shed new light on the transcripts and splice isoforms of the perennial ryegrass

Article 01 January 2020

Introduction

Long-read RNA isoform sequencing (Iso-Seq)^1,2,3, developed by Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing technology⁴, allows for complete and high base accuracy sequencing of cDNA fragments containing thousands of bases⁵. Iso-Seq can identify RNA isoforms without read assembly and thus is widely used to improve gene annotation in newly generated or highly fragmented plant genome sequences^{6,7,8,9,10,11,12}. Furthermore, Iso-Seq offers advantages of high base accuracy and long read length, making it highly sensitive for analyzing RNA alternative splicing, which is an important foundation for plant adaptation to internal and external environmental changes. Improving the annotation of alternative splicing transcripts is an important goal of gene annotation. Therefore, gene annotation using Iso-Seq data not only improves the accuracy of gene expression quantification, but also allows for accurate analysis of RNA alternative splicing at the transcript level. Currently, Iso-Seq has been widely used for researching RNA alternative splicing related to tissue specificity and stress response in plant^{13,14,15,16,17,18,19,20,21}.

Despite its significant research and application potential, the widespread adoption of Iso-Seq is constrained by its considerable cost, limiting its scalability for sample sequencing. The PacBio Sequel II sequencer offers high-fidelity (HiFi) sequencing capabilities for 15–20 kb library molecules through circular consensus sequencing (CCS) mode²². Its throughput, ranging from 15 to 20 kb, surpasses the average length of conventional plant transcripts. For instance, the typical length of lettuce reference gene transcripts is approximately 1.64 kb (Table 1). Deploying Iso-Seq analysis on a single lettuce sample may result in substantial underutilization of the PacBio Sequel II sequencer in terms of read length. An alternative approach involves concatenating short amplified fragments into longer molecules for sequencing, presenting a viable method to enhance sequencing throughput and alleviate analysis costs on the PacBio Sequel II sequencing platform. Notably, we have recently developed HIT-scISOseq²³ techniques tailored for single-cell long-read transcriptome sequencing, utilizing the PacBio Sequel II platform. This method is based on the concatenation of barcoded cDNA generated by the 10× Genomics system.

Table 1 Performance of HIT-ISOseq in lettuce samples

Full size table

Lettuce is among the most widely grown and consumed dicotyledonous vegetables worldwide. It is characterized by easy cultivation and high nutritional value, with an annual production value of $2.4 billion in the United States²⁴. Lettuce is a diploid plant of the Asteraceae family, with 18 chromosomes and an estimated genome size of 2.5 GB^25,26. However, its reference genome’s corresponding gene annotation does not include long-read RNA sequencing data^25,26,27. The limitations of the reference gene annotation hinder research and applications of the genomic lettuce resources, calling for a feasible solution.

In this study, we employed the HIT-scISOseq approach, referred to as HIT-ISOseq, to perform long-read RNA isoform sequencing of bulk samples by concatenating long molecules for sequencing. This method was applied to pool samples from distinct bulk lettuce tissues, namely root, stem, and leaf, each replicated twice. The results showed that HIT-ISOseq facilitated the generation of 15.79 million long-read cDNA sequences from a single Sequel II SMRT Cell 8 M chip, effectively distributed across the six pooled samples. Notably, each sample yielded a balanced output, ranging from 2.55 to 2.74 million full-length non-chimeric (FLNC) reads. The transcriptomic data obtained through HIT-ISOseq allowed robust clustering of the six lettuce samples based on tissue type. The gene and transcript expressions derived from HIT-ISOseq data exhibited a high correlation with the results obtained through Next-Generation Sequencing (NGS), validating the gene expression quantification. Additionally, leveraging HIT-ISOseq data led to the augmentation of 51,558 transcript annotations, incorporation of 119 GO terms, and identification of 25 KEGG pathways. The HIT-ISOseq data enabled the identification of 2611 genes exhibiting tissue-specific expression and 4,842 transcripts displaying tissue-specific expression patterns in lettuce. Furthermore, we characterized 18 isoforms manifesting distinct expression patterns in contrast to their corresponding genes and performed qPCR validation for these isoforms. This study’s findings provide a comprehensive long-read transcriptomic atlas encompassing various lettuce tissues, offering invaluable resources for advancing lettuce genome research.

Results

Experimental design

HIT-ISOseq differs from HIT-scISOseq as it starts its workflow from total RNA extraction of tissue samples instead of cDNA generated by the 10× Genomics system. For HIT-ISOseq, we devised a primer pair with an upstream sequence containing a dU base, facilitating cDNA amplification after RNA extraction from bulk samples and reverse transcription of polyA+ RNA into double-stranded cDNA. The cDNA ends were then enzymatically digested using the USER enzyme to generate a sticky end featuring a 6 bp palindromic sequence structure. The digested cDNA molecules were subsequently ligated using T4 DNA ligase. After a secondary round of PCR amplification, the ligated products underwent SMRTbell library preparation utilizing the PacBio official barcoded adapter kit and were sequenced via a Sequel II instrument (Fig. 1A). Following sequencing, the Circular Consensus Sequencing (CCS) reads of each sample were obtained by demultiplexing the pooled sample barcode CCS reads using PacBio’s Lima software. These CCS reads were further processing to isolate the final full-length non-chimeric (FLNC) reads, accomplished through the deconcatenate program developed by the authors (refer to the Methods section). For this investigation, various tissues-root, stem, and leaf-derived from the Italian lettuce cultivar, were chosen as the pooling samples for HIT-ISOseq sequencing, with two biological replicates for each tissue.

**Fig. 1: Overview of the workflow and the performance of HIT-ISOseq.**

Performance of sequencing runs

We evaluated the performance of the HIT-ISOseq dataset in terms of yield, quality, and sample uniformity using three indicators: CCS Reads, FLNC Reads, and Gene and Isoform Feature Detection (Table 1).

Regarding CCS Reads, the HIT-ISOseq generated 4.56 M reads in a single SMRT Cell 8 M chip, with an average distribution of 0.71 to 0.84 M CCS reads for each sample. The average numbers of CCS passes and quality values (QV) for each sample were in the range from 17 to 20 and 0.97 to 0.98, respectively. The average length of CCS reads for each sample (4.04 to 4.86 kb) was 2.45–2.95 times longer that (1.65 kb) of the existing lettuce reference transcripts (Table 1, Fig. 1B).

Regarding FLNC reads, a total of 15.79 M reads were produced, with an average of 3.46 FLNCs per CCS read. Each sample obtained an average of 2.55 to 2.74 M FLNC reads, with the average read length ranging from 736.31 to 779.97 bp (Table 1, Fig. 1C). The length distribution of FLNC reads among different samples showed a high degree of similarity (Supplementary Fig. 1A). It is worth noting that the average length of FLNC reads is much lower than that of lettuce reference transcripts, indicating that HIT-ISOseq may tend to concatenate shorter fragments. Additionally, we found that the longer CCS reads did not result in more FLNC molecules. In fact, the numbers of FLNC molecules first increased and then decreased with the increasing length of the CCS reads, showing an inverted “U” relationship. The highest number of FLNC molecules was observed at ~10 kb of CCS length (Fig. 1B). Based on the genomic mapping matrices of FLNC reads, an average of 95.3% of FLNC reads could be mapped to the genome, with mapping coverage exceeding 98% and mapping identity up to over 97.8% (Fig. 1D).

In terms of Feature Detection, HIT-ISOseq identified a total of 31,297 genes and 69,973 transcripts, with the numbers of genes and transcripts ranging from 23,550 to 27,629 and from 54,711 to 58,998 per sample, respectively. The detected isoform lengths using HIT-ISOseq generally covered the length range of lettuce isoforms, although the average length (1.56–1.59 kb) of the detected isoforms was shorter than that (1.65 kb) of the reference lettuce annotated transcripts (Fig. 1E). The majority of the detected isoforms (90.87%) exhibited a multi-exonic structure, with most (89.27%) being coding transcripts. Furthermore, the splice sites of these isoforms were predominantly canonical (99.97%, Fig. 1F, Supplementary Fig. 1B–D). Additionally, our method successfully clustered the lettuce samples based on tissue type at the level of gene expression (Fig. 1G).

HIT-ISOseq improves the annotation of reference transcripts and functional annotations for lettuce

We performed quality control and filtering of the non-redundant isoforms generated by HIT-ISOseq using SQANTI3. This process retained a total of 197,363 isoforms (Supplementary Fig. 1B–D). To ensure reliable downstream differential expression analysis, we further filtered out isoforms with a total UMI count lower than 5 across all six samples, resulting in 69,973 isoforms (Figs. 1F, 2A). These isoforms were classified as FSM (full splice match: isoforms that match reference annotations), ISM (incomplete splice match: isoforms whose internal junction sites agree with reference annotations and but 5’ and/or 3’ exons have truncations compared to reference annotations), NIC (novel in catalog: isoforms that have not been annotated but use a combination of known splice sites and exons), and NNC (novel not in catalog: isoforms that contain at least one splice site not annotated).

**Fig. 2: Updating of lettuce reference annotation by HIT-ISOseq.**

To enhance transcript annotation accuracy, we integrated the long-read transcriptome data obtained from HIT-ISOseq with the reference transcripts of lettuce genes. Following SQANTI3 quality control and filtering, we obtain 197,363 HIT-ISOseq non-redundant isoforms. These isoforms were merged with the reference gene annotation of NCBI Lsat_Salinas_v7 using StringTie software, resulting in a new lettuce transcriptome in GFF format. We conducted thorough validation to ensure the consistency of merging HIT-ISOseq transcripts using different versions of the reference genome and gene annotation. Results demonstrated that the lettuce gene annotation has not significantly updated at present (Supplementary Fig. 2). Evaluation of the new lettuce transcriptome GFF was based on three criteria: transcript number, transcript structure, and functional database annotation (Fig. 2B–F, Supplementary Fig. 3B–D). After merging, the new lettuce transcriptome GFF exhibited increased transcript number, totaling 117,608 transcripts (Supplementary Table S3), representing an addition of 51,558 transcripts compared to the lettuce reference transcripts (Fig. 2B). Notably, HIT-ISOseq data added 1285 new genes, comprising 1988 transcripts, to the gene annotation of Lsat_Salinas_v7. These new genes do not overlap with the existing annotation region, corresponding to intergenic regions.

Afterward, the newly obtained lettuce transcripts were subjected to annotation against the NR (NCBI non-redundant protein database), Pfam (Protein families database), Swiss-Prot, KOG (Eukaryotic Orthologous Groups of proteins), and KEGG (Kyoto Encyclopedia of Genes and Genomes) databases. Remarkably, a substantial portion (96.9%) of the transcripts were successfully annotated in at least one of these databases (Fig. 2C). Specifically, 93.2% of the transcripts were annotated in the NR database (Fig. 2D), with the majority (92.03%) of these assigned to lettuce (Supplementary Fig. 3A). Additionally, 71.0% and 70.7% of the transcripts in the new lettuce GFF could be annotated in the GO and KEGG databases, respectively (Fig. 2D). In comparison to the annotation results obtained using the lettuce reference transcripts, the adoption of the new lettuce GFF resulted in an augmentation of 119 and 25 terms in the GO and KEGG annotations, respectively (Fig. 2E, F).

HIT-ISOseq deciphers tissue-specific expressed genes in lettuce

To assess the performance of HIT-ISOseq in gene expression quantification, the identified gene counts from HIT-ISOseq data across six lettuce tissue samples were subjected to saturation curve analysis. The results revealed saturation in gene identification when the number of FLNC reads exceeded 1 million (Fig. 3A). Considering that the throughput of the HIT-ISOseq (~15 million reads) was much higher than the identifiable gene numbers of a single plant tissue sample, it had high potential to simultaneously analyze plenty of plant samples in a single SMRT Cell 8 M chip. Correlation analysis was conducted between the HIT-ISOseq and NGS RNA-seq (Supplementary Table S4) in gene expression quantification, with the correlation coefficients of all samples > 0.923 (Fig. 3B). This result indicated the high accuracy of the HIT-ISOseq in the gene expression quantification. Subsequently, HIT-ISOseq data were utilized to identify tissue-specific expression genes (TSPGs) in lettuce, revealing 976, 259, and 1,338 TSPGs in leaf, root, and stem samples respectively. Furthermore, GO and KEGG functional enrichment analyses were performed on the TSPGs, using the re-annotated databases in this study (Fig. 3D, E, Supplementary Fig. 4A, B), highlighted enrichment of photosynthesis, aquaporin TIP, and the auxin-activated signaling pathway of TSPGs were enriched in leaf, root, and stem samples, respectively.

**Fig. 3: Identification of tissue-specific expressed genes in lettuce using HIT-ISOseq.**

HIT-ISOseq deciphers tissue-specific expressed isoforms in lettuce

Besides gene expression quantification, HIT-ISOseq also excels in quantifying RNA isoform expression. The lettuce samples could be effectively clustered based on the tissue type using RNA isoform expression quantification (Fig. 4A). Consequently, tissue-specific isoforms of lettuce in roots, stems, and leaves could be discerned by analyzing isoform expression data from each sample. In this study, we identified 1651, 342, and 2849 tissue-specific isoforms in root, stem, and leaf samples, respectively (Fig. 4B). Notably, upon integrating the expression of tissue-specific isoforms with their corresponding genes (Fig. 4C), we observed that the expression patterns of 18 isoforms diverged from those of their respective genes (Supplementary Table S5), as illustrated in Fig. 4D. Furthermore, the identified isoforms exhibited exon structures notably distinct from the reference annotated transcripts and principal isoforms of their associated genes, depicted in Fig. 5A, Supplementary Fig. 5A, Supplementary Fig. 6 and Supplementary Fig. 7. Enrichment analyses via GO and KEGG pathways demonstrated the involvement of these isoforms in pivotal biological processes, including cell wall organization, channel activity, and participation in intracellular metabolic processes (Supplementary Table S6). This outcome underscores HIT-ISOseq’s capacity to unveil intricate regulatory expression dynamics at the transcript level with remarkable resolution.

**Fig. 4: Identification of tissue-specific expressed isoforms in lettuce using HIT-ISOseq.**

**Fig. 5: Validation of tissue-specific expressed isoforms.**

Experimental validation of tissue-specific isoform expression

To validate the fidelity of HIT-ISOseq in detecting tissue-specific expression isoforms, we conducted qPCR validation on six isoforms. These isoforms were chosen based on their distinct tissue-specific expression patterns from the previously identified pool of 18 isoforms (refer to Supplementary Table S2). The expression profiles of these six isoforms were consistently observed in both HIT-ISOseq and qPCR results, as depicted in Fig. 5B, C, Supplementary Fig. 5B, C. This consistent observation confirms the precision and robustness of HIT-ISOseq. This outcome provides evidence that HIT-ISOseq not only accurately identifies tissue-specific expression isoforms but also reliably quantifies their expression levels. As a result, HIT-ISOseq is expected to have broad utility in functional investigations and biological interpretations of plant samples. It serves as a dependable tool for delving deeper into the regulatory mechanisms of tissue-specific gene expression and facilitating functional annotation.

Discussion

Lettuce stands as one of the most globally favored and extensively consumed vegetables, rendering the investigation into its genetic variations profoundly significant. Despite the sequencing of the reference genome of cultivated lettuce (Lactuca sativa L.), the current annotation of lettuce genes remains incomplete, impeding comprehensive research and the broad utilization of lettuce genomic resources. Recent advancements in long-read sequencing technology have positioned it as a prevailing method in plant genome assembly. Additionally, long-read DNA sequencing and long-read RNA isoform sequencing (Iso-Seq) offer substantial advantages in analyzing RNA alternative splicing and aiding gene annotation. However, their relatively lower throughput constrains their widespread application on a larger scale.

In this study, we employed a PacBio Circular Consensus Sequencing (CCS) method termed HIT-ISOseq, which involves concatenating multiple cDNA molecules to achieve high-throughput and high-precision RNA isoform sequencing across diverse samples. Using HIT-ISOseq, we conducted sequencing on the root, stem, and leaf tissues of lettuce (Lactuca sativa L.), providing valuable insights into tissue-specific gene expression and RNA isoforms in lettuce. By harnessing multi-tissue long-read RNA sequencing, we refined the transcript annotation of the lettuce reference genome, significantly expanding genomic annotations covering GO and KEGG pathways. Through the utilization of HIT-ISOseq data, we identified tissue-specific expressed genes and isoforms within lettuce’s root, stem, and leaf tissues. Integrating tissue-specific isoforms with their corresponding gene expression profiles uncovered distinctive expression patterns in tissue-specific isoforms.

Furthermore, the throughput or yield of HIT-ISOseq can be further enhanced by increasing the number of concatmers. Notably, while longer CCS reads not always correlate with higher FLNC molecule counts, the highest number of FLNC molecules was observed under ~10 kb CCS conditions (Fig. 1B). Improving HIT-ISOseq yield and average FLNC read length can be achieved by selectively removing shorter connectors (e.g., < 5 kb) through gel cutting or BluePippin size selection. It is expected that the HIT-ISOseq will enable population-scale long-read RNA sequencing for larger sample sizes in the future, with the introduction of the latest PacBio Revio SMRT Cell 25 M chip and the increase in available concatmers. Additionally, while this study was an early proponent of a concatenation approach for sequencing multiple bulk samples, it’s noteworthy that the recent launch of PacBio’s official Kinnex concatenation library preparation kit (based on the MAS-ISO-seq²⁸ method) is gradually establishing concatenation-based methodologies as a mainstream option in Iso-Seq applications.

In conclusion, this study conducted a thorough investigation into lettuce genetics, highlighting the persisting gaps in gene annotation despite the exhaustive sequencing of the cultivated lettuce genome. Leveraging HIT-ISOseq, enabled a comprehensive exploration of RNA isoforms across diverse lettuce tissues. This exploration resulted in the unveiling of tissue-specific gene expression patterns and a significant enhancement in the transcript annotation of the lettuce genome. This refinement notably enriched the genomic annotations across pivotal pathways, including GO and KEGG. Furthermore, valuable insights were gained into strategies to boost HIT-ISOseq throughput, suggesting its potential scalability for larger-scale RNA sequencing using advanced technologies. While the principal aim of this study is not to provide a reference-level gene annotation for lettuce, our long-read RNA-seq data significantly enhance the annotation of lettuce transcripts. With the establishment and enhancement of lettuce germplasm gene resource databases, such as LettuceGDB²⁹ and LettuceDB³⁰, we envision that the HIT-ISOseq data generated in this study will facilitate exploration of lettuce genetic resources. Additionally, our study also offers an effective strategy to improve gene prediction and enhance genome annotation for other plants with incomplete gene annotation. In the future, long-read RNA sequencing of multi-tissue samples using HIT-ISOseq or PacBio Kinnex can be employed to increase the number of full-length transcripts beyond the current annotated gene boundaries and discover novel genes within intergenic regions of annotated genes.

Methods

Plant material and growth condition

Seeds of the Italian lettuce were evenly placed into a substrate for germination. After having four true leaves, the lettuce seedlings with healthy and equal growth performance were selected and transported to 1/2 Hoagland’s nutrient solution for hydroponic cultivation (Supplementary Table S1). The hydroponic cultivation lasted for 4 weeks in a greenhouse at natural temperature (22 ± 3 °C), with the nutrient solution renewed every 7 days. During the cultivation, the lettuces gradually grew to be mature. Six mature lettuce plants were randomly selected and divided into two biological replicate groups, labeled as s1 and s2, with three plants per group. After washing with ultrapure water for three times, the selected plants were respectively cut into three parts, including roots, stems, and leaves. The obtained plant tissues were immediately placed into the sterile and non-enzyme frozen pipe for quick-freezing with liquid nitrogen, and then kept at −80 °C for subsequent NGS RNA-seq and HIT-ISOseq.

RNA isolation and cDNA preparation

Total RNA was extracted from each sample using the QIAGEN RNeasy Mini Kit (#74104). To prepare cDNA, 1 μg of RNA was utilized along with SuperScript™ II Reverse Transcriptase (Invitrogen) and an RT primer (5’-AAGCAGTGGTATCAACGCAGAGTACNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3’). The 12 µL final reaction mixture contained RT primer, dNTP mixture, and total RNA, initially incubated at 65 °C for 5 minutes followed by 42 °C for 2 minutes. Subsequently, 5× first-strand buffer, recombinant RNase inhibitor, 0.1 M DTT, SuperScript™ II RT, and nuclease-free water were added to achieve a final volume of 19 µL. Reverse transcription proceeded at 42 °C for 60 min, followed by addition of 1 µL of Template Switching Oligo (TSO, 5’-AAGCAGTGGTATCAACGCAGAGTACATrGrG+G-3’) and further incubation at 42 °C for 30 minutes. The resulting cDNAs were purified using 0.8 volumes of Agencourt AMPure XP Beads (Beckman Coulter) and eluted in 20 μL nuclease-free water.

NGS RNA-seq library preparation and sequencing

The sequencing library was prepared using the MGIEasy mRNA library prep kit. The libraries were sequenced on an MGISEQ-2000 sequencer (BGI Genomics, Wuhan, China) in paired-end 150 mode. Reads that passed the MGI quality filters were retained for further analysis. Reads containing adapter or low-quality sequences were filtered out. To eliminate contamination and obtain passed reads, a series of data processing steps was performed using SOAPnuke³¹. The parameters for the SOAPnuke software were configured as follows: “-n 0.01 -l 20 -q 0.4 --adaMR 0.25 --ada_trim --polyX 50”. The specific filtering conditions are as follows: 1) Trimming adapter: Reads matching the adapter sequence by 25% or more (allowing up to 2 mismatches) will undergo adapter sequence trimming; 2) Filtering low-quality data: Reads will be discarded if 40.0% or more of the bases have a quality value lower than 20; 3) Removal of Ns: Reads will be eliminated if the proportion of Ns exceeds 1.0% of the total read length; 4) Removal of polyX: Reads with a polyX tract (where X can be A, T, G, or C) longer than 50 bp will be discarded entirely.

Amplification of cDNA for PacBio library construction

The purified cDNA products were subjected to 12 cycles of amplification using KAPA HiFi HotStart Uracil 2× ReadyMix (Kapa Biosystems) and PCR primers incorporating deoxyUracil: 5’-ACTAGUAAGCAGTGGTATCAACGCAGAG -3’. The resulting amplified cDNA products were then purified with 0.8 volumes of Agencourt AMPure XP Beads and eluted in 16 μL of nuclease-free water. Quantification was performed using Qubit dsDNA HS Assay Kits (Thermo Fisher), and the quality was evaluated using Agilent 2100 DNA HS Assays.

USER cloning-based ligation of multiple inserts

The amplified cDNA was treated with 1 μL of USER Enzyme (NEB) and 2 μL of 10× T4 DNA ligase buffer (NEB), then incubated at 37 °C for 20 minutes. During this time, the USER enzyme cleaved at the deoxyuracil sites to create 3’ palindrome overhangs. Next, 1 μL of T4 DNA ligase (NEB, 400,000 U mL^–¹) was added, and the reaction mixture was incubated at 16 °C for 10 minutes to promote insert ligation. The resulting multi-insert library was purified using 0.4 volumes of Agencourt AMPure XP Beads (Beckman Coulter), followed by end-repair and A-tailing with the NEBNext Ultra II End Repair/dA-Tailing Module, involving incubation at 20 °C for 15 minutes and 65 °C for 30 minutes. The cDNA was then ligated with 2 μL of a dT-overhang selection adapter (10 μM, annealed with primers 5’-GAACGACATGGCTACGATCCGACTT-3’ and 5’ PHO-AGTCGGATCGTAGCCATGTCGTTC-3’) using the NEBNext® Ultra™ II Ligation Module (NEB) at 20 °C for 15 minutes, and purified again with 0.4 volumes of Agencourt AMPure XP Beads (Beckman Coulter). Subsequently, 100 ng of the purified products were PCR amplified for 8–10 cycles using KAPA HiFi HotStart 2× ReadyMix and a selection primer (5’PHO-GAACGACATGGCTACGATCCGACTT-3’) to ensure the multi-insert library had no ligation nicks. Finally, the amplified products were purified once more using 0.4 volumes of Agencourt AMPure XP Beads (Beckman Coulter) and assessed using Agilent DNA 12000 Assays.

Preparing PacBio SMRTbell templates and conducting sequencing

The ligated cDNA PCR products were subjected to end repair and A-tailing using the NEBNext End Repair/dA-Tailing Module. Following this, the products were ligated with a dT-overhang hairpin adapter via the NEBNext® Ultra™ II Ligation Module (NEB) and purified with 0.4 volumes of Agencourt AMPure XP Beads (Beckman Coulter) to create the SMRTbell template. To remove residual adapters and unligated DNA fragments, the library was treated with 1 μL exonuclease I (NEB), 1 μL exonuclease III (NEB), and NEBuffer 1 (NEB), followed by incubation at 37 °C for 1 h. The purified products were then processed with 0.8 volumes of Agencourt AMPure XP beads, eluted in 15 μL elution buffer (10 mM Tris-HCl, pH 8.0), and quantified using Agilent DNA 12000 Kits (Agilent). Sequencing primer annealing and polymerase binding to the PacBio SMRTbell templates were conducted following the manufacturer’s protocols (PacBio, US). Finally, the library complex was sequenced using the SMRT Cell 8 M (PacBio) compatible with the Sequel II sequencer.

Validation of Isoform Expression by RT-qPCR

RNA extraction and cDNA synthesis: Total RNA was isolated from each sample using the QIAGEN RNeasy Mini Kit (#74104) following the manufacturer’s instructions. Next, 50 ng of total RNA was reverse transcribed into cDNA using SuperScript™ II Reverse Transcriptase (Invitrogen) and oligo dT primers. In brief, a 12 µL mixture containing oligo dT primers, dNTPs, and total RNA was heated to 65 °C for 5 minutes and then quickly cooled on ice. Afterward, 5× first-strand buffer, recombinant RNase inhibitor, 0.1 M DTT, and SuperScript™ II RT were added, bringing the final volume to 20 µL. The reverse transcription reaction was performed at 42°C for 50 minutes and then inactivated at 70°C for 15 minutes. The resulting cDNAs were then diluted to a final volume of 60 μL with nuclease-free water.

RT-qPCR analysis: The RT-qPCR experiments were performed on 96-well plates (Axygen) using the StepOnePlus system (Applied Biosystems). Primer sequences and additional details are provided in Supplementary Table S2. Each reaction mixture comprised 5 μL of TB Green Premix Ex Taq II (Tli RNase H Plus) (Takara, #RR82WR), 1 μL of 5 μM primer mix, 1 μL of diluted cDNA, and 3 μL of nuclease-free water. The thermal cycling conditions were set as follows: an initial denaturation at 95°C for 1 minute, followed by 40 cycles at 95°C for 10 seconds, 52°C for 30 seconds, and 68°C for 30 seconds. All RT-qPCR assays were performed in triplicate, with three biological replicates for each sample.

Short-read RNA-seq data analysis

For each sample, we initially aligned the clean reads to the Lactuca sativa L. genome and transcriptome (NCBI Lsat_Salinas_v7) using STAR (version 2.7.2a)³² with the following parameters: “--quantMode GeneCounts”. We then utilized featureCounts (version 2.0.1)³³ to obtain the read count for each gene, the parameters were set as follows: “-p -t exon -g gene_id”. The read counts obtained were subsequently normalized to CPM (Counts per Million) and TPM (Transcripts per Kilobase Million) values.

Generation of CCS reads

CCS reads were produced using SMRT-Link (version 10.1.0.119588) with adjusted parameters: “--min-passes 0 --min-length 50 --max-length 21000 --min-rq 0.75”. Subsequently, the resulting CCS reads were demultiplexed using the Lima program included in SMRT-Link (version 10.1.0.119588) by grouping sample barcodes.

Generation of FLNC reads

To generate FLNC reads, the following steps were taken: (1) Mapping Primers to CCS Reads: The 5’ and 3’ primers were mapped to the CCS reads using NCBI BLAST (version 2.10.0 + )³⁴ with the parameters: “-outfmt 7 -word_size 5”. (2) Extracting UMIs: Primer BLAST results were used as inputs for the classify_by_primer utility to extract Unique Molecular Identifiers (UMIs). (3) Generating FLNC Reads: The final FLNC reads were generated using the parameters: “-umilen 8 -min_primerlen 16 -min_isolen 50”. The classify_by_primer utility performed the following functions: (1) Parsed the 5’ and 3’ primers in the CCS reads to obtain FLNC reads, orienting them from the 5’ to the 3’ end. (2) Trimmed the 5’ and 3’ primer sequences, along with the 8 bp sequences following the 3’ primers as UMIs. (3) Trimmed the 3’ polyA tail using a sliding window algorithm. The program ensured that each read was correctly oriented by strictly pairing the 5’ and 3’ primers. Reads that included primers, UMIs, and polyA tails were considered FLNC reads.

Genome alignment of FLNC reads

After identifying and trimming the primers, UMIs, and polyA tails from the FLNC reads, the remaining sequences were aligned to the Lactuca sativa L. genome (NCBI Lsat_Salinas_v7). This alignment was performed using minimap2 (version 2.20-r1061)³⁵ in spliced alignment mode, employing the parameters: “-ax splice -uf --secondary=no”.

Generation of the gene count matrix

Following the alignment of FLNC reads to the genome, gffcompare (version 0.11.6)³⁶ was utilized to assign the FLNCs were assigned to NCBI Lactuca sativa L. annotation gene models (NCBI Lsat_Salinas_v7). The gene_quantify utility was then employed to produce gene expression data for each sample, leveraging the output from gffcompare and the UMI associated with each FLNC read.

Collapsing redundant isoforms

We utilized the Python script “collapse_isoforms_by_sam.py” from the cDNA_Cupcake package (https://github.com/Magdoll/cDNA_Cupcake). The default parameters for minimum alignment coverage (--min-coverage) and minimum alignment identity (--min-identity) were set to 0.99 and 0.95, respectively. This procedure ensures the production of highly accurate transcripts.

Nonredundant isoform quality filtering, coding frame prediction, and UTR detection

To analyze and ensure the quality of nonredundant isoforms, SQANTI3³⁷ was employed, utilizing NCBI Lactuca sativa L. annotation gene models (NCBI Lsat_Salinas_v7). Isoforms were categorized as either known or novel. For the prediction of CDS coding frames and the identification of UTRs in nonredundant isoforms, SQANTI3 called upon GeneMarkS-T (version 5.1, March 2014).

Updating reference annotation with Iso-Seq isoforms using StringTie

After the collapsing and SQANTI3 quality filtering procedure, the HIT-ISOseq isoforms were integrated with the NCBI Lactuca sativa L. annotation gene models (NCBI Lsat_Salinas_v7 and Lsat_Salinas_v11) using StringTie (version 2.2.1)³⁸ with the following parameters: “--merge input.gff -G ref.gff”. The output of this process was a GTF file containing the updated gene models.

BUSCO assessments for transcript annotations

To evaluate the quality of transcript annotation for both the reference and updated gene models of NCBI Lsat_Salinas_v7 and Lsat_Salinas_v11, we conducted a BUSCO analysis (version 5.4.7)³⁹. This was done using the BUSCO transcriptome mode (-m transcriptome) with OrthoDB v10, enabling auto lineage selection (--auto-lineage-euk). The results were summarized and visualized using the ggplot2 package in R.

Functional annotation analysis

We performed annotation on the HIT-ISOseq updated transcripts using seven functional databases. Transcripts were annotated with the Pfam (v14.6)⁴⁰ library using hmmscan (v3.1b2)⁴¹. Blastn (v2.2.23)³⁴ was used to annotate transcripts with the NT database (downloaded on November 17, 2021). Diamond (v0.8.31)⁴² was employed to annotate transcripts with NR (downloaded on November 17, 2021), KOG (v20090331)⁴³, KEGG (v101)⁴⁴, and Swiss-Prot (release-2020_02)⁴⁵. Blast2GO (v2.5.0)⁴⁶ and NR annotation results were utilized for GO⁴⁷ annotation.

Generation of the isoform count matrix

Following the collapsing and SQANTI3 quality filtering processes, we employed the isoform_quantify utility to determine the expression levels of isoforms in each sample. Isoforms detected with fewer than 5 UMIs across all samples were subsequently filtered out.

Differential expression analysis of genes and isoforms

To identify tissue-specific genes and isoforms, we used the edgeR R package (version 3.32.1)⁴⁸, applying a criterion of logFC >2 and p-value < 0.001. Enrichment analysis of these tissue-specific genes and isoforms, utilizing GO and KEGG annotations, was conducted with the clusterProfiler R package (version 3.18.1)⁴⁹, referencing our updated functional annotation file.

Mfuzz cluster analysis

The tissue-specific isoforms and their corresponding gene’s CPM values for each sample were merged into a single expression matrix. Subsequently, we employed the Mfuzz R package (version 2.50.0)⁵⁰ to cluster the expressed features into three clusters based on the sample time points (leaf, root, and stem). Based on the cluster results, we identified isoforms and their corresponding genes that exhibited inconsistent clustering.

Generation of the isoforms structure view

Isoforms of interest were selected and imported as transcript annotation files into ggtranscript R package (version 0.99.3)⁵¹ to visualize their splicing structures.

Statistics and reproducibility

This study collected two samples from Lactuca sativa L., creating two replicate samples each of roots, leaves, and stems for long-read RNA sequencing. No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. All the statistical details for the long-read RNA sequencing analysis can be found in the figure legends as well as in the Method section.

Data availability

All sequencing data generated in this study are available at NCBI Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under the accession numbers GSE264560. The reference genome and gene annotation file (Lsat_Salinas_v7) were downloaded from NCBI. The data used to create Figure and Supplementary Fig. were deposited in figshare (https://doi.org/10.6084/m9.figshare.25701069.v1)⁵². All other data are available from the corresponding author upon reasonable request.

Code availability

The HIT-ISOseq analysis pipeline and source code are available from https://github.com/shizhuoxing/Lettuce_HIT-ISOseq_Paper. The source code used in this study has also been published at https://zenodo.org/records/10394889⁵³.

References

Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science 345, 1251033 (2014).
Article PubMed PubMed Central Google Scholar
Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics Proteom. Bioinforma. 13, 278–289 (2015).
Article Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Article CAS PubMed Google Scholar
Anvar, S. Y. et al. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol. 19, 46 (2018).
Article PubMed PubMed Central Google Scholar
Dong, L. et al. Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research. BMC Genomics 16, 1039 (2015).
Article PubMed PubMed Central Google Scholar
Abdel-Ghany, S. E. et al. A survey of the sorghum transcriptome using single-molecule long reads. Nat. Commun. 7, 11706 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wang, B. et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat. Commun. 7, 11708 (2016).
Article CAS PubMed PubMed Central Google Scholar
Makita, Y., Kawashima, M., Lau, N. S., Othman, A. S. & Matsui, M. Construction of Para rubber tree genome and multi-transcriptome database accelerates rubber researches. BMC Genomics 19, 922 (2018).
Article PubMed PubMed Central Google Scholar
Yang, N. et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat. Genet 51, 1052–1059 (2019).
Article CAS PubMed Google Scholar
Ma, X. et al. A chromosome-level Amaranthus cruentus genome assembly highlights gene family evolution and biosynthetic gene clusters that may underpin the nutritional value of this traditional crop. Plant J. 107, 613–628 (2021).
Article CAS PubMed Google Scholar
Zhang, R. et al. A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis. Genome Biol. 23, 149 (2022).
Article CAS PubMed PubMed Central Google Scholar
Xu, Z. et al. Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis. Plant J. 82, 951–961 (2015).
Article CAS PubMed Google Scholar
Chen, J. et al. Full-length transcriptome sequences and the identification of putative genes for flavonoid biosynthesis in safflower. BMC Genomics 19, 548 (2018).
Article PubMed PubMed Central Google Scholar
Zhang, B., Liu, J., Wang, X. & Wei, Z. Full-length RNA sequencing reveals unique transcriptome composition in bermudagrass. Plant Physiol. Biochem 132, 95–103 (2018).
Article CAS PubMed Google Scholar
Filichkin, S. A. et al. Abiotic Stresses Modulate Landscape of Poplar Transcriptome via Alternative Splicing, Differential Intron Retention, and Isoform Ratio Switching. Front Plant Sci. 9, 5 (2018).
Article PubMed PubMed Central Google Scholar
Zhang, G. et al. PacBio full-length cDNA sequencing integrated with RNA-seq reads drastically improves the discovery of splicing transcripts in rice. Plant J. 97, 296–305 (2019).
Article CAS PubMed Google Scholar
Schaarschmidt, S. et al. Utilizing PacBio Iso-Seq for Novel Transcript and Gene Discovery of Abiotic Stress Responses in Oryza sativa L. Int J. Mol. Sci. 21, 8148 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. High-resolution profile of transcriptomes reveals a role of alternative splicing for modulating response to nitrogen in maize. BMC Genomics 21, 353 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hu, Z. et al. Full-Length Transcriptome Assembly of Italian Ryegrass Root Integrated with RNA-Seq to Identify Genes in Response to Plant Cadmium Stress. Int J. Mol. Sci. 21, 1067 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, S. et al. Global identification of full-length cassava lncRNAs unveils the role of cold-responsive intergenic lncRNA 1 in cold stress response. Plant Cell Environ. 45, 412–426 (2022).
Article CAS PubMed Google Scholar
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Article CAS PubMed PubMed Central Google Scholar
Shi, Z. X. et al. High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing. Nat. Commun. 14, 2631 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lelivelt, C. L. C. et al. Stable plastid transformation in lettuce (Lactuca sativa L.). Plant Mol. Biol. 58, 763–774 (2005).
Article CAS PubMed Google Scholar
Reyes-Chin-Wo, S. et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, 14953 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wei, T. et al. Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce. Nat. Genet 53, 752–760 (2021).
Article CAS PubMed Google Scholar
Zhang, L. et al. RNA sequencing provides insights into the evolution of lettuce and the regulation of flavonoid biosynthesis. Nat. Commun. 8, 2264 (2017).
Article PubMed PubMed Central Google Scholar
Al’Khafaji, A. M. et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat. Biotechnol. 42, 582–586 (2024).
Article PubMed Google Scholar
Guo, Z. et al. LettuceGDB: The community database for lettuce genetics and omics. Plant Commun. 4, 100425 (2023).
Article CAS PubMed Google Scholar
Zhou W., et al. LettuceDB: an integrated multi-omics database for cultivated lettuce. Database (Oxford) 2024, baae018 (2024).
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, 1–6 (2018).
Article PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinforma. 10, 421 (2009).
Article Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pertea G., Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res. 9, ISCB Comm J-304 (2020).
Tardaguila, M. et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 28, 396–411 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol. Biol. 1962, 227–245 (2019).
Article CAS PubMed Google Scholar
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44, D279–D285 (2016).
Article CAS PubMed Google Scholar
Johnson, L. S., Eddy, S. R. & Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinforma. 11, 431 (2010).
Article Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Koonin, E. V. et al. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 5, R7 (2004).
Article PubMed PubMed Central Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31, 365–370 (2003).
Article CAS PubMed PubMed Central Google Scholar
Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innov. (Camb.) 2, 100141 (2021).
CAS Google Scholar
Kumar, L. M EF. Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2, 5–7 (2007).
Article PubMed PubMed Central Google Scholar
Gustavsson, E. K., Zhang, D., Reynolds, R. H., Garcia-Ruiz, S. & Ryten, M. ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2. Bioinformatics 38, 3844–3846 (2022).
Article CAS PubMed PubMed Central Google Scholar
Shi Z. X. et al. HIT-ISOseq paper. figshare. Dataset. https://doi.org/10.6084/m9.figshare.25701069.v1 (2024)
Shi Z. X. et al. Source code for the lettuce HIT-ISOseq analysis paper. Zenodo https://doi.org/10.5281/zenodo.12514652 (2024).

Download references

Acknowledgements

This work received support from the National Natural Science Foundation of China (no. 42030713 to C.-H.M.; no. 42177187, 42077300 to L.X.; and no. 42107148 to Z.-X.S., 42377236 to Y.-W.L), Fundamental Research Funds for the Central Universities (no. 21622109 to L.X.), and the USDA Hatch project (no. HAW5032-R to Q.-X.L.). The test tubes shown in Fig. 1A were generated using Servier Medical Art (https://smart.servier.com/), which is licensed under a Creative Commons Attribution 4.0 Unported License (https://creativecommons.org/licenses/by/4.0/).

Author information

These authors contributed equally: Zhuo-Xing Shi, Lei Xiang, Hai-Ming Zhao.
These authors jointly supervised this work: Chong Tang, Ce-Hui Mo.

Authors and Affiliations

Guangdong Provincial Research Center for Environment Pollution Control and Remediation Materials, College of Life Science and Technology, Jinan University, Guangzhou, 510632, China
Zhuo-Xing Shi, Lei Xiang, Hai-Ming Zhao, Lang-Qi Yang, Yu-Qing Pu, Yan-Wen Li, Quan-Ying Cai, Bai-Lin Liu, Nai-Xian Feng, Hui Li & Ce-Hui Mo
BGI Genomics, BGI-Shenzhen, Shenzhen, China
Zhuo-Xing Shi, Zhi-Chao Chen, Bei Luo & Chong Tang
Department of Molecular Biosciences and Bioengineering, University of Hawaii at Manoa, Honolulu, HI, 96822, USA
Qing X. Li

Authors

Zhuo-Xing Shi
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Ming Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lang-Qi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Chao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Qing Pu
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Wen Li
View author publications
You can also search for this author in PubMed Google Scholar
Bei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Quan-Ying Cai
View author publications
You can also search for this author in PubMed Google Scholar
Bai-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Nai-Xian Feng
View author publications
You can also search for this author in PubMed Google Scholar
Hui Li
View author publications
You can also search for this author in PubMed Google Scholar
Qing X. Li
View author publications
You can also search for this author in PubMed Google Scholar
Chong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Ce-Hui Mo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.-H.M., L.X., and Z.-X.S. conceived and designed the project. C.T., Z.-C.C., Z.-X.S., and B.L. developed the experimental technology; L.X., and N.-X.F. collected the lettuce samples; C.T. and Z.-C.C., and B.L. performed sequencing experiments. C.T., Z.-X.S., H.L., and Q.-Y.C. guided bioinformatics analyses; Z.-X.S. developed the data analysis pipeline and wrote the documentation. Z.-X.S., Y.-Q.P., H.-M.Z., B.-L.L., and L.-Q.Y. performed the informatics analysis; Y.-Q.P and L.-Q.Y. coordinated data release and assisted with executing the pipeline. C.-H.M., L.X., Z.-X.S., Y.-W.L., and Q.-X.L. wrote the manuscript and created the figures. All authors have read and approved the final version of this manuscript.

Corresponding authors

Correspondence to Chong Tang or Ce-Hui Mo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Lianfeng Gu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: David Favero and Mengtan Xing. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Supplementary Information

nr-reporting-summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shi, ZX., Xiang, L., Zhao, HM. et al. High-throughput single-molecule long-read RNA sequencing analysis of tissue-specific genes and isoforms in lettuce (Lactuca sativa L.). Commun Biol 7, 920 (2024). https://doi.org/10.1038/s42003-024-06598-4

Download citation

Received: 17 January 2024
Accepted: 17 July 2024
Published: 31 July 2024
DOI: https://doi.org/10.1038/s42003-024-06598-4
Springer Nature Limited

High-throughput single-molecule long-read RNA sequencing analysis of tissue-specific genes and isoforms in lettuce (Lactuca sativa L.)

Abstract

Similar content being viewed by others

A survey of the sorghum transcriptome using single-molecule long reads

Unveiling the transcriptomic complexity of Miscanthus sinensis using a combination of PacBio long read- and Illumina short read sequencing platforms

PacBio single-molecule long-read sequencing shed new light on the transcripts and splice isoforms of the perennial ryegrass

Introduction

Results

Experimental design

Performance of sequencing runs

HIT-ISOseq improves the annotation of reference transcripts and functional annotations for lettuce

HIT-ISOseq deciphers tissue-specific expressed genes in lettuce

HIT-ISOseq deciphers tissue-specific expressed isoforms in lettuce

Experimental validation of tissue-specific isoform expression

Discussion

Methods

Plant material and growth condition

RNA isolation and cDNA preparation

NGS RNA-seq library preparation and sequencing

Amplification of cDNA for PacBio library construction

USER cloning-based ligation of multiple inserts

Preparing PacBio SMRTbell templates and conducting sequencing

Validation of Isoform Expression by RT-qPCR

Short-read RNA-seq data analysis

Generation of CCS reads

Generation of FLNC reads

Genome alignment of FLNC reads

Generation of the gene count matrix

Collapsing redundant isoforms

Nonredundant isoform quality filtering, coding frame prediction, and UTR detection

Updating reference annotation with Iso-Seq isoforms using StringTie

BUSCO assessments for transcript annotations

Functional annotation analysis

Generation of the isoform count matrix

Differential expression analysis of genes and isoforms

Mfuzz cluster analysis

Generation of the isoforms structure view

Statistics and reproducibility

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Peer Review File

Supplementary Information

nr-reporting-summary

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation