Advances in long-read single-cell transcriptomics

Kumari, Pallawi; Kaur, Manmeet; Dindhoria, Kiran; Ashford, Bruce; Amarasinghe, Shanika L.; Thind, Amarinder Singh

doi:10.1007/s00439-024-02678-x

Advances in long-read single-cell transcriptomics

Review
Open access
Published: 24 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Human Genetics Aims and scope Submit manuscript

Advances in long-read single-cell transcriptomics

Download PDF

Pallawi Kumari¹,
Manmeet Kaur¹,
Kiran Dindhoria¹,
Bruce Ashford²,
Shanika L. Amarasinghe^3,4 &
…
Amarinder Singh Thind^2,5

1635 Accesses
2 Altmetric
Explore all metrics

Abstract

Long-read single-cell transcriptomics (scRNA-Seq) is revolutionizing the way we profile heterogeneity in disease. Traditional short-read scRNA-Seq methods are limited in their ability to provide complete transcript coverage, resolve isoforms, and identify novel transcripts. The scRNA-Seq protocols developed for long-read sequencing platforms overcome these limitations by enabling the characterization of full-length transcripts. Long-read scRNA-Seq techniques initially suffered from comparatively poor accuracy compared to short read scRNA-Seq. However, with improvements in accuracy, accessibility, and cost efficiency, long-reads are gaining popularity in the field of scRNA-Seq. This review details the advances in long-read scRNA-Seq, with an emphasis on library preparation protocols and downstream bioinformatics analysis tools.

Transcriptomics at the Single Cell Level and Human Diseases: Opportunities and Challenges in Data Processing and Analysis

Transcriptome Sequencing (RNA-Seq)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

During the past two decades, short-read bulk RNA-Seq (NGS) has been widely used for biological research, particularly because of its cost-effectiveness in quantifying the expression of genes on a genome-scale and identifying novel genes, in comparison to microarrays (Conesa et al. 2016; Rao et al. 2019; Thind, et al. 2021). However, due to the innate short 150-300 bp fragment length of reads, short reads still struggle to identify complex transcriptome events such as alternative splicing and gene fusions that result in incomplete transcript reconstruction thus making it challenging to accurately analyse data on the transcriptome level (Midha et al. 2019; Deshpande et al. 2023) (Steijger et al. 2013; Angerer et al. 2017; Lähnemann et al. 2020; Adil et al. 2021). In comparison, recent advancements in long-read technologies such as the ones that use voltage-driven protein nanosensor technology (e.g., Oxford Nanopore Technology; ONT) or fluorescent sequential binding (e.g., Pacific BioSciences; PacBio) (Mantere et al. 2019; MacKenzie and Argyropoulos 2023) allow construction of complete transcript isoforms due to the possibility of reads spanning over entire transcripts (Payne et al. 2019; Wang et al. 2021a). A host of studies using long-read sequencing have shown that there is more isoform diversity in the human genome than anticipated (Zhou et al. 2019; Wright et al. 2022; Heberle, et al. 2023; Page, et al. 2024). Such studies show the role of these novel isoform in various diseases (Ray et al. 2020; Huang et al. 2021; Veiga, et al. 2022; Paoli-Iseppi, et al. 2024). Veiga, et al. (2022) employing long-read sequencing, not only pinpointed approximately 300 breast tumor-specific splicing events, showcasing the technology's enhanced splicing event detection, but also highlighted that a few of these events are significantly correlated with patient survival. Apart from generating longer reads than traditional NGS, long-read technologies also possess the ability to give technology-specific novel insights, such as RNA modifications and detection of expressed mutations in RNA for genotyping.

Bulk RNA-Seq, whether short or long-read fails to capture cellular heterogeneity and RNA extracted from a mixed cell population only provides an average gene expression profile (Thind, et al. 2021). This averaging can mask or dilute the expression signals of specific cell types, leading to challenges in accurately identifying differential gene expression between different conditions or groups. This limitation is particularly evident in cancer studies, where tumor samples often contain a mixture of tumor cells and surrounding healthy cell types. Only samples with high tumor purity can yield accurate detection of gene expression signals specific to the tumor cells, while lower purity samples may obscure these signals within the averaged expression profile. The discovery of single-cell transcriptomics (scRNA-Seq) in 2009, offered greater insight into complex biological system including tumour heterogeneity (Hwang et al. 2018), novel cell types (Stuart, et al. 2019), drug-resistant clones (Adewale 2020), role of sub-cellular population in disease progression (Shi et al. 2023), treatment response (Volden and Vollmers 2022), and overall cellular function (Hazzard et al. 2022; Thijssen et al. 2022). Over time, new single-cell library preparation protocols and analysis methods have been developed with the aim of reducing sequencing costs and/or increasing throughput (Healey et al. 2022; Yang, et al. 2023). Short read single cell sequencing is being done more frequently in the past decade than long read sequencing, primarily due to lower cost and faster turnaround times.

Similar to short-read bulk RNA-Seq, short-read single-cell RNA-Seq has challenges in identifying isoforms and gene fusions at the single-cell level (Steijger et al. 2013; Angerer et al. 2017; Lähnemann et al. 2020; Adil et al. 2021). Other issue includes incomplete 3′-UTR annotations due to its bias towards the 3′ end (Healey et al. 2022), which could give misleading impressions about their expression in different cell types. Cell type identification analysis relies on finding patterns in highly variable genes to cluster cell types (Stuart et al. 2019). If these crucial genes are systematically missing, it can lead to mistakes in analysis and obscure the true biological picture. In contrast, long-reads have allowed capturing the isoform diversity even at single-cell resolution which can improve 3′-UTR annotations, hence cell type identification (Adewale 2020; Volden and Vollmers 2022; Shi et al. 2023; Healey et al. 2022). Despite the initial limitations such as higher cost, and elevated error rates of early long-read technologies, their popularity is now growing for single-cell applications with advancements in the field. For example, Thijssen et al. (2022) applied a novel nanopore single-cell approach to identify mutations and alternative transcripts in specific sub-clones of the tumour at relapse. In another study, Hazzard et al. (2022) used PacBio sequencing with 10X Chromium scRNA-Seq libraries to reveal the isoform diversity of Plasmodium vivax transcripts at different developmental stages. Healey et al. (2022) conducted single-cell isoform sequencing alongside single-cell RNA sequencing, quickly enhancing 3′-UTR annotations. They demonstrated that gene models derived from a minimal embryonic single-cell isoform sequencing dataset retained 26.1% more single-cell RNA sequencing reads compared to gene models from Ensembl alone, consequently amplifying the reads identified for numerous genes. Further, long-read scRNA-Seq can accurately predict gene fusion events at single cell level (Yang, et al. 2023) and facilitate the identification structural variants (Mahmoud et al. 2019). Subsequently, PacBio has introduced MAS-Iso-seq, a new programmable concatenation of DNA fragments into long library molecules which enables single-cell sequencing with high throughout (14–18 fold increase) compared to their Iso-Seq approach (Al’Khafaji et al. 2023). However, these technologies are yet to overcome the limitations of traditional NGS scRNA-Seq across several domains including higher throughput (Boldogkői et al. 2019; Amarasinghe et al. 2020; Oikonomopoulos et al. 2020).

In this review, we aim to give a snapshot of available long-read sequencing technologies, current long-read single-cell isolation protocols, and downstream bioinformatics analysis methods long-read scRNA-Seq data (Fig. 1).

Currently available long-read sequencing technologies

PacBio and ONT, being the two dominant platforms in long-read technologies, offer a range of platforms for sequencing, e.g., REVIO, Sequel, Sequel II, SEQUEL from PacBio and MinION, GridION, and PromethION devices from ONT (Amarasinghe et al. 2020). PacBio relies on its proprietary single-molecule, real-time (SMRT) sequencing technology that uses a sequencing-by-synthesis method. PacBio sequencing technology uses HiFi reads that provide accuracy of 99.9%, on par with short reads and Sanger sequencing (Logsdon et al. 2020). With redesigned SMRT cell and improved sequencing chemistry than its predecessors, Sequel II now allows generation of data up to 10 Gb per SMRT cell (Logsdon et al. 2020). Additionally, Sequel II circular consensus sequencing (CCS) improves the read accuracy by sequencing the same SMRT molecule iteratively (Dijk 2023). The Sequel II with CCS has the highest accuracy of the three PacBio instruments, with accuracy rates approaching 99.9%. It is recorded that with the novel MAS-Iso-Seq approach, Sequel-II can produce up to 40 million cDNA reads per run (Al’Khafaji et al. 2023).

ONT uses a nanopore-based sequencing method (Wang et al. 2021a). This concept revolves around a process where each DNA or RNA molecule travels through a nanometre scaled protein pore in an electrophysiological solution. As it passes, ions move with it, their concentrations varying according to the molecule's nucleotide composition (Midha et al. 2019). This activity results in identifiable patterns of current fluctuations that correspond to the sequence of nucleotides. By analysing the digitalised data of these current levels, pretrained artificial neural networks can accurately predict the resulting sequence. Since this approach does not rely on imaging required by popular sequencing-by-synthesis methods, sequencing devices could be significantly reduced in size. Having the ability to be either portable with low throughput and low cost or high throughput due to large number of flow cells makes ONT technology popular in metagenomic and microbial research (Zhu 2020) as well as human genome-based research (Bowden et al. 2019). Currently, there are attempts to figure out the usefulness of shallow sequencing (Milanez-Almeida et al. 2020; Mock et al. 2023) or targeted sequencing (Gilpatrick et al. 2020; McClinton et al. 2023) that would allow low throughput ONT machines (e.g., minION) to be used in clinical setting.

ONT PromethION generates more reads per flow cell than the PacBio Sequel II and offers the longest reads, with some reads exceeding 2 Mb in length (Payne et al. 2019). ONT has a higher error rate compared to PacBio, with an error rate of 10–15% (Table 1), although this can be mitigated by using base calling algorithms and consensus approaches (Fu et al. 2019). The cost per base for ONT is lower than PacBio, although the difference in cost can vary depending on the specific application and sequencing strategy (Dijk 2023).

Table 1 Features of available long-read sequencing instruments

Full size table

Long read sequencing for RNA modifications and expressed mutations in RNA for genotyping

Long-read technologies not only produce longer reads compared to traditional NGS methods but also offer unique, technology-specific insights. For example, ONT can potentially detect the RNA modifications, such as m6A and pseudo uridine, directly within RNA molecules (Leger et al. 2021) (Felton, et al. 2022; Dorney, et al. 2023). Traditional methods like mass spectrometry, NGS and biophysical assays that are widely used to identify RNA modifications have their own drawbacks (Helm and Motorin 2017). While mass spectrometry provides limited site-specific information, NGS can only detect a limited number of abundant modifications, and biophysical assays share both these limitations (Helm and Motorin 2017). Nanopore direct RNA sequencing has the ability to directly sequence entire, unaltered RNA molecules without requiring reverse transcription (RT) or amplification steps (Garalde et al. 2018). This innovation has revealed a hidden advantage: Nanopore data inherently captures information about modifications present within the RNA molecules themselves (Smith et al. 2019; Workman et al. 2019). This advantage undoubtedly unlocks a new level of detail in the field of epi-transcriptomics. Many software tools are currently being developed for the accurate detection of these RNA modification from ONT data (Furlan et al. 2021; Stephenson, et al. 2022).

PacBio has the advantage of generating more accurate long-reads due to its circular consensus sequencing. In brief, the PacBio libraries prepared in their proprietary SMRT Bell format will get repeatedly read by the sequencer. This enables the sequencing of the same read repeatedly which will then get collapsed into a random “error-free” consensus sequence during downstream analysis. This approach, however, has the drawbacks of having to use more storage for storing raw reads and making sequencing more expensive and time consuming than ONT. Nevertheless, PacBio RNA sequencing has accurately allowed the detection of expressed mutations in RNA for genotyping and neo-antigen discovery (Cavelier et al. 2015).

Tissue dissociation and library preparation for long-read scRNA-Seq

Suitability of various tissues for scRNA-Seq libraries preparation

Generating a homogeneous single-cell suspension from the tissue of interest is the first step of any scRNA-Seq pipeline (Gross et al. 2015). To do this isolation of viable cells from the target tissue at an optimal dissociation level (i.e., a balance between releasing cell types that are difficult to dissociate while avoiding damage to those that are fragile) is required (Denisenko et al. 2020; Burja et al. 2022). The choice of input tissue type suitable for scRNA-Seq depends on several factors, including the availability of adequate amounts of high-quality tissue, the goal of the experiment, and the method used for tissue preservation. Fresh-frozen tissue is often considered the gold standard for scRNA-Seq experiments as it provides the highest quality RNA (Yin et al. 2021). Harvested tissue is immediately frozen in liquid nitrogen after collection and stored at −80 °C to minimize RNA degradation. Whole-cell scRNA-Seq approaches and single-nucleus RNA sequencing protocols (snRNA-Seq) are two widely used approaches of sc transcriptomics for various tissue types (Slyper et al. 2020). Whole-cell scRNA-Seq approaches offer several advantages over snRNA-Seq, including comprehensive transcriptome analysis, the ability to study cytoplasmic transcripts, cell-type-specific gene expression patterns, intracellular interactions, and higher-quality RNA (Jovic et al. 2022). However, it has several disadvantages compared to snRNA-Seq, including limited resolution of nuclear gene expression, higher levels of background noise (due to transcripts from the cytoplasm and organelles), RNA degradation & more difficult tissue preparation (isolation of whole cells is more difficult than nuclei and may cause more mechanical damage), and challenges studying specific cell types or subpopulations of cells (Nguyen et al. 2018) (Bakken et al. 2018). According to a study conducted on the human liver, snRNA-Seq was more effective in detecting many other cell subtypes, including cholangiocytes and mesenchymal cells. These cell types are typically challenging to distinguish by scRNA-Seq alone (Andrews et al. 2022). Nuclei isolation can be used to analyse gene expression from fixed and/or formalin-fixed paraffin-embedded (FFPE) tissue, which may not be possible with whole-cell scRNA-Seq approaches (Rousselle et al. 2022). Nuclei isolation is also a useful approach for studying rare cell types (Wu et al. 2019). Therefore, it is important to consider the specific goals of the experiment and the quality of the tissue available when deciding which approach to use (Sant et al. 2023).

snRNA-Seq protocols can be applied to snap-frozen samples, avoiding many of the dissociation-related artifacts (Krishnaswami et al. 2016; Lake et al. 2016). It is more difficult to isolate live single cells upon thawing complete tissue that has been frozen, hence it is preferable in this situation to remove the nuclei from the tissue before it is snap frozen. Single-nuclei methods also permit the profiling of nuclei from large cells (> 40 μm) that do not fit through the microfluidics. When there is availability for fresh tissue, manual isolation is a suitable platform for undertaking scRNA-Seq. SnRNA-Seq utilizing a droplet-based technique is appropriate when there are frozen samples available, and isolation of viable whole cells is difficult (Martelotto 2019). Compared to isolating cells from fresh tissues, dissected tissues are frozen using either the snap-freeze or slow-freeze techniques that reduce overall cell recovery. If a well-optimized tissue dissociation protocol is already available, then dissociation followed by cryopreservation is preferred. However, as per 10 × Genomics recommendation, if the tissue dissociation protocol is not optimized, then it may be better to snap-freeze the tissue whole (10xGenomics, Are fresh frozen tissue samples compatible with Single Cell RNA sequencing? https://kb.10xgenomics.com/hc/en-us/articles/360019890851-Are-fresh-frozen-tissue-samples-compatible-with-Single-Cell-RNA-sequencing-, Accessed Sept 2023).

Formalin-fixation (and paraffin-embedding–FFPE) of tissue is a commonly used preservation method, but it is not ideal for scRNA-Seq experiments due to the potential for RNA degradation and crosslinking during the fixation process (Chung, et al. 2022). However, the greatest collection of clinically documented human samples can be found as FFPE tissue. FFPE-fixed single cell sequencing can be attempted using labour-intensive, slow, insensitive, and low-resolution techniques, making it difficult to fully utilize the huge research and clinical potential of these samples. Despite these challenges, some scRNA-Seq experiments have been successfully performed on FFPE tissue (Vallejo, et al. 2022). FFPE tissue types suitable for scRNA-Seq include cancer tissues, where the goal is to study the heterogeneity of tumour cells, and tissues with low cell numbers or rare cell types, where preservation in FFPE may be the only option (Gao et al. 2020). Single nuclei extraction from FFPE specimens can be profiled using the effective and sensitive high-throughput technique known as single nuclei pathology sequencing (snPATHO-Seq) (Vallejo, et al. 2022). The 10 × Genomics probe-based technology that targets the entire transcriptome is combined with an improved protocol for extracting nuclei from archived samples to undergo snPATHO-Seq. In summary, fresh-frozen tissue is considered the best choice for scRNA-Seq experiments, but FFPE tissue can also be used.

Library preparation for long-read scRNA-Seq

Like other scRNA-Seq, even for long-read sequencing technologies like ONT or PacBio, both unique molecular identifiers (UMI) and barcodes are typically used in combination (but not always) to enable accurate and reliable quantification of gene expression at the single-cell level (Table 2). UMIs allow for the reduction of PCR and sequencing errors, while barcodes enable the identification of individual cells and their respective RNA molecules. Together, these features facilitate the generation of accurate and high-quality scRNA-Seq data. However, the assignment of barcodes and UMIs is more challenging for long-reads than for short-reads because long-read sequencing platforms have a higher error rate. Long-read can span multiple genes or transcripts, which can lead to ambiguity in assigning UMIs to specific genes or transcripts. Dropout occurs when a transcript is not detected in a particular cell due to technical noise or low sequencing depth. This can occur more frequently with long-read sequencing technologies due to the increased technical noise associated with these methods because they generate reads that are more error-prone due to factors such as polymerase errors, signal variability, and template degradation (Amarasinghe et al. 2020; Prawer, et al. 2023). The advanced protocols have addressed library preparation challenges in single-cell long-read sequencing by introducing innovative strategies for error correction and increasing base accuracy (Table 2). This advancement overcomes the issues associated with barcode and UMI assignment in long-read sequencing platforms, ultimately providing more accurate and robust results for single-cell analysis than before (Figure 3).

Table 2 Currently available long-read scRNA-Seq library preparation protocols

Full size table

Handling library preparation artifacts

The template switching oligo (TSO) artifact is a potential issue in single-cell RNA sequencing (scRNA-Seq) methods that use the Smart-seq2 protocol or other similar protocols. TSO are short oligonucleotides that are added during the reverse transcription step of scRNA-Seq library preparation to prime the synthesis of the complementary DNA (cDNA) strand from the template RNA (Fig. 2). The TSO artifact occurs when the TSO anneals to the cDNA strand synthesized during reverse transcription and acts as a template for the synthesis of a new cDNA strand, leading to the incorporation of extra nucleotides into the cDNA sequence (Picelli 2017). For example, a threshold can be set on the length of the reads to remove those that are likely to contain the TSO sequence. Additionally, statistical models can be used to infer the correct cell barcodes based on the distribution of read counts and UMIs across the barcodes (You et al. 2023).

Unlike the Smart-seq2 protocol, the 10 × Genomics Chromium system uses a unique barcoding technology to capture the RNA molecules. In single-cell 3' RNA sequencing using 10 × Genomics, a small amount of the libraries is expected to have the TSO sequence at the beginning of the second read (10xGenomics, Why do a fraction of my Visium reads contain the Template Switch Oligo (TSO) at the beginning of Read 2? https://kb.10xgenomics.com/hc/en-us/articles/360041690731-Why-do-a-fraction-of-my-Visium-reads-contain-the-Template-Switch-Oligo-TSO-at-the-beginning-of-Read-2, Accessed Sept 2023). However, if a large fraction of the library has the TSO sequence, this could indicate a problem with the library preparation such as (a) Significantly shorter cDNA or cDNA degradation than expected before starting the reverse transcription reaction, (b) TSO sequence is not efficiently removed from the cDNA construct during library preparation (10xGenomics, Why do a fraction of my Visium reads contain the Template Switch Oligo (TSO) at the beginning of Read 2? https://kb.10xgenomics.com/hc/en-us/articles/360041690731-Why-do-a-fraction-of-my-Visium-reads-contain-the-Template-Switch-Oligo-TSO-at-the-beginning-of-Read-2, Accessed Sept 2023).

Modified short-read scRNA-Seq library preparation protocols for long-reads

10 × Genomics, Smart-seq2, Tn5Prime (modified Smart-seq2) are some popular methods for scRNA-Seq library preparation. Tn5Prime is a modification of the Smart-seq2 protocol that uses the Tn5 transposase enzyme to simultaneously fragment and tag RNA molecules (Picelli et al. 2014; Picelli 2017; Cole et al. 2018). The main difference between 10 × Genomics scRNA-Seq and Smart-seq2 is in the way cells are processed and barcoded (Baran-Gale et al. 2018). In 10 × Genomics scRNA-Seq, cells are partitioned into droplets containing unique barcodes and molecular identifiers, enabling the capture and amplification of individual cells in a high-throughput manner. In contrast, Smart-seq2 uses a plate-based method in which individual cells are isolated and lysed in individual wells of a 96- or 384-well plates, and cDNA is synthesized using oligo (dT) primers. Furthermore, the original 10 × chromium platform does not generate long-reads but short-reads of typically 50–100 bp length from the 3' end of transcripts, while Smart-seq2 generates long-read (up to several kilobases) from the entire length of transcripts with higher sensitivity than 10x (Lebrigand et al. 2020; Tian et al. 2021). 10 × however has low cost per cell compared to Smart-seq2 (Fig. 3).

To be useful for long-read sequencing modifications have been made to the 10 × Genomics protocol with the use of specific adapters, modifications to the reverse transcription step, and optimisation to the reaction conditions etc. (Jabbari and Tian 2019; Lebrigand et al. 2020; Tian et al. 2021). For example, before PromethION flow cell (Oxford Nanopore) sequencing, Lebrigand et. al re-amplified the 10 × Genomics PCR product for eight cycles with different primers which contain Ns at the 5′ ends to avoid the preferential generation of reverse Nanopore reads (Lebrigand et al. 2020). Similarly, Jabbari et. al uses subsampling of 10 × Chromium generated single cell Gel Bead-in-Emulsions (GEMs) after reverse transcription for cDNA amplification and their protocol enables transcriptome sequencing of full-length cDNA from a flexible number of single cells captured on Chromium using long-read sequencers (Tian et al. 2021) and in some of their studies they followed standard 10 × Genomics user guide, with RT time increased to 2 h to potentially increase the reverse transcription of longer transcripts (Tian et al. 2021). The RAGE-Seq method, developed by Singh et al. (2019) builds upon the 10 × Chromium Single Cell 3' protocol with the addition of two extra PCR cycles. These cycles enable a 1:1 full-length cDNA split, with one portion utilized for Nanopore long-read sequencing and the other for short-read sequencing. Single-Cell Isoform RNA sequencing -Seq (ScISOr-Seq) (Gupta et al. 2018) is another example of long-read scRNA-Seq methods that have been developed by tweaking the 10 × Genomics protocol. ScISOr-Seq combines the 10 × Genomics protocol with Iso-Seq, a method developed by PacBio for full-length transcript sequencing. In ScISOr-Seq, single cells are captured using the 10 × Genomics protocol and then subjected to Iso-Seq library preparation to generate full-length cDNA sequences.

In terms of the modifications to the plate-based approaches, instead of the TSO used for the short-read protocol, a new template-switching oligonucleotide (LMO-PCR TSO) is used to capture, and reverse transcribe the mRNA in Smart-seq2. This TSO has a UMI, a poly (A) tail, and a hairpin adapter that facilitates the ligation of the cDNA to the sequencing adapter. The resulting cDNA library is amplified by long-range PCR to generate long fragments, which then can be sequenced on the long-read platforms. Some methods use a targeted transcript capture approach to enrich full-length transcripts such as HIT-scISOseq that uses biotinylated PCR primers, and a library preparation procedure that combines head-to-tail concatemeric full-length cDNAs into a long SMRTbell insert for high-accuracy PacBio sequencing (Shi et al. 2023). This allows for more efficient use of sequencing reads and reduces the sequencing cost, compared to other techniques that rely on the random sampling of transcripts.

Handling sequencing depth

Single-cell sequencing typically requires a much greater sequencing depth compared to bulk long-read sequencing due to factors such as the desired number of reads per cell per sample, the isoforms diversity amongst cells (Sims et al. 2014; Rizzetto et al. 2017). It is known that increased read length improves read-to-transcript assignment (Chen, et al. 2021), which means that long-read sequencing may warrant less sequencing depth compared to short-read sequencing to gain the same isoform identification ability. For example, reads per cell obtained by Volden and Vollmers (2022) using the 10X Genomics platform and R2C2 were five times less than the number of short reads per cell recommended by 10X Genomics. Yet, this count of around 4000 R2C2 reads per cell managed to capture around 67% of the molecules present in a comprehensively sequenced Illumina dataset derived from the same cDNA. This level of coverage has been reported as adequate for cell type clustering and generating single-cell transcriptomes. However, based on the aim of study for sequence depth, one should consider factors such as sub-population rarity, desired resolution, and transcript complexity to measure etc. (Zhang et al. 2020; Denyer and Timmermans 2022). The number of expressed genes per cell at a particular time can vary. For instance, in well-characterized and differentiated human cells, it can range from a few thousand to the entire transcriptome (20,000 to 24,000 genes) (Ramsköld et al. 2009). So, other factors must also be considered, such as developmental stage, environmental conditions, and cell type etc.

Bioinformatics tools and pipelines

The standard pipeline for single-cell long-read analyses involves quality control, read mapping, alignment correction, Cell barcode and UMI processing, gene expression quantification, batch effect correction, normalization, imputation, dimensionality reduction, feature selection, cell type clustering and annotations. The raw data for ONT long-reads are mostly stored as .fast5 files and are 5–7 fold larger in size than raw data for short-reads with same depth which are converted and stored as .fastq files. For ONT data particularly, there needs to be a “base calling” step that would convert these .fast5 files into .fastq files. However, there have been attempts made to reduce the size of the ONT produced raw data (Gamaarachchi et al. 2022). Furthermore, there are tools that can utilise the native (raw) ONT data as input, currently tested for smaller genomes. Here we will discuss some similarities and differences between long and short read scRNA-Seq data analysis as well as popular specialised tools designed to work with long-read scRNA-Seq. Therefore, while data processing downstream is not vastly different between long-read and short read scRNA-Seq alignment and count matrix generation require specialised tools for long-read analysis.

Data processing of short-read vs long reads RNA-Seq

Alignment of long reads is more challenging due to the inherent higher error rates in long-read technologies, especially for ONT (Križanović et al. 2018). Specialized aligners like minimap2 (Li 2021) are often used, which can handle insertions, deletions, and other errors present in long reads, while for short reads, splice-aware aligners such as HISAT2 (Kim et al. 2019) or STAR (Dobin et al. 2013) can efficiently be used. Transcriptome assembly using long reads enables the identification of novel transcript & alternative splicing events (Moreno-Santillán et al. 2019), identification of foreign RNAs, intra-species gene-fusion transcripts. Some pipelines like SQANTI3 and FLAIR use combination of alignment-based (STAR/Kallisto and Minimap2 respectively) and subsequent de novo assembly to collapse long reads and get isoforms (Tang et al. 2020; Pardo-Palacios 2024). Conversely, other pipelines, such as RNA-Bloom2, employ a reference-free transcriptome assembly approach (Nip et al. 2023). Differential expression (DE) analysis is a common downstream step to identify genes with statistically significant changes in expression between conditions.

Data processing of long-read for bulk RNA-Seq vs scRNA-Seq

Compared with bulk RNA-Seq, scRNA-Seq requires additional steps like cell barcode processing to identify transcripts originating from individual cells and filtering out empty droplets (containing no cells) or cells with low RNA content. Due to poor per-base sequencing accuracy in long reads, long-reads based tools strategies differ from short reads (explained below). Additionally, some protocols that involve Unique Molecular Identifier (UMI) require processing of it to account for amplification bias. Although, the same aligner can be used for both bulk and single cell, however the presence of cell barcodes and UMIs within the reads require the aligner to be able to handle them effectively (Kaminow et al. 2021). Some short-read aligners specifically designed for scRNA-Seq (e.g., STARsolo (Kaminow et al. 2021), CeleScope (https://github.com/singleron-RD/CeleScope), kallisto | bustools (Melsted et al. 2021)) incorporate functionalities to account for these sequences (with UMI and Barcodes) during the alignment process (Table 3). Processing barcode from long-read sequencing data is bit challenging especially from ONT, however this is improved significantly with time. We discussed long read bar-code processing in details below.

Table 3 Different aspects of short-read and long read Seq

Full size table

Downstream analyses specific to single-cell RNA sequencing (scRNA-seq) delve deeper into cellular heterogeneity. Techniques such as dimensionality reduction (e.g., PCA, UMAP, tSNE) and cell-type clustering enable researchers to pinpoint distinct cell populations within the sample. Cell-type annotation is typically achieved through reference-based methods [e.g., singleR (Aran et al. 2019), scPred (Alquicira-Hernandez et al. 2019), scClassify (Lin et al. 2020)], and leveraging known cell type markers identified by tools like Seurat and sctype. Additionally, popular analyses include differential cellular composition (DCC) and cell–cell interaction (CCI) studies. DCC analyses identify cell types with statistically significant changes in abundance across multiple experimental conditions. CCI provides insights into active regulatory networks within cell subpopulations, shedding light on the mechanisms driving cellular heterogeneity.

Cell barcode/UMI assignment for long-read scRNA-Seq

Recovering accurate cell barcodes and UMIs from single-cell ONT sequencing data poses a significant challenge when dealing with poor per-base sequencing accuracy, typically ranging from 87 to 95% (Philpott et al. 2021). This issue arises because short sequence tags, such as barcodes, are highly sensitive to sequencing errors (Philpott et al. 2021). Currently used strategies include (a) Iterative sequencing, for error reduction (Sameith et al. 2017), (b) Modifying the read structure to increase error tolerance (Philpott et al. 2021), and (c) Employing hybrid sequencing (short-read and long-read) techniques (Sović et al. 2016), where short reads used to create reference barcode sets for the extraction of cell barcodes and UMIs. For the creation of reference cell barcode list, a shallow short-read sequencing run is acceptable; however, highly saturated sequencing can be expensive, is necessary for the creation of a reference UMI list. Following are some computational methods developed to provide solution for this challenge.

SiCeLoRe (Single Cell long-read) is a set of tools for the highly multiplexed single-cell Nanopore or PacBIo long-read sequencing data that are used for cell barcode/UMI assignment and bioinformatics analysis (Lebrigand et al. 2020). The workflow incorporates several sequential steps for cell barcode and UMI assignment to long-read (guided by short-read data), transcript isoform identification, generation of molecules consensus sequences (UMI-guided error-correction), and production of [isoforms/junctions/SNPs x cells] count matrices for new modalities integration into standard single-cell RNA-Seq statistic.

Currently, BLAZE (You et al. 2023) can only recognize 10 × single-cell barcodes from nanopore scans. Although various LR single-cell techniques have been utilized to profile single cells with long-read, including scCOLOR-seq and R2C2, the 10 × chromium platform is the most accessible and well-liked. For Nanopore long-read, BLAZE displays an accurate single-cell barcode recognition tool. BLAZE can integrate downstream gene/isoform identification and quantification and that it performs well across a variety of data sets, read depths, and read accuracies. Importantly, BLAZE overcomes the requirement for extra matched short-read (SR) data, simplifying LR scRNA-Seq methods while drastically lowering cost.

ScNapBar (single-cell Nanopore barcode demultiplexer) demultiplexes Nanopore barcodes and is particularly suited for low-depth Illumina and Nanopore sequencing (Wang et al. 2021b). The high error rate of Nanopore reads poses a challenge for cell barcode assignment. This method propose a solution to this problem by using a hybrid sequencing approach on Nanopore and Illumina platforms.

Isoform and gene fusion analyses at single cell level

Several specialized tools have been developed to handle single-cell long-read data for isoform identification and downstream analysis. Examples include SiCeLoRe (Philpott et al. 2021), FLAMES (Holmqvist et al. 2021), and FLAIR (Tang et al. 2020). These tools are specifically designed to work with long-read data and facilitate isoform-level analysis. Additionally, tools like JAFFAL (Davidson et al. 2022) are dedicated to identifying gene fusions in single-cell long-read data. Other tools like scTagger and FLAIR are specialized in detecting RNA splicing events from single-cell long-read data.

Inferencing cell trajectories at isoform level

Data obtained using Long-read sequencing at single cell level provides a good representation of isoform diversity which can help to better infer cell trajectories at isoform level. Since many RNA velocity models relies on the assumption of constant splicing rate of pre-mRNAs over time, which may not be true for complex splicing patterns (Wu and Schmitz 2023). In context to long-read scRNA-Seq, there is a scope to develop tools that integrate the complex splicing mechanisms to construct comprehensive mathematical models of cell-fate determination. However, long-read based tools like FLAMES provide deeper insights into isoform expression dynamics and splicing at the single-cell level. Moreover, FLAMES offers the capability to detect mutations in single-cell long-read data. This tool, along with others as mentioned in Table 4, contributes to the comprehensive analysis of single-cell long-read data, enabling various functional insights.

Table 4 Bioinformatics tools developed for the analysis of long-read scRNA-Seq

Full size table

Prospects of long-read single cell sequencing

Long-read scRNA-Seq is undoubtedly poised to revolutionize the field of transcriptomics by providing data that can be used for a wide range of computational studies, including isoform and gene fusion analyses at the single-cell level. The exploration of isoform and alternative splicing events at the single-cell level is particularly intriguing, as this high-resolution data enables deep investigation of cell-to-cell variations and useful to study the role of alternative splicing in heterogenous population formation. Additionally, long-read scRNA-Seq has the potential to detect RNA modification at single cell level.

The development of novel long-read single-cell library preparation methods, along with precise bioinformatics tools and robust mathematical models, is imperative for the effective production, processing, and analysis of long-read scRNA-Seq data. These advances will enhance our ability to dissect complex biological mechanisms, including those underlying development and disease progression. This progress promises a comprehensive exploration of isoform diversity and alternative splicing, significantly expanding our understanding of cellular processes.

Data availability

There is no data associated with this manuscript.

Abbreviations

ONT:: Oxford Nanopore Technology
UMI:: Unique molecular identifiers
TSO:: Template switching oligo

References

Adewale BA (2020) Will long-read sequencing technologies replace short-read sequencing technologies in the next 10 years? African J Lab Med 9(1):1–5
Article Google Scholar
Adil A et al (2021) Single-cell transcriptomics: current methods and challenges in data acquisition and analysis. Front Neurosci 15:591122
Article PubMed PubMed Central Google Scholar
Al’Khafaji AM et al (2023) High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol 42(4):582–586
Article PubMed Google Scholar
Alquicira-Hernandez J, Sathe A, Ji HP, Nguyen Q, Powell JE (2019) scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol 20:1–17
Article Google Scholar
Amarasinghe SL et al (2020) Opportunities and challenges in long-read sequencing data analysis. Genome Biol 21(1):1–16
Article Google Scholar
Andrews TS et al (2022) Single-cell, single-nucleus, and spatial RNA sequencing of the human liver identifies cholangiocyte and mesenchymal heterogeneity. Hepatol Commun 6(4):821–840
Article CAS PubMed Google Scholar
Angerer P et al (2017) Single cells make big data: new challenges and opportunities in transcriptomics. Cur Opin Syst Biol 4:85–91
Article Google Scholar
Aran D, Looney AP, Liu L, Wu E, Fong V, Hsu A, Chak S, Naikawadi RP, Wolters PJ, Abate AR, Butte AJ, Bhattacharya M (2019) Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol 20:163–172. https://doi.org/10.1038/s41590-018-0276-y
Article CAS PubMed PubMed Central Google Scholar
Bakken TE et al (2018) Single-nucleus and single-cell transcriptomes compared in matched cortical cell types. PLoS ONE 13(12):e0209648
Article PubMed PubMed Central Google Scholar
Baran-Gale J, Chandra T, Kirschner K (2018) Experimental design for single-cell RNA sequencing. Brief Funct Genomics 17(4):233–239
Article CAS PubMed Google Scholar
Boldogkői Z et al (2019) Long-read sequencing–a powerful tool in viral transcriptome research. Trends Microbiol 27(7):578–592
Article PubMed Google Scholar
Bowden R et al (2019) Sequencing of human genomes with nanopore technology. Nat Commun 10(1):1869
Article PubMed PubMed Central Google Scholar
Burja B et al (2022) An optimized tissue dissociation protocol for single-cell RNA sequencing analysis of fresh and cultured human skin biopsies. Front Cell Dev Biol 10:872688
Article PubMed PubMed Central Google Scholar
Cavelier L et al (2015) Clonal distribution of BCR-ABL1 mutations and splice isoforms by single-molecule long-read RNA sequencing. BMC Cancer 15:1–12
Article Google Scholar
Chen Y, et al. (2021). A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. BioRxiv. p 2021.04. 21.440736.
Chung H, et al. (2022). SnFFPE-Seq: towards scalable single nucleus RNA-Seq of formalin-fixed paraffin-embedded (FFPE) tissue. bioRxiv. p 2022.08. 25.505257.
Cole C et al (2018) Tn5Prime, a Tn5 based 5′ capture method for single cell RNA-seq. Nucleic Acids Res 46(10):e62–e62
Article PubMed PubMed Central Google Scholar
Conesa A et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17(1):1–19
Google Scholar
Davidson NM et al (2022) JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biol 23(1):1–20
Article Google Scholar
De Paoli-Iseppi R, et al. (2024). Long-read sequencing reveals the RNA isoform repertoire of neuropsychiatric risk genes in human brain. medRxiv. p 2024.02. 22.24303189.
Denisenko E et al (2020) Systematic assessment of tissue dissociation and storage biases in single-cell and single-nucleus RNA-seq workflows. Genome Biol 21(1):1–25
Article Google Scholar
Denyer T, Timmermans MC (2022) Crafting a blueprint for single-cell RNA sequencing. Trends Plant Sci 27(1):92–103
Article CAS PubMed Google Scholar
Deshpande D et al (2023) RNA-seq data science: from raw data to effective interpretation. Front Genet 14:997383
Article CAS PubMed PubMed Central Google Scholar
Dobin A et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21
Article CAS PubMed Google Scholar
Dorney R et al (2023) Recent advances in cancer fusion transcript detection. Brief Bioinform 24(1):bbac519
Article PubMed Google Scholar
Ebrahimi G et al (2022) Fast and accurate matching of cellular barcodes across short-reads and long-reads of single-cell RNA-seq experiments. Iscience 25(7):104530
Article CAS PubMed PubMed Central Google Scholar
Fan X et al (2020) Single-cell RNA-seq analysis of mouse preimplantation embryos by third-generation sequencing. PLoS Biol 18(12):e3001017
Article CAS PubMed PubMed Central Google Scholar
Felton C, et al. (2022). Detection of alternative isoforms of gene fusions from long-read RNA-seq with FLAIR-fusion. bioRxiv. p 2022.08. 01.502364.
Fu S, Wang A, Au KF (2019) A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol 20:1–17
Article Google Scholar
Fu Y, et al. (2023). Single cell and spatial alternative splicing analysis with long read sequencing. bioRxiv. p 2023.02. 23.529769.
Furlan M et al (2021) Computational methods for RNA modification detection from nanopore direct RNA sequencing data. RNA Biol 18(sup1):31–40
Article CAS PubMed PubMed Central Google Scholar
Gamaarachchi H et al (2022) Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40(7):1026–1029
Article CAS PubMed PubMed Central Google Scholar
Gao XH et al (2020) Comparison of fresh frozen tissue with formalin-fixed paraffin-embedded tissue for mutation analysis using a multi-gene panel in patients with colorectal cancer. Front Oncol 10:310
Article PubMed PubMed Central Google Scholar
Garalde DR et al (2018) Highly parallel direct RNA sequencing on an array of nanopores. Nat Methods 15(3):201–206
Article CAS PubMed Google Scholar
Gilpatrick T et al (2020) Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol 38(4):433–438
Article CAS PubMed PubMed Central Google Scholar
Gross A et al (2015) Technologies for single-cell isolation. Int J Mol Sci 16(8):16897–16919
Article CAS PubMed PubMed Central Google Scholar
Gupta I et al (2018) Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat Biotechnol 36(12):1197–1202
Article CAS Google Scholar
Hazzard B et al (2022) Long read single cell RNA sequencing reveals the isoform diversity of Plasmodium vivax transcripts. PLoS Negl Trop Dis 16(12):e0010991
Article CAS PubMed PubMed Central Google Scholar
Healey HM, Bassham S, Cresko WA (2022) Single-cell Iso-Sequencing enables rapid genome annotation for scRNAseq analysis. Genetics 220(3):iyac017
Article PubMed PubMed Central Google Scholar
Heberle BA, et al. (2023). Using deep long-read RNAseq in Alzheimer’s disease brain to assess medical relevance of RNA isoform diversity. bioRxiv
Helm M, Motorin Y (2017) Detecting RNA modifications in the epitranscriptome: predict and validate. Nat Rev Genet 18(5):275–291
Article CAS PubMed Google Scholar
Holmqvist I et al (2021) FLAME: long-read bioinformatics tool for comprehensive spliceome characterization. RNA 27(10):1127–1139
Article CAS PubMed PubMed Central Google Scholar
Huang KK et al (2021) Long-read transcriptome sequencing reveals abundant promoter diversity in distinct molecular subtypes of gastric cancer. Genome Biol 22:1–24
Article Google Scholar
Hwang B, Lee JH, Bang D (2018) Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50(8):1–14
Article CAS PubMed PubMed Central Google Scholar
Jabbari J, Tian L (2019). Massively parallel long-read sequencing of single cell RNA isoforms. Protocols. Io.
Jovic D et al (2022) Single-cell RNA sequencing technologies and applications: A brief overview. Clin Transl Med 12(3):e694
Article CAS PubMed PubMed Central Google Scholar
Kaminow B, Yunusov D, Dobin A (2021). TARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. Biorxiv. p 2021.05. 05.442755.
Kim D et al (2019) Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37(8):907–915
Article CAS PubMed PubMed Central Google Scholar
Krishnaswami SR et al (2016) Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons. Nat Protoc 11(3):499–524
Article CAS PubMed PubMed Central Google Scholar
Križanović K et al (2018) Evaluation of tools for long read RNA-seq splice-aware alignment. Bioinformatics 34(5):748–754
Article PubMed Google Scholar
Lähnemann D et al (2020) Eleven grand challenges in single-cell data science. Genome Biol 21(1):1–35
Article Google Scholar
Lake BB et al (2016) Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. Science 352(6293):1586–1590
Article CAS PubMed PubMed Central Google Scholar
Lebrigand K et al (2020) High throughput error corrected Nanopore single cell transcriptome sequencing. Nat Commun 11(1):4025
Article CAS PubMed PubMed Central Google Scholar
Leger A et al (2021) RNA modifications detection by comparative Nanopore direct RNA sequencing. Nat Commun 12(1):7198
Article CAS PubMed PubMed Central Google Scholar
Li H (2021) New strategies to improve minimap2 alignment accuracy. Bioinformatics 37(23):4572–4574
Article CAS PubMed PubMed Central Google Scholar
Liao Y et al (2023) High-throughput and high-sensitivity full-length single-cell RNA-seq analysis on third-generation sequencing platform. Cell Discovery 9(1):5
Article CAS PubMed PubMed Central Google Scholar
Lin Y, Cao Y, Kim HJ, Salim A, Speed TP, Lin DM, Yang P, Yang JYH (2020) scClassify: sample size estimation and multiscale classification of cells using single and multiple reference. Mol Syst Biol 16(6):e9389
Article CAS PubMed PubMed Central Google Scholar
Logsdon GA, Vollger MR, Eichler EE (2020) Long-read human genome sequencing and its applications. Nat Rev Genet 21(10):597–614
Article CAS PubMed PubMed Central Google Scholar
Long Y et al (2021) FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants. Genome Biol 22:1–14
Article Google Scholar
MacKenzie M, Argyropoulos C (2023) An introduction to nanopore sequencing: past, present, and future considerations. Micromachines 14(2):459
Article PubMed PubMed Central Google Scholar
Mahmoud M et al (2019) Structural variant calling: the long and the short of it. Genome Biol 20:1–14
Article Google Scholar
Mantere T, Kersten S, Hoischen A (2019) Long-read sequencing emerging in medical genetics. Front Genet 10:426
Article CAS PubMed PubMed Central Google Scholar
Martelotto L (2019)Frankenstein’protocol for nuclei isolation from fresh and frozen tissue for snRNAseq27,2019
McClinton B et al (2023) Targeted nanopore sequencing enables complete characterisation of structural deletions initially identified using exon-based short-read sequencing strategies. Mol Genet Genomic Med 11(6):e2164
Article CAS PubMed PubMed Central Google Scholar
Melsted P et al (2021) Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat Biotechnol 39(7):813–818
Article CAS PubMed Google Scholar
Midha MK, Wu M, Chiu K-P (2019) Long-read sequencing in deciphering human genetics to a greater depth. Hum Genet 138(11):1201–1215
Article CAS PubMed Google Scholar
Milanez-Almeida P et al (2020) Cancer prognosis with shallow tumor RNA sequencing. Nat Med 26(2):188–192
Article CAS PubMed Google Scholar
Mock A et al (2023) Transcriptome profiling for precision cancer medicine using shallow nanopore cDNA sequencing. Sci Rep 13(1):2378
Article CAS PubMed PubMed Central Google Scholar
Moreno-Santillán DD et al (2019) De novo transcriptome assembly and functional annotation in five species of bats. Sci Rep 9(1):6222
Article PubMed PubMed Central Google Scholar
Nguyen QH et al (2018) Experimental considerations for single-cell RNA sequencing approaches. Front Cell Dev Biol 6:108
Article PubMed PubMed Central Google Scholar
Nip KM et al (2023) Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2. Nat Commun 14(1):2940
Article CAS PubMed PubMed Central Google Scholar
Oikonomopoulos S et al (2020) Methodologies for transcript profiling using long-read technologies. Front Genet 11:606
Article CAS PubMed PubMed Central Google Scholar
Page ML, et al. (2024). Surveying the landscape of RNA isoform diversity and expression across 9 GTEx tissues using long-read sequencing data. bioRxiv. p 2024.02. 13.579945.
Pardo-Palacios FJ et al (2024) SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat Methods. https://doi.org/10.1101/2023.05.17.541248
Article PubMed PubMed Central Google Scholar
Pardo-Palacios F, et al, (2021). Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. bioRxiv. p 2023.07.25.550582.
Payne A et al (2019) BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files. Bioinformatics 35(13):2193–2198
Article CAS PubMed Google Scholar
Philpott M et al (2021) Nanopore sequencing of single-cell transcriptomes with scCOLOR-seq. Nat Biotechnol 39(12):1517–1520
Article CAS PubMed PubMed Central Google Scholar
Picelli S (2017) Single-cell RNA-sequencing: the future of genome biology is now. RNA Biol 14(5):637–650
Article PubMed Google Scholar
Picelli S et al (2013) Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods 10(11):1096–1098
Article CAS PubMed Google Scholar
Picelli S et al (2014) Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res 24(12):2033–2040
Article CAS PubMed PubMed Central Google Scholar
Prawer YD et al (2023) Pervasive effects of RNA degradation on Nanopore direct RNA sequencing. NAR Genom Bioinform 5(2):lqad060
Article PubMed PubMed Central Google Scholar
Ramsköld D et al (2009) An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput Biol 5(12):e1000598
Article PubMed PubMed Central Google Scholar
Rao MS et al (2019) Comparison of RNA-Seq and microarray gene expression platforms for the toxicogenomic evaluation of liver from short-term rat toxicity studies. Front Genet 9:636
Article PubMed PubMed Central Google Scholar
Ray TA et al (2020) Comprehensive identification of mRNA isoforms reveals the diversity of neural cell-surface molecules with roles in retinal development and disease. Nat Commun 11(1):3328
Article CAS PubMed PubMed Central Google Scholar
Rebboah E et al (2021) Mapping and modeling the genomic basis of differential RNA isoform expression at single-cell resolution with LR-Split-seq. Genome Biol 22(1):1–28
Article Google Scholar
Rizzetto S et al (2017) Impact of sequencing depth and read length on single cell RNA sequencing data of T cells. Sci Rep 7(1):12781
Article PubMed PubMed Central Google Scholar
Rousselle TV et al (2022) An optimized protocol for single nuclei isolation from clinical biopsies for RNA-seq. Sci Rep 12(1):9851
Article CAS PubMed PubMed Central Google Scholar
Sameith K, Roscito JG, Hiller M (2017) Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly. Brief Bioinform 18(1):1–8
Article PubMed Google Scholar
Sant P, Rippe K, Mallm J-P (2023) Approaches for single-cell RNA sequencing across tissues and cell types. Transcription 14(3–5):127–145
Article CAS PubMed PubMed Central Google Scholar
Shi Z-X et al (2023) High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing. Nat Commun 14(1):2631
Article CAS PubMed PubMed Central Google Scholar
Shiau C-K et al (2023) High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors. Nat Commun 14(1):4124
Article CAS PubMed PubMed Central Google Scholar
Sims D et al (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15(2):121–132
Article CAS PubMed Google Scholar
Singh M et al (2019) High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nat Commun 10(1):3120
Article PubMed PubMed Central Google Scholar
Slyper M et al (2020) A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors. Nat Med 26(5):792–802
Article CAS PubMed PubMed Central Google Scholar
Smith AM et al (2019) Reading canonical and modified nucleobases in 16S ribosomal RNA using nanopore native RNA sequencing. PLoS ONE 14(5):e0216709
Article CAS PubMed PubMed Central Google Scholar
Sović I et al (2016) Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads. Bioinformatics 32(17):2582–2589
Article PubMed Google Scholar
Steijger T et al (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat Methods 10(12):1177–1184
Article CAS PubMed PubMed Central Google Scholar
Stephenson W et al (2022) Direct detection of RNA modifications and structure using single-molecule nanopore sequencing. Cell Genomics 2(2):100097
Article CAS PubMed PubMed Central Google Scholar
Stuart T et al (2019) Comprehensive integration of single-cell data. Cell 177(7):1888-1902. e21
Article CAS PubMed PubMed Central Google Scholar
Tang AD, Soulette CM, van Baren MJ, Hart K, Hrabeta-Robinson E, Wu CJ, Brooks AN (2020) Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat Commun 11(1):1438
Article CAS PubMed PubMed Central Google Scholar
Thijssen R et al (2022) Single-cell multiomics reveal the scale of multilayered adaptations enabling CLL relapse during venetoclax therapy. Blood J Am Soc Hematolgy 140(20):2127–2141
CAS Google Scholar
Thind AS et al (2021) Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform 22(6):bbab259
Article PubMed Google Scholar
Tian L et al (2021) Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. Genome Biol 22(1):1–24
Article CAS Google Scholar
Vallejo AF, et al. (2022). snPATHO-seq: unlocking the FFPE archives for single nucleus RNA profiling. bioRxiv. p 2022.08. 23.505054.
van Dijk EL et al (2023) Genomics in the long-read sequencing era. Trends Genet 39(9):649–671
Article PubMed Google Scholar
Veiga DF et al (2022) A comprehensive long-read isoform analysis platform and sequencing resource for breast cancer. Sci Adv 8(3):eabg6711
Article CAS PubMed PubMed Central Google Scholar
Volden R, Vollmers C (2022) Single-cell isoform analysis in human immune cells. Genome Biol 23(1):1–21
Article Google Scholar
Volden R et al (2018) Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc Natl Acad Sci 115(39):9726–9731
Article CAS PubMed PubMed Central Google Scholar
Volden R, et al. (2022). Identifying and quantifying isoforms from accurate full-length transcriptome sequencing reads with Mandalorion. bioRxiv. p 2022.06. 29.498139.
Wang Y et al (2021a) Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 39(11):1348–1365
Article CAS PubMed PubMed Central Google Scholar
Wang Q et al (2021b) Single-cell transcriptome sequencing on the Nanopore platform with ScNapBar. RNA 27(7):763–770
Article CAS PubMed PubMed Central Google Scholar
Workman RE et al (2019) Nanopore native RNA sequencing of a human poly (A) transcriptome. Nat Methods 16(12):1297–1305
Article CAS PubMed PubMed Central Google Scholar
Wright DJ et al (2022) Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes. BMC Genomics 23:1–12
Google Scholar
Wu S, Schmitz U (2023) Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 21:2373–2380
Article CAS PubMed PubMed Central Google Scholar
Wu H et al (2019) Advantages of single-nucleus over single-cell RNA sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis. J Am Soc Nephrol 30(1):23
Article CAS PubMed Google Scholar
Yang Y et al (2023) Single-cell long-read sequencing in human cerebral organoids uncovers cell-type-specific and autism-associated exons. Cell Rep 42(11):113335
Article CAS PubMed PubMed Central Google Scholar
Yin S et al (2021) SMIXnorm: Fast and Accurate RNA-Seq Data Normalization for Formalin-Fixed Paraffin-Embedded Samples. Front Genet 12:650795
Article CAS PubMed PubMed Central Google Scholar
You Y, Prawer YD, De Paoli-Iseppi R, Hunt CP, Parish CL, Shim H, Clark MB (2023) Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE. Genome Biol 24(1):66
Article CAS PubMed PubMed Central Google Scholar
Zhang MJ, Ntranos V, Tse D (2020) Determining sequencing depth in a single-cell RNA-seq experiment. Nat Commun 11(1):774
Article CAS PubMed PubMed Central Google Scholar
Zhou Y et al (2019) Isoform sequencing provides insight into natural genetic diversity in maize. Plant Biotechnol J 17(8):1473
Article PubMed PubMed Central Google Scholar
Zhu X et al (2020) The applications of nanopore sequencing technology in pathogenic microorganism detection. Can J Infect Dis Med Microbiol 2020:6675206
Article PubMed PubMed Central Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
Pallawi Kumari, Manmeet Kaur & Kiran Dindhoria
Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia
Bruce Ashford & Amarinder Singh Thind
Monash Biomedical Discovery Institute, Monash University, Clayton, VIC, 3800, Australia
Shanika L. Amarasinghe
Walter and Eliza Hall Institute of Medical Research, 1G, Royal Parade, Parkville, VIC, 3025, Australia
Shanika L. Amarasinghe
The School of Chemistry and Molecular Bioscience (SCMB), University of Wollongong, Loftus St, Wollongong, NSW, 2500, Australia
Amarinder Singh Thind

Authors

Pallawi Kumari
View author publications
You can also search for this author in PubMed Google Scholar
Manmeet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Kiran Dindhoria
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Ashford
View author publications
You can also search for this author in PubMed Google Scholar
Shanika L. Amarasinghe
View author publications
You can also search for this author in PubMed Google Scholar
Amarinder Singh Thind
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Author contributions: AST Conceived the original idea; Literature search: PK, MK, Drafting the original article: MK, PK, AST; Figure preparation: KD, MK, PK; Major editing: SLA, AST, BA; Critical revision of the article: SLA, AST, KD; Final editing and approval of the version to be published: All authors

Corresponding author

Correspondence to Amarinder Singh Thind.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kumari, P., Kaur, M., Dindhoria, K. et al. Advances in long-read single-cell transcriptomics. Hum. Genet. (2024). https://doi.org/10.1007/s00439-024-02678-x

Download citation

Received: 20 November 2023
Accepted: 07 May 2024
Published: 24 May 2024
DOI: https://doi.org/10.1007/s00439-024-02678-x

Advances in long-read single-cell transcriptomics

Abstract

Similar content being viewed by others