Introduction

Research over the past decade has illuminated the dramatic extent of variation in the human genome. Any two unrelated individuals have millions of DNA sequence changes that differ between them [1]. Amid this natural variation, individuals may carry certain sequence changes that cause or predispose them to monogenic hereditary disease. Identifying and correctly classifying these variants are central to clinical genetic testing.

In recent years, the ability to identify and classify disease-causing variants has dramatically improved owing to advances in next-generation sequencing (NGS) technologies; data from large-scale sequencing studies, such as those included in the Genome Aggregation Database (gnomAD); better tools for assessing the functional effects of variants; and advances in functional genomics, computational biology, and predictive algorithms. In addition, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) have provided useful guidelines for evaluating evidence for the clinical interpretation of DNA sequence variants [2]. As a result, diagnostic laboratories have been able to classify variants more consistently.

Even with these recent advances, the clinical significance of a large proportion of variants observed during genetic testing remains uncertain. These variant(s) of uncertain significance (VUS) are a class of variants for which data at the time of clinical testing are insufficient to confidently determine their relationship to human disease. In particular, the rarity of most VUS in the human population corresponds to limited clinical data about them and a low likelihood that additional evidence will become available, which means their significance may remain uncertain. Importantly, this uncertainty and its impact on medical decision making are points of frustration for both clinicians and patients, not only when test results are delivered but also during posttest genetic counseling.

DNA variants can lead to disease in several ways. Any variant that results in a premature termination codon, such as a nonsense or frameshift variant, can lead to nonsense-mediated mRNA decay and consequent reduction of protein synthesis. Alternatively, a missense variant or an in-frame insertion or deletion can lead to changes in the amino acid sequence of a protein. The difficulty of interpreting the clinical significance of these types of variants lies on a scale in which those with severe effects on protein synthesis are often easy to interpret while those that alter protein sequence can be very difficult [2]. Missense and in-frame insertions or deletions pose challenges because, although substantial progress is being made, we still have limited understanding of the relationships between changes in protein sequence and clinical phenotypes.

Variants that are predicted to disrupt splicing can also result in nonsense-mediated mRNA decay. These variants are moderately difficult to interpret because of the added challenge of determining their effects [2]. Some of the variants change the highly conserved canonical splice sites (typically GT and AG dinucleotides), which occupy the two nucleotide positions immediately flanking exons (Fig. 1a) and are almost universally expected to disrupt splicing. Other variants located at less-conserved nucleotides surrounding the consensus splice sites (e.g., the last nucleotide of the exon or + 3 to + 6 bases into the intron) may also disrupt mRNA splicing. However, because the likelihood of this is often difficult to estimate, the clinical significance of these variants is less certain, requiring additional evidence to clarify. Still other variants occurring outside the natural splice site may influence splicing or create new consensus splice sites that, if utilized by the splicing machinery, could lead to defective or lost proteins [3].

Fig. 1
figure 1

RNA analysis of a normal and b abnormal splicing. Highly conserved sequences in the introns (GT donor and AG acceptor sites) direct the splicing machinery to the exon-intron junctions, where introns are removed and exons are spliced together to form messenger RNA (mRNA). Cellular mRNA is extracted from patient tissue samples and processed in the laboratory to complementary DNA (cDNA) by reverse transcription. Amplification of the cDNA by polymerase chain reaction (PCR) allows the cDNA products to be visualized by agarose gel electrophoresis. The length of each cDNA fragment observed on the gel correlates with the length of the spliced mRNA, which in the normal splicing example a includes exons 1–4. In addition to providing information on mRNA size, the optical densities of the gel fragments provide semiquantitative information about the relative abundance of each piece of spliced mRNA. RNA-seq approaches leverage NGS technologies to provide both sequence information about the mRNA and quantification of alternative or abnormal splicing events. This can be achieved by aligning short overlapping sequence reads to the genomic reference sequence. The corresponding read depth for nucleotides across the gene is then plotted. Regions of the gene that are spliced out will have a significant decrease in sequencing read depth. The abnormal splicing examples b depict skipping of exon 2 caused by a variant in the GT donor site of intron 1 (top), inclusion of intron 1 caused by a variant in the AG acceptor site of intron 1 (middle), and inclusion of a pseudo-exon caused by the creation of a novel GT donor site of intron 2 (bottom). All examples have RT-PCR and RNA-seq results for normal samples and patients who are heterozygous or homozygous for a variant. Abbreviations: Nl = normal; Pt = patient; +/+ = normal genotype; var/+ = heterozygous for a variant; var/var = homozygous for a variant

In recent years, RNA analysis has proven to be a powerful approach for analyzing splicing alterations in patients with a suspected clinical diagnosis [4•, 5]. RNA sequencing (RNA-seq), which detects changes in splicing patterns and measures the expression of RNA transcripts, can be used in many cases to assess the functional impact of VUS and assist with variant classification. Beyond its use in analyzing sequence variants, RNA analysis has utility for interpreting structural aberrations that may have a predicted effect on a transcript produced from a disease gene [6]. In addition, the evaluation of mRNA profiles in blood or tissues can identify genetic variation not routinely observed by standard DNA sequencing techniques alone, thus expanding the use and utility of genetic testing [5].

This review will summarize the need for standardization of RNA-seq as this technology becomes more widely available. It will also address the RNA analysis methods commonly used to support the interpretation of clinical variants, circumstances in which these techniques should be considered as part of a patient’s testing, and considerations for pre- and posttest genetic counseling.

RNA Analysis Practices

Reverse-transcription polymerase chain reaction (RT-PCR) is a common method used for RNA analysis to allow for the detection and evaluation of the splicing effects of genomic variants (Fig. 1a). Variability of the RT-PCR protocol among laboratories has potential implications for the downstream splicing analysis. The Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) has evaluated the reliability and effectiveness of RT-PCR approaches among 23 laboratories in the network [7]. The study reviewed methodologies from the participating laboratories and assessed the consistency and reproducibility of results depending on the protocols used. Importantly, of nine BRCA1 or BRCA2 variants previously shown to be experimentally associated with splicing aberrations, less than half of the variant carriers tested by RT-PCR received unequivocal results confirming these aberrations. The remaining carriers, depending on the protocol, showed clearly aberrant results, although low-abundance transcripts were detected in both normal and affected individuals by some labs and not others, further complicating the interpretation of these results. This study illustrates that when designing RT-PCR strategies for evaluating the effects of DNA variants on splicing, having empirical knowledge about all potential transcripts is one important way to improve the process. However, questions will inherently arise around whether the presence of a low-abundance transcript is an artifact and whether the presence or absence of such a transcript is due to genetic variation, impacts normal protein function, or is associated with disease.

RNA-seq is an alternate technique for analyzing RNA that takes advantage of high-throughput NGS of short complementary DNA fragments generated from the RT-PCR process (Fig. 1a). Although quantitative RT-PCR can be used to measure gene expression, RNA-seq has the key advantage of both providing sequence information and measuring the quantity of novel and alternative transcripts in the cell [8, 9]. RNA-seq can be used to analyze the whole transcriptome or be targeted to analyze a subset of genes for specific splicing changes observed in a patient relative to control individuals. This is important for the interpretation and evaluation of DNA variants that may disrupt normal splicing [10•, 11]. However, a challenge with RNA-seq (as well as with quantitative RT-PCR) is that mRNA transcript profiles can vary among cell types and tissues. For example, a gene responsible for a neuromuscular disorder may not be expressed in blood or may have tissue-specific expression patterns. Therefore, the utilization of RNA-seq for neuromuscular disorders may require access to muscle tissue (rather than a blood sample) in order to obtain an appropriate readout of mRNA expression and alternative splicing patterns [10•]. Sample-type requirements associated with RNA-seq limit its scope and utility to only diseases for which representative tissue(s) are readily accessible or easy to biopsy.

Gene expression patterns may also change with age, so analysis of genes in a pediatric patient, for instance, may require a direct comparison with age-matched controls; however, further studies are needed to better understand the impact of age on the utility of RNA-seq analysis for diagnostic purposes [10•].

The ENIGMA study reviewed various splicing assay protocols, including RT-PCR and RNA-seq, and reported variability in both laboratory processes and findings across methods [7]. The multicenter investigation identified four instances in which laboratories produced inconsistent or conflicting findings when evaluating the same variants at splice sites. An investigation into these discrepancies revealed the need for standardized approaches to ensure consistency and reproducibility of all methods for RNA analysis [7]. While efforts to standardize RNA-seq are ongoing, laboratory technology is evolving and new methods, such as Clone-seq, are being introduced into clinical testing [12]. This underscores the importance of carefully establishing the validity, reliability, and reproducibility of all RNA-seq methodologies before using evidence from RNA analysis to classify splice-site variants in a clinical setting.

RNA Analysis for Variant Classification

Splice-Site Variants

Canonical splice acceptor and donor sites are highly conserved sequences that define exon-intron boundaries. Although it is widely understood that disrupting a canonical splice site most likely disrupts the normal splicing pattern for a transcript and thus impairs protein synthesis, this outcome is not universal or guaranteed [2]. Other possible effects of a disrupted canonical splice site include changes in alternative splicing, in-frame exon skipping, and short in-frame insertions or deletions in the messenger RNA (mRNA) (Fig. 1b) [13]. The interpretation of splice variants must take these effects and the molecular mechanism of the gene-disease relationship into consideration; therefore, observing a splicing change by RNA analysis is not always sufficient to conclude that a splice-site variant is pathogenic. As a result, even with RNA analysis, a canonical splice site variant may remain a VUS if the splicing effect is equivocal. Nevertheless, a subset of variants at canonical splice sites may clearly benefit from complementary RNA analysis to provide supporting evidence for clinical classification, particularly when complete loss of protein function is a known mechanism for a disease.

To determine that a variant at a canonical splice site truly results in a truncated or absent protein, the sequence of the mRNA associated with the DNA variant needs to be evaluated. Cautionary examples of misinterpretation of the significance of splice-site variants have recently emerged. For instance, Rosenthal et al. [14] described a retrospective analysis of the BRCA1 c.594-2A>C variant, which was initially presumed to result in a truncated or absent protein based on predicted molecular consequence. This variant was reevaluated because another variant identified within the same splice site had a frequency in the population that was much higher than would be expected based on the prevalence of hereditary breast cancer. The c.594-2A>C variant was also present in numerous families that had alternate explanations for their disease, and there was a lack of clear genotype-phenotype correlation in one case of Fanconi anemia. This conflicting evidence prompted an investigation into the splicing activity that occurs when the c.594-2 position in BRCA1 is altered from an A to a C. Using mRNA analysis, this variant was shown to result in in-frame skipping of exons 9 and 10, leading to an abnormal protein but not a complete loss of protein, as originally expected. This experiment provided the necessary evidence to justify reevaluating variants at this splice site and consider changing their classifications from likely pathogenic to VUS or likely benign [14].

Results from RNA analysis contribute one line of evidence that potentially resolves the significance of variants with ambiguous splicing effects when there is limited clinical information to suggest that they are benign or pathogenic. One study assessing the utility of RNA-seq analysis in patients undergoing genetic testing concluded that approximately 2.4% of individuals had an inconclusive result for a variant with a possible effect on splicing and therefore would benefit from further experimental evaluation to clarify the significance of the variant [15•]. However, this study originally (before incorporating evidence from RNA analysis) classified some variants as VUS without incorporating other applicable evidence criteria, resulting in classification discrepancies with public databases such as ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/). Current ACMG/AMP guidelines for variant interpretation incorporate evidence based on the predicted molecular consequences of canonical splice-site variants when loss of function is a known mutational mechanism for a specific gene-disease association [2]. Therefore, it is likely that a smaller percentage of variants than what was reported would benefit from follow-up RNA analysis (manuscript in preparation).

Structural Variants

Copy number gains (i.e., duplications) represent another category of variants that can directly impact the structure of a transcript produced from a disease gene. Subgenic or intragenic duplications in particular can disrupt the transcript reading frame, leading to an absent or disrupted protein. In one study, more than 80% of intergenic, multigene, and complex duplications studied using high-resolution comparative genomic hybridization and breakpoint sequencing were tandem duplications in direct orientation [6]. Although the tandem arrangement applies to a majority of duplications, in some cases, the duplicated material is actually located elsewhere in the genome, making it more difficult to predict its effect. Duplications can be further complicated by alternative transcripts, gene-fusion events, and inverted intergenic gene duplications. Under some circumstances, these events may result in haploinsufficiency.

Both mRNA and protein studies can help resolve the functional consequence of these kinds of copy number gains [5]. Depending on the size, location, and position of a copy number gain, the effect on the transcribed protein can vary greatly. RNA analysis can help determine whether the gain has a predicted effect on local gene expression through disruption of regulatory sequences, based on where and how the gain is inserted [16]. Therefore, the incorporation of RNA analysis has potential utility beyond splice-site variants, as the resolution of copy number gains can provide much needed clarity for structural variants.

RNA Analysis for Variant Discovery

In addition to being used to interpret splice-site or copy number variants, RNA analysis can help detect variants that lie in regions not typically covered by DNA analysis. This is especially relevant for patients who have not yet obtained a molecular diagnosis for a suspected genetic disorder through standard DNA analysis. Current DNA sequencing methods in clinical genetic testing do not typically detect deep intronic variants. This is because variants in these noncoding sequences are tolerated more than variants within exons; are less commonly reported in genetic disease; and, aside from those at canonical splice sites, are more difficult to interpret clinically [3, 16, 17]. However, deep intronic variants that affect transcript splicing and, consequently, protein structure or function can provide a molecular diagnosis in some unresolved cases [10•, 18, 19]. For example, in a recent study of patients with a clinical diagnosis of Cornelia de Lange syndrome (CdLS), RNA-seq identified splicing changes specific to patients, which helped the researchers pinpoint DNA variants in the NIPBL gene that were previously undetected by standard NGS approaches [18]. Additionally, a third molecularly undiagnosed patient from this study was found to harbor a missense variant in BRD4, which is known to interact with CdLS-related genes. This variant was identified through RNA-seq, which noted decreased gene expression. BRD4 was reported to be associated with CdLS in the literature based on other unrelated cases, only after the patient’s DNA and RNA had been analyzed [18, 20, 21].

Identification of deep intronic variants has also been informative for individuals with suspected autosomal recessive diseases. Bronstein et al. [22•] investigated a patient with retinal degeneration who had only a single pathogenic variant in the CNGB3 gene, detected by gene-panel and whole-exome sequencing. The CNGB3 gene is known to cause an autosomal recessive form of this retinal abnormality. Follow-up analysis with whole-genome sequencing, RNA-seq, and familial segregation analysis identified a deep intronic variant in the CNGB3 gene on the patient’s other chromosome, thus establishing a compound heterozygous molecular diagnosis. Further analysis of the deep intronic variant revealed an alternative splicing pattern resulting in the incorporation of a cryptic exon and consequent protein mislocalization [22•]. This additional functional evidence and the familial segregation of the two variants in affected siblings provided the necessary supportive evidence to definitively classify the deep intronic variant as pathogenic. Computational splice predictors alone would not have been sufficient to reveal the molecular explanation of disease in this family.

These case examples highlight the importance of searching comprehensively for variants when there is a strong clinical suspicion of genetic disease and an inconclusive molecular diagnosis from DNA sequencing. Although whole-genome or whole-exome sequencing can detect deep intronic variants, prioritizing candidate variants that may explain a patient’s phenotype can be computationally challenging without additional functional data that RNA-seq provides [23]. Therefore, combining large-scale DNA analysis with RNA-seq can increase diagnostic yield compared with either method alone [10•, 15, 18, 19, 23, 24•].

Implications for Genetic Counseling

At present and based on current approaches, RNA analysis should be considered a complementary or research-based analysis rather than a primary test when discussing testing options for patients [7, 24•]. Clinicians must consider how to select the appropriate patients for follow-up RNA analysis. Reduced disease penetrance, variable age of onset of clinical features, and limited familial structures are common confounding factors when trying to identify these patients. Additionally, standardized guidelines for RNA analysis in the context of diagnostic genetic testing are lacking, and selecting patients or laboratories without well-designed methods could increase the burden on posttest counseling. This is further complicated by the sample-type requirements for RNA analysis. The desired sample type may not be available or may require an additional medical procedure such as a biopsy, which decreases the frequency with which RNA analysis is conducted [5, 10•]. Interpretation and communication of the results of RNA analysis also contribute to the complexity of posttest counseling. For example, RNA analysis may identify a novel variant in a patient, but if the biological significance of the variant remains unclear, then a definitive molecular diagnosis cannot be made [5, 10•, 23]. This additional uncertainty can further complicate the delivery of results and increase frustration among clinicians and patients. The identification of novel pathogenic variants, however, uncovers a potential for achieving molecular diagnoses that could lead to medical management and future discussions about risk assessment. Before RNA analysis can be integrated more broadly into genetic testing, the assay used, the quality of data, and the interpretation of results must be standardized to ensure high clinical utility [5].

Genetic counseling for RNA analysis should consider all possible outcomes: RNA analysis can result in an upgrade of a VUS to a pathogenic classification, a downgrade of a VUS to a benign classification, or no change in classification [15•]. Communication regarding the best use of RNA analysis will be essential for ensuring that patients and clinicians are prepared for the results as methods evolve over time. A thorough understanding of which patients are eligible and which type of RNA analysis is most appropriate will be critical for decreasing the emotional burden associated with uncertain results.

Conclusion

Germline DNA testing is a well-established approach for identifying variants suspected of contributing to a molecular diagnosis as well as variants that may impact splicing. RNA analysis, which is already being integrated into clinical genetic testing, should be considered as a complementary test to clarify results from DNA testing. It can also serve as a supplemental test when a monogenic explanation is expected but DNA test results are insufficient to make a diagnosis. Its utility is highest for variants predicted or expected to disrupt splicing based on their location in a consensus splice site. RNA analysis can also be useful for unaffected individuals who are considered to have a predisposition to disease due to a variant that is expected to disrupt a canonical splice site. A direct and immediate benefit of RNA analysis is its ability to resolve VUS in conjunction with supportive or contradictory clinical evidence.

The potential of RNA analysis, however, does not come without challenges. Our current understanding of RNA profiles for given genes is continuously evolving, and further research is needed to fully appreciate and interpret variations detected by RNA analysis. Thus, a primary challenge will be identifying and triaging patients who would likely benefit from this analysis. Furthermore, the current lack of standardization around the laboratory processes and predictive analyses will need to be resolved before RNA analysis can be more widely implemented as a baseline test. Requiring clinicians to scrutinize the analytical and clinical validity of a testing methodology and to coordinate appropriate sample types will add an additional burden to pre- and posttest genetic counseling. Resolving these issues will improve the utility of RNA analysis in the context of genetic testing and will provide a path for the realization and maximization of its potential.

As standardization, reproducibility, and reliability of RNA sequencing and analysis increases, the scope of RNA analysis in clinical testing will broaden. Future routine uses of this analysis may include discovery of deep intronic variants and resolution of structural variants. RNA analysis will also facilitate computational targeting and filtering of expansive data from whole-genome sequencing. This will substantially reduce the burden associated with analyzing whole-genome sequences, which may lead to faster and more complete resolution for patients searching for a genetic diagnosis. The lessons learned from the integration and utilization of RNA analysis will also influence and direct the path toward a broader understanding of other analyses that complement DNA testing. This in turn will help uncover new opportunities for more comprehensively evaluating the human genome.