Molecular diagnostics of Mendelian disorders via combined DNA and RNA sequencing

The diagnostic yield in rare disorders is currently less than 50% although sequencing technologies in use are able to detect the majority of possible variants in our genome. The diagnostic gap is in part due to limitations in prioritizing and interpreting identified variants. The integration of functional data, such as transcriptomics, is emerging as a powerful complementary tool in diagnostics. It is able to quantify aberrant splicing, validate nonsense-mediated mRNA decay for potential loss-of-function variants, identify mono-allelically expressed variants, and help prioritize variants not predicted to change the encoded protein. Moreover, RNA-sequencing has been validated as a tool for the discovery of pathogenic variants in novel Mendelian disease genes. As RNA sequencing provides complementary information to DNA sequencing and can easily be established in addition to DNA sequencing, it has great potential for implementation as a routine tool for improving molecular diagnosis.


Molecular diagnostics of Mendelian disorders via combined DNA and RNA sequencing
Next-generation sequencing (NGS) has transformed diagnostic protocols for Mendelian diseases. Although in the past it could be a long, frustrating and often futile battle for parents with an affected child to find the cause of their child's suffering, the availability of wholeexome sequencing (WES) and wholegenome sequencing (WGS) has made molecular diagnosis-at least conceptually-possible for every patient. Genetic confirmation of diagnosis can be key for treatment, removes uncertainty, and may be important for future family planning. However, this promise has not been fully kept. For mitochondrial and other diseases, the analysis of the coding sequence does not lead to a diagnosis in 50-75% of patients. This figure indicates that in numerous cases, the pathogenic variants escape detection, were detected but erroneously classified as a variant of uncertain significance (VUS), or were part of a more complex genetic constellation.

Limitations of DNA sequencing in diagnostics
Only 25-50% of patients receive a firm genetic diagnosis after WES, often because of limitations concerning the cov-erage of the target regions, the detection of intronic and regulatory variants, the bioinformatic filtering and prioritization of potential pathogenic variants, and knowledge about the molecular and clinical consequences of genetic variants. WGS improves the coverage and allows detection of extra-exonic variants and structural variants [12]. When focusing on the coding region, WGS currently improves diagnostics of recessive disorders only marginally. When the search space is extended to the full genome, the currently most effective filter for minor allele frequency of 0.1-1.0% is not effective. In a single WES dataset, the frequency fil-

Limitations in DNA variant interpretation
A definitive diagnosis is based on the discovery of known pathogenic variant(s) in a patient with a specific clinical presentation similar to the clinical picture reported multiple times, usually listed in the disease-variant database ClinVar [9]. However, this is not the common situation, neither on a variant nor on a phenotype level. We observe a continuous extension of the phenotypic spectrum associated with variants in the same gene or even the same variant. The increasing overlap of clinical presentations of genetically different disorders is additionally weakening the discriminating power of established genotype-phenotype associations. Identification of possible protein-truncating variants in genes for which nontruncating/in-frame pathogenic loss-offunction variants are known can already be challenging. If transcripts affected by such variants escape nonsense-mediated mRNA decay (NMD), they may still produce a functional protein by the mechanism of translation re-initiation [15], by functional alternative transcript isoforms [13], or by preserved/residual function of the truncated protein [7]. A recent systematic study shows that exons present only in tissue specific isoforms may not be essential for protein function [5]. In many cases, the candidate variant is even more difficult to interpret, such as rare missense, (near) splice site, intronic, and synonymous variants. Therefore, data describing the functional consequences on the molecular level are required to advance diagnostics.

The value of RNA sequencing in diagnostics
Transcriptomics by RNA sequencing (RNA-seq) takes advantage of new sequencing protocols and allows direct insights into the transcriptome of cell lines or tissues, reflecting a snapshot of a specific time point [11]. With a focus on protein coding genes, the procedures usually include an enrichment step for full-length Poly(A) transcripts followed by cDNA synthesis and sequencing; however, many other protocols exist to analyze total RNA, circular RNA or micro RNA to name a few. RNA-seq of full-length mRNA has the capability to detect and quantify known pre-defined RNA species, in addition to rare and novel RNA transcript variants and isoforms [3]. Hence, it uncovers the transcriptional consequences of genetic variants either previously prioritized or previously missed by the applied filters in the bioinformatics pipeline. RNAseq provides a single assay to validate and quantify the impact of potential regulatory or splice defects for all genes expressed in a biological sample. Moreover, RNA-seq has been validated as a tool to indicate novel Mendelian disease genes through the identification of pathogenic variants in the respective genes [8]. With a diagnostic yield between 10 and 35%, two recent studies convincingly demonstrated the power of combined DNA and RNA sequencing [4,8].

Schlüsselwörter
Aberrante Genexpression · Splicing · Mitochondriale Erkrankung · Gesamtgenomsequenzierung disorders, Cummings studied muscle biopsy samples from patients with muscular disorders. In both cases, the tissue was carefully selected. More than 90% of the known mitochondrial disease genes were reliably detected in fibroblast cell lines, and muscular disease genes in muscle biopsies respectively. However, this is not applicable for the whole spectrum of tissues, e.g., in the usually available tissue, blood, only about two thirds of the known disease genes are expressed.
RNA-seq data can be analyzed using gene-specific questions to refine transcript isoform annotation and to verify the consequence of a suspected variant on a specific transcript, thereby replacing quantitative RT-PCR and cDNA sequencing in a comprehensive assay including a number of controls. In cases where only the index cases is available, it enables haplotype phasing of two variants in different exons represented on continuous RNA reads. Examples were transcriptome analysis providing medizinische genetik 2 · 2019 193 complementary information to DNA sequencing, including three cases with non-pathogenic protein truncating variants (. Fig. 1). In cases of a homozygous frameshift mutation in exon 2 of ATP7B, we detected mRNA expression comparable with healthy controls, suggesting that NMD could be bypassed by the mechanism of translation re-initiation. This was confirmed by Western blot and functional tests of copper export capacity [15]. In another case, we identified bi-allelic frameshift variants in FLAD1, which encodes FAD synthase. Because FADS is essential for cellular supply of FAD cofactors, the finding of bi-allelic frameshift variants was unexpected. RNA-seq analysis discovered a novel FLAD1 isoform missing the affected exon, explaining why bi-allelic FLAD1 frameshift variants still harbor substantial FADS activity [13]. In a pa-tient with a mitochondrial disorder, we found homozygous, protein-truncating variants in LYRM7 and MTO1, two genes encoding essential mitochondrial proteins. Transcriptome and proteome studies confirmed normal expression of the truncated MTO1 and we did not find any indication of impaired MTO1 activity [7].
In addition to the validation of the impact of an identified VUS on the corresponding transcript, RNA-seq data can also be analyzed transcriptome-wide to detect aberrant gene expression. In such systematic analysis of RNA-seq data, searching for extremes (as detailed below) allows candidate disease-causing genes for rare disorders to be identified and prioritized. To focus on rare and recessive diseases, we applied stringent filtering for rare events with strong effect sizes, as described below.
Mono-allelic expression (MAE) is where one allele is silenced, leading to expression of only the second allele. When assuming a recessive mode of inheritance, genes with a single heterozygous rare coding variant identified by WES or WGS analysis are not prioritized [6]. However, MAE of such variants fits the recessive mode of inheritance assumption. Detection of mono-allelic expression can thus help to re-prioritize heterozygous rare variants. Our setting is based on the use of fibroblast cell lines, where about 7500 heterozygous SNPs identified by genotyping are covered by RNA-seq reads at least ten times, allowing detection of alleles expressed by at least 90% [8]. Six of the MAE alleles carry rare single-nucleotide variants (SNVs) affecting the protein sequence.
Aberrant expression, identified as gene expression outliers, occurs when expression is outside their physical range and usually implies impaired gene expression of both alleles with decreased expression levels of less than 50% of the controls. It can result from RNA degradation through nonsense-mediated decay (NMD) based on either apparently protein-truncating variants or splice defects, but it can also result from non-coding variants in regulatory regions such as promoters, enhancers, suppressors or variants in the untranslated region of the transcripts or combinations thereof. The genome-wide analysis reveals a median of only one aberrantly expressed gene per sample (. Fig. 2; [8]).
Aberrant splicing has been recognized as a major cause of Mendelian disorders for a long time [14]. A systematic study of SNVs in ClinVar predicted that 20 to 30% of VUS and pathogenic variants cause aberrant splicing patterns [10]. However, the prediction of splicing defects from genetic sequences is difficult, because splicing involves a complex set of cis-regulatory elements that are not yet fully understood. Some of them can have deep intronic location and are thus not covered by WES. Hence, direct probing of splice isoforms by RNA-seq is important, and has led to the discovery of multiple splicing defects based on single-gene studies. To detect aberrant splicing events, we adapted an algorithm for splicing quantitative trait loci to the context of rare disorders. This pipeline is based on an annotation-free algorithm that is also able to detect novel splice sites. A median of five aberrantly spliced genes are detected per sample [8]. Aberrant splicing is not only caused by variation affecting known splice sites or splice motifs, it can also be the consequence of variants creating novel splice sites or splice motifs within coding or deep-intronic regions (. Fig. 3). Splicing abnormalities include exon creation, skipping, extension and truncation, or a combination thereof, but also intron inclusion and often leads to premature in-frame stop codons, provoking degradation of the RNA by NMD, which may frequently be detected as aberrant expression. The RNA-seq data allow the characterization of all novel transcript isoforms. Quantification of the reads connecting the reference and aberrantly spliced exons may provide a direct readout of the DNA variant's consequences (. Fig. 4).
The small number of less than 20 aberrantly expressed genes per sample allows a manual inspection and evaluation of medizinische genetik 2 · 2019 195 the RNA-seq data and improved clinical interpretation in the context of the genetic and clinical data.
The RNA-seq protocols and bioinformatics pipelines presently in use are focused on the gene level for expression outliers, on exon/intron or splice site level for aberrant splicing, and on SNPs for monoallelic expression in a specific tissue or cell line. The development of long-read sequencing will also allow consideration of more complex situations in large genes with multiple transcript isoforms and single-cell RNA-seq protocols will increase the resolution of average expression level from a certain tissue to specific cell types and will allow the cell specific regulation and imprinting mechanism to be studied. However, the methods provide only a snapshot of the cells studied and the non-detection of aberrant expression in a surrogate tissue does not allow normal splicing in the affected tissue to be concluded, which represents a clear limitation. Currently, several RNA-seq analysis pipelines are available, but further improvement is necessary to optimize sensitivity and specificity. To automate and optimize the correction of confounding technical, environmental, or common genetic variations, we recently developed OUTRIDER. OUTRIDER improved the detection of aberrant expression, based onthe assessmentofstatistical significance [2]. Further method development is nevertheless required, especially for the detection of aberrant splicing events and the prediction of causal variants.

Practical conclusion
By integrating phenotype and genotype information only, less than 50% of Mendelian disorders are diagnosed 4 This diagnostic gap is in part due to limitations in prioritizing and interpreting identified variants 4 Transcriptomis by RNA sequencing provides complementary functional information to DNA sequencing 4 RNA sequencing delivers quantitative data on RNA expression level, aberrant splicing, and allele specific expression 4 The systematic analysis helps prioritizing variants predicted or not predicted to change the encoded protein 4 RNA sequencing has been validated as a tool for the discovery of pathogenic variants in novel Mendelian disease genes 4 RNA sequencing has a high potential to be implemented as a routine tool to improve molecular diagnosis.

Corresponding address
Holger Prokisch Institut für Humangenetik, Klinikum rechts der Isar, Technische Universität München Trogerstr. 32, 81675 Munich, Germany prokisch@helmholtzmuenchen.de Acknowledgements. I am grateful for the cooperation of the patients, clinicians, and scientists in the German and European networks for mitochondrial diseases: mitoNET (BMBF), GENOMIT (BMBF, Horizon2020). I would also like to thank the research team of Prof. Gagneur (Technical University, Munich), for collaborating on establishing the RNA analysis pipeline.

Compliance with ethical guidelines
Conflict of interest H. Prokisch declares that he has no competing interests.
For this article no studies with human participants or animals were performed by any of the authors. All studies performed were in accordance with the ethical standards indicated in each case.
Open Access. This article is distributedundertheterms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, providealinktotheCreativeCommons license, and indicate if changes were made.