Molecular diagnostics of Mendelian disorders via combined DNA and RNA sequencing
- 171 Downloads
The diagnostic yield in rare disorders is currently less than 50% although sequencing technologies in use are able to detect the majority of possible variants in our genome. The diagnostic gap is in part due to limitations in prioritizing and interpreting identified variants. The integration of functional data, such as transcriptomics, is emerging as a powerful complementary tool in diagnostics. It is able to quantify aberrant splicing, validate nonsense-mediated mRNA decay for potential loss-of-function variants, identify mono-allelically expressed variants, and help prioritize variants not predicted to change the encoded protein. Moreover, RNA-sequencing has been validated as a tool for the discovery of pathogenic variants in novel Mendelian disease genes. As RNA sequencing provides complementary information to DNA sequencing and can easily be established in addition to DNA sequencing, it has great potential for implementation as a routine tool for improving molecular diagnosis.
KeywordsAberrant expression Mitochondrial disorders Aberrant splicing Whole-genome sequencing
Molekulare Diagnostik von monogenetischen Erkrankungen durch kombinierte DNA- und RNA-Sequenzierung
Die diagnostische Ausbeute bei seltenen Erkrankungen beträgt derzeit weniger als 50 %, obwohl die verfügbaren Sequenzierungstechnologien in der Lage sind, die meisten der möglichen DNA-Varianten in unserem Genom nachweisen zu können. Die diagnostische Lücke ist zum Teil auf Einschränkungen bei der Priorisierung und Interpretation identifizierter Varianten zurückzuführen. Die Integration funktionaler Daten, wie z. B. Transcriptomics, entwickelt sich zu einem leistungsfähigen ergänzenden Instrument in der Diagnostik. Die RNA-Sequenzierung ist in der Lage, anomales Splicing zu quantifizieren, „nonsense-mediated mRNA decay“ zu validieren, monoallelisch exprimierte Varianten zu identifizieren und bei der Priorisierung von Varianten zu helfen, von denen nicht erwartet wird, dass sie das kodierte Protein verändern. Darüber hinaus kann die systematische Analyse von RNA-Sequenzierungsdaten pathogene Varianten in Genen identifizieren, die noch nicht mit Erkrankungen beschrieben wurden. Da die RNA-Sequenzierung komplementäre Informationen zur DNA-Sequenzierung liefert und leicht über die DNA-Sequenzierung hinaus etabliert werden kann, bietet sie ein hohes Potenzial, als Routineinstrument zur Verbesserung der molekularen Diagnose eingesetzt zu werden.
SchlüsselwörterAberrante Genexpression Splicing Mitochondriale Erkrankung Gesamtgenomsequenzierung
Next-generation sequencing (NGS) has transformed diagnostic protocols for Mendelian diseases. Although in the past it could be a long, frustrating and often futile battle for parents with an affected child to find the cause of their child’s suffering, the availability of whole-exome sequencing (WES) and whole-genome sequencing (WGS) has made molecular diagnosis—at least conceptually—possible for every patient. Genetic confirmation of diagnosis can be key for treatment, removes uncertainty, and may be important for future family planning. However, this promise has not been fully kept. For mitochondrial and other diseases, the analysis of the coding sequence does not lead to a diagnosis in 50–75% of patients. This figure indicates that in numerous cases, the pathogenic variants escape detection, were detected but erroneously classified as a variant of uncertain significance (VUS), or were part of a more complex genetic constellation.
Limitations of DNA sequencing in diagnostics
Only 25–50% of patients receive a firm genetic diagnosis after WES, often because of limitations concerning the coverage of the target regions, the detection of intronic and regulatory variants, the bioinformatic filtering and prioritization of potential pathogenic variants, and knowledge about the molecular and clinical consequences of genetic variants. WGS improves the coverage and allows detection of extra-exonic variants and structural variants . When focusing on the coding region, WGS currently improves diagnostics of recessive disorders only marginally. When the search space is extended to the full genome, the currently most effective filter for minor allele frequency of 0.1–1.0% is not effective. In a single WES dataset, the frequency filter already yields on average 100–200 variants (25 bi-allelic) requiring manual interpretation. Outside the exome, the numbers of such variants are two orders of magnitudes higher . Moreover, although our understanding of coding variants is incomplete, our understanding of non-coding sequences is severely restricted. The capability of sequencing technology and bioinformatics tools are developing quickly and provide comprehensive genome annotation, much faster than our ability to define the clinically relevant impact of detected variants.
Limitations in DNA variant interpretation
A definitive diagnosis is based on the discovery of known pathogenic variant(s) in a patient with a specific clinical presentation similar to the clinical picture reported multiple times, usually listed in the disease-variant database ClinVar . However, this is not the common situation, neither on a variant nor on a phenotype level. We observe a continuous extension of the phenotypic spectrum associated with variants in the same gene or even the same variant. The increasing overlap of clinical presentations of genetically different disorders is additionally weakening the discriminating power of established genotype–phenotype associations. Identification of possible protein-truncating variants in genes for which non-truncating/in-frame pathogenic loss-of-function variants are known can already be challenging. If transcripts affected by such variants escape nonsense-mediated mRNA decay (NMD), they may still produce a functional protein by the mechanism of translation re-initiation , by functional alternative transcript isoforms , or by preserved/residual function of the truncated protein . A recent systematic study shows that exons present only in tissue specific isoforms may not be essential for protein function . In many cases, the candidate variant is even more difficult to interpret, such as rare missense, (near) splice site, intronic, and synonymous variants. Therefore, data describing the functional consequences on the molecular level are required to advance diagnostics.
The value of RNA sequencing in diagnostics
Transcriptomics by RNA sequencing (RNA-seq) takes advantage of new sequencing protocols and allows direct insights into the transcriptome of cell lines or tissues, reflecting a snapshot of a specific time point . With a focus on protein coding genes, the procedures usually include an enrichment step for full-length Poly(A) transcripts followed by cDNA synthesis and sequencing; however, many other protocols exist to analyze total RNA, circular RNA or micro RNA to name a few. RNA-seq of full-length mRNA has the capability to detect and quantify known pre-defined RNA species, in addition to rare and novel RNA transcript variants and isoforms . Hence, it uncovers the transcriptional consequences of genetic variants either previously prioritized or previously missed by the applied filters in the bioinformatics pipeline. RNA-seq provides a single assay to validate and quantify the impact of potential regulatory or splice defects for all genes expressed in a biological sample. Moreover, RNA-seq has been validated as a tool to indicate novel Mendelian disease genes through the identification of pathogenic variants in the respective genes . With a diagnostic yield between 10 and 35%, two recent studies convincingly demonstrated the power of combined DNA and RNA sequencing [4, 8]. Whereas Kremer et al. performed RNA-seq on fibroblast cell lines from patients with suspected mitochondrial disorders, Cummings studied muscle biopsy samples from patients with muscular disorders. In both cases, the tissue was carefully selected. More than 90% of the known mitochondrial disease genes were reliably detected in fibroblast cell lines, and muscular disease genes in muscle biopsies respectively. However, this is not applicable for the whole spectrum of tissues, e.g., in the usually available tissue, blood, only about two thirds of the known disease genes are expressed.
In addition to the validation of the impact of an identified VUS on the corresponding transcript, RNA-seq data can also be analyzed transcriptome-wide to detect aberrant gene expression. In such systematic analysis of RNA-seq data, searching for extremes (as detailed below) allows candidate disease-causing genes for rare disorders to be identified and prioritized. To focus on rare and recessive diseases, we applied stringent filtering for rare events with strong effect sizes, as described below.
Mono-allelic expression (MAE) is where one allele is silenced, leading to expression of only the second allele. When assuming a recessive mode of inheritance, genes with a single heterozygous rare coding variant identified by WES or WGS analysis are not prioritized . However, MAE of such variants fits the recessive mode of inheritance assumption. Detection of mono-allelic expression can thus help to re-prioritize heterozygous rare variants. Our setting is based on the use of fibroblast cell lines, where about 7500 heterozygous SNPs identified by genotyping are covered by RNA-seq reads at least ten times, allowing detection of alleles expressed by at least 90% . Six of the MAE alleles carry rare single-nucleotide variants (SNVs) affecting the protein sequence.
The small number of less than 20 aberrantly expressed genes per sample allows a manual inspection and evaluation of the RNA-seq data and improved clinical interpretation in the context of the genetic and clinical data.
The RNA-seq protocols and bioinformatics pipelines presently in use are focused on the gene level for expression outliers, on exon/intron or splice site level for aberrant splicing, and on SNPs for mono-allelic expression in a specific tissue or cell line. The development of long-read sequencing will also allow consideration of more complex situations in large genes with multiple transcript isoforms and single-cell RNA-seq protocols will increase the resolution of average expression level from a certain tissue to specific cell types and will allow the cell specific regulation and imprinting mechanism to be studied. However, the methods provide only a snapshot of the cells studied and the non-detection of aberrant expression in a surrogate tissue does not allow normal splicing in the affected tissue to be concluded, which represents a clear limitation. Currently, several RNA-seq analysis pipelines are available, but further improvement is necessary to optimize sensitivity and specificity. To automate and optimize the correction of confounding technical, environmental, or common genetic variations, we recently developed OUTRIDER. OUTRIDER improved the detection of aberrant expression, based on the assessment of statistical significance . Further method development is nevertheless required, especially for the detection of aberrant splicing events and the prediction of causal variants.
This diagnostic gap is in part due to limitations in prioritizing and interpreting identified variants
Transcriptomis by RNA sequencing provides complementary functional information to DNA sequencing
RNA sequencing delivers quantitative data on RNA expression level, aberrant splicing, and allele specific expression
The systematic analysis helps prioritizing variants predicted or not predicted to change the encoded protein
RNA sequencing has been validated as a tool for the discovery of pathogenic variants in novel Mendelian disease genes
RNA sequencing has a high potential to be implemented as a routine tool to improve molecular diagnosis.
I am grateful for the cooperation of the patients, clinicians, and scientists in the German and European networks for mitochondrial diseases: mitoNET (BMBF), GENOMIT (BMBF, Horizon2020). I would also like to thank the research team of Prof. Gagneur (Technical University, Munich), for collaborating on establishing the RNA analysis pipeline.
Compliance with ethical guidelines
Conflict of interest
H. Prokisch declares that he has no competing interests.
For this article no studies with human participants or animals were performed by any of the authors. All studies performed were in accordance with the ethical standards indicated in each case.
- 4.Cummings BB, Marshall JL, Tukiainen T, Lek M, Donkervoort S, Foley AR, Bolduc V, Waddell LB, Sandaradura SA, O’Grady GL, Estrella E, Reddy HM, Zhao F, Weisburd B, Karczewski KJ, O’Donnell-Luria AH, Birnbaum D, Sarkozy A, Hu Y, Gonorazky H, Claeys K, Joshi H, Bournazos A, Oates EC, Ghaoui R, Davis MR, Laing NG, Topf A, Genotype-Tissue Expression Consortium, Kang PB, Beggs AH, North KN, Straub V, Dowling JJ, Muntoni F, Clarke NF, Cooper ST, Bönnemann CG, MacArthur DG (2017) Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med 9(386). https://doi.org/10.1126/scitranslmed.aal5209 Google Scholar
- 5.Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA, Singer-Berk M, Mudge JM, Karjalainen J, Satterstrom KF, ODonnell-Luria A, Poterba T, Seed C, Solomonson M, Alfoldi J, The Genome Aggregation Database Production Team, The Genome Aggregation Database Consortium, Daly MJ, MacArthur DG (2019) Transcript expression-aware annotation improves rare variant discovery and interpretation. https://www.biorxiv.org/. https://doi.org/10.1101/554444 Google Scholar
- 7.Kremer LS, L’hermitte-Stead C, Lesimple P, Gilleron M, Filaut S, Jardel C, Haack TB, Strom TM, Meitinger T, Azzouz H, Tebib N, Ogier de Baulny H, Touati G, Prokisch H, Lombès A (2016) Severe respiratory complex III defect prevents liver adaptation to prolonged fasting. J Hepatol 65(2):377–385CrossRefGoogle Scholar
- 8.Kremer LS, Bader DM, Mertes C, Kopajtich R, Pichler G, Iuso A, Haack TB, Graf E, Schwarzmayr T, Terrile C, Koňaříková E, Repp B, Kastenmüller G, Adamski J, Lichtner P, Leonhardt C, Funalot B, Donati A, Tiranti V, Lombes A, Jardel C, Gläser D, Taylor RW, Ghezzi D, Mayr JA, Rötig A, Freisinger P, Distelmaier F, Strom TM, Meitinger T, Gagneur J, Prokisch H (2017) Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat Commun 8:15824CrossRefGoogle Scholar
- 9.Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W, Karapetyan K, Katz K, Liu C, Maddipatla Z, Malheiro A, McDaniel K, Ovetsky M, Riley G, Zhou G, Holmes JB, Kattman BL, Maglott DR (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46(D1):D1062–D1067CrossRefGoogle Scholar
- 10.Lee H, Deignan JL, Dorrani N, Strom SP, Kantarci S, Quintero-Rivera F, Das K, Toy T, Harry B, Yourshaw M, Fox M, Fogel BL, Martinez-Agosto JA, Wong DA, Chang VY, Shieh PB, Palmer CG, Dipple KM, Grody WW, Vilain E, Nelson SF (2014) Clinical exome sequencing for genetic identification of rare Mendelian disorders. JAMA 312(18):1880–1887CrossRefGoogle Scholar
- 12.Merker JD, Wenger AM, Sneddon T, Grove M, Zappala Z, Fresard L, Waggott D, Utiramerur S, Hou Y, Smith KS, Montgomery SB, Wheeler M, Buchan JG, Lambert CC, Eng KS, Hickey L, Korlach J, Ford J, Ashley EA (2018) Long-read genome sequencing identifies causal structural variation in a Mendelian disease. Genet Med 20(1):159–163CrossRefGoogle Scholar
- 13.Olsen RKJ, Koňaříková E, Giancaspero TA, Mosegaard S, Boczonadi V, Mataković L, Veauville-Merllié A, Terrile C, Schwarzmayr T, Haack TB, Auranen M, Leone P, Galluccio M, Imbard A, Gutierrez-Rios P, Palmfeldt J, Graf E, Vianey-Saban C, Oppenheim M, Schiff M, Pichard S, Rigal O, Pyle A, Chinnery PF, Konstantopoulou V, Möslinger D, Feichtinger RG, Talim B, Topaloglu H, Coskun T, Gucer S, Botta A, Pegoraro E, Malena A, Vergani L, Mazzà D, Zollino M, Ghezzi D, Acquaviva C, Tyni T, Boneh A, Meitinger T, Strom TM, Gregersen N, Mayr JA, Horvath R, Barile M, Prokisch H (2016) Riboflavin-Responsive and -Non-responsive Mutations in FAD Synthase Cause Multiple Acyl-CoA Dehydrogenase and Combined Respiratory-Chain Deficiency. Am J Hum Genet 98(6):1130–1145CrossRefGoogle Scholar
- 15.Stalke A, Pfister ED, Baumann U, Eilers M, Schäffer V, Illig T, Auber B, Schlegelberger B, Brackmann R, Prokisch H, Krooss S, Bohne J, Skawran B (2019) Homozygous frame shift variant in ATP7B exon 1 leads to bypass of nonsense-mediated mRNA decay and to a protein capable of copper export. Eur J Hum Genet. https://doi.org/10.1038/s41431-019-0345-1 Google Scholar
Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.