Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders

Rowlands, Charlie; Thomas, Huw B.; Lord, Jenny; Wai, Htoo A.; Arno, Gavin; Beaman, Glenda; Sergouniotis, Panagiotis; Gomes-Silva, Beatriz; Campbell, Christopher; Gossan, Nicole; Hardcastle, Claire; Webb, Kevin; O’Callaghan, Christopher; Hirst, Robert A.; Ramsden, Simon; Jones, Elizabeth; Clayton-Smith, Jill; Webster, Andrew R.; Douglas, Andrew G. L.; O’Keefe, Raymond T.; Newman, William G.; Baralle, Diana; Black, Graeme C. M.; Ellingford, Jamie M.

doi:10.1038/s41598-021-99747-2

Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders

Article
Open access
Published: 18 October 2021

Volume 11, article number 20607, (2021)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders

Download PDF

Charlie Rowlands^1,2,
Huw B. Thomas^1,2,
Jenny Lord³,
Htoo A. Wai³,
Gavin Arno^4,5,6,
Glenda Beaman^1,2,
Panagiotis Sergouniotis^1,2,
Beatriz Gomes-Silva²,
Christopher Campbell¹,
Nicole Gossan¹,
Claire Hardcastle¹,
Kevin Webb⁷,
Christopher O’Callaghan^8,9,
Robert A. Hirst⁹,
Simon Ramsden¹,
Elizabeth Jones¹,
Jill Clayton-Smith^1,2,
Andrew R. Webster^4,5,
Genomics England Research Consortium,
Andrew G. L. Douglas^3,10,
Raymond T. O’Keefe²,
William G. Newman^1,2,
Diana Baralle^3,10,
Graeme C. M. Black^1,2 &
…
Jamie M. Ellingford^1,2

4675 Accesses
33 Citations
17 Altmetric
Explore all metrics

Abstract

The development of computational methods to assess pathogenicity of pre-messenger RNA splicing variants is critical for diagnosis of human disease. We assessed the capability of eight algorithms, and a consensus approach, to prioritize 249 variants of uncertain significance (VUSs) that underwent splicing functional analyses. The capability of algorithms to differentiate VUSs away from the immediate splice site as being ‘pathogenic’ or ‘benign’ is likely to have substantial impact on diagnostic testing. We show that SpliceAI is the best single strategy in this regard, but that combined usage of tools using a weighted approach can increase accuracy further. We incorporated prioritization strategies alongside diagnostic testing for rare disorders. We show that 15% of 2783 referred individuals carry rare variants expected to impact splicing that were not initially identified as ‘pathogenic’ or ‘likely pathogenic’; one in five of these cases could lead to new or refined diagnoses.

A systematic analysis of splicing variants identifies new diagnoses in the 100,000 Genomes Project

Article Open access 26 July 2022

S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing

Article 25 February 2019

Predicting the impact of rare variants on RNA splicing in CAGI6

Article Open access 03 January 2024

Introduction

Pinpointing disease-causing genomic variation informs diagnosis, treatment and management for a wide range of rare disorders, and helps bring an end to the “diagnostic odyssey” undergone by some Mendelian disease patients. Molecular testing, in a healthcare setting, now frequently includes genome and exome sequencing^1,2,3. Accurate interpretation and categorization of identified variants remains a key limiting factor despite the availability of guidelines for variant analysis^4,5.

The capability to interpret variation within the non-coding genome is particularly challenging. Variant interpretation is hindered by the vast number of rare/novel non-coding variants identified in each individual^6,7, the depleted levels of evolutionary conservation within non-coding regions⁸, and our current lack of understanding of the motifs and interactions that are required for appropriate control of gene expression and regulation^9,10.

Intragenic genomic variants have the potential to impact splicing¹¹, the ubiquitous process in eukaryotic cells of converting nascent pre-messenger RNA (pre-mRNA) molecules into mature messenger RNA (mRNA) which can be transported out of the nucleus to provide a template for protein synthesis. Genomic variation in protein-coding, splice junction and intronic regions of genes can disrupt normal splicing mechanisms and underpin the onset of rare disease¹². Known mechanisms of splicing disruption include the introduction of cryptic splice sites, disruption of canonical splice acceptor and donor sites, and the disruption of other motifs essential for splicing, e.g. branchpoints and the polypyrimidine tract¹². The significant impact these events have on transcript and protein structure means such disruption is likely to be pathogenic when in transcripts of genes associated with loss-of-function disease mechanisms. This has already been observed in many disease types, for example in autism and intellectual disability¹³, and rare ophthalmic disorders¹⁴.

A number of computational tools have been developed to assist in the interpretation of genomic variation impacting splicing, and these tools have been expanded recently to include an array of machine learning tools that have been trained to prioritize splice disrupting variation through diverse means^{13,15,16,17,18}. Developing standards and recommendations for variants in non-coding regions is an important and emerging area for genome diagnostic services. However, in a similar manner to guidance for missense variants, in silico tools may be used as supporting evidence (PP3) to prioritize variants that impact splicing and can thereby assist in variant classification. While the initial reports of these in silico prioritization tools have shown promising results, there is yet to be a formal assessment of their integration, utilization and comparative performance in clinical environments.

The aim of this study was to compare the performance of nine in silico strategies, including eight state-of-the-art algorithms and a consensus approach, to prioritize variants impacting splicing. By applying these findings to known cohorts of variants identified through clinical testing, we aimed to identify the likely diagnostic benefit of routine integration of bioinformatics splicing predictions into diagnostic pipelines.

Results and discussion

Functional assessment of variants of uncertain significance identified through clinical genetic testing strategies

First, we ascertained and performed functional analyses for 249 VUSs to observe their impact on splicing (Supplementary Table S1). To the best of our knowledge, this is the largest set of VUSs that have been functionally interrogated for impact on splicing as part of diagnostic services for individuals with rare disease. All VUSs investigated are in genes where loss-of-function is an expected mechanism of disease causation. Variants had been identified in individuals undergoing genome sequencing and targeted gene panel analysis, with diverse phenotypic presentations including familial susceptibility to breast cancer (MIM #604370), syndromic disorders such as Marfan syndrome (MIM #154700) and isolated inherited retinal disorders such as retinitis pigmentosa (MIM #300029). The approaches for VUS functional analysis are described elsewhere¹⁹ and in the “Supporting Information”. We observed that 80/249 (32%) of the VUSs significantly impacted splicing, and as a result may be reclassified as ‘likely pathogenic’ according to ACMG guidelines for variant interpretation⁴. This formal reclassification is not conducted as part of this study which focused on the capability of in silico tools to distinguish variants which impact splicing (true positives) and variants which did not impact splicing (true negatives). All VUSs impacted regions outside of canonical splice acceptor and donor sites, and included examples of deeply intronic cryptic splice sites, exonic cryptic splice sites and branchpoint variants. In some cases, functional investigations demonstrated a range of consequences on mRNA splicing (Fig. 1), reinforcing that the precise effect of splicing variants is an important piece of evidence for consideration during clinical variant interpretation that, in the future, may enable refinements in appropriate targeted treatments^20,21.

Assessment of in silico prediction strategies to prioritize variants of uncertain significance

We obtained in silico prediction scores for each of the 249 functionally assessed variants using eight in silico prioritization algorithms (Supplementary Table S1) and calculated sensitivity, specificity and receiver operating characteristic area under the curve (AUC), observing significantly variable performances (Fig. 2). Pairwise statistical comparisons of AUC for the 249 functionally assessed VUSs, after Bonferroni correction for multiple testing, demonstrated that SpliceAI outperformed other single algorithm approaches (Fig. 2; Supplementary Table S2). The AUC analysis for single algorithms calculated the optimal score (based on Youden’s J statistic, as calculated using the pROC software package) for each of the algorithms to distinguish between true positives (80 variants shown to impact splicing in our functional assays) and true negatives (169 variants shown functionally not to impact splicing) in this dataset. We acknowledge that splicing machinery may be influenced by cell-/tissue-specific factors which are outside the scope of assays performed here^22,23,24, and variants may have pathogenic impacts on gene expression and/or regulation without any detrimental impact on splicing^8,25,26,27. Such factors will influence comparative metrics between algorithms, and future investigations may uncover pathogenic roles for variants reported here. However, the optimal thresholds calculated in light of these limitations for the 249 functionally assessed VUSs in this study are reported in Supplementary Table S3.

Global approaches to variant analysis, as assessed through the AUC, may fail to capture region-specific intricacies in splicing disruption¹⁶. For example, variants could be sub-divided by their pathogenic mechanism, their effect on pre-mRNA splicing, their predicted molecular consequence or the location of the variant with respect to known splicing motifs, and each of these sub-groups may require different approaches or thresholds for accurate prioritization of pathogenic variation. We therefore predicted variants to be ‘disruptive’ or ‘undisruptive’ according to thresholds pre-defined by the developers of the tools. This included region-specific thresholds for S-CAP and CADD, across six and five different regions, respectively, dependent on variant location in relation to its nearest exon (Supplementary Table S4, see Fig. 3C in Ref.¹⁶). These regions illustrate if a variant lies in the core splicing dinucleotides, the immediate vicinity of these sites, or at a greater distance. We utilized a single score threshold for tools where region-specific thresholds have not been previously identified (Supplementary Table S4). We compared accuracy of each of the prioritization strategies across 2000 iterations of sampling with replacement. This analysis highlighted differences across the tools and significantly differentiated their ability to accurately predict pathogenicity (Kruskal–Wallis, df = 8, p < 0.0001; Fig. 2c,d). Similar to the AUC analysis, SpliceAI (using a threshold of 0.2) was significantly the best performing strategy across all assessed single algorithms for our set of analyzed VUSs (Kruskal–Wallis, p < 0.0001 for all pairwise comparisons of accuracy between SpliceAI and other tools; Fig. 2c,d).

Combining in silico tools improves accuracy to identify variants of uncertain significance impacting splicing

To determine if combining one or more of these metrics could achieve greater accuracy than prioritization scores in isolation, we developed a consensus score for each variant which considered the region-specific thresholds for each tool (Supplementary Table S4). The score ranged from 0 to 8 and represented the number of tools for which a variant’s score exceeded the respective threshold. We observed that the consensus approach performed similarly to SpliceAI when assessed through the receiver operating characteristic AUC (Fig. 2; Supplementary Tables S2 and S3). The consensus approach (using a threshold of 4/8 algorithms supporting splicing disruption) also performed more similarly to SpliceAI than other strategies when measuring accuracy across sampling iterations (Fig. 2c), but was less frequently the best performing approach (Fig. 2d). Variability in model accuracy was consistently low across sampling iterations for all tools (Supplementary Table S5). To understand if the relative scores from each algorithm could assist interpretation we developed a novel metric which incorporates weighted scores from the prioritization strategies. This analysis considered the actual score of the variant relative to the maximum score possible from each prediction algorithm (see “Methods”). Of note, the weighted approach considering scores from SpliceAI and a consensus approach performs better than these two approaches in isolation (Fig. 2b; Supplementary Table S3). Although not mutually exclusive and underpowered to detect significant statistical differences in the AUC from this combined analysis—due to marginal gains in accuracy and sample size—this demonstrates the potential utility of combined approaches utilizing combinations of scores to improve accuracy for the identification of variants impacting splicing.

Integration of in silico strategies to prioritize variants impacting splicing for a large cohort of individuals with rare disease

Next, we sought to examine the impact of these approaches on clinical variant analysis. Therefore, we integrated region-specific prioritization strategies (Supplementary Table S4) into an accredited diagnostic service for 2783 individuals with rare diseases²⁸. All individuals included in this analysis have received genetic testing for rare disease within the UK National Healthcare Service through a clinically accredited laboratory. We calculated in silico scores for 20,617 variants (of which 18,013 were rare) observed a total of 1,346,744 times in the cohort. We observed substantial variability in the number of rare variants prioritized by each in silico tool (Fig. 3a; Supplementary Table S6) and in the specific variants prioritized by the most correlated in silico splicing tools (Fig. 3b). We observed that while variants which show the highest consensus between in silico splicing tools impact the canonical splice site (Fig. 3c; Supplementary Table S7), 99% (n = 17,871) of variants analyzed impact exonic or intronic regions of genes outside of the canonical splice sites. Splicing variants are often considered as a single class of variants and canonical splice site variants are therefore highly susceptible to over-prioritization by in silico tools, as such canonical splice site variants represent the majority (~ 70%) of known pathogenic splicing variants^12,18,29. Our data further underline the need to develop effective and unbiased strategies for prioritizing variants impacting splicing outside of the canonical splice sites, and this will be especially important for VUSs in known disease genes. Overall, these data demonstrate that different in silico strategies for splicing variant prioritization will alter the burden of variant analysis for clinical scientists. This is an important consideration for the analytical specificity and throughput of diagnostic testing.

To assess the clinical impact of such strategies, we integrated a single prioritization strategy, SpliceAI (using a threshold of 0.2, as above), in parallel to outcomes from routine diagnostic testing. This analysis involved extensive curation of genomic findings for the 2783 referred individuals, all of which were classified in accordance with ACMG guidelines by clinically accredited scientists. We added SpliceAI predictions alongside these analyses and observed that this approach influenced analysis for 420 (15%) individuals receiving genomic testing for rare disease, and prioritized variants that could result in new or refined molecular diagnoses in 81 (3%) cases. Overall, we prioritized 758 variants (528 unique variants) in 646 individuals (23% of cohort) with a range of predicted molecular consequences. Most (99.6%, 526/528) variants were prioritized by at least one other in silico tool (Supplementary Table S8). The strength of the score from SpliceAI correlated highly with prioritization from other in silico tools (Fig. 3d) and differed between regions of genome that were impacted (Supplementary Table S9). We defined prioritized variants as:

New: variant not previously highlighted or reported through diagnostic testing
Clarified: variant previously reported through diagnostic testing but pathogenicity or pathogenic mechanism was unclear
Reported: variant already described or established as ‘pathogenic’ or ‘likely pathogenic’ through diagnostic testing

In this regard, we identified 379 new variants in 337 individuals, 87 clarified variants in 83 individuals and 292 reported variants in 274 individuals. We found most (91%, 697/758) variants to be in genes known as a recessive cause of genetic disease. To understand if these variants impacted normal splicing, we interrogated the GTEx datasets³⁰ for individuals carrying these variants in a heterozygous state, identifying 40 carriers of variants prioritized by this analysis. Of these, 21 had suitable RNA-seq datasets available for evaluation, and we were able to clearly observe significant alterations to splicing in four cases (Table 1). Whilst most variants will require bespoke functional investigations to establish precise effects on splicing and protein synthesis, leveraging publicly available datasets for individuals carrying potentially pathogenic rare variants in the GTEx dataset can quickly increase certainty of variant impact and refine clinical variant analysis.

Table 1 Metrics obtained from the analysis of GTEx v7 datasets to observe the impact of variants prioritized as splice impacting.

Full size table

Conclusions

The incorporation of the prioritization and functional strategies described in this study for variants impacting splicing significantly improved molecular diagnostic services. However, we expect that the true impact of such analysis strategies will be more profound. Targeted next generation sequencing approaches employed within this large cohort ignore deeply intronic regions of genes, which, as shown here (Box 1, Case Example) and in other studies^31,32,33, can harbor variants which result in aberrant splicing through the production of novel cryptic exons. The recent availability of genomic datasets within healthcare amplifies the current limitations in interpreting variation within the non-coding genome, particularly in large genome sequencing cohorts. Our findings demonstrate the opportunity to expand bioinformatics analysis to the pre-mRNA regions of known disease genes and provide immediate increases to diagnostic yield. Further, a wide variety of bioinformatics prediction tools continue to be developed, as seen with the recent release of CADD-Splice³⁴, and SQUIRLS³⁵. As such tools continue to become available, careful analysis of their utility using a framework as described here will allow integration with maximum effect. Future approaches may expand on the consensus model described here through integration of probabilistic models, for example based on Bayesian statistics. Importantly, we demonstrate a requirement to functionally assess variant impact on pre-mRNA splicing as the delineation of the precise effects may be important in considerations for variant pathogenicity. The prioritization and identification of pathogenic variants impacting splicing is therefore an important consideration for diagnostic services and for the development of new targeted treatments.

Methods

Patient recruitment and genomic variant dataset generation

All individuals included in this study have provided informed written consent for the analysis of relevant disease-causing genes through tertiary healthcare centers within the UK. All genetic testing procedures have been approved by and are available through the UK National Health Service and were performed in a UK Accreditation Service Clinical Pathology Accredited Medical Laboratory (North West Genomic Laboratory Hub, Manchester, UK; ISO 15189:2012; UKAS Medical reference 9865). All data collected is part of routine clinical care and all investigations were conducted in accordance to the tenets of the Declaration of Helsinki. Analyses to improve genomic services, as reported in this study, have been approved by the North West Research Ethics Committee (11/NW/0421 and 15/YH/0365). Patients reported in individual case reports have provided informed written consent for publication. All individuals with genome sequencing datasets have consented through the Genomics England 100,000 Genomes Project.

Patients were identified with ‘variants of uncertain significance’ (VUSs) according to ACMG guidelines for variant interpretation⁴. Variants were generated through genome sequencing or gene panel sequencing (see “Methods”). All variants investigated are reported in Supplementary Table S1 and their HGVS cDNA nomenclature and genomic co-ordinates (GRCh37 and GRCh38) were validated using VariantValidator³⁶.

Whole genome sequencing

Whole genome sequencing datasets were created through the UK 100,000 Genomes Project³, using Illumina X10 sequencing chemistry. Sequencing reads were aligned to build GRCh37 of the human reference genome utilizing Isaac³⁷. Small variants were identified through Starline (SNVs and small indels ≤ 50 bp), and structural variants were identified utilizing Manta³⁸ and Canvas³⁹ (CNV Caller). Variants were annotated and analyzed with the Ensembl Variant Effect Predictor (v75), bcftools and bespoke Perl scripts within the Genomics England secure research embassy.

Gene panel sequencing

Enrichments were performed on DNA extracted from peripheral blood using Agilent SureSelect Custom Design target-enrichment kits (Agilent, Santa Clara, CA, USA). Enrichment kits were designed to capture known pathogenic intronic variants and the protein-coding regions ± 50 nucleotides of selected NCBI RefSeq transcripts; conditions tested included inherited retinal disease (105 genes or 176 genes), ophthalmic disorders (114 genes), cardiac disorders (72 genes comprised of ten sub-panels) and severe learning difficulties (82 genes). All genes tested and relevant testing strategies are available through the UK Genetic Testing Network (https://ukgtn.nhs.uk/). All samples included in the large cohort analysis were generated through a previously described methodology⁴⁰, and had been completed prior to August 2017. Briefly, samples were pooled and paired-end sequencing was performed using the manufacturer protocols for the Illumina HiSeq 2000/2500 platform (Illumina, Inc., San Diego, CA, USA). Sequencing reads were demultiplexed with CASAVA v.1.8.2. and aligned to the GRCh37 reference genome using Burrows-Wheeler Aligner short read (BWA-short v0.6.2)^41,42 software before duplicate reads were removed using samtools v0.1.18. The detection and clinical analysis of single nucleotide variants and small insertions/deletions was performed as described previously^40,43, and in accordance with ACMG guidelines for variant interpretation⁴. During variant analysis, we considered inheritance modes associated with monogenic disorders available in OMIM (https://omim.org/) or PanelApp (https://panelapp.genomicsengland.co.uk/), the zygosity of identified variants, additional variants identified to impact the same gene, phenotype-genotype correlations and scores determined by in silico splicing tools. We identified rare variants within our cohort for prioritization (< 20 heterozygous variants and < 10 homozygous variants) by each of the in silico splicing prediction tools, resulting in 18,013 unique variants and 43,744 total variants (42,281 het and 1463 hom). The region of impact for each rare variant was extracted from S-CAP pre-computed files where available¹⁶, or determined through Ensembl Variant Predictor (v75) for specified transcripts where unavailable through S-CAP.

In silico splicing prediction scores

We utilized scores available from CADD⁴⁴, SpliceAI¹³, SPIDEX¹⁸, S-CAP¹⁶, MMSplice¹⁷, TraP⁴⁵, KipoiSplice⁴⁶ and MaxEntScan⁴⁷ to prioritize the 249 variants (we noted on revision that one duplicate variant existed in our dataset). Where multiple scores were available for a variant from the in silico tool, we selected the highest for consideration. To enable comparisons of tool performance and correlation between scores, we converted negative values from SPIDEX, MaxEntScan and MMSplice to positive integers. Whilst these conversions removed directional impact information, i.e. reduced or increased splice site usage, they still reflected the absolute splicing impact of variants. Where scores were unavailable, we assigned the variant a score of 0, i.e. no impact could be predicted. Pre-defined thresholds were applied to determine whether a variant was ‘disruptive’ or ‘undisruptive’ to splicing, as suggested by the authors of the original papers^13,18, by recent refinements of thresholds¹⁶, or through nationally recommended guidelines (Supplementary Table S1). A consensus score was generated by considering whether the variant exceeded the threshold of each in silico prediction tool. ROC curves were generated and compared using the pROC package in R. A comparison of accuracy of the tools was performed through 2,000 iterations of sampling with replacement for the 249 samples. Statistical differences in accuracy were identified through the Kruskal–Wallis test in R.

A novel scaled metric was generated for each variant:

$$ score = \sum\limits_{{i = 1}}^{n} {{\raise0.7ex\hbox{${x_{i} }$} \!\mathord{\left/ {\vphantom {{x_{i} } {max_{i} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${max_{i} }$}}} $$

where, n = a given combination of the nine prediction strategies, max = maximum score from prediction tool i, and x = variant score from prediction tool i. For example, for a variant with a SpliceAI score of 0.85 (the maximum SpliceAI score being 1) and above the threshold of 5/8 tools using the consensus approach:

$$ score = {\raise0.7ex\hbox{${0.85}$} \!\mathord{\left/ {\vphantom {{0.85} 1}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$1$}} + {\raise0.7ex\hbox{$5$} \!\mathord{\left/ {\vphantom {5 8}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$8$}} = 1.475 $$

RNA investigations

Appropriate functional assays were selected after consideration of gene expression profiles in GTEx (https://gtexportal.org/home/), and the availability of relevant patient samples. We performed assessments on available patient samples or through cell-based minigene assays.

RNA investigations from patient samples—LCLs and blood

Lymphoblast cell cultures were established for control samples and probands. RNA was extracted using the RNeasy^® Mini Kit (Qiagen, UK, Catalogue No. 74104) following the manufacturer's protocol. RNA was extracted from whole-cell blood using the PAXgene™ Blood RNA System Kit (Qiagen, UK. Catalogue No. 762174), following the manufacturer’s protocol for control samples and probands. Extracted RNA was reverse transcribed using the High Capacity RNA to cDNA Kit (Applied Biosystems, UK. Catalogue No. 4387406) following the manufacturer's protocol. Gene specific primers (available on request) amplified relevant regions of the genes being investigated. PCR products were visualized on an agarose gel using a BioRad Universal Hood II and the Agilent 2200 Tapestation. Visualized bands were cut out and prepared for capillary sequencing on an ABI 3730xl DNA Analyzer.

RNA investigations using cell-based minigene assays

Assays were designed to amplify appropriate genomic regions from patient DNA templates. For variants nearby to wild-type exons, we amplified regions containing one or multiple exons along with flanking ~ 200 intronic nucleotides. For deeply intronic variants we amplified regions containing at least 500 bp of flanking intronic sequence. Primer sequences are available upon request. All regions were amplified from patient DNA templates. For homozygous variants, we also generated a minigene plasmid from a control DNA template. Amplified fragments were checked for size using gel electrophoresis, purified using the QIAquick Gel Extraction kit (Qiagen, UK, Catalogue No. 28706) and then cloned into a customized minigene plasmid (a derivative of the pSpliceExpress vector)⁴⁸ containing an RSV-promoter and two control exons (rat insulin exons 2 and 3) using the NEBuilder^® HiFi DNA assembly (NEB, E2621). Amplified fragments were inserted between the two control exons. Plasmids were transformed into competent bacteria (XL-1 blue) and incubated overnight at 37 °C on LB plates containing Carbenicillin. Individual colonies were cultured overnight before isolation of plasmid DNA using the GenElute™ miniprep kit (Sigma-Aldrich, Catalogue No. PLN350). Purified plasmids were Sanger sequenced to confirm successful cloning and identify plasmids containing the wild-type and variant sequence. Plasmids were transiently transfected into HEK-293 cells using Lipofectamine, and incubated for up to 48 h in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% fetal bovine serum at 37 °C and 5% CO₂.

RNA was isolated using TRI Reagent^® and further purified using the RNeasy Mini Kit (Qiagen, UK, Catalogue No. 74106) which included a DNase digestion step. cDNA was synthesized from up to 4 μg of purified RNA using SuperScript™ reverse transcriptase (ThermoFisher Scientific, Catalogue No. 18091200) and subsequently amplified by Phusion high-fidelity polymerase (ThermoFisher Scientific, Catalogue No. F553) using primers designed to amplify all minigene transcripts. PCR products were visualized by electrophoresis on a 1–2% agarose gel and purified using the QIAquick Gel Extraction kit. Purified PCR products were Sanger sequenced and aligned to the reference sequence for the minigene vector using the SnapGene software suite and assessed for differences in splicing between wild-type and variant minigene constructs.

Comparison with GTEx datasets

Variants identified in GTEx v7 datasets were cross-referenced with prioritized variants from our cohort. FASTQs were downloaded from the Database of Genotypes and Phenotypes (dbGaP) under the project accession phs000424.v8.p2 for GTEx control individuals carrying prioritized variants. RNA-seq datasets for samples carrying prioritized variants were identified, and the TPM value of the tissues available were considered. RNA-seq data from tissues with a TPM value > 5 were considered and FASTQ datasets were processed as described previously⁴⁹. Read alignments were visualized in IGV and Normalized Read Count (NRC) and intron retention levels were quantified. NRC is calculated as the proportional usage of non-canonical splice junctions compared to canonical splice junctions for any given site. NRC and intron retention levels for individuals carrying prioritized variants were compared to ten control individuals in the GTEx dataset.

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available.

References

Lee, K. et al. High diagnostic yield of whole exome sequencing in participants with retinal dystrophies in a clinical ophthalmology setting. Am. J. Ophthalmol. 160, 354–363. https://doi.org/10.1016/j.ajo.2015.04.026 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. JAMA https://doi.org/10.1001/jama.2014.14601 (2014).
Article PubMed PubMed Central Google Scholar
Turnbull, C. et al. The 100,000 Genomes Project: Bringing whole genome sequencing to the NHS. BMJ 361, k1687. https://doi.org/10.1136/bmj.k1687 (2018).
Article PubMed Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423. https://doi.org/10.1038/gim.2015.30 (2015).
Article PubMed PubMed Central Google Scholar
Tavtigian, S. V. et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 20, 1054–1060. https://doi.org/10.1038/gim.2017.210 (2018).
Article PubMed PubMed Central Google Scholar
Taylor, J. C. et al. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat. Genet. https://doi.org/10.1038/ng.3304 (2015).
Article PubMed PubMed Central Google Scholar
Ellingford, J. M. et al. Whole genome sequencing increases molecular diagnostic yield compared with current diagnostic testing for inherited retinal disease. Ophthalmology 123, 1143–1150. https://doi.org/10.1016/j.ophtha.2016.01.009 (2016).
Article PubMed Google Scholar
Short, P. J. et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature 555, 611–616. https://doi.org/10.1038/nature25983 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
ENCODE. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74. https://doi.org/10.1038/nature11247 (2012).
Article ADS CAS Google Scholar
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330. https://doi.org/10.1038/nature14248 (2015).
Article CAS PubMed PubMed Central Google Scholar
Faustino, N. A. & Cooper, T. A. Pre-mRNA splicing and human disease. Genes Dev. 17, 419–437. https://doi.org/10.1101/gad.1048803 (2003).
Article CAS PubMed Google Scholar
Stenson, P. D. et al. The Human Gene Mutation Database: Building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9. https://doi.org/10.1007/s00439-013-1358-4 (2014).
Article CAS PubMed Google Scholar
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535-548.e524. https://doi.org/10.1016/j.cell.2018.12.015 (2019).
Article CAS PubMed Google Scholar
Weisschuh, N., Buena-Atienza, E. & Wissinger, B. Splicing mutations in inherited retinal disease. Prog. Retin. Eye Res. 80, 100874. https://doi.org/10.1016/j.preteyeres.2020.100874 (2021).
Article CAS PubMed Google Scholar
Lee, M. et al. Systematic computational identification of variants that activate exonic and intronic cryptic splice sites. Am. J. Hum. Genet. 100, 751–765. https://doi.org/10.1016/j.ajhg.2017.04.001 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jagadeesh, K. A. et al. S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing. Nat. Genet. 51, 755–763. https://doi.org/10.1038/s41588-019-0348-4 (2019).
Article CAS PubMed Google Scholar
Cheng, J. et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20, 48. https://doi.org/10.1186/s13059-019-1653-z (2019).
Article PubMed PubMed Central Google Scholar
Xiong, H. Y. et al. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806. https://doi.org/10.1126/science.1254806 (2015).
Article CAS PubMed Google Scholar
Wai, H. A. et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet. Med. https://doi.org/10.1038/s41436-020-0766-9 (2020).
Article PubMed PubMed Central Google Scholar
Shen, X. & Corey, D. R. Chemistry, mechanism and clinical status of antisense oligonucleotides and duplex RNAs. Nucleic Acids Res. 46, 1584–1600. https://doi.org/10.1093/nar/gkx1239 (2018).
Article CAS PubMed Google Scholar
Bauwens, M. et al. ABCA4-associated disease as a model for missing heritability in autosomal recessive disorders: Novel noncoding splice, cis-regulatory, structural, and recurrent hypomorphic variants. Genet. Med. 21, 1761–1771. https://doi.org/10.1038/s41436-018-0420-y (2019).
Article CAS PubMed PubMed Central Google Scholar
Aicher, J. K., Jewell, P., Vaquero-Garcia, J., Barash, Y. & Bhoj, E. J. Mapping RNA splicing variations in clinically accessible and nonaccessible tissues to facilitate Mendelian disease diagnosis using RNA-seq. Genet. Med. 22, 1181–1190. https://doi.org/10.1038/s41436-020-0780-y (2020).
Article PubMed PubMed Central Google Scholar
Vig, A. et al. DYNC2H1 hypomorphic or retina-predominant variants cause nonsyndromic retinal degeneration. Genet. Med. https://doi.org/10.1038/s41436-020-0915-1 (2020).
Article PubMed PubMed Central Google Scholar
Cummings, B. B. et al. Transcript expression-aware annotation improves rare variant interpretation. Nature 581, 452–458. https://doi.org/10.1038/s41586-020-2329-2 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Evans, D. G. R. et al. A dominantly inherited 5’ UTR variant causing methylation-associated silencing of BRCA1 as a cause of breast and ovarian cancer. Am. J. Hum. Genet. 103, 213–220. https://doi.org/10.1016/j.ajhg.2018.07.002 (2018).
Article CAS PubMed PubMed Central Google Scholar
Castel, S. E. et al. Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nat. Genet. 50, 1327–1334. https://doi.org/10.1038/s41588-018-0192-y (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X., Wakeling, M., Ware, J. & Whiffin, N. Annotating high-impact 5’untranslated region variants with the UTRannotator. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa783 (2020).
Article PubMed PubMed Central Google Scholar
Ellingford, J. M. et al. Molecular findings from 537 individuals with inherited retinal disease. J. Med. Genet. https://doi.org/10.1136/jmedgenet-2016-103837 (2016).
Article PubMed Google Scholar
Krawczak, M. et al. Single base-pair substitutions in exon–intron junctions of human genes: Nature, distribution, and consequences for mRNA splicing. Hum. Mutat. 28, 150–158. https://doi.org/10.1002/humu.20400 (2007).
Article CAS PubMed Google Scholar
GTEx. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585. https://doi.org/10.1038/ng.2653 (2013).
Article CAS Google Scholar
Montalban, G. et al. Screening of BRCA1/2 deep intronic regions by targeted gene sequencing identifies the first germline BRCA1 variant causing pseudoexon activation in a patient with breast/ovarian cancer. J. Med. Genet. 56, 63–74. https://doi.org/10.1136/jmedgenet-2018-105606 (2019).
Article CAS PubMed Google Scholar
den Hollander, A. I. et al. Mutations in the CEP290 (NPHP6) gene are a frequent cause of Leber congenital amaurosis. Am. J. Hum. Genet. 79, 556–561. https://doi.org/10.1086/507318 (2006).
Article Google Scholar
Sangermano, R. et al. Deep-intronic ABCA4 variants explain missing heritability in Stargardt disease and allow correction of splice defects by antisense oligonucleotides. Genet. Med. 21, 1751–1760. https://doi.org/10.1038/s41436-018-0414-9 (2019).
Article CAS PubMed PubMed Central Google Scholar
Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. CADD-Splice - improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31. https://doi.org/10.1186/s13073-021-00835-9 (2021).
Article CAS PubMed PubMed Central Google Scholar
Danis, D. et al. Interpretable prioritization of splice variants in diagnostic next-generation sequencing. Am. J. Hum. Genet. 108, 1564–1577. https://doi.org/10.1016/j.ajhg.2021.06.014 (2021).
Article CAS Google Scholar
Freeman, P. J., Hart, R. K., Gretton, L. J., Brookes, A. J. & Dalgleish, R. VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions. Hum. Mutat. 39, 61–68. https://doi.org/10.1002/humu.23348 (2018).
Article PubMed Google Scholar
Raczy, C. et al. Isaac: Ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043. https://doi.org/10.1093/bioinformatics/btt314 (2013).
Article CAS PubMed Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222. https://doi.org/10.1093/bioinformatics/btv710 (2016).
Article CAS PubMed Google Scholar
Roller, E., Ivakhno, S., Lee, S., Royce, T. & Tanner, S. Canvas: Versatile and scalable detection of copy number variants. bioRxiv https://doi.org/10.1101/036194 (2016).
Article Google Scholar
Ellingford, J. M. et al. Molecular findings from 537 individuals with inherited retinal disease. J. Med. Genet. 53, 761–767. https://doi.org/10.1136/jmedgenet-2016-103837 (2016).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. https://doi.org/10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. (2013). arXiv preprint arXiv:1303.3997.
Gillespie, R. L. et al. Personalized diagnosis and management of congenital cataract by next-generation sequencing. Ophthalmology 121, 2124-U2302. https://doi.org/10.1016/j.ophtha.2014.06.006 (2014).
Article PubMed Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310. https://doi.org/10.1038/ng.2892 (2014).
Article CAS PubMed PubMed Central Google Scholar
Gelfman, S. et al. Annotating pathogenic non-coding variants in genic regions. Nat. Commun. 8, 236. https://doi.org/10.1038/s41467-017-00141-2 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Avsec, Ž et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600. https://doi.org/10.1038/s41587-019-0140-0 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394. https://doi.org/10.1089/1066527041410418 (2004).
Article CAS PubMed Google Scholar
Kishore, S., Khanna, A. & Stamm, S. Rapid generation of splicing reporters with pSpliceExpress. Gene 427, 104–110. https://doi.org/10.1016/j.gene.2008.09.021 (2008).
Article CAS PubMed PubMed Central Google Scholar
Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. https://doi.org/10.1126/scitranslmed.aal5209 (2017).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. We acknowledge funding from the Wellcome Trust Transforming Genomic Medicine Initiative (200990/Z/16/Z), and personal support from Health Education England (Ellingford), Medical Research Council (Rowlands) and the Wellcome Trust (Baralle, RP-2016-07-011).

Author information

A list of authors and their affiliations appears at the end of the paper.

Authors and Affiliations

North West Genomic Laboratory Hub, Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, St Mary’s Hospital, Manchester, UK
Charlie Rowlands, Huw B. Thomas, Glenda Beaman, Panagiotis Sergouniotis, Christopher Campbell, Nicole Gossan, Claire Hardcastle, Simon Ramsden, Elizabeth Jones, Jill Clayton-Smith, William G. Newman, Graeme C. M. Black & Jamie M. Ellingford
Division of Evolution and Genomic Sciences, Neuroscience and Mental Health Domain, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
Charlie Rowlands, Huw B. Thomas, Glenda Beaman, Panagiotis Sergouniotis, Beatriz Gomes-Silva, Jill Clayton-Smith, Raymond T. O’Keefe, William G. Newman, Graeme C. M. Black & Jamie M. Ellingford
Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
Jenny Lord, Htoo A. Wai, Andrew G. L. Douglas & Diana Baralle
Institute of Ophthalmology, UCL, London, UK
Gavin Arno & Andrew R. Webster
Moorfields Eye Hospital NHS Foundation Trust, London, UK
Gavin Arno & Andrew R. Webster
Great Ormond Street Hospital NHS Foundation Trust, London, UK
Gavin Arno
Manchester Adult Cystic Fibrosis Centre, Manchester University Hospitals NHS Foundation Trust, Manchester, UK
Kevin Webb
Respiratory, Critical Care and Anaesthesia, UCL Great Ormond Street Institute of Child Health & Great Ormond Street Children’s Hospital & NIHR Great Ormond Street Hospital Biomedical Research Centre, London, UK
Christopher O’Callaghan
Centre for PCD Diagnosis and Research, Department of Infection, Immunity and Inflammation, RKCSB, University of Leicester, Leicester, UK
Christopher O’Callaghan & Robert A. Hirst
Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, UK
Andrew G. L. Douglas & Diana Baralle
Genomics England, London, UK
J. C. Ambrose, P. Arumugam, R. Bevers, M. Bleda, F. Boardman-Pretty, C. R. Boustred, H. Brittain, M. J. Caulfield, G. C. Chan, T. Fowler, A. Giess, A. Hamblin, S. Henderson, T. J. P. Hubbard, R. Jackson, L. J. Jones, D. Kasperaviciute, M. Kayikci, A. Kousathanas, L. Lahnstein, S. E. A. Leigh, I. U. S. Leong, F. J. Lopez, F. Maleady-Crowe, M. McEntagart, F. Minneci, L. Moutsianas, M. Mueller, N. Murugaesu, A. C. Need, P. O’Donovan, C. A. Odhams, C. Patch, D. Perez-Gil, M. B. Pereira, J. Pullinger, T. Rahim, A. Rendon, T. Rogers, K. Savage, K. Sawant, R. H. Scott, A. Siddiq, A. Sieghart, S. C. Smith, A. Sosinsky, A. Stuckey, M. Tanguy, A. L. Taylor Tavares, E. R. A. Thomas, S. R. Thompson, A. Tucci, M. J. Welland, E. Williams, K. Witkowsa & S. M. Wood
William Harvey Research Institute, Queen Mary University of London, London, EC1M 6BQ, UK
F. Boardman-Pretty, M. J. Caulfield, S. Henderson, L. J. Jones, D. Kasperaviciute, L. Moutsianas, M. Mueller, A. C. Need, C. Patch, A. Sosinsky, E. R. A. Thomas, A. Tucci, K. Witkowsa & S. M. Wood

Authors

Charlie Rowlands
View author publications
You can also search for this author in PubMed Google Scholar
Huw B. Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Lord
View author publications
You can also search for this author in PubMed Google Scholar
Htoo A. Wai
View author publications
You can also search for this author in PubMed Google Scholar
Gavin Arno
View author publications
You can also search for this author in PubMed Google Scholar
Glenda Beaman
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Sergouniotis
View author publications
You can also search for this author in PubMed Google Scholar
Beatriz Gomes-Silva
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Gossan
View author publications
You can also search for this author in PubMed Google Scholar
Claire Hardcastle
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Webb
View author publications
You can also search for this author in PubMed Google Scholar
Christopher O’Callaghan
View author publications
You can also search for this author in PubMed Google Scholar
Robert A. Hirst
View author publications
You can also search for this author in PubMed Google Scholar
Simon Ramsden
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Jones
View author publications
You can also search for this author in PubMed Google Scholar
Jill Clayton-Smith
View author publications
You can also search for this author in PubMed Google Scholar
Andrew R. Webster
View author publications
You can also search for this author in PubMed Google Scholar
Andrew G. L. Douglas
View author publications
You can also search for this author in PubMed Google Scholar
Raymond T. O’Keefe
View author publications
You can also search for this author in PubMed Google Scholar
William G. Newman
View author publications
You can also search for this author in PubMed Google Scholar
Diana Baralle
View author publications
You can also search for this author in PubMed Google Scholar
Graeme C. M. Black
View author publications
You can also search for this author in PubMed Google Scholar
Jamie M. Ellingford
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

Genomics England Research Consortium

J. C. Ambrose
, P. Arumugam
, R. Bevers
, M. Bleda
, F. Boardman-Pretty
, C. R. Boustred
, H. Brittain
, M. J. Caulfield
, G. C. Chan
, T. Fowler
, A. Giess
, A. Hamblin
, S. Henderson
, T. J. P. Hubbard
, R. Jackson
, L. J. Jones
, D. Kasperaviciute
, M. Kayikci
, A. Kousathanas
, L. Lahnstein
, S. E. A. Leigh
, I. U. S. Leong
, F. J. Lopez
, F. Maleady-Crowe
, M. McEntagart
, F. Minneci
, L. Moutsianas
, M. Mueller
, N. Murugaesu
, A. C. Need
, P. O’Donovan
, C. A. Odhams
, C. Patch
, D. Perez-Gil
, M. B. Pereira
, J. Pullinger
, T. Rahim
, A. Rendon
, T. Rogers
, K. Savage
, K. Sawant
, R. H. Scott
, A. Siddiq
, A. Sieghart
, S. C. Smith
, A. Sosinsky
, A. Stuckey
, M. Tanguy
, A. L. Taylor Tavares
, E. R. A. Thomas
, S. R. Thompson
, A. Tucci
, M. J. Welland
, E. Williams
, K. Witkowsa
& S. M. Wood

Contributions

Design and coordination of the study: C.F.R., J.M.E., G.C.M.B., D.B., W.G.N., R.T.O. Contributing genetic and/or phenotypic data: all authors. Writing of the manuscript: C.F.R. G.C.M.B. & J.M.E. Revision of the manuscript with important intellectual content: all authors.

Corresponding authors

Correspondence to Graeme C. M. Black or Jamie M. Ellingford.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rowlands, C., Thomas, H.B., Lord, J. et al. Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders. Sci Rep 11, 20607 (2021). https://doi.org/10.1038/s41598-021-99747-2

Download citation

Received: 09 March 2021
Accepted: 13 September 2021
Published: 18 October 2021
DOI: https://doi.org/10.1038/s41598-021-99747-2
Springer Nature Limited

This article is cited by

Combining a prioritization strategy and functional studies nominates 5’UTR variants underlying inherited retinal disease
- Alfredo Dueñas Rey
- Marta del Pozo Valero
- Frauke Coppieters
Genome Medicine (2024)
Benchmarking splice variant prediction algorithms using massively parallel splicing assays
- Cathy Smith
- Jacob O. Kitzman
Genome Biology (2023)
Systematic analysis of CNGA3 splice variants identifies different mechanisms of aberrant splicing
- Peggy Reuter
- Magdalena Walter
- Nicole Weisschuh
Scientific Reports (2023)
Aberrant splicing caused by exonic single nucleotide variants positioned 2nd or 3rd to the last nucleotide in the COL4A5 gene
- Eri Okada
- Yuya Aoto
- Kandai Nozu
Clinical and Experimental Nephrology (2023)
SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation
- Jean-Madeleine de Sainte Agathe
- Mathilde Filser
- David Baux
Human Genomics (2023)

Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders

Abstract

Similar content being viewed by others

Introduction

Results and discussion

Functional assessment of variants of uncertain significance identified through clinical genetic testing strategies

Assessment of in silico prediction strategies to prioritize variants of uncertain significance

Combining in silico tools improves accuracy to identify variants of uncertain significance impacting splicing

Integration of in silico strategies to prioritize variants impacting splicing for a large cohort of individuals with rare disease

Conclusions

Methods

Patient recruitment and genomic variant dataset generation

Whole genome sequencing

Gene panel sequencing

In silico splicing prediction scores

RNA investigations

RNA investigations from patient samples—LCLs and blood

RNA investigations using cell-based minigene assays

Comparison with GTEx datasets

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

Genomics England Research Consortium

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation