Abstract
Pre-mRNA alternative splicing is a prevalent mechanism for diversifying eukaryotic transcriptomes and proteomes. Regulated alternative splicing plays a role in many biological processes, and dysregulated alternative splicing is a feature of many human diseases. Short-read RNA sequencing (RNA-seq) is now the standard approach for transcriptome-wide analysis of alternative splicing. Since 2011, our laboratory has developed and maintained Replicate Multivariate Analysis of Transcript Splicing (rMATS), a computational tool for discovering and quantifying alternative splicing events from RNA-seq data. Here we provide a protocol for the contemporary version of rMATS, rMATS-turbo, a fast and scalable re-implementation that maintains the statistical framework and user interface of the original rMATS software, while incorporating a revamped computational workflow with a substantial improvement in speed and data storage efficiency. The rMATS-turbo software scales up to massive RNA-seq datasets with tens of thousands of samples. To illustrate the utility of rMATS-turbo, we describe two representative application scenarios. First, we describe a broadly applicable two-group comparison to identify differential alternative splicing events between two sample groups, including both annotated and novel alternative splicing events. Second, we describe a quantitative analysis of alternative splicing in a large-scale RNA-seq dataset (~1,000 samples), including the discovery of alternative splicing events associated with distinct cell states. We detail the workflow and features of rMATS-turbo that enable efficient parallel processing and analysis of large-scale RNA-seq datasets on a compute cluster. We anticipate that this protocol will help the broad user base of rMATS-turbo make the best use of this software for studying alternative splicing in diverse biological systems.
Key points
-
This protocol provides detailed guidelines for using rMATS-turbo, the latest implementation of the popular software for the discovery and quantification of alternative splicing events from RNA sequencing data. The software is exemplified in two representative scenarios.
-
rMATS-turbo incorporates a revamped computational workflow with a substantial improvement in speed and data storage efficiency. The software scales up to massive RNA sequencing datasets with tens of thousands of samples.
Similar content being viewed by others
Data availability
RNA-seq data for PC3E and GS689 cell lines (Procedure 1) and RNA-seq data for 1,019 CCLE human cancer cell lines (Procedure 2) can be downloaded from the SRA archive (https://www.ncbi.nlm.nih.gov/sra) under accessions BioProject PRJNA438990 and PRJNA523380, respectively. rMATS-turbo output files for both Procedure 1 and Procedure 2 datasets are available at https://doi.org/10.5281/zenodo.6647023 and other result files are provided in the companion GitHub repository of this protocol (https://github.com/Xinglab/rmats-turbo-tutorial63).
Code availability
rMATS-turbo is publicly available on GitHub (https://github.com/Xinglab/rmats-turbo) and Bioconda (https://anaconda.org/bioconda/rmats). rmats2sashimiplot is publicly available on GitHub (https://github.com/Xinglab/rmats2sashimiplot). Custom scripts for data analysis and visualization of the results generated in Procedures 1 and 2 are provided in the companion GitHub repository of this protocol (https://github.com/Xinglab/rmats-turbo-tutorial63).
References
Nilsen, T. W. & Graveley, B. R. Expansion of the eukaryotic proteome by alternative splicing. Nature 463, 457–463 (2010).
Sharp, P. A. Split genes and RNA splicing. Cell 77, 805–815 (1994).
Wang, Z. & Burge, C. B. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14, 802–813 (2008).
Fu, X. D. & Ares, M. Jr Context-dependent control of alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 15, 689–701 (2014).
Kalsotra, A. & Cooper, T. A. Functional consequences of developmentally regulated alternative splicing. Nat. Rev. Genet. 12, 715–729 (2011).
Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2016).
Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Park, E., Pan, Z., Zhang, Z., Lin, L. & Xing, Y. The expanding landscape of alternative splicing variation in human populations. Am. J. Hum. Genet. 102, 11–26 (2018).
Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Alamancos, G. P., Agirre, E. & Eyras, E. Methods to study splicing from high-throughput RNA sequencing data. Methods Mol. Biol. 1126, 357–397 (2014).
Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016).
Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631–656 (2019).
Pan, Y. et al. RNA dysregulation: an expanding source of cancer immunotherapy targets. Trends Pharmacol. Sci. 42, 268–282 (2021).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Shen, S. et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014).
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).
Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020).
Begg, B. E., Jens, M., Wang, P. Y., Minor, C. M. & Burge, C. B. Concentration-dependent splicing is enabled by Rbfox motifs of intermediate affinity. Nat. Struct. Mol. Biol. 27, 901–912 (2020).
Hu, X. et al. The RNA-binding protein AKAP8 suppresses tumor metastasis by antagonizing EMT-associated alternative splicing. Nat. Commun. 11, 486 (2020).
Jourdain, A. A. et al. Loss of LUC7L2 and U1 snRNP subunits shifts energy metabolism from glycolysis to OXPHOS. Mol. Cell 81, 1905–1919 e1912 (2021).
Leclair, N. K. et al. Poison exon splicing regulates a coordinated network of SR protein expression during differentiation and tumorigenesis. Mol. Cell 80, 648–665 e649 (2020).
Liu, W. et al. Ectopic targeting of CG DNA methylation in Arabidopsis with the bacterial SssI methyltransferase. Nat. Commun. 12, 3130 (2021).
Wang, L. et al. RALF1–FERONIA complex affects splicing dynamics to modulate stress responses and growth in plants. Sci. Adv. 6, eaaz1622 (2020).
Phillips, J. W. et al. Pathway-guided analysis identifies Myc-dependent alternative pre-mRNA splicing in aggressive prostate cancers. Proc. Natl Acad. Sci. USA 117, 5269–5279 (2020).
Wang, Y. et al. Role of Hakai in m(6)A modification pathway in Drosophila. Nat. Commun. 12, 2159 (2021).
Lau, E. et al. Splice-junction-based mapping of alternative isoforms in the human proteome. Cell Rep. 29, 3751–3765 e3755 (2019).
Daniels, N. J. et al. Functional analyses of human LUC7-like proteins involved in splicing regulation and myeloid neoplasms. Cell Rep. 35, 108989 (2021).
Zhang, Y. et al. Regional variation of splicing QTLs in human brain. Am. J. Hum. Genet. 107, 196–210 (2020).
Heber, S., Alekseyev, M., Sze, S. H., Tang, H. & Pevzner, P. A. Splicing graphs and EST assembly problem. Bioinformatics 18, S181–S188 (2002).
Xing, Y., Resch, A. & Lee, C. The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Res. 14, 426–441 (2004).
Rahman, M. A., Krainer, A. R. & Abdel-Wahab, O. SnapShot: splicing alterations in cancer. Cell 180, 208–208 e201 (2020).
Anczukow, O. & Krainer, A. R. Splicing-factor alterations in cancers. RNA 22, 1285–1301 (2016).
Mironov, A., Denisov, S., Gress, A., Kalinina, O. V. & Pervouchine, D. D. An extended catalogue of tandem alternative splice sites in human tissue transcriptomes. PLoS Comput. Biol. 17, e1008329 (2021).
Demirdjian, L. et al. Detecting allele-specific alternative splicing from population-scale RNA-seq data. Am. J. Hum. Genet. 107, 461–472 (2020).
Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).
Wu, J. et al. SpliceTrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics 27, 3010–3016 (2011).
Alamancos, G. P., Pages, A., Trincado, J. L., Bellora, N. & Eyras, E. Leveraging transcript quantification for fast computation of alternative splicing profiles. RNA 21, 1521–1531 (2015).
Trincado, J. L. et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19, 40 (2018).
Vaquero-Garcia, J. et al. RNA splicing analysis using heterogeneous and large RNA-seq datasets. Nat. Commun. 14, 1230 (2023).
Li, Y. I. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 50, 151–158 (2018).
Lin, K. T. & Krainer, A. R. PSI-Sigma: a comprehensive splicing-detection method for short-read and long-read RNA-seq analysis. Bioinformatics 35, 5048–5054 (2019).
Sterne-Weiler, T., Weatheritt, R. J., Best, A. J., Ha, K. C. H. & Blencowe, B. J. Efficient and accurate quantitative profiling of alternative splicing patterns of any complexity on a laptop. Mol. Cell 72, 187–200 e186 (2018).
Wang, Q. & Rio, D. C. JUM is a computational method for comprehensive annotation-free analysis of alternative pre-mRNA splicing patterns. Proc. Natl Acad. Sci. USA 115, E8181–E8190 (2018).
Mehmood, A. et al. Systematic evaluation of differential splicing tools for RNA-seq studies. Brief. Bioinform. 21, 2052–2065 (2020).
Muller, I. B. et al. Computational comparison of common event-based differential splicing tools: practical considerations for laboratory researchers. BMC Bioinform. 22, 347 (2021).
Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).
Byrne, A., Cole, C., Volden, R. & Vollmers, C. Realizing the potential of full-length transcriptome sequencing. Philos. Trans. R. Soc. Lond. B 374, 20190097 (2019).
Gao, Y. et al. ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Sci. Adv. 9, eabq5072 (2023).
Zhang, Z. et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat. Methods 16, 307–310 (2019).
Lu, Z. X. et al. Transcriptome-wide landscape of pre-mRNA alternative splicing associated with metastatic colonization. Mol. Cancer Res. 13, 305–318 (2015).
Yang, J. et al. Guidelines and definitions for research on epithelial–mesenchymal transition. Nat. Rev. Mol. Cell Biol. 21, 341–352 (2020).
Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).
Chakraborty, P., George, J. T., Tripathi, S., Levine, H. & Jolly, M. K. Comparative study of transcriptomics-based scoring metrics for the epithelial–hybrid–mesenchymal spectrum. Front. Bioeng. Biotechnol. 8, 220 (2020).
Tan, T. Z. et al. Epithelial–mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients. EMBO Mol. Med. 6, 1279–1293 (2014).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Veeneman, B. A., Shukla, S., Dhanasekaran, S. M., Chinnaiyan, A. M. & Nesvizhskii, A. I. Two-pass alignment improves novel splice junction quantification. Bioinformatics 32, 43–49 (2016).
Wang, Y. et al. rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data. rMATS-turbo-tutorial https://doi.org/10.5281/zenodo.7931186 (2023).
Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 33, 306–312 (2015).
Acknowledgements
This work was supported by National Institutes of Health grants R01GM088342 and R01GM117624.
Author information
Authors and Affiliations
Contributions
Y.X. conceived the study. Y.W., Z.X. and Y.X. designed the research and developed the methodology. Y.W., Z.X. and E.K. contributed to analytic tools. Y.W., E.K., J.I.A. and Y.X. analyzed the data. Y.W., K.E.K.-E. and Y.X. wrote the paper with input from all other authors.
Corresponding author
Ethics declarations
Competing interests
Y.X. is a scientific cofounder of Panorama Medicine. Y.X. and Z.X. receive licensing income for commercial usage of rMATS-turbo. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Protocols thanks Yiwen Chen and Julien Gagneur for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Phillips, J. W. et al. Proc. Natl Acad. Sci. USA 117, 5269–5279 (2020): https://doi.org/10.1073/pnas.1915975117
Shen, S. et al. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014): https://doi.org/10.1073/pnas.1419161111
Key data used in this protocol
Zhang, Z. et al. Nat. Methods 16, 307–310 (2019): https://doi.org/10.1038/s41592-019-0351-9
Supplementary information
Supplementary Information
Supplementary Figs. 1–3.
Supplementary Tables 1, 2
SRA accession numbers and sample information for examples 1 and 2.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Xie, Z., Kutschera, E. et al. rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data. Nat Protoc 19, 1083–1104 (2024). https://doi.org/10.1038/s41596-023-00944-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-023-00944-2
- Springer Nature Limited
This article is cited by
-
Loss-of-function mutation in PRMT9 causes abnormal synapse development by dysregulation of RNA alternative splicing
Nature Communications (2024)