Predicting Alternative Splicing

Part of the Methods in Molecular Biology book series (MIMB, volume 1126)


Alternative splicing of pre-mRNA is a complex process whose outcome depends on elements reviewed in the previous chapters such as the core spliceosome units, how the core spliceosome units interact between themselves and with other splicing enhancers and repressors, primary sequence motifs, and local RNA secondary structure. Connections between RNA splicing, transcription, and other processes have also been reviewed in the previous chapters. Splicing is inherently a stochastic process: Some defective transcripts are produced and handled by mechanisms such as nonsense-mediated decay (NMD), and studies report high variability at the transcript level between cells supposedly in similar states. Nonetheless, splicing is obviously not a random process: Many determinants of splicing regulation have been identified, and experimental measurements detect highly robust and conserved splicing changes between developmental stages and tissues. These observations naturally lead to the following questions: Can we devise a method that predicts given a cellular context and the primary transcript what would be the splicing outcome? What can such a method tell us about the underlying mechanisms that govern alternative splicing?

This chapter describes how these questions can be framed and addressed using machine-learning methodology. We describe how to extract putative RNA regulatory features from genomic sequence of exons and proximal introns, how to define target values based on experimental measurements of exon inclusion, how to learn a simple splicing model that optimizes the prediction the observed exon inclusion levels from the identified RNA features, and how to subsequently evaluate the learned model accuracy.

Key words

RNA Alternative splicing Machine learning Computational biology Posttranscriptional regulation 


  1. 1.
    Berget SM, Moore C, Sharp PA (1977) Spliced segments at the 5′terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci 74:3171–3175PubMedCentralPubMedCrossRefGoogle Scholar
  2. 2.
    Chow LT, Gelinas RE, Broker TR et al (1977) An amazing sequence arrangement at the 5? ends of adenovirus 2 messenger RNA. Cell 12:1–8PubMedCrossRefGoogle Scholar
  3. 3.
    Chen M, Manley JL (2009) Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nat Rev Mol Cell Biol 10:741–754PubMedCentralPubMedGoogle Scholar
  4. 4.
    Roca X, Krainer AR, Eperon IC (2013) Pick one, but be quick: 5′ splice sites and the problems of too many choices. Genes Dev 27:129–144PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Lim LP, Burge CB (2001) A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci 98:11193–11198PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Black DL (1995) Finding splice sites within a wilderness of RNA. RNA 1:763–771PubMedCentralPubMedGoogle Scholar
  7. 7.
    Yu Y, Maroney PA, Denker JA et al (2008) Dynamic regulation of alternative splicing by silencers that modulate 5′ splice site competition. Cell 135:1224–1236PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Stadler M, Shomron N, Yeo GW et al (2006) Inference of splicing regulatory activities by sequence neighborhood analysis. PLoS Genet 2:e191PubMedCentralPubMedCrossRefGoogle Scholar
  9. 9.
    Wang Y, Xiao X, Zhang J et al (2013) A complex network of factors with overlapping affinities represses splicing through intronic elements. Nat Struct Mol Biol 20:36–45PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Lam BJ, Hertel KJ (2002) A general role for splicing enhancers in exon definition. Rna 8:1233–1241PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Shepard PJ, Hertel KJ (2010) Embracing the complexity of pre-mRNA splicing. Cell Res 20:866–868PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Kalsotra A, Xiao X, Ward AJ et al (2008) A postnatal switch of CELF and MBNL proteins reprograms alternative splicing in the developing heart. Proc Natl Acad Sci 105:20333–20338PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Wang ET, Cody NAL, Jog S et al (2012) Transcriptome-wide regulation of Pre-mRNA splicing and mRNA localization by muscleblind proteins. Cell 150:710–724PubMedCentralPubMedCrossRefGoogle Scholar
  14. 14.
    Ray D, Kazan H, Chan ET et al (2009) Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat Biotechnol 27:667–670PubMedCrossRefGoogle Scholar
  15. 15.
    Xue Y, Zhou Y, Wu T et al (2009) Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol Cell 36:996–1006PubMedCentralPubMedCrossRefGoogle Scholar
  16. 16.
    Barash Y, Calarco JA, Gao W et al (2010) Deciphering the splicing code. Nature 465:53–59PubMedCrossRefGoogle Scholar
  17. 17.
    Robberson BL, Cote GJ, Berget SM (1990) Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol Cell Biol 10:84–94PubMedCentralPubMedGoogle Scholar
  18. 18.
    Wang Z, Burge CB (2008) Splicing regulation: From a parts list of regulatory elements to an integrated splicing code. RNA 14:802–813PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Black DL (1991) Does steric interference between splice sites block the splicing of a short c-src neuron-specific exon in non-neuronal cells? Genes Dev 5:389–402PubMedCrossRefGoogle Scholar
  20. 20.
    Yeo GW, Coufal NG, Liang TY et al (2009) An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol 16:130–137PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21.
    Zhang C, Zhang Z, Castle J et al (2008) Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev 22:2550–2563PubMedCentralPubMedCrossRefGoogle Scholar
  22. 22.
    Chan R, Black D (1997) The polypyrimidine tract binding protein binds upstream of neural cell-specific c-src exon N1 to repress the splicing of the intron downstream. Mol Cell Biol 17:4667–4676PubMedCentralPubMedGoogle Scholar
  23. 23.
    Markovtsov V, Nikolic J, Goldman J et al (1992) Activation of c-src neuron-specific splicing by an unusual RNA element in vivo and in vitro. Cell 69:795–807CrossRefGoogle Scholar
  24. 24.
    Rooke N, Markovtsov V, Cagavi E et al (2003) Roles for SR proteins and hnRNP A1 in the regulation of c-src exon N1. Mol Cell Biol 23:1874–1884PubMedCentralPubMedCrossRefGoogle Scholar
  25. 25.
    Yeo GW, Nostrand EL, Liang TY (2007) Discovery and analysis of evolutionarily conserved intronic splicing regulatory elements. PLoS Genet 3:e85PubMedCentralPubMedCrossRefGoogle Scholar
  26. 26.
    Castle JC, Zhang C, Shah JK et al (2008) Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat Genet 40(12):1416–1425PubMedCentralPubMedCrossRefGoogle Scholar
  27. 27.
    Ule J, Stefani G, Mele A et al (2006) An RNA map predicting Nova-dependent splicing regulation. Nature 444(7119):580–586PubMedCrossRefGoogle Scholar
  28. 28.
    Dror G, Sorek R, Shamir R (2005) Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21:897–901PubMedCrossRefGoogle Scholar
  29. 29.
    Xiong HY, Barash Y, Frey BJ (2011) Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics 27:2554–2562PubMedCrossRefGoogle Scholar
  30. 30.
    Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830Google Scholar
  31. 31.
    Giardine B, Riemer C, Hardison RC et al (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15:1451–1455PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11:377–394PubMedCrossRefGoogle Scholar
  33. 33.
    Grant GR, Farkas MH, Pizarro AD et al (2011) Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics 27:2518–2528PubMedCentralPubMedGoogle Scholar
  34. 34.
    Hastie T, Tibshirani R, Friedman JH (2003) The elements of statistical learning. Springer, New YorkGoogle Scholar
  35. 35.
    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232CrossRefGoogle Scholar
  36. 36.
    Barash Y, Elidan G, Kaplan T et al (2005) CIS: compound importance sampling method for protein–DNA binding site p-value estimation. Bioinformatics 21:596–600PubMedCrossRefGoogle Scholar
  37. 37.
    Barash Y, Elidan G, Friedman N et al (2003) Modeling dependencies in Protein–DNA binding sites. Proceedings of Seventh International Conference Res in Comp Mol Bio (RECOMB)Google Scholar
  38. 38.
    Sinha S, Tompa M (2000) A statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol 8:344–354PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2014

Authors and Affiliations

  1. 1.Department of GeneticsUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations