Skip to main content
Log in

Incorporating significant amino acid pairs and protein domains to predict RNA splicing-related proteins with functional roles

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Machinery of pre-mRNA splicing is carried out through the interaction of RNA sequence elements and a variety of RNA splicing-related proteins (SRPs) (e.g. spliceosome and splicing factors). Alternative splicing, which is an important post-transcriptional regulation in eukaryotes, gives rise to multiple mature mRNA isoforms, which encodes proteins with functional diversities. However, the regulation of RNA splicing is not yet fully elucidated, partly because SRPs have not yet been exhaustively identified and the experimental identification is labor-intensive. Therefore, we are motivated to design a new method for identifying SRPs with their functional roles in the regulation of RNA splicing. The experimentally verified SRPs were manually curated from research articles. According to the functional annotation of Splicing Related Gene Database, the collected SRPs were further categorized into four functional groups including small nuclear Ribonucleoprotein, Splicing Factor, Splicing Regulation Factor and Novel Spliceosome Protein. The composition of amino acid pairs indicates that there are remarkable differences among four functional groups of SRPs. Then, support vector machines (SVMs) were utilized to learn the predictive models for identifying SRPs as well as their functional roles. The cross-validation evaluation presents that the SVM models trained with significant amino acid pairs and functional domains could provide a better predictive performance. In addition, the independent testing demonstrates that the proposed method could accurately identify SRPs in mammals/plants as well as effectively distinguish between SRPs and RNA-binding proteins. This investigation provides a practical means to identifying potential SRPs and a perspective for exploring the regulation of RNA splicing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Jurica MS, Moore MJ (2003) Mol Cell 12:5

    Article  CAS  Google Scholar 

  2. Zahler AM, Lane WS, Stolk JA, Roth MB (1992) Genes Dev 6:837

    Article  CAS  Google Scholar 

  3. Keren H, Lev-Maor G, Ast G (2010) Nat Rev Genet 11:345

    Article  CAS  Google Scholar 

  4. Hui JY (2009) Sci China Ser C Life Sci 52:253

    Article  CAS  Google Scholar 

  5. Hsu JBK, Bretana NA, Lee TY, Huang HD (2011) Plos One 6:e27567

  6. Wang ET, Sandberg R, Luo SJ, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB (2008) Nature 456:470

    Article  CAS  Google Scholar 

  7. Johnson JM, Castle J, Garrett-Engele P, Kan ZY, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD (2003) Science 302:2141

    Article  CAS  Google Scholar 

  8. Chen L, Zheng SK (2009) Genome Biol 10:R3

  9. Ben-Dov C, Hartmann B, Lundgren J, Valcarcel J (2008) J Biol Chem 283:1229

    Article  CAS  Google Scholar 

  10. Grabowski PJ, Black DL (2001) Prog Neurobiol 65:289

    Article  CAS  Google Scholar 

  11. Barbosa-Morais NL, Carmo-Fonseca M, Aparicio S (2006) Genome Res 16:66

    Article  CAS  Google Scholar 

  12. Reed R (2000) Curr Opin Cell Biol 12:340

    Article  CAS  Google Scholar 

  13. Patel AA, Steitz JA (2003) Nat Rev Mol Cell Biol 4:960

    Article  CAS  Google Scholar 

  14. Johnson PJ (2002) Proc Natl Acad Sci USA 99:3359

    Article  Google Scholar 

  15. Wahl MC, Will CL, Luhrmann R (2009) Cell 136:701

    Article  CAS  Google Scholar 

  16. Stamm S, Ben-Ari S, Rafalska I, Tang YS, Zhang ZY, Toiber D, Thanaraj TA, Soreq H (2005) Gene 344:1

    Article  CAS  Google Scholar 

  17. Matlin AJ, Clark F, Smith CWJ (2005) Nat Rev Mol Cell Biol 6:386

    Article  CAS  Google Scholar 

  18. Cartegni L, Chew SL, Krainer AR (2002) Nat Rev Genet 3:285

    Article  CAS  Google Scholar 

  19. Maniatis T, Tasic B (2002) Nature 418:236

    Article  CAS  Google Scholar 

  20. Smith CWJ, Valcarcel J (2000) Trends Biochem Sci 25:381

    Article  CAS  Google Scholar 

  21. Black DL (2003) Ann Rev Biochem 72:291

    Article  CAS  Google Scholar 

  22. Paz I, Akerman M, Dror I, Kosti I, Mandel-Gutfreund Y (2010) Nucleic Acids Res 38:W281

    Article  CAS  Google Scholar 

  23. Wang BB, Brendel V (2004) Genome Biol 5:R102

  24. Mueller WF, Hertel KJ (2011) Landes Bioscience and Springer Science+Business Media

  25. Long JC, Caceres JF (2009) Biochem J 417:15

    Article  CAS  Google Scholar 

  26. Cazalla D, Newton K, Caceres JF (2005) Mol Cell Biol 25:2969

    Article  CAS  Google Scholar 

  27. Stojdl DF, Bell JC (1999) Biochem Cell Biol 77:293

    Article  CAS  Google Scholar 

  28. Zhou ZL, Licklider LJ, Gygi SP, Reed R (2002) Nature 419:182

    Article  CAS  Google Scholar 

  29. Rappsilber J, Ryder U, Lamond AI, Mann M (2002) Genome Res 12:1231

    Article  CAS  Google Scholar 

  30. Kasyapa CS, Kunapuli P, Cowell JK (2005) Exp Cell Res 309:78

    Article  CAS  Google Scholar 

  31. Chen YIG, Moore RE, Ge HY, Young MK, Lee TD, Stevens SW (2007) Nucleic Acids Res 35:3928

    Article  CAS  Google Scholar 

  32. Barbazuk WB, Fu Y, McGinnis KM (2008) Genome Res 18:1381

    Article  CAS  Google Scholar 

  33. Neubauer G, King A, Rappsilber J, Calvio C, Watson M, Ajuh P, Sleeman J, Lamond A, Mann M (1998) Nat Genet 20:46

    Article  CAS  Google Scholar 

  34. Kumar M, Gromiha MM, Raghava GP (2010) J Mol Recognit 24:303

    Article  Google Scholar 

  35. Han LY, Cai CZ, Lo SL, Chung MC, Chen YZ (2004) RNA 10:355

    Article  Google Scholar 

  36. Ma X, Guo J, Wu J, Liu H, Yu J, Xie J, Sun X (2011) Proteins 79:1230

    Article  CAS  Google Scholar 

  37. Wang L, Huang C, Yang MQ, Yang JY (2010) BMC Syst Biol 4(Suppl 1):S3

    Article  Google Scholar 

  38. Murakami Y, Spriggs RV, Nakamura H, Jones S (2010) Nucleic Acids Res 38:W412

    Article  CAS  Google Scholar 

  39. Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L (2010) Bioinformatics 26:1616

    Article  CAS  Google Scholar 

  40. Maetschke SR, Yuan Z (2009) BMC Bioinforma 10:341

    Article  Google Scholar 

  41. Wang Y, Xue Z, Shen G, Xu J (2008) Amino Acids 35:295

    Article  Google Scholar 

  42. Tong J, Jiang P, Lu ZH (2008) Comput Methods Programs Biomed 90:148

    Article  Google Scholar 

  43. Kumar M, Gromiha MM, Raghava GP (2008) Proteins 71:189

    Article  CAS  Google Scholar 

  44. Wang L, Brown SJ (2006) Conf Proc IEEE Eng Med Biol Soc 1:5830

    Article  Google Scholar 

  45. Terribilini M, Lee JH, Yan C, Jernigan RL, Honavar V, Dobbs D (2006) RNA 12:1450

    Article  CAS  Google Scholar 

  46. Hsu JB, Bretana NA, Lee TY, Huang HD (2011) PLoS One 6:e27567

    Article  CAS  Google Scholar 

  47. Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V (2008) Nucleic Acids Res 36:D959

    Article  CAS  Google Scholar 

  48. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh LS (2004) Nucleic Acids Res 32:D115

    Article  CAS  Google Scholar 

  49. Panwar B, Raghava GP (2010) BMC Genom 11:507

    Article  Google Scholar 

  50. Li W, Jaroszewski L, Godzik A (2001) Bioinformatics 17:282

    Article  CAS  Google Scholar 

  51. Lin C.-J, Chen Y.-W (2003) NIPS 2003 feature selection challenge 1

  52. Chen SA, Lee TY, Ou YY (2010) BMC Bioinforma 11:536

    Article  Google Scholar 

  53. Jones DT (1999) J Mol Biol 292:195

    Article  CAS  Google Scholar 

  54. Xie D, Li A, Wang MH, Fan ZW, Feng HQ (2005) Nucleic Acids Res 33:W105

    Article  CAS  Google Scholar 

  55. Ou YY, Gromiha MM, Chen SA, Suwa M (2008) Comput Biol Chem 32:227

    Article  CAS  Google Scholar 

  56. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic Acids Res 25:3389

    Article  CAS  Google Scholar 

  57. Wang L, Huang C, Yang JY (2011) BMC Genom 11(Suppl 3):S2

    Article  Google Scholar 

  58. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C (2009) Nucleic Acids Res 37:D211

    Article  CAS  Google Scholar 

  59. Bairoch A (1991) Nucleic Acids Res 19(Suppl):2241

    Article  CAS  Google Scholar 

  60. Attwood TK, Beck ME, Bleasby AJ, Parry-Smith DJ (1994) Nucleic Acids Res 22:3590

    CAS  Google Scholar 

  61. Sonnhammer EL, Eddy SR, Durbin R (1997) Proteins 28:405

    Article  CAS  Google Scholar 

  62. Corpet F, Gouzy J, Kahn D (1998) Nucleic Acids Res 26:323

    Article  CAS  Google Scholar 

  63. Chang C.-C, Lin C.-J (2001) Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm (2001)

  64. Peng H, Ozaki T, Haggan-Ozaki V, Toyoda Y (2003) IEEE Trans Neural Netw 14:432

    Article  Google Scholar 

  65. Chou KC, Shen HB (2007) Anal Biochem 370:1

    Article  CAS  Google Scholar 

  66. Wang BB, Brendel V (2004) Genome Biol 5:R102

    Article  Google Scholar 

  67. Kumar M, Gromiha AM, Raghava GPS (2008) Proteins Struct Funct Bioinforma 71:189

    Article  CAS  Google Scholar 

  68. Bhasin M, Raghava GP (2004) J Biol Chem 279:23262

    Article  CAS  Google Scholar 

  69. Wong YH, Lee TY, Liang HK, Huang CM, Wang TY, Yang YH, Chu CH, Huang HD, Ko MT, Hwang JK (2007) Nucleic Acids Res 35:W588

    Article  Google Scholar 

  70. Sadygov RG, Yates JR 3rd (2003) Anal Chem 75:3792

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors would like to sincerely thank the National Science Council of the Republic of China for financially supporting this research under Contract No. 101-2628-E-155-002-MY2 and 102-2221-E-155-069.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tzong-Yi Lee.

Additional information

Justin Bo-Kai Hsu and Kai-Yao Huang have contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 869 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, J.BK., Huang, KY., Weng, TY. et al. Incorporating significant amino acid pairs and protein domains to predict RNA splicing-related proteins with functional roles. J Comput Aided Mol Des 28, 49–60 (2014). https://doi.org/10.1007/s10822-014-9706-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-014-9706-6

Keywords

Navigation