Identification of Regulatory Binding Sites on mRNA Using in Vivo Derived Informations and SVMs

  • Carmen Maria Livi
  • Luc Paillard
  • Enrico Blanzieri
  • Yann Audic
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 154)


Proteins able to interact with ribonucleic acids (RNA) are involved in many cellular processes. A detailed knowledge about the binding pairs is necessary to construct computational models which can avoid time consuming biological experiments. This paper addresses the creation of a model based on support vector machines and trained on experimentally validated data. The goal is the identification of RNA molecules binding specifically to a regulatory protein, called CELF1.


Support vector machines bioinformatics machine learning classification RNA binding site prediction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AAAI Press: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. AAAI Press (1994)Google Scholar
  2. 2.
    Auweter, S., Oberstrass, F., Allain, F.: Sequence-specific binding of single-stranded rna: is there a code for recognition? Nucleic Acid Research 34(17), 4943–4959 (2006)CrossRefGoogle Scholar
  3. 3.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001),
  4. 4.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)Google Scholar
  5. 5.
    Cheng, C.W., Chia-Yu, S., Hwang, J., Sung, T., Hsu, W.: Predicting rna-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinformatics 9 (2008)Google Scholar
  6. 6.
    Dreyfuss, G., Kim, V.N., Kataoka, N.: Messenger-rna-binding proteins and the messages they carry. Nature Reviews Molecular Cell Biology 3, 195–205 (2002)CrossRefGoogle Scholar
  7. 7.
    Green, E., Brenner, S., Regents, U.: motifbs. a program to generate dna or rna position-specific scoring matrices and to search databases of sequences with these matrices (2003),
  8. 8.
    Gupta, A., Gribskov, M.: The role of rna sequence and structure in rna–protein interactions. Journal of Molecular Biology 409(4), 574–587 (2011)CrossRefGoogle Scholar
  9. 9.
    Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano, M.J., Jungkamp, A.C., Munschauer, M., Ulrich, A., Wardle, G.S., Dewell, S., Zavolan, M., Tuschl, T.: Transcriptome-wide identification of rna-binding protein and microrna target sites by par-clip. Cell 141(1), 129–141 (2010)CrossRefGoogle Scholar
  10. 10.
    Hebsgaard, S.M., Korning, P.G., Tolstrup, N., Engelbrecht, J., Rouze, P., Brunak, S.: Splice site prediction in arabidopsis thaliana pre-mrna by combining local and global sequence information. Nucleic Acid Research 24(17), 3439–3452 (1996)CrossRefGoogle Scholar
  11. 11.
    Jeong, E., Chung, I.F., Miyano, S.: A neural network method for identification of rna-interacting residues in protein. Genome Informatics 15(1), 105–116 (2004)Google Scholar
  12. 12.
    Jones, S., Daley, D.T., Luscombe, N.M., Berman, H.M.: Protein-rna interactions: a structural analysis. Nucleic Acid Research 29(4), 943–954 (2001)CrossRefGoogle Scholar
  13. 13.
    Klug, S.J., Famulok, M.: All you wanted to know about selex. Molecular Biology Reports 20(2), 97–107 (1994)CrossRefGoogle Scholar
  14. 14.
    Liu, Z.P., Wu, L.Y., Wang, Y., Zhang, X.S., Chen, L.: Prediction of protein–rna binding sites by a random forest method with combined features. Bioinformatics 26(13), 1616–1622 (2010)CrossRefGoogle Scholar
  15. 15.
    Maetschke, S., Yuan, Z.: Exploiting structural and topological information to improve prediction of rna-protein binding sites. BMC Bioinformatics 10(341) (2009)Google Scholar
  16. 16.
    Marquis, J., Paillard, L., Audic, Y., Cosson, B., Danos, O., Bec, C.L., Osborne, H.B.: Cug-bp1/celf1 requires ugu-rich sequences for high-affinity binding. Biochemical Journal 400(2), 291–301 (2006)CrossRefGoogle Scholar
  17. 17.
    Mersch, B., Gepperth, A., Suhai, S., Hotz-Wagenblatt, A.: Automatic detection of exonic splicing enhancers (eses) using svms. BMC Bioinformatics 9(1), 369 (2008)Google Scholar
  18. 18.
    Segata, N.: Falkm-lib v1.0: a library for fast local kernel machines. Tech. rep., DISI, University of Trento, Italy (2009), Software available at
  19. 19.
    Terribilini, M., Lee, J., Yan, C., Jernigan, R.L., Honavar, V., Dobbs, D.: Prediction of rna binding sites in proteins from amino acid sequences. RNA (12), 1450–1462 (2006)Google Scholar
  20. 20.
    Le Tonquèze, O., Gschloessl, B., Namanda-Vanderbeken, A., Legagneux, V., Paillard, L., Audic, Y.: Chromosome wide analysis of cugbp1 binding sites identifies the tetraspanin cd9 mrna as a target for cugbp1-mediated down-regulation. Biochemical and Biophysical Research Communications 394(4), 884–889 (2010)CrossRefGoogle Scholar
  21. 21.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)MATHGoogle Scholar
  22. 22.
    Wang, L., Brown, J.: Bindn: a web-based tool for efficient prediction of dna and rna binding sites in amino acid sequences. Nucleic Acid Research 34, 243–248 (2006)CrossRefGoogle Scholar
  23. 23.
    Zien, A., Raetsch, G., Mika, S., Schoelkopf, B., Lengauer, T., Mueller, K.R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Carmen Maria Livi
    • 1
  • Luc Paillard
    • 2
  • Enrico Blanzieri
    • 1
  • Yann Audic
    • 2
  1. 1.Department of Computer Science DISIUniversity of TrentoTrentoItaly
  2. 2.CNRS, Institut genetique et developpement de RennesUniversity of Rennes 1RennesFrance

Personalised recommendations