Predicting SUMOylation Sites

  • Denis C. Bauer
  • Fabian A. Buske
  • Mikael Bodén
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)


Recent evidence suggests that SUMOylation of proteins plays a key regulatory role in the assembly and dis-assembly of nuclear sub-compartments, and may repress transcription by modifying chromatin. Determining whether a protein contains a SUMOylation site or not thus provides essential clues about a substrate’s intra-nuclear spatial association and function.

Previous SUMOylation predictors are largely based on a degenerate and functionally unreliable consensus motif description, not rendering satisfactory accuracy to confidently map the extent of this essential class of regulatory modifications. This paper embarks on an exploration of predictive dependencies among SUMOylation site amino acids, non-local and structural properties (including secondary structure, solvent accessibility and evolutionary profiles).

An extensive examination of two main machine learning paradigms, Support-Vector-Machine and Bidirectional Recurrent Neural Networks, demonstrates that (1) with careful attention to generalization issues both methods achieve comparable performance and, that (2) local features enable best generalization, with structural features having little to no impact. The predictive model for SUMOylation sites based on the primary protein sequence achieves an area under the ROC of 0.92 using 5-fold cross-validation, and 96% accuracy on an independent hold-out test set. However, similar to other predictors, the new predictor is unable to generalize beyond the simple consensus motif.


Radial Basis Function Consensus Motif Radial Basis Function Kernel Sequence Logo Relative Solvent Accessibility 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped bLAST and pSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15, 937–946 (1999)CrossRefPubMedGoogle Scholar
  3. 3.
    Bodén, M., Yuan, Z., Bailey, T.L.: Prediction of protein continuum secondary structure with probabilistic models based on NMR solved structures. BMC Bioinformatics 7, 68 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Crooks, G.E., Hon, G., Chandonia, J.-M., Brenner, S.E.: WebLogo: a sequence logo generator. Genome Res. 14(6), 1188–1190 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Dorval, V., Fraser, P.E.: SUMO on the road to neurodegeneration. Biochim Biophys Acta. 1773(6), 694–706 (2007)CrossRefPubMedGoogle Scholar
  6. 6.
    Fawcett, T.: Roc graphs: Notes and practical considerations for researchers. Machine Learning (2004)Google Scholar
  7. 7.
    Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, Department of Computer Science, University of California, Santa Cruz, CA 95064 (1999)Google Scholar
  8. 8.
    Hay, R.T.: SUMO: a history of modification. Mol Cell 18(1), 1–12 (2005)CrossRefPubMedGoogle Scholar
  9. 9.
    Heun, P.: SUMOrganization of the nucleus. Curr Opin Cell Biol. 19(3), 350–355 (2007)CrossRefPubMedGoogle Scholar
  10. 10.
    Nathan, D., Ingvarsdottir, K., Sterner, D.E., Bylebyl, G.R., Dokmanovic, M., Dorsey, J.A., Whelan, K.A., Krsmanovic, M., Lane, W.S., Meluh, P.B., Johnson, E.S., Berger, S.L.: Histone sumoylation is a negative regulator in saccharomyces cerevisiae and shows dynamic interplay with positive-acting histone modifications. Genes Dev. 20(8), 966–976 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Rodriguez, M.S., Dargemont, C., Hay, R.T.: SUMO-1 conjugation in vivo requires both a consensus modification motif and nuclear targeting. J Biol Chem. 276(16), 12654–12659 (2001)CrossRefPubMedGoogle Scholar
  12. 12.
    Saigo, H., Vert, J., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20(11), 1682–1689 (2004)CrossRefPubMedGoogle Scholar
  13. 13.
    Schwartz, D., Gygi, S.P.: An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat Biotechnol. 23(11), 1391–1398 (2005)CrossRefPubMedGoogle Scholar
  14. 14.
    Shen, T.H., Lin, H.-K., Scaglioni, P.P., Yung, T.M., Pandolfi, P.P.: The mechanisms of PML-nuclear body formation. Mol. Cell 24(3), 331–339 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Vapnik, V.: Statistical learning theory. Wiley, Chichester (1998)Google Scholar
  16. 16.
    Xu, J., He, Y., Qiang, B., Yuan, J., Peng, X., Pan, X.: A novel method for high accuracy sumoylation site prediction from protein sequences. BMC Bioinformatics 9, 8 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Xue, Y., Zhou, F., Fu, C., Xu, Y., Yao, X.: SUMOsp: a web server for sumoylation site prediction. Nucleic Acids Res. 34, W254–W257 (2006)CrossRefGoogle Scholar
  18. 18.
    Yuan, Z., Huang, B.: Prediction of protein accessible surface areas by support vector regression. Proteins 57(3), 558–564 (2004)CrossRefPubMedGoogle Scholar
  19. 19.
    Zhou, F., Xue, Y., Chen, G., Yao, X.: GPS: a novel group-based phosphorylation predicting and scoring method. Biochem. Biophys. Res. Commun. 325(4), 1443–1448 (2004)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Denis C. Bauer
    • 1
  • Fabian A. Buske
    • 1
  • Mikael Bodén
    • 1
  1. 1.Institute for Molecular BioscienceUniversity of QueenslandBrisbaneAustralia

Personalised recommendations