Amino Acids

, Volume 43, Issue 1, pp 447–455

Predicting protein sumoylation sites from sequence features

Original Article

DOI: 10.1007/s00726-011-1100-2

Cite this article as:
Teng, S., Luo, H. & Wang, L. Amino Acids (2012) 43: 447. doi:10.1007/s00726-011-1100-2

Abstract

Protein sumoylation is a post-translational modification that plays an important role in a wide range of cellular processes. Small ubiquitin-related modifier (SUMO) can be covalently and reversibly conjugated to the sumoylation sites of target proteins, many of which are implicated in various human genetic disorders. The accurate prediction of protein sumoylation sites may help biomedical researchers to design their experiments and understand the molecular mechanism of protein sumoylation. In this study, a new machine learning approach has been developed for predicting sumoylation sites from protein sequence information. Random forests (RFs) and support vector machines (SVMs) were trained with the data collected from the literature. Domain-specific knowledge in terms of relevant biological features was used for input vector encoding. It was shown that RF classifier performance was affected by the sequence context of sumoylation sites, and 20 residues with the core motif ΨKXE in the middle appeared to provide enough context information for sumoylation site prediction. The RF classifiers were also found to outperform SVM models for predicting protein sumoylation sites from sequence features. The results suggest that the machine learning approach gives rise to more accurate prediction of protein sumoylation sites than the other existing methods. The accurate classifiers have been used to develop a new web server, called seeSUMO (http://bioinfo.ggc.org/seesumo/), for sequence-based prediction of protein sumoylation sites.

Keywords

Protein sumoylation site predictionRandom forestsSupport vector machinesBiological featuresSeeSUMO

Supplementary material

726_2011_1100_MOESM1_ESM.pdf (38 kb)
Supplementary Tables (PDF 38 kb)

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Department of Genetics and BiochemistryClemson UniversityClemsonUSA
  2. 2.J.C. Self Research Institute of Human GeneticsGreenwood Genetic CenterGreenwoodUSA