Molecular Diversity

, Volume 14, Issue 1, pp 81–86 | Cite as

Protein sumoylation sites prediction based on two-stage feature selection

  • Lin Lu
  • Xiao-He Shi
  • Su-Jun Li
  • Zhi-Qun Xie
  • Yong-Li Feng
  • Wen-Cong Lu
  • Yi-Xue Li
  • Haipeng Li
  • Yu-Dong Cai
Full-Length Paper


Protein sumoylation is one of the most important post-translational modifications. Accurate prediction of sumoylation sites is very useful for the analysis of proteome. Though the putative motif ΨK XE can be used, optimization of prediction models still remains a challenge. In this study, we developed a prediction system based on feature selection strategy. A total of 1,272 peptides with 14 residues from SUMOsp (Xue et al. [8] Nucleic Acids Res 34:W254–W257, 2006) were investigated in this study, including 212 substrates and 1,060 non-substrates. Among the substrates, only 162 substrates comply to the motif ΨK XE. First, 1,272 substrates were divided into training set and test set. All the substrates were encoded into feature vectors by hundreds of amino acid properties collected by Amino Acid Index Database (AAIndex, Then, mRMR (minimum redundancy–maximum relevance) method was applied to extract the most informative features. Finally, Nearest Neighbor Algorithm (NNA) was used to produce the prediction models. Tested by Leave-one-out (LOO) cross-validation, the optimal prediction model reaches the accuracy of 84.4% for the training set and 76.4% for the test set. Especially, 180 substrates were correctly predicted, which was 18 more than using the motif ΨK XE. The final selected features indicate that amino acid residues with two-residue downstream and one-residue upstream of the sumoylation sites play the most important role in determining the occurrence of sumoylation. Based on the feature selection strategy, our prediction system can not only be used for high throughput prediction of sumoylation sites but also as a tool to investigate the mechanism of sumoylation.


Prediction Protein sumoylation mRMR AAIndex Nearest Neighbor Algorithm Leave-one-out cross-validation Bioinformatics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11030_2009_9149_MOESM1_ESM.doc (51 kb)
ESM 1 (DOC 51.0 kb)
11030_2009_9149_MOESM2_ESM.doc (26 kb)
ESM 2 (DOC 25.5 kb)
11030_2009_9149_MOESM3_ESM.pdf (116 kb)
ESM 3 (PDF 116 kb)
11030_2009_9149_MOESM4_ESM.doc (88 kb)
ESM 4 (DOC 88.0 kb)


  1. 1.
    Mann M, Jensen ON (2003) Proteomic analysis of post-translational modifications. Nat Biotechnol 21: 255–261. doi: 10.1038/nbt0303-255 CrossRefPubMedGoogle Scholar
  2. 2.
    Johnson ES (2004) Protein modification by SUMO. Annu Rev Biochem 73: 355–382. doi: 10.1146/annurev.biochem.73.011303.074118 CrossRefPubMedGoogle Scholar
  3. 3.
    Girdwood DW, Tatham MH, Hay RT (2004) SUMO and transcriptional regulation. Semin Cell Dev Biol 15: 201–210. doi: 10.1016/j.semcdb.2003.12.001 CrossRefPubMedGoogle Scholar
  4. 4.
    Liang M, Melchior F, Feng XH, Lin X (2004) Regulation of Smad4 sumoylation and transforming growth factor-beta signaling by protein inhibitor of activated STAT1. J Biol Chem 279: 22857–22865. doi: 0.1074/jbc.M401554200 CrossRefPubMedGoogle Scholar
  5. 5.
    Li M, Guo D, Isales CM, Eizirik DL, Atkinson M, She JX, Wang CY (2005) SUMO wrestling with type 1 diabetes. J Mol Med 83: 504–513. doi: 10.1007/s00109-005-0645-5 CrossRefPubMedGoogle Scholar
  6. 6.
    Shinbo Y, Niki T, Taira T, Ooe H, Takahashi-Niki K, Maita C, Seino C, Iguchi-Ariga SM, Ariga H (2006) Proper SUMO-1 conjugation is essential to DJ-1 to exert its full activities. Cell Death Differ 13: 96–108. doi: 10.1038/sj.cdd.4401704 CrossRefPubMedGoogle Scholar
  7. 7.
    Hay RT (2005) SUMO: a history of modification. Mol Cell 18: 1–12. doi: 10.1016/j.molcel.2005.03.012 CrossRefPubMedGoogle Scholar
  8. 8.
    Xue Y, Zhou F, Fu C, Xu Y, Yao X (2006) SUMOsp: a web server for sumoylation site prediction. Nucleic Acids Res 34: W254–W257. doi: 10.1093/nar/gkl207 CrossRefPubMedGoogle Scholar
  9. 9.
    Harder Z, Zunino R, McBride H (2004) Sumo1 conjugates mitochondrial substrates and participates in mitochondrial fission. Curr Biol 14: 340–345PubMedGoogle Scholar
  10. 10.
    Kawashima S, Kanehisa M (2000) Amino acid index database. Nucleic Acids Res 28: 374CrossRefPubMedGoogle Scholar
  11. 11.
    Kawashima S, Ogata H, Kanehisa M (1999) Amino acid index database. Nucleic Acids Res 27: 368–369. doi: 10.1093/nar/27.1.368 CrossRefPubMedGoogle Scholar
  12. 12.
    Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27: 1226–1238. doi: 10.1109/TPAMI.2005.159 CrossRefPubMedGoogle Scholar
  13. 13.
    Cai YD, Chou KC (2006) Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. J Theor Biol 238: 395–400. doi: 10.1016/j.jtbi.2005.05.035 CrossRefPubMedGoogle Scholar
  14. 14.
    Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30: 275–349. doi: 10.3109/10409239509083488 CrossRefPubMedGoogle Scholar
  15. 15.
    Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17: 729–738. doi: 10.1023/A:1020713915365 CrossRefPubMedGoogle Scholar
  16. 16.
    Cai YD (2001) Is it a paradox or misinterpretation?. Proteins 43: 336–338. doi: 10.1002/prot.1045 CrossRefPubMedGoogle Scholar
  17. 17.
    Lin D, Tatham MH, Yu B, Kim S, Hay RT, Chen Y (2002) Identification of a substrate recognition site on Ubc9. J Biol Chem 277: 21740–21748. doi: 10.1074/jbc.M108418200 CrossRefPubMedGoogle Scholar
  18. 18.
    Bernier-Villamor V, Sampson DA, Matunis MJ, Lima CD (2002) Structural basis for E2-mediated SUMO conjugation revealed by a complex between ubiquitin-conjugating enzyme Ubc9 and RanGAP1. Cell 108: 345–356. doi: 10.1016/S0092-8674(02)00630-X CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.Institute of System BiologyShanghai UniversityShanghaiChina
  2. 2.Department of Biomedical EngineeringShanghai Jiao Tong UniversityShanghaiChina
  3. 3.CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiChina
  4. 4.Life Science and TechnologySchool of Shanghai Jiao Tong UniversityShanghaiChina
  5. 5.Institute of Health Science, Shanghai Institute for Biological Science, Chinese Academy of ScienceShanghaiChina
  6. 6.Department of ChemistryCollege of SciencesShanghaiChina
  7. 7.Key Laboratory of Systems BiologyShanghai Institutes for Biological Sciences, Chinese Academy of SciencesShanghaiChina

Personalised recommendations