Amino Acids

, Volume 42, Issue 4, pp 1387–1395

Prediction of lysine ubiquitination with mRMR feature selection and analysis

Original Article

DOI: 10.1007/s00726-011-0835-0

Cite this article as:
Cai, Y., Huang, T., Hu, L. et al. Amino Acids (2012) 42: 1387. doi:10.1007/s00726-011-0835-0


Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew’s correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What’s more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.


UbiquitinationMaximum relevance and minimum redundancy (mRMR)Incremental feature selection (IFS)Nearest neighbor algorithm (NNA)

Supplementary material

726_2011_835_MOESM1_ESM.xls (128 kb)
Dataset S1 - Benchmark dataset (XLS 127 kb)
726_2011_835_MOESM2_ESM.xls (74 kb)
Table S1 - The IFS result (XLS 73 kb)
726_2011_835_MOESM3_ESM.xls (67 kb)
Table S2 - The 456 optimal features (XLS 67 kb)

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Institute of Systems BiologyShanghai UniversityShanghaiPeople’s Republic of China
  2. 2.Key Laboratory of Systems Biology, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiPeople’s Republic of China
  3. 3.Shanghai Center for Bioinformation TechnologyShanghaiPeople’s Republic of China
  4. 4.Centre for Computational Systems BiologyFudan UniversityShanghaiPeople’s Republic of China
  5. 5.Singapore Bioimaging ConsortiumAgency for Science, Technology and ResearchSingaporeSingapore