Amino Acids

, Volume 42, Issue 4, pp 1387–1395

Prediction of lysine ubiquitination with mRMR feature selection and analysis

Original Article


Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew’s correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What’s more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.


Ubiquitination Maximum relevance and minimum redundancy (mRMR) Incremental feature selection (IFS) Nearest neighbor algorithm (NNA) 

Supplementary material

726_2011_835_MOESM1_ESM.xls (128 kb)
Dataset S1 - Benchmark dataset (XLS 127 kb)
726_2011_835_MOESM2_ESM.xls (74 kb)
Table S1 - The IFS result (XLS 73 kb)
726_2011_835_MOESM3_ESM.xls (67 kb)
Table S2 - The 456 optimal features (XLS 67 kb)


  1. Aguilar RC, Wendland B (2003) Ubiquitin: not just for proteasomes anymore. Curr Opin Cell Biol 15(2):184–190PubMedCrossRefGoogle Scholar
  2. Ahmad S, Sarai A (2005) Pssm-based prediction of DNA binding sites in proteins. BMC Bioinform 6:33. doi:10.1186/1471-2105-6-33 CrossRefGoogle Scholar
  3. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402 (pii:gka562)PubMedCrossRefGoogle Scholar
  4. Atchley WR, Zhao J, Fernandes AD, Druke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102(18):6395–6400. doi:10.1073/pnas.0408677102 PubMedCrossRefGoogle Scholar
  5. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424PubMedCrossRefGoogle Scholar
  6. Beirlant J, Dudewicz E, Gyorfi L, Meulen Evd (1997) Nonparametric entropy estimation: an overview. Int J Math Stat Sci 6(1):17–39Google Scholar
  7. Cai Y, He J, Li X, Lu L, Yang X, Feng K, Lu W, Kong X (2009) A novel computational approach to predict transcription factor DNA binding preference. J Proteome Res 8(2):999–1003. doi:10.1021/pr800717y PubMedCrossRefGoogle Scholar
  8. Cai YD, Huang T, Feng KY, Hu L, Xie L (2010) A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B cell lymphomas. PLoS One 5(9). doi:10.1371/journal.pone.0012726
  9. Edwards YJ, Lobley AE, Pentony MM, Jones DT (2009) Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data. Genome Biol 10(5):R50. doi:10.1186/gb-2009-10-5-r50 PubMedCrossRefGoogle Scholar
  10. Gentry MS, Worby CA, Dixon JE (2005) Insights into lafora disease: Malin is an e3 ubiquitin ligase that ubiquitinates and promotes the degradation of laforin. Proc Natl Acad Sci USA 102(24):8501–8506PubMedCrossRefGoogle Scholar
  11. Han L, Wang Y, Bryant SH (2008) Developing and validating predictive decision tree models from mining chemical structural fingerprints and high-throughput screening data in pubchem. BMC Bioinform 9:401. doi:10.1186/1471-2105-9-401 CrossRefGoogle Scholar
  12. Herrmann J, Lerman LO, Lerman A (2007) Ubiquitin and ubiquitin-like proteins in protein regulation. Circ Res 100(9):1276–1291PubMedCrossRefGoogle Scholar
  13. Hershko A, Ciechanover A (1998) The ubiquitin system. Annu Rev Biochem 67:425–479PubMedCrossRefGoogle Scholar
  14. Hicke L (2001) Protein regulation by monoubiquitin. Nat Rev Mol Cell Biol 2(3):195–201PubMedCrossRefGoogle Scholar
  15. Hicke L, Dunn R (2003) Regulation of membrane protein transport by ubiquitin and ubiquitin-binding proteins. Annu Rev Cell Dev Biol 19:141–172PubMedCrossRefGoogle Scholar
  16. Hoeller D, Hecker CM, Dikic I (2006) Ubiquitin and ubiquitin-like proteins in cancer pathogenesis. Nat Rev Cancer 6(10):776–788PubMedCrossRefGoogle Scholar
  17. Huang L, Kinnucan E, Wang G, Beaudenon S, Howley PM, Huibregtse JM, Pavletich NP (1999) Structure of an e6ap–ubch7 complex: insights into ubiquitination by the e2–e3 enzyme cascade. Science 286(5443):1321–1326PubMedCrossRefGoogle Scholar
  18. Huang T, Tu K, Shyr Y, Wei CC, Xie L, Li YX (2008) The prediction of interferon treatment effects based on time series microarray gene expression profiles. J Transl Med 6:44. doi:10.1186/1479-5876-6-44 PubMedCrossRefGoogle Scholar
  19. Huang T, Cui W, Hu L, Feng K, Li YX, Cai YD (2009) Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. PLoS ONE 4(12):e8126. doi:10.1371/journal.pone.0008126 PubMedCrossRefGoogle Scholar
  20. Huang T, Shi XH, Wang P, He Z, Feng KY, Hu L, Kong X, Li YX, Cai YD, Chou KC (2010a) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One 5(6):e10972. doi:10.1371/journal.pone.0010972 PubMedCrossRefGoogle Scholar
  21. Huang T, Wang P, Ye ZQ, Xu H, He Z, Feng KY, Hu L, Cui W, Wang K, Dong X, Xie L, Kong X, Cai YD, Li Y (2010b) Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties. PLoS One 5(7):e11900. doi:10.1371/journal.pone.0011900 PubMedCrossRefGoogle Scholar
  22. Kawashima S, Kanehisa M (2000) Aaindex: amino acid index database. Nucleic Acids Res 28(1):374 pii:gkd029PubMedCrossRefGoogle Scholar
  23. Kirkpatrick DS, Denison C, Gygi SP (2005) Weighing in on ubiquitin: the expanding role of mass-spectrometry-based proteomics. Nat Cell Biol 7(8):750–757PubMedCrossRefGoogle Scholar
  24. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659. doi:10.1093/bioinformatics/btl158 PubMedCrossRefGoogle Scholar
  25. Li S, Liu B, Cai Y, Li Y (2007) Predicting protein n-glycosylation by combining functional domain and secretion information. J Biomol Struct Dyn 25(1):49–54PubMedGoogle Scholar
  26. Li H, Xing X, Ding G, Li Q, Wang C, Xie L, Zeng R, Li Y (2009) Sysptm: a systematic resource for proteomic research on post-translational modifications. Mol Cell Proteomics 8(8):1839–1849. doi:10.1074/mcp.M900030-MCP200 PubMedCrossRefGoogle Scholar
  27. Lin DH, Sterling H, Wang Z, Babilonia E, Yang B, Dong K, Hebert SC, Giebisch G, Wang WH (2005) Romk1 channel activity is regulated by monoubiquitination. Proc Natl Acad Sci USA 102(12):4306–4311PubMedCrossRefGoogle Scholar
  28. Nonaka T, Iwatsubo T, Hasegawa M (2005) Ubiquitination of alpha-synuclein. Biochemistry 44(1):361–368PubMedCrossRefGoogle Scholar
  29. Peng H, Long F, Ding C (2005a) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238PubMedCrossRefGoogle Scholar
  30. Peng H, Long F, Ding C (2005b) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. doi:10.1109/TPAMI.2005.159 PubMedCrossRefGoogle Scholar
  31. Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z (2006) Length-dependent prediction of protein intrinsic disorder. BMC Bioinform 7:208. doi:10.1186/1471-2105-7-208 CrossRefGoogle Scholar
  32. Pickart CM (2001) Mechanisms underlying ubiquitination. Annu Rev Biochem 70:503–533PubMedCrossRefGoogle Scholar
  33. Qian Z, Cai YD, Li Y (2006) A novel computational method to predict transcription factor DNA binding preference. Biochem Biophys Res Commun 348(3):1034–1037PubMedCrossRefGoogle Scholar
  34. Qiu P, Gentles AJ, Plevritis SK (2009) Fast calculation of pairwise mutual information for gene regulatory network reconstruction. Comput Methods Programs Biomed 94(2):177–180. doi:10.1016/j.cmpb.2008.11.003 PubMedCrossRefGoogle Scholar
  35. Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW, Goebl MG, Iakoucheva LM (2010) Identification, analysis, and prediction of protein ubiquitination sites. Proteins 78(2):365–380. doi:10.1002/prot.22555 PubMedCrossRefGoogle Scholar
  36. Reinstein E, Ciechanover A (2006) Narrative review: protein degradation and human diseases: the ubiquitin connection. Ann Intern Med 145(9):676–684PubMedGoogle Scholar
  37. Rubinstein ND, Mayrose I, Pupko T (2009) A machine-learning approach for predicting B cell epitopes. Mol Immunol 46(5):840–847. doi:10.1016/j.molimm.2008.09.009 PubMedCrossRefGoogle Scholar
  38. Saghatelian A, Cravatt BF (2005) Assignment of protein function in the postgenomic era. Nat Chem Biol 1(3):130–142PubMedCrossRefGoogle Scholar
  39. Schulman BA, Carrano AC, Jeffrey PD, Bowen Z, Kinnucan ER, Finnin MS, Elledge SJ, Harper JW, Pagano M, Pavletich NP (2000) Insights into scf ubiquitin ligases from the structure of the skp1–skp2 complex. Nature 408(6810):381–386PubMedCrossRefGoogle Scholar
  40. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN, Obradovic Z, Dunker AK (2007) Disprot: the database of disordered proteins. Nucleic Acids Res 35(Database issue):D786–D793. doi:10.1093/nar/gkl893 Google Scholar
  41. Sun L, Chen ZJ (2004) The novel functions of ubiquitination in signaling. Curr Opin Cell Biol 16(2):119–126PubMedCrossRefGoogle Scholar
  42. Tung CW, Ho SY (2008) Computational identification of ubiquitylation sites from protein sequences. BMC Bioinform 9:310. doi:10.1186/1471-2105-9-310 CrossRefGoogle Scholar
  43. Welchman RL, Gordon C, Mayer RJ (2005) Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol 6(8):599–609PubMedCrossRefGoogle Scholar
  44. Wu G, Xu G, Schulman BA, Jeffrey PD, Harper JW, Pavletich NP (2003) Structure of a beta-trcp1-skp1-beta-catenin complex: destruction motif binding and lysine specificity of the scf(beta-trcp1) ubiquitin ligase. Mol Cell 11(6):1445–1456PubMedCrossRefGoogle Scholar
  45. Zheng N, Wang P, Jeffrey PD, Pavletich NP (2000) Structure of a c-cbl-ubch7 complex: ring domain function in ubiquitin-protein ligases. Cell 102(4):533–539PubMedCrossRefGoogle Scholar
  46. Zheng N, Schulman BA, Song L, Miller JJ, Jeffrey PD, Wang P, Chu C, Koepp DM, Elledge SJ, Pagano M, Conaway RC, Conaway JW, Harper JW, Pavletich NP (2002) Structure of the cul1-rbx1-skp1-f boxskp2 scf ubiquitin ligase complex. Nature 416(6882):703–709PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Institute of Systems BiologyShanghai UniversityShanghaiPeople’s Republic of China
  2. 2.Key Laboratory of Systems Biology, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiPeople’s Republic of China
  3. 3.Shanghai Center for Bioinformation TechnologyShanghaiPeople’s Republic of China
  4. 4.Centre for Computational Systems BiologyFudan UniversityShanghaiPeople’s Republic of China
  5. 5.Singapore Bioimaging ConsortiumAgency for Science, Technology and ResearchSingaporeSingapore

Personalised recommendations