Skip to main content
Log in

Prediction of lysine ubiquitination with mRMR feature selection and analysis

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew’s correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What’s more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aguilar RC, Wendland B (2003) Ubiquitin: not just for proteasomes anymore. Curr Opin Cell Biol 15(2):184–190

    Article  PubMed  CAS  Google Scholar 

  • Ahmad S, Sarai A (2005) Pssm-based prediction of DNA binding sites in proteins. BMC Bioinform 6:33. doi:10.1186/1471-2105-6-33

    Article  Google Scholar 

  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402 (pii:gka562)

    Article  PubMed  CAS  Google Scholar 

  • Atchley WR, Zhao J, Fernandes AD, Druke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102(18):6395–6400. doi:10.1073/pnas.0408677102

    Article  PubMed  CAS  Google Scholar 

  • Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424

    Article  PubMed  CAS  Google Scholar 

  • Beirlant J, Dudewicz E, Gyorfi L, Meulen Evd (1997) Nonparametric entropy estimation: an overview. Int J Math Stat Sci 6(1):17–39

    Google Scholar 

  • Cai Y, He J, Li X, Lu L, Yang X, Feng K, Lu W, Kong X (2009) A novel computational approach to predict transcription factor DNA binding preference. J Proteome Res 8(2):999–1003. doi:10.1021/pr800717y

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Huang T, Feng KY, Hu L, Xie L (2010) A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B cell lymphomas. PLoS One 5(9). doi:10.1371/journal.pone.0012726

  • Edwards YJ, Lobley AE, Pentony MM, Jones DT (2009) Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data. Genome Biol 10(5):R50. doi:10.1186/gb-2009-10-5-r50

    Article  PubMed  Google Scholar 

  • Gentry MS, Worby CA, Dixon JE (2005) Insights into lafora disease: Malin is an e3 ubiquitin ligase that ubiquitinates and promotes the degradation of laforin. Proc Natl Acad Sci USA 102(24):8501–8506

    Article  PubMed  CAS  Google Scholar 

  • Han L, Wang Y, Bryant SH (2008) Developing and validating predictive decision tree models from mining chemical structural fingerprints and high-throughput screening data in pubchem. BMC Bioinform 9:401. doi:10.1186/1471-2105-9-401

    Article  Google Scholar 

  • Herrmann J, Lerman LO, Lerman A (2007) Ubiquitin and ubiquitin-like proteins in protein regulation. Circ Res 100(9):1276–1291

    Article  PubMed  CAS  Google Scholar 

  • Hershko A, Ciechanover A (1998) The ubiquitin system. Annu Rev Biochem 67:425–479

    Article  PubMed  CAS  Google Scholar 

  • Hicke L (2001) Protein regulation by monoubiquitin. Nat Rev Mol Cell Biol 2(3):195–201

    Article  PubMed  CAS  Google Scholar 

  • Hicke L, Dunn R (2003) Regulation of membrane protein transport by ubiquitin and ubiquitin-binding proteins. Annu Rev Cell Dev Biol 19:141–172

    Article  PubMed  CAS  Google Scholar 

  • Hoeller D, Hecker CM, Dikic I (2006) Ubiquitin and ubiquitin-like proteins in cancer pathogenesis. Nat Rev Cancer 6(10):776–788

    Article  PubMed  CAS  Google Scholar 

  • Huang L, Kinnucan E, Wang G, Beaudenon S, Howley PM, Huibregtse JM, Pavletich NP (1999) Structure of an e6ap–ubch7 complex: insights into ubiquitination by the e2–e3 enzyme cascade. Science 286(5443):1321–1326

    Article  PubMed  CAS  Google Scholar 

  • Huang T, Tu K, Shyr Y, Wei CC, Xie L, Li YX (2008) The prediction of interferon treatment effects based on time series microarray gene expression profiles. J Transl Med 6:44. doi:10.1186/1479-5876-6-44

    Article  PubMed  Google Scholar 

  • Huang T, Cui W, Hu L, Feng K, Li YX, Cai YD (2009) Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. PLoS ONE 4(12):e8126. doi:10.1371/journal.pone.0008126

    Article  PubMed  Google Scholar 

  • Huang T, Shi XH, Wang P, He Z, Feng KY, Hu L, Kong X, Li YX, Cai YD, Chou KC (2010a) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One 5(6):e10972. doi:10.1371/journal.pone.0010972

    Article  PubMed  Google Scholar 

  • Huang T, Wang P, Ye ZQ, Xu H, He Z, Feng KY, Hu L, Cui W, Wang K, Dong X, Xie L, Kong X, Cai YD, Li Y (2010b) Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties. PLoS One 5(7):e11900. doi:10.1371/journal.pone.0011900

    Article  PubMed  Google Scholar 

  • Kawashima S, Kanehisa M (2000) Aaindex: amino acid index database. Nucleic Acids Res 28(1):374 pii:gkd029

    Article  PubMed  CAS  Google Scholar 

  • Kirkpatrick DS, Denison C, Gygi SP (2005) Weighing in on ubiquitin: the expanding role of mass-spectrometry-based proteomics. Nat Cell Biol 7(8):750–757

    Article  PubMed  CAS  Google Scholar 

  • Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659. doi:10.1093/bioinformatics/btl158

    Article  PubMed  CAS  Google Scholar 

  • Li S, Liu B, Cai Y, Li Y (2007) Predicting protein n-glycosylation by combining functional domain and secretion information. J Biomol Struct Dyn 25(1):49–54

    PubMed  Google Scholar 

  • Li H, Xing X, Ding G, Li Q, Wang C, Xie L, Zeng R, Li Y (2009) Sysptm: a systematic resource for proteomic research on post-translational modifications. Mol Cell Proteomics 8(8):1839–1849. doi:10.1074/mcp.M900030-MCP200

    Article  PubMed  CAS  Google Scholar 

  • Lin DH, Sterling H, Wang Z, Babilonia E, Yang B, Dong K, Hebert SC, Giebisch G, Wang WH (2005) Romk1 channel activity is regulated by monoubiquitination. Proc Natl Acad Sci USA 102(12):4306–4311

    Article  PubMed  CAS  Google Scholar 

  • Nonaka T, Iwatsubo T, Hasegawa M (2005) Ubiquitination of alpha-synuclein. Biochemistry 44(1):361–368

    Article  PubMed  CAS  Google Scholar 

  • Peng H, Long F, Ding C (2005a) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  PubMed  Google Scholar 

  • Peng H, Long F, Ding C (2005b) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. doi:10.1109/TPAMI.2005.159

    Article  PubMed  Google Scholar 

  • Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z (2006) Length-dependent prediction of protein intrinsic disorder. BMC Bioinform 7:208. doi:10.1186/1471-2105-7-208

    Article  Google Scholar 

  • Pickart CM (2001) Mechanisms underlying ubiquitination. Annu Rev Biochem 70:503–533

    Article  PubMed  CAS  Google Scholar 

  • Qian Z, Cai YD, Li Y (2006) A novel computational method to predict transcription factor DNA binding preference. Biochem Biophys Res Commun 348(3):1034–1037

    Article  PubMed  CAS  Google Scholar 

  • Qiu P, Gentles AJ, Plevritis SK (2009) Fast calculation of pairwise mutual information for gene regulatory network reconstruction. Comput Methods Programs Biomed 94(2):177–180. doi:10.1016/j.cmpb.2008.11.003

    Article  PubMed  Google Scholar 

  • Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW, Goebl MG, Iakoucheva LM (2010) Identification, analysis, and prediction of protein ubiquitination sites. Proteins 78(2):365–380. doi:10.1002/prot.22555

    Article  PubMed  CAS  Google Scholar 

  • Reinstein E, Ciechanover A (2006) Narrative review: protein degradation and human diseases: the ubiquitin connection. Ann Intern Med 145(9):676–684

    PubMed  Google Scholar 

  • Rubinstein ND, Mayrose I, Pupko T (2009) A machine-learning approach for predicting B cell epitopes. Mol Immunol 46(5):840–847. doi:10.1016/j.molimm.2008.09.009

    Article  PubMed  CAS  Google Scholar 

  • Saghatelian A, Cravatt BF (2005) Assignment of protein function in the postgenomic era. Nat Chem Biol 1(3):130–142

    Article  PubMed  CAS  Google Scholar 

  • Schulman BA, Carrano AC, Jeffrey PD, Bowen Z, Kinnucan ER, Finnin MS, Elledge SJ, Harper JW, Pagano M, Pavletich NP (2000) Insights into scf ubiquitin ligases from the structure of the skp1–skp2 complex. Nature 408(6810):381–386

    Article  PubMed  CAS  Google Scholar 

  • Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN, Obradovic Z, Dunker AK (2007) Disprot: the database of disordered proteins. Nucleic Acids Res 35(Database issue):D786–D793. doi:10.1093/nar/gkl893

    Google Scholar 

  • Sun L, Chen ZJ (2004) The novel functions of ubiquitination in signaling. Curr Opin Cell Biol 16(2):119–126

    Article  PubMed  CAS  Google Scholar 

  • Tung CW, Ho SY (2008) Computational identification of ubiquitylation sites from protein sequences. BMC Bioinform 9:310. doi:10.1186/1471-2105-9-310

    Article  Google Scholar 

  • Welchman RL, Gordon C, Mayer RJ (2005) Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol 6(8):599–609

    Article  PubMed  CAS  Google Scholar 

  • Wu G, Xu G, Schulman BA, Jeffrey PD, Harper JW, Pavletich NP (2003) Structure of a beta-trcp1-skp1-beta-catenin complex: destruction motif binding and lysine specificity of the scf(beta-trcp1) ubiquitin ligase. Mol Cell 11(6):1445–1456

    Article  PubMed  CAS  Google Scholar 

  • Zheng N, Wang P, Jeffrey PD, Pavletich NP (2000) Structure of a c-cbl-ubch7 complex: ring domain function in ubiquitin-protein ligases. Cell 102(4):533–539

    Article  PubMed  CAS  Google Scholar 

  • Zheng N, Schulman BA, Song L, Miller JJ, Jeffrey PD, Wang P, Chu C, Koepp DM, Elledge SJ, Pagano M, Conaway RC, Conaway JW, Harper JW, Pavletich NP (2002) Structure of the cul1-rbx1-skp1-f boxskp2 scf ubiquitin ligase complex. Nature 416(6882):703–709

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors acknowledge Yvonne Poindexter at the Vanderbilt University Cancer Biostatistics Center for her editing. This work was supported by grants from National High-Tech R&D Program of China (863 Program) (2006AA02Z334, 2007DFA31040), China National Key Projects for Infectious Disease (2008ZX10002-021), National Basic Research Program of China (2006CB910700), National Natural Science Foundation of China (Grant No. 31070752) and Key Research Program (CAS) (KSCX2-YW-R-112).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yudong Cai or Yixue Li.

Additional information

Y. Cai and T. Huang contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material

Dataset S1 - Benchmark dataset.

Table S1 - The IFS result.

Table S2 - The 456 optimal features.

Dataset S1 - Benchmark dataset (XLS 127 kb)

Table S1 - The IFS result (XLS 73 kb)

Table S2 - The 456 optimal features (XLS 67 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, Y., Huang, T., Hu, L. et al. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42, 1387–1395 (2012). https://doi.org/10.1007/s00726-011-0835-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-011-0835-0

Keywords

Navigation