Original Article

Amino Acids

, Volume 42, Issue 4, pp 1387-1395

First online:

Prediction of lysine ubiquitination with mRMR feature selection and analysis

  • Yudong CaiAffiliated withInstitute of Systems Biology, Shanghai UniversityCentre for Computational Systems Biology, Fudan University Email author 
  • , Tao HuangAffiliated withKey Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of SciencesShanghai Center for Bioinformation Technology
  • , Lele HuAffiliated withInstitute of Systems Biology, Shanghai University
  • , Xiaohe ShiAffiliated withSingapore Bioimaging Consortium, Agency for Science, Technology and Research
  • , Lu XieAffiliated withShanghai Center for Bioinformation Technology
  • , Yixue LiAffiliated withKey Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of SciencesShanghai Center for Bioinformation Technology Email author 

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew’s correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What’s more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.

Keywords

Ubiquitination Maximum relevance and minimum redundancy (mRMR) Incremental feature selection (IFS) Nearest neighbor algorithm (NNA)