Predicting SUMOylation Sites
Recent evidence suggests that SUMOylation of proteins plays a key regulatory role in the assembly and dis-assembly of nuclear sub-compartments, and may repress transcription by modifying chromatin. Determining whether a protein contains a SUMOylation site or not thus provides essential clues about a substrate’s intra-nuclear spatial association and function.
Previous SUMOylation predictors are largely based on a degenerate and functionally unreliable consensus motif description, not rendering satisfactory accuracy to confidently map the extent of this essential class of regulatory modifications. This paper embarks on an exploration of predictive dependencies among SUMOylation site amino acids, non-local and structural properties (including secondary structure, solvent accessibility and evolutionary profiles).
An extensive examination of two main machine learning paradigms, Support-Vector-Machine and Bidirectional Recurrent Neural Networks, demonstrates that (1) with careful attention to generalization issues both methods achieve comparable performance and, that (2) local features enable best generalization, with structural features having little to no impact. The predictive model for SUMOylation sites based on the primary protein sequence achieves an area under the ROC of 0.92 using 5-fold cross-validation, and 96% accuracy on an independent hold-out test set. However, similar to other predictors, the new predictor is unable to generalize beyond the simple consensus motif.
KeywordsRadial Basis Function Consensus Motif Radial Basis Function Kernel Sequence Logo Relative Solvent Accessibility
- 6.Fawcett, T.: Roc graphs: Notes and practical considerations for researchers. Machine Learning (2004)Google Scholar
- 7.Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, Department of Computer Science, University of California, Santa Cruz, CA 95064 (1999)Google Scholar
- 10.Nathan, D., Ingvarsdottir, K., Sterner, D.E., Bylebyl, G.R., Dokmanovic, M., Dorsey, J.A., Whelan, K.A., Krsmanovic, M., Lane, W.S., Meluh, P.B., Johnson, E.S., Berger, S.L.: Histone sumoylation is a negative regulator in saccharomyces cerevisiae and shows dynamic interplay with positive-acting histone modifications. Genes Dev. 20(8), 966–976 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
- 15.Vapnik, V.: Statistical learning theory. Wiley, Chichester (1998)Google Scholar