Abstract
Our algorithm predicts short linear functional motifs in proteins using only sequence information. Statistical models for short linear functional motifs in proteins are built using the database of short sequence fragments taken from proteins in the current release of the Swiss-Prot database. Those segments are confirmed by experiments to have single-residue post-translational modification. The sensitivities of the classification for various types of short linear motifs are in the range of 70%. The query protein sequence is dissected into short overlapping fragments. All segments are represented as vectors. Each vector is then classified by a machine learning algorithm (Support Vector Machine) as potentially modifiable or not. The resulting list of plausible post-translational sites in the query protein is returned to the user. We also present a study of the human protein kinase C family as a biological application of our method.
Similar content being viewed by others
References
Attwood TK, Bradley P, Flower DR, Gaulton A, Maudling N, Mitchell AL, Moulton G, Nordle A, Paine K, Taylor P, Uddin A, Zygouri C (2003) Nucl Acids Res 31:400–402
Nevill-Manning CG, Wu TD, Brutlag DL (1998) Proc Natl Acad Sci USA 95:5865–5871
Huang JY, Brutlag DL (2001) Nucl Acids Res 29:202–204
Henikoff S, Henikoff JG, Pietrokovski S (1999) Bioinformatics 15:471–479
Zdobnov EM, Apweiler R (2001) Bioinformatics 17:847–848
Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K, Bairoch A (2002) Nucl Acids Res 30:235–238
Gattiker A, Gasteiger E, Bairoch A (2002) Applied Bioinformatics 1:107–108
Jonassen I, Collins JF, Higgins D (1995) Protein Science 4:1587–1595
Puntervoll P, Linding R, Gemünd C, Chabanis-Davidson S, Mattingsdal M, Cameron S, Martin DMA, Ausiello G, Brannetti B, Costantini A, Ferrè F, Maselli V, Via A, Cesareni G, Diella F, Superti-Furga G, Wyrwicz L, Ramu C, McGuigan C, Gudavalli R, Letunic I, Bork P, Rychlewski L, Küster B, Helmer-Citterich M, Hunter WN, Aasland R, Gibson TJ (2003) Nucl Acids Res 31:3625–3630
Obenauer JC, Cantley LC, Yaffe MB (2003) Nucl Acids Res 31:3635–3641
Monigatti F, Gasteiger E, Bairoch A, Jung E (2002) Bioinformatics 18:769–770
Kreegipuu A, Blom N, Brunak S, Jarv J (1998) FEBS Lett 430:45–50
Kreegipuu A, Blom N, Brunak S (1999) Nucl Acids Res 27:237–239
Blom N, Gammeltoft S, Brunak S (1999) J Mol Biol 294:1351–1362
Plewczynski D, Rychlewski L, Ye Y, Jaroszewski L, Godzik A (2004) BMC Bioinformatics 5:98
Plewczynski D, Rychlewski L (2003) Comput Methods Sci Technol 9:93–100
Plewczynski D, Jaroszewski L, Godzik A, Kloczkowski A, Rychlewski L (2005) J Mol Model (in press)
Bairoch A, Apweiler R (1999) Nucl Acids Res 27:49–54
Simons KT, Bonneau R, Ruczinski II, Baker D (1999) Proteins 37:171–176
Rohl CA, Strauss CE, Chivian D, Baker D (2004) Proteins 55:656–677
Bystroff C, Shao Y (2002) Bioinformatics 18:S54–S61
Vapnik VN (1995) The Nature of Statistical Learning Theory. Springer
Vapnik VN (1998) Statistical Learning Theory. Wiley, New York
Cristianini N, Shawe−Taylor J (2000) Support Vector Machines. Cambridge, UK
Zavaljevski N, Stevens FJ, Reifman J (2002) Bioinformatics 18:689–696
Kim H, Park H (2003) Protein Engin 16:553–560
Minakuchi Y, Satou K, Konagaya A (2003) Prediction of protein–protein interaction sites using support vector machines. Proceedings of the international conference on mathematics and engineering techniques in medicine and biological sciences, pp 22–28
Parekh DB, Ziegler W, Parker PJ (2000) EMBO J 19:496–503
Newton AC (1997) Curr Opin Cell Biol 9:161–167
Lohman R, Schneider G, Nehrens D, Wrede P (1994) Protein Sci 3:1597–1601
Acknowledgements
This work was supported by the USA grant (“SPAM” GM63208), ELM (QLRT-CT2000-00127), BioSapiens (LHSG-CT-2003-503265), GeneFun (LSHG-CT-2004-503567) projects within five and six FP EC programs. A. K. acknowledges the financial support provided by the NIH grant 1R01GM072014-01. LSW is supported by Foundation for Polish Science within Program for Young Researchers.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Plewczynski, D., Tkacz, A., Wyrwicz, L.S. et al. Support-vector-machine classification of linear functional motifs in proteins. J Mol Model 12, 453–461 (2006). https://doi.org/10.1007/s00894-005-0070-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00894-005-0070-2