Prediction of the O-glycosylation Sites in Protein by Layered Neural Networks and Support Vector Machines
O-glycosylation is one of the main types of the mammalian protein glycosylation, which is serine or threonine specific, though any consensus sequence is still unknown. In this paper, a layered neural network and a support vector machine are used for the prediction of O-glycosylation sites. Three types of encoding for a protein sequence within a fixed size window are used as the input to the network, that is, a sparse coding which distinguishes all 20 amino acid residues, 5-letter coding and hydropathy coding. In the neural network, one output unit gives the prediction whether a particular site of serine or threonine is glycosylated, while SVM classifies into the 2 classes. The performance is evaluated by the Matthews correlation coefficient. The preliminary results on the neural network show the better performance of the sparse and 5-letter codings compared with the hydropathy coding, while the improvement according to the window size is shown to be limited to a certain extent by SVM.
KeywordsSupport Vector Machine Window Size Feedforward Neural Network Sparse Code Matthews Correlation
Unable to display preview. Download preview PDF.
- 2.Julenius, K., Molgaard, A., Gupta, R., Brunak, S.: Supplementary material on Prediction, conservation analysis and structural characterization of mammalian mucin-type O-glycosylation sites (2004)Google Scholar
- 4.Cristianini, N., Taylor, J.S.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge Univ. Press, Cambridge (2000)Google Scholar