A Novel Method for Classifying Subfamilies and Sub-subfamilies of G-Protein Coupled Receptors
G-protein coupled receptors (GPCRs) are a large superfamily of integral membrane proteins that transduce signals across the cell membrane. Because of that important property and other physiological roles undertaken by the GPCR family, they have been an important target of therapeutic drugs. The function of many GPCRs is not known and accurate classification of GPCRs can help us to predict their function. In this study we suggest a kernel based method to classify them at the subfamily and sub-subfamily level. To enhance the accuracy and sensitivity of classifiers at the sub-subfamily level that we were facing with a low number of sequences (imbalanced data), we used our new synthetic protein sequence oversampling (SPSO) algorithm and could gain an overall accuracy and Matthew’s correlation coefficient (MCC) of 98.4 % and 0.98 for class A, nearly 100% and 1 for class B and 96.95% and 0.91 for class C, respectively, at the subfamily level and overall accuracy and MCC of 97.93% and 0.95 at the sub-subfamily level. The results shows that Our oversampling technique can be used for other applications of protein classification with the problem of imbalanced data.
KeywordsKernel Matrix Minority Class Imbalanced Data Error Cost Imbalanced Dataset
Unable to display preview. Download preview PDF.
- 7.Qian, B., Soyer, O.S., Neubig, R.R.: Depicting a protein’s two faces: GPCR classification by phylogenetic tree-based HMM. FEBS Lett. 554, 95 (2003)Google Scholar
- 10.Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: Altman, R.B., Dunker, A.K., Hunter, L., Lauderdale, K., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing, pp. 564–575. World Scientific, New Jersey (2002)Google Scholar
- 11.Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernel for SVM protein classification. Advances in Neural Information Processing System 15, 1441–1448 (2003)Google Scholar
- 12.Vert, J.-P., Saigo, H., Akustu, T.: Convolution and local alignment kernel. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Compuatational Biology. The MIT Press, CambridgeGoogle Scholar
- 16.Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz (1999)Google Scholar
- 17.Pazzini, M., Marz, C., Murphi, P., Ali, K., Hume, T., Bruk, C.: Reducing misclassification costs. In: proceedings of the Eleventh International Conference on Machine Learning, pp. 217–225 (1994)Google Scholar
- 18.Japkowicz, N., Myers, C., Gluch, M.: A novelty detection approach to classification. In: Proceeding of the Fourteenth International Joint Conference on Artificial Intelilligence, pp. 10–15 (1995)Google Scholar
- 19.Japkowicz, N.: Learning from imbalanved data sets:A Comparison of various strategies. In: Proceedings of Learning from Imbalanced Data, pp. 10–15 (2000)Google Scholar
- 20.Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)Google Scholar
- 23.Joachims, T.: Macking large scale svm learning practical. Technical Report LS8-24, Universitat Dortmond (1998)Google Scholar
- 24.Beigi, M., Zell, A.: SPSO: Synthetic Protein Sequence Oversampling for imbalanced protein data and remote homilogy detection. In: VII International Symposium on Biological and Medical Data Analysis ISBMDA (2006)Google Scholar