Advertisement

A Novel Method for Classifying Subfamilies and Sub-subfamilies of G-Protein Coupled Receptors

  • Majid Beigi
  • Andreas Zell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4345)

Abstract

G-protein coupled receptors (GPCRs) are a large superfamily of integral membrane proteins that transduce signals across the cell membrane. Because of that important property and other physiological roles undertaken by the GPCR family, they have been an important target of therapeutic drugs. The function of many GPCRs is not known and accurate classification of GPCRs can help us to predict their function. In this study we suggest a kernel based method to classify them at the subfamily and sub-subfamily level. To enhance the accuracy and sensitivity of classifiers at the sub-subfamily level that we were facing with a low number of sequences (imbalanced data), we used our new synthetic protein sequence oversampling (SPSO) algorithm and could gain an overall accuracy and Matthew’s correlation coefficient (MCC) of 98.4 % and 0.98 for class A, nearly 100% and 1 for class B and 96.95% and 0.91 for class C, respectively, at the subfamily level and overall accuracy and MCC of 97.93% and 0.95 at the sub-subfamily level. The results shows that Our oversampling technique can be used for other applications of protein classification with the problem of imbalanced data.

Keywords

Kernel Matrix Minority Class Imbalanced Data Error Cost Imbalanced Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Attwood, T.K., Croning, M.D.R., Gaulton, A.: Deriving structural and functional insights from a ligand-based hierarchical classification of G-protein coupled receptors. Protein Eng. 15, 7–12 (2002)CrossRefGoogle Scholar
  2. 2.
    Herbert, T.E., Bouvier, M.: Structural and functional aspects of G protein-coupled receptor oligomerization. Biochem. Cell Biol. 76, 1–11 (1998)CrossRefGoogle Scholar
  3. 3.
    Horn, F., Bettler, E., Oliveira, L., Campagne, L.F., Cohhen, F.E., Vriend, G.: GPCRDB information system for G protein-coupled receptors. Nucleic Acids Res. 31(1), 294–297 (2003)CrossRefGoogle Scholar
  4. 4.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleaic Acids Res 25, 3389–3402 (1997)CrossRefGoogle Scholar
  5. 5.
    Kim, J., Moriyama, E.N., Warr, C.G., Clyne, P.J., Carlson, J.R.: Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties. Bioinformatics 16(9), 767–775 (2000)CrossRefGoogle Scholar
  6. 6.
    Elrod, D.W., Chou, K.C.: A study on the correlation of G-protein-coupled receptor types with amino acid composition. Protein Eng. 15, 713–715 (2002)CrossRefGoogle Scholar
  7. 7.
    Qian, B., Soyer, O.S., Neubig, R.R.: Depicting a protein’s two faces: GPCR classification by phylogenetic tree-based HMM. FEBS Lett. 554, 95 (2003)Google Scholar
  8. 8.
    Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18(1), 147–159 (2002)CrossRefGoogle Scholar
  9. 9.
    Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7(1-2), 95–114 (2000)CrossRefGoogle Scholar
  10. 10.
    Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: Altman, R.B., Dunker, A.K., Hunter, L., Lauderdale, K., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing, pp. 564–575. World Scientific, New Jersey (2002)Google Scholar
  11. 11.
    Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernel for SVM protein classification. Advances in Neural Information Processing System 15, 1441–1448 (2003)Google Scholar
  12. 12.
    Vert, J.-P., Saigo, H., Akustu, T.: Convolution and local alignment kernel. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Compuatational Biology. The MIT Press, CambridgeGoogle Scholar
  13. 13.
    Huang, Y., Cai, J., Li, Y.D.: Classifying G-protein coupled receptors with bagging classification tree. Computationa Biology and Chemistry 28, 275–280 (2004)MATHCrossRefGoogle Scholar
  14. 14.
    Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids res. 29, 346–349 (2001)CrossRefGoogle Scholar
  15. 15.
    Saigo, H., Vert, J.P., Ueda, N., Akustu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20(11), 1682–1689 (2004)CrossRefGoogle Scholar
  16. 16.
    Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz (1999)Google Scholar
  17. 17.
    Pazzini, M., Marz, C., Murphi, P., Ali, K., Hume, T., Bruk, C.: Reducing misclassification costs. In: proceedings of the Eleventh International Conference on Machine Learning, pp. 217–225 (1994)Google Scholar
  18. 18.
    Japkowicz, N., Myers, C., Gluch, M.: A novelty detection approach to classification. In: Proceeding of the Fourteenth International Joint Conference on Artificial Intelilligence, pp. 10–15 (1995)Google Scholar
  19. 19.
    Japkowicz, N.: Learning from imbalanved data sets:A Comparison of various strategies. In: Proceedings of Learning from Imbalanced Data, pp. 10–15 (2000)Google Scholar
  20. 20.
    Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)Google Scholar
  21. 21.
    Bhasin, M., Raghava, G.P.S.: GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic Acids res. 32, 383–389 (2004)CrossRefGoogle Scholar
  22. 22.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTALW: Improving the sesitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)CrossRefGoogle Scholar
  23. 23.
    Joachims, T.: Macking large scale svm learning practical. Technical Report LS8-24, Universitat Dortmond (1998)Google Scholar
  24. 24.
    Beigi, M., Zell, A.: SPSO: Synthetic Protein Sequence Oversampling for imbalanced protein data and remote homilogy detection. In: VII International Symposium on Biological and Medical Data Analysis ISBMDA (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Majid Beigi
    • 1
  • Andreas Zell
    • 1
  1. 1.Center for Bioinformatics Tübingen (ZBIT)University of TübingenTübingenGermany

Personalised recommendations