Finding Class C GPCR Subtype-Discriminating N-grams through Feature Selection
G protein-coupled receptors (GPCRs) are a large and heterogeneous superfamily of receptors that are key cell players for their role as extracellular signal transmitters. Class C GPCRs, in particular, are of great interest in pharmacology. The lack of knowledge about their full 3-D structure prompts the use of their primary amino acid sequences for the construction of robust classifiers, capable of discriminating their different subtypes. In this paper, we describe the use of feature selection techniques to build Support Vector Machine (SVM)-based classification models from selected receptor subsequences described as n-grams. We show that this approach to classification is useful for finding class C GPCR subtype-specific motifs.
KeywordsG-Protein coupled receptors pharmaco-proteomics feature selection n-grams support vector machines
Unable to display preview. Download preview PDF.
- 1.Caragea, C., Silvescu, A., Mitra, P.: Protein Sequence Classification Using Feature Hashing. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 538–543. IEEE (2011)Google Scholar
- 2.Chang, C., Lin, C.: LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27:1–27:27 (2011)Google Scholar
- 8.König, C., Cruz-Barbosa, R., Alquézar, R., Vellido, A.: SVM-based classification of class C GPCRs from alignment-free physicochemical transformations of their sequences. In: Petrosino, A., Maddalena, L., Pala, P. (eds.) ICIAP 2013 Workshops. LNCS, vol. 8158, pp. 336–343. Springer, Heidelberg (2013)Google Scholar
- 9.Mhamdi, F., Elloumi, M., Rakotomalala, R.: Textmining, features selection and datamining for proteins classification. In: Proceedings of the 2004 International Conference on Information and Comunication Technologies: From Theory to Applications, pp. 457–458. IEEE (2004)Google Scholar
- 12.Vroling, B., Sanders, M., Baakman, C., Borrmann, A., Verhoeven, S., Klomp, J., Oliveira, L., de Vlieg, J., Vriend, G.: GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Research 39(suppl. 1), D309–D319 (2011)Google Scholar