Abstract
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The tertiary structure of the transmembrane domain, a gate to the study of protein functionality, is unknown for almost all members of class C GPCRs, which are the target of the current study. As a result, their investigation must often rely on alignments of their amino acid sequences. Sequence alignment entails the risk of missing relevant information. Various approaches have attempted to circumvent this risk through alignment-free transformations of the sequences on the basis of different amino acid physicochemical properties. In this paper, we use several of these alignment-free methods, as well as a basic amino acid composition representation, to transform the available sequences. Novel semi-supervised statistical machine learning methods are then used to discriminate the different class C GPCRs types from the transformed data. This approach is relevant due to the existence of orphan proteins to which type labels should be assigned in a process of deorphanization or reverse pharmacology. The reported experiments show that the proposed techniques provide accurate classification even in settings of extreme class-label scarcity and that fair accuracy can be achieved even with very simple transformation strategies that ignore the sequence ordering.
This is a preview of subscription content, access via your institution.







Notes
References
Alexander SPH, Benson HE, Faccenda E, Pawson AJ, Sharman JL, Spedding M, Peters JA, Harmar AJ (2013) CGTP-collaborators: the concise guide to pharmacology 2013/14: G protein-coupled receptors. Br J Pharmacol 170:1459–1581
Aliferis CF, Statnikov A, Tsamardinos I (2006) Challenges in the analysis of mass-throughput data: a technical commentary from the statistical machine learning perspective. Cancer Inform 2:133–162
Bengio Y, Delalleau O, Roux NL (2006) Semi-supervised learning, chap. label propagation and quadratic criterion. MIT Press, Cambridge
Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10:215–234
Branden C, Tooze J (1991) Introduction to protein structure. Garland Publishing, USA
Cárdenas MI, Vellido A, Olier I, Rovira X, Giraldo J (2012) Complementing kernel-based visualization of protein sequences with their phylogenetic tree. In: Lecture notes in bioinformatics (LNCS/LNBI), vol 7548, pp 136–149
Cruz-Barbosa R, Vellido A (2010) Semi-supervised geodesic generative topographic mapping. Pattern Recognit Lett 31:202–209
Cruz-Barbosa R, Vellido A (2011) Semi-supervised analysis of human brain tumours from partially labeled MRS information, using manifold learning models. Int J Neural Syst 21:17–29
Cruz-Barbosa R, Vellido A, Giraldo J (2013) Advances in semi-supervised alignment-free classification of G protein-coupled receptors. In: Proceedings of the international work-conference on bioinformatics and biomedical engineering (IWBBIO’13), pp 759–766
Davies MN, Secker A, Freitas AA, Mendao M, Timmis J, Flower DR (2007) On the hierarchical classification of G protein-coupled receptors. Bioinformatics 23(23):3113–3118
Doré AS, Okrasa K, Patel JC, Serrano-Vega M, Bennett K, Cooke RM, Errey JC, Jazayeri A, Khan S, Tehan B, Weir M, Wiggin GR, Marshall FH (2014) Structure of class C GPCR metabotropic glutamate receptor 5 transmembrane domain. Nature 551:557–562
Foord SM, Bonner TI, Neubig RR, Rosser EM, Pin JP, Davenport AP, Spedding M, Harmar AJ (2005) International union of pharmacology. XLVI. G protein-coupled receptor list. Pharmacol Rev 57(2):279–288
Fredriksson R, Lagerström MC, Lundin LG, Schiöth HB (2003) The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Mol Pharmacol 63:1256–1272
Gorodkin J (2004) Comparing two K-category assignments by a K-category correlation coefficient. Comput Biol Chem 28:367–374
Herrmann L, Ultsch A (2007) Label propagation for semi-supervised learning in self-organizing maps. In: Proceedings of the 6th international workshop on self-organizing maps (WSOM)
Hollenstein K, Kean J, Bortolato A, Cheng RK, Doré AS, Jazayeri A, Cooke RM, Weir M, Marshall FH (2013) Structure of class B GPCR corticotropin-releasing factor receptor 1. Nature (2013). doi:10.1038/nature12357
Jurman G, Riccadonna S, Furlanello C (2012) A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE 7(8):e4,1882
Karchin R, Karplus K, Haussler D (2002) Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18:147–159
Katritch V, Cherezov V, Stevens RC (2013) Structure-function of the G protein-coupled receptor superfamily. Annu Rev Pharmacol Toxicol 53:531–556
Kim J, Moriyama EN, Warr CG, Clyne PJ, Carlson JR (2000) Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties. Bioinformatics 16:767–775
Kniazeff J, Prézeau L, Rondard P, Pin JP, Goudet C (2011) Dimers and beyond: the functional puzzles of class C GPCRs. Pharmacol Ther 130:9–25
Lapinsh M, Gutcaits A, Prusis P, Post C, Lundstedt T, Wikberg JES (2002) Classification of G-protein coupled receptors by alignment-independent extraction of principal chemical properties of primary amino acid sequences. Protein Sci 11:795–805
Liu B, Wang X, Chen Q, Dong Q, Lan X (2012) Using amino acid physicochemical distance transformation for fast protein remote homology detection. PLoS ONE 7:e46,633
Matthews B (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta Protein Struct 405:442–451
Oh DY, Kim K, Kwon HB, Seong JY (2006) Cellular and molecular biology of orphan G protein-coupled receptors. Int Rev Cytol 252:163–218
Opiyo SO, Moriyama EN (2007) Protein family classification with partial least squares. J Proteome Res 6:846–853
Otaki JM, Mori A, Itoh Y, Nakayama T, Yamamoto H (2006) Alignment-free classification of G-protein-coupled receptors using self-organizing maps. J Chem Inf Model 46:1479–1490
Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are there? Nat Rev Drug Discov 5:993–996
Palczewski K, Kumasaka T, Hori T, Behnke CA, Motoshima H et al (2000) Crystal structure of rhodopsin: a G protein-coupled receptor. Science 289:739–45
Pin JP, Galvez T, Prézeau L (2003) Evolution, structure, and activation mechanism of family 3/C G-protein-coupled receptors. Pharmacol Ther 98:325–354
Rask-Andersen M, Sällman-Almén M, Schiöth HB (2011) Trends in the exploitation of novel drug targets. Nat Rev Drug Discov 10:579–590
Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S (1998) New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J Med Chem 41:2481–2491
Siu FY, He M, de Graaf C, Han GW, Yang D, Zhang Z, Zhou C, Xu Q, Wacker D, Joseph JS, Liu W, Lau J, Cherezov V, Katritch V, Wang MW, Stevens RC (2013) Structure of the human glucagon class B G-protein-coupled receptor. Nature. doi:10.1038/nature12393
Stevens RC, Cherezov V, Katritch V, Abagyan R, Kuhn P, Rosen H, Wüthrich K (2013) The GPCR network: a large-scale collaboration to determine human GPCR structure and function. Nat Rev Drug Discov 12:25–34
Vellido A, Cárdenas MI, Olier I, Rovira X, Giraldo J (2011) A probabilistic approach to the visual exploration of G protein-coupled receptor sequences. In: Proceedings of the 19th European symposium on artificial neural networks (ESANN 2011), pp 233–238
Vroling B, Sanders M, Baakman C, Borrmann A, Verhoeven S, Klomp J, Oliveira L, de Vlieg J, Vriend G (2011) GPCRDB: information system for G protein-coupled receptors. Nucl Acids Res 39(Suppl 1):D309–D319
Wacker D, Wang C, Katritch V, Han GW, Huang XP, Vardy E, McCorvy JD, Jiang Y, Chu M, Siu FY, Liu W, Xu HE, Cherezov V, Roth BL, Stevens RC (2013) Structural features for functional selectivity at serotonin receptors. Science 340:615–619. doi:10.1126/science.1232808
Wang C, Jiang Y, Ma J, Wu H, Wacker D, Katritch V, Han GW, Liu W, Huang XP, Vardy E, McCorvy JD, Gao X, Zhou EX, Melcher K, Zhang C, Bai F, Yang H, Yang L, Jiang H, Roth BL, Cherezov V, Stevens RC, Xu HE (2013) Structural basis for molecular recognition at serotonin receptors. Science 340:610–614. doi:10.1126/science.1232807
Wang C, Wu H, Katritch V, Han GW, Huang XP, Liu W, Siu FY, Roth BL, Cherezov V, Stevens RC (2013) Structure of the human smoothened receptor bound to an antitumour agent. Nature 497(7449):338–343. doi:10.1038/nature12167
Wei JM, Yuang XJ, Hu QH, Wang SQ (2010) A novel measure for evaluating classifiers. Expert Syst Appl 37:3799–3809
Wold S, Jonsson J, Sjöström M, Sandberg M, Rännar S (1993) DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures. Anal Chim Acta 277:239–253
Wu H, Wang C, Gregory KJ, Han GW, Cho HP, Xia Y, Niswender CM, Katritch V, Meiler J, Cherezov V, Conn PJ, Stevens RC (2014) Structure of a class C GPCR metabotropic glutamate receptor 1 bound to an allosteric modulator. Science 344(6179):58–64
Wu Z, Li CH, Zhu J, Huang J (2006) A semi-supervised SVM for manifold learning. In: Proceedings of the 18th international conference on pattern recognition (ICPR)
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Technical report CMU-CALD-02-107, Carnegie Mellon University, PA, USA
Acknowledgments
R. Cruz-Barbosa acknowledges the Mexican National Council for Science and Technology for his postdoctoral fellowship. This research is partially funded by Spanish research Projects TIN2012-31377, SAF2010-19257, Fundació La Marató de TV3 (110230), RecerCaixa 2010ACUP 00378 and ERA-NET NEURON PCIN-2013-018-C03-02.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Cruz-Barbosa, R., Vellido, A. & Giraldo, J. The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors. Med Biol Eng Comput 53, 137–149 (2015). https://doi.org/10.1007/s11517-014-1218-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-014-1218-y
Keywords
- Class C G protein-coupled receptors
- Semi-supervised learning
- Alignment-free sequence representations