Abstract
Transmembrane proteins (TMPs) are major drug targets, but the knowledge of their precise topology structure remains highly limited compared with globular proteins. In spite of the difficulties in obtaining their structures, an important effort has been made these last years to increase their number from an experimental and computational point of view. In view of this emerging challenge, the development of computational methods to extract knowledge from these data is crucial for the better understanding of their functions and in improving the quality of structural models. Here, we revisit an efficient unsupervised learning procedure, called Hybrid Protein Model (HPM), which is applied to the analysis of transmembrane proteins belonging to the all-α structural class. HPM method is an original classification procedure that efficiently combines sequence and structure learning. The procedure was initially applied to the analysis of globular proteins. In the present case, HPM classifies a set of overlapping protein fragments, extracted from a non-redundant databank of TMP 3D structure. After fine-tuning of the learning parameters, the optimal classification results in 65 clusters. They represent at best similar relationships between sequence and local structure properties of TMPs. Interestingly, HPM distinguishes among the resulting clusters two helical regions with distinct hydrophobic patterns. This underlines the complexity of the topology of these proteins. The HPM classification enlightens unusual relationship between amino acids in TMP fragments, which can be useful to elaborate new amino acids substitution matrices. Finally, two challenging applications are described: the first one aims at annotating protein functions (channel or not), the second one intends to assess the quality of the structures (X-ray or models) via a new scoring function deduced from the HPM classification.
Similar content being viewed by others
References
Arinaminpathy Y, Khurana E, Engelman DM, Gerstein MB (2009) Computational analysis of membrane proteins: the largest class of drug targets. Drug Discov Today 14(23–24):1130–1135. doi:10.1016/j.drudis.2009.08.006
Bansal M, Kumar S, Velavan R (2000) HELANAL: a program to characterize helix geometry in proteins. J Biomol Struct Dyn 17(5):811–819. doi:10.1080/07391102.2000.10506570
Benros C, de Brevern AG, Etchebest C, Hazout S (2006) Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 62(4):865–880. doi:10.1002/prot.20815
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242 (gkd090 [pii])
Bornot A, Etchebest C, de Brevern AG (2009) A new prediction strategy for long local protein structures using an original description. Proteins 76(3):570–587. doi:10.1002/prot.22370
Bornot A, Etchebest C, de Brevern AG (2011) Predicting protein flexibility through the prediction of local structures. Proteins 79(3):839–852. doi:10.1002/prot.22922
Burgess SM, Delannoy M, Jensen RE (1994) MMM1 encodes a mitochondrial outer membrane protein essential for establishing and maintaining the structure of yeast mitochondria. J Cell Biol 126(6):1375–1391
Cline M, Hughey R, Karplus K (2002) Predicting reliable regions in protein sequence alignments. Bioinformatics 18(2):306–314
Cohen RS (2005) The role of membranes and membrane trafficking in RNA localization. Biol Cell 97(1):5–18. doi:10.1042/BC20040056
Cordes FS, Bright JN, Sansom MS (2002) Proline-induced distortions of transmembrane helices. J Mol Biol 323(5):951–960 (S0022283602010069 [pii])
Dayhoff MO, Schwartz RM (1978) A model of evolutionary change in proteins. Atlas of protein sequence and structure, vol 5. National Biomedical Research Foundation, Washington, pp 345–358
de Brevern AG (2005) New assessment of a structural alphabet. Silico Biol 5(3):283–289
de Brevern AG, Hazout S (2000) Hybrid Protein Model (HPM): a method to compact protein 3D-structure information and physicochemical properties. IEEE Comp Soc (SPIRE 2000) S1:49–54
de Brevern AG, Hazout S (2001) Compacting local protein folds with a “hybrid protein model”. Theor Chem Acc 106(1–2):36–47
de Brevern AG, Hazout S (2003) ‘Hybrid protein model’ for optimally defining 3D protein structure fragments. Bioinformatics 19(3):345–353
de Brevern AG, Etchebest C, Hazout S (2000) Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins 41(3):271–287. doi:10.1002/1097-0134(20001115)41:3<271:AID-PROT10>3.0.CO;2-Z
de Brevern AG, Valadie H, Hazout S, Etchebest C (2002) Extension of a local backbone description using a structural alphabet: a new approach to the sequence–structure relationship. Protein Sci 11(12):2871–2886. doi:10.1110/ps.0220502
de Brevern AG, Bornot A, Craveur P, Etchebest C, Gelly JC (2012) PredyFlexy: flexibility and local structure prediction from sequence. Nucleic Acids Res 40((Web Server issue)):W317–W322. doi:10.1093/nar/gks482
Edgar RC (2004a) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform 5:113. doi:10.1186/1471-2105-5-113
Edgar RC (2004b) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. doi:10.1093/nar/gkh340
Eisenberg D, Schwarz E, Komaromy M, Wall R (1984) Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol 179(1):125–142 (0022-2836(84)90309-7 [pii])
Elofsson A, von Heijne G (2007) Membrane protein structure: prediction versus reality. Annu Rev Biochem 76:125–140. doi:10.1146/annurev.biochem.76.052705.163539
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A (2006) Comparative protein structure modeling using Modeller. Curr Protoc Bioinform Chapter 5:5.6. doi:10.1002/0471250953.bi0506s15
Forrest LR, Tang CL, Honig B (2006) On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. Biophys J 91(2):508–517. doi:10.1529/biophysj.106.082313
Fuchs A, Kirschner A, Frishman D (2009) Prediction of helix-helix contacts and interacting helices in polytopic membrane proteins using neural networks. Proteins 74(4):857–871. doi:10.1002/prot.22194
Gabdoulline RR, Hoffmann R, Leitner F, Wade RC (2003) ProSAT: functional annotation of protein 3D structures. Bioinformatics 19(13):1723–1725
Gabdoulline RR, Ulbrich S, Richter S, Wade RC (2006) ProSAT2—Protein Structure Annotation Server. Nucleic Acids Res 34(Web Server issue):W79–W83. doi:10.1093/nar/gkl216
Gamper N, Shapiro MS (2007) Regulation of ion transport proteins by membrane phosphoinositides. Nat Rev Neurosci 8(12):921–934. doi:10.1038/nrn2257
Gelly JC, Joseph AP, Srinivasan N, de Brevern AG (2011) iPBA: a tool for protein structure comparison using sequence alignment strategies. Nucleic Acids Res 39(Web Server issue):W18–W23. doi:10.1093/nar/gkr333
Giacomini KM, Huang SM, Tweedie DJ, Benet LZ, Brouwer KL, Chu X, Dahlin A, Evers R, Fischer V, Hillgren KM, Hoffmaster KA, Ishikawa T, Keppler D, Kim RB, Lee CA, Niemi M, Polli JW, Sugiyama Y, Swaan PW, Ware JA, Wright SH, Yee SW, Zamek-Gliszczynski MJ, Zhang L (2010) Membrane transporters in drug development. Nat Rev Drug Discov 9(3):215–236. doi:10.1038/nrd3028
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864
Hall SE, Roberts K, Vaidehi N (2009) Position of helical kinks in membrane protein crystal structures and the accuracy of computational prediction. J Mol Graph Model 27(8):944–950. doi:10.1016/j.jmgm.2009.02.004
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919
Hill JR, Kelm S, Shi J, Deane CM (2011) Environment specific substitution tables improve membrane protein alignment. Bioinformatics 27(13):i15–i23. doi:10.1093/bioinformatics/btr230
Ikeda M, Arai M, Okuno T, Shimizu T (2003) TMPDB: a database of experimentally-characterized transmembrane topologies. Nucleic Acids Res 31(1):406–409
Jones DT, Taylor WR, Thornton JM (1994) A mutation data matrix for transmembrane proteins. FEBS Lett 339(3):269–275 (0014-5793(94)80429-X [pii])
Joseph AP, Agarwal G, Mahajan S, Gelly JC, Swapna LS, Offmann B, Cadet F, Bornot A, Tyagi M, Valadie H, Schneider B, Etchebest C, Srinivasan N, De Brevern AG (2011) A short survey on protein blocks. Biophys Rev 2(3):137–147. doi:10.1007/s12551-010-0036-1
Kelm S, Shi J, Deane CM (2010) MEDELLER: homology-based coordinate generation for membrane proteins. Bioinformatics 26(22):2833–2840. doi:10.1093/bioinformatics/btq554
Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65. doi:10.1016/j.neunet.2012.09.018
Kullback S, Leibler RA (1951) On information andsufficiency. Ann Math Stat 22:79–86
Langelaan DN, Wieczorek M, Blouin C, Rainey JK (2010) Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. J Chem Inf Model 50(12):2213–2220. doi:10.1021/ci100324n
Laskowski RA, Watson JD, Thornton JM (2005a) ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res 33 (Web Server issue):W89–W93. doi:10.1093/nar/gki414
Laskowski RA, Watson JD, Thornton JM (2005b) Protein function prediction using local 3D templates. J Mol Biol 351(3):614–626. doi:10.1016/j.jmb.2005.05.067
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659. doi:10.1093/bioinformatics/btl158
Liu Y, Engelman DM, Gerstein M (2002) Genomic analysis of membrane protein families: abundance and conserved motifs. Genome Biol 3(10):research0054
Lo A, Chiu YY, Rodland EA, Lyu PC, Sung TY, Hsu WL (2009) Predicting helix-helix interactions from residue contacts in membrane proteins. Bioinformatics 25(8):996–1003. doi:10.1093/bioinformatics/btp114
Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI (2006) OPM: orientations of proteins in membranes database. Bioinformatics 22(5):623–625. doi:10.1093/bioinformatics/btk023
Marsico A, Henschel A, Winter C, Tuukkanen A, Vassilev B, Scheubert K, Schroeder M (2010a) Structural fragment clustering reveals novel structural and functional motifs in alpha-helical transmembrane proteins. BMC Bioinform 11:204. doi:10.1186/1471-2105-11-204
Marsico A, Scheubert K, Tuukkanen A, Henschel A, Winter C, Winnenburg R, Schroeder M (2010b) MeMotif: a database of linear motifs in alpha-helical transmembrane proteins. Nucleic Acids Res 38 (Database issue):D181–D189. doi: 10.1093/nar/gkp1042
Matthews BW (2007) Five retracted structure reports: inverted or incorrect? Protein Sci 16(6):1013–1016. doi:10.1110/ps.072888607
Meruelo AD, Samish I, Bowie JU (2011) TMKink: a method to predict transmembrane helix kinks. Protein Sci 20(7):1256–1264. doi:10.1002/pro.653
Nagarathnam B, Sankar K, Dharnidharka V, Balakrishnan V, Archunan G, Sowdhamini R (2011) TM-MOTIF: an alignment viewer to annotate predicted transmembrane helices and conserved motifs in aligned set of sequences. Bioinformation 7(5):214–221
Nam HJ, Jeon J, Kim S (2009) Bioinformatic approaches for the structure and function of membrane proteins. BMB Rep 42(11):697–704
Ng PC, Henikoff JG, Henikoff S (2000) PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane. Bioinformatics 16(9):760–766
Nogi T, Fathir I, Kobayashi M, Nozawa T, Miki K (2000) Crystal structures of photosynthetic reaction center and high-potential iron-sulfur protein from Thermochromatium tepidum: thermostability and electron transfer. Proc Natl Acad Sci USA 97(25):13561–13566. doi:10.1073/pnas.240224997
Nugent T, Jones DT (2009) Transmembrane protein topology prediction using support vector machines. BMC Bioinform 10:159. doi:10.1186/1471-2105-10-159
Nugent T, Jones DT (2010) Predicting transmembrane helix packing arrangements using residue contacts and a force-directed algorithm. PLoS Comput Biol 6(3):e1000714. doi:10.1371/journal.pcbi.1000714
Nugent T, Jones DT (2012) Membrane protein structural bioinformatics. J Struct Biol 179(3):327–337. doi:10.1016/j.jsb.2011.10.008
Nugent T, Ward S, Jones DT (2011) The MEMPACK alpha-helical transmembrane protein structure prediction server. Bioinformatics 27(10):1438–1439. doi:10.1093/bioinformatics/btr096
Papaloukas C, Granseth E, Viklund H, Elofsson A (2008) Estimating the length of transmembrane helices using Z-coordinate predictions. Protein Sci 17(2):271–278. doi:10.1110/ps.073036108
Persson B, Argos P (1994) Prediction of transmembrane segments in proteins utilising multiple sequence alignments. J Mol Biol 237(2):182–192. doi:10.1006/jmbi.1994.1220
Pieper U, Schlessinger A, Kloppmann E, Chang GA, Chou JJ, Dumont ME, Fox BG, Fromme P, Hendrickson WA, Malkowski MG, Rees DC, Stokes DL, Stowell MH, Wiener MC, Rost B, Stroud RM, Stevens RC, Sali A (2013) Coordinating the impact of structural genomics on the human alpha-helical transmembrane proteome. Nat Struct Mol Biol 20(2):135–138. doi:10.1038/nsmb.2508
Pirovano W, Feenstra KA, Heringa J (2008) PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 24(4):492–497. doi:10.1093/bioinformatics/btm636
Ray A, Lindahl E, Wallner B (2010) Model quality assessment for membrane proteins. Bioinformatics 26(24):3067–3074. doi:10.1093/bioinformatics/btq581
Reyes CL, Chang G (2005) Structure of the ABC transporter MsbA in complex with ADP.vanadate and lipopolysaccharide. Science 308(5724):1028–1031. doi:10.1126/science.1107733
Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 5(4):725–738. doi:10.1038/nprot.2010.5
Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234(3):779–815. doi:10.1006/jmbi.1993.1626
Sansom MS, Weinstein H (2000) Hinges, swivels and switches: the role of prolines in signalling via transmembrane alpha-helices. Trends Pharmacol Sci 21(11):445–451 (S0165614700015534 [pii])
Sauder JM, Arthur JW, Dunbrack RL Jr (2000) Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins 40(1):6–22. doi:10.1002/(SICI)1097-0134(20000701)40:1<6:AID-PROT30>3.0.CO;2-7
Shafrir Y, Guy HR (2004) STAM: simple transmembrane alignment method. Bioinformatics 20(5):758–769. doi:10.1093/bioinformatics/btg482
Siew N, Elofsson A, Rychlewski L, Fischer D (2000) MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16(9):776–785
Stamm M, Staritzbichler R, Khafizov K, Forrest LR (2013) Alignment of helical membrane protein sequences using AlignMe. PLoS One 8(3):e57731. doi:10.1371/journal.pone.0057731
Stamm M, Staritzbichler R, Khafizov K, Forrest LR (2014) AlignMe—a membrane protein sequence alignment web server. Nucleic Acids Res 42(Web Server issue):W246–W251. doi:10.1093/nar/gku291
Sutormin RA, Rakhmaninova AB, Gelfand MS (2003) BATMAS30: amino acid substitution matrix for alignment of bacterial transporters. Proteins 51(1):85–95. doi:10.1002/prot.10308
Suzuki R, Shimodaira H (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22(12):1540–1542. doi:10.1093/bioinformatics/btl117
Szalontai B (2009) Membrane protein dynamics: limited lipid control. PMC Biophys 2(1):1. doi:10.1186/1757-5036-2-1
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680
Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15(1):87–88 (btc017 [pii])
Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61(1):127–136. doi:10.1002/prot.20527
Tress ML, Jones D, Valencia A (2003) Predicting reliable regions in protein alignments from sequence profiles. J Mol Biol 330(4):705–718 (S0022283603006223 [pii])
Tusnady GE, Dosztanyi Z, Simon I (2004) Transmembrane proteins in the Protein Data Bank: identification and classification. Bioinformatics 20(17):2964–2972. doi:10.1093/bioinformatics/bth340
Tusnady GE, Dosztanyi Z, Simon I (2005) PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank. Nucleic Acids Res 33(Database issue):D275–D278. doi:10.1093/nar/gki002
Viklund H, Granseth E, Elofsson A (2006) Structural classification and prediction of reentrant regions in alpha-helical transmembrane proteins: application to complete genomes. J Mol Biol 361(3):591–603. doi:10.1016/j.jmb.2006.06.037
Visser WF, van Roermund CW, Ijlst L, Waterham HR, Wanders RJ (2007) Metabolite transport across the peroxisomal membrane. Biochem J 401(2):365–375. doi:10.1042/BJ20061352
von Heijne G (2011) Introduction to theme “membrane protein folding and insertion”. Annu Rev Biochem 80:157–160. doi:10.1146/annurev-biochem-111910-091345
Wallin E, von Heijne G (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci 7(4):1029–1038. doi:10.1002/pro.5560070420
Walters RF, DeGrado WF (2006) Helix-packing motifs in membrane proteins. Proc Natl Acad Sci USA 103(37):13658–13663. doi:10.1073/pnas.0605878103
Wang XF, Chen Z, Wang C, Yan RX, Zhang Z, Song J (2011) Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach. PLoS One 6(10):e26767. doi:10.1371/journal.pone.0026767
Ward A, Reyes CL, Yu J, Roth CB, Chang G (2007) Flexibility in the ABC transporter MsbA: Alternating access with a twist. Proc Natl Acad Sci USA 104(48):19005–19010. doi:10.1073/pnas.0709388104
Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15(3):275–284. doi:10.1016/j.sbi.2005.04.003
White SH (2009) Biophysical dissection of membrane proteins. Nature 459(7245):344–346. doi:10.1038/nature08142
Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M (2007) Drug-target network. Nat Biotechnol 25(10):1119–1126. doi:10.1038/nbt1338
Yohannan S, Faham S, Yang D, Whitelegge JP, Bowie JU (2004a) The evolution of transmembrane helix kinks and the structural diversity of G protein-coupled receptors. Proc Natl Acad Sci USA 101(4):959–963. doi:10.1073/pnas.0306077101
Yohannan S, Yang D, Faham S, Boulting G, Whitelegge J, Bowie JU (2004b) Proline substitutions are not easily accommodated in a membrane protein. J Mol Biol 341(1):1–6. doi:10.1016/j.jmb.2004.06.025
Zamyatnin AA (1984) Amino acid, peptide, and protein volume in solution. Annu Rev Biomed Eng 13:145–165
Zdobnov EM, Apweiler R (2001) InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9):847–848
Zhang Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinform 9:40. doi:10.1186/1471-2105-9-40
Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57(4):702–710. doi:10.1002/prot.20264
Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309. doi:10.1093/nar/gki524
Acknowledgments
We would like to thank our colleagues Jean-Christophe Gelly and Stéphane Téletchéa for their precious advice on this article. This work was supported by grants from the Ministry of Research (France), University Paris Diderot, Sorbonne, Paris Cité (France), National Institute for Blood Transfusion (INTS, France), National Institute for Health and Medical Research (INSERM, France) and labex GR-Ex to JE, CE and AdB, National Institute for Agricultural Research (INRA, France) to AU and National Center for Scientific Research (CNRS) to JE. JE also acknowledges an ATER (research and teaching) position from University Paris Diderot (France). The labex GR-Ex, reference ANR-11-LABX-0051 is funded by the program “Investissements d’avenir” of the French National Research Agency, reference ANR-11-IDEX-0005-02.
Conflict of interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: L. Taher.
J. Esque and A. Urbain the first two authors should be regarded as joint first authors.
C. Etchebest and A. G. de Brevern the last two authors should be regarded as joint last authors.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Esque, J., Urbain, A., Etchebest, C. et al. Sequence–structure relationship study in all-α transmembrane proteins using an unsupervised learning approach. Amino Acids 47, 2303–2322 (2015). https://doi.org/10.1007/s00726-015-2010-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-015-2010-5