Multiple alignment of transmembrane protein sequences

  • Walter Pirovano
  • Sanne Abeln
  • K. Anton Feenstra
  • Jaap Heringa


Multiple sequence alignment remains one of the most powerful tools for assessing evolutionary sequence relationships and for identifying structurally and functionally important protein regions. Membrane-bound proteins represent a special class of proteins. The regions that insert into the cell membrane have a profoundly different hydrophobicity pattern as compared with soluble proteins. Multiple alignment techniques employing scoring schemes tailored for sequences of soluble proteins are therefore in principle not optimal to align membrane-bound proteins. In this chapter we describe some of the characteristics leading transmembrane proteins to display differences at the sequence level. We will also cover computational strategies and methods developed over the years for aligning this special class of proteins, discuss some current bottlenecks, and suggest some avenues for improvement.


TM, transmembrane MSA, multiple sequence alignment SP, sum of pairs (score) TC, total column (score) 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abagyan R and Batalov S (1997) Do aligned sequences share the same fold? J Mol Biol 273: 355–368CrossRefGoogle Scholar
  2. Abeln S and Frenkel D (2008) Disordered flanks prevent peptide aggregation. PLoS Comput Biol 4: e1000241CrossRefGoogle Scholar
  3. Altschul SF, Carroll RJ, Lipman DJ (1989) Weights for data related by a tree. J Mol Biol 207: 647–653CrossRefGoogle Scholar
  4. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402CrossRefGoogle Scholar
  5. Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, Schäffer AA, Yu YK (2005) Protein database searches using compositionally adjusted substitution matrices. FEBS J 272: 5101–5109CrossRefGoogle Scholar
  6. Bahr A, Tompson JD, Tierry JC, Poch O (2001) BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res 29: 323–326CrossRefGoogle Scholar
  7. Bernsel A, Viklund H, Elofsson A (2008) Remote homology detection of integral membrane proteins using conserved sequence features. Proteins 71: 1387–1399CrossRefGoogle Scholar
  8. Bordner AJ (2009) Predicting protein-protein binding sites in membrane proteins. BMC Bioinform 10: 312CrossRefGoogle Scholar
  9. Bucher P, Karplus K, Moeri N, Hofmann K (1996) A flexible motif search technique based on generalized profiles. Comput Chem 20: 3–23CrossRefGoogle Scholar
  10. Carrillo H and Lipman D (1988) The multiple sequence alignment problem in biology. SIAM J Appl Math 48: 1073–1082CrossRefMathSciNetMATHGoogle Scholar
  11. Cserzö M, Bernassau JM, Simon I, Maigret B (1994) New alignment strategy for transmembrane proteins. J Mol Biol 243: 388–396CrossRefGoogle Scholar
  12. Cserzö M, Wallin E, Simon I, Von Heijne G, Elofsson A (1997) Prediction of transmembrane alphahelices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng 10: 673–676CrossRefGoogle Scholar
  13. Dafforn TR and Smith CJ (2004) Natively unfolded domains in endocytosis: hooks, lines and linkers. EMBO Rep 5: 1046–1052CrossRefGoogle Scholar
  14. Dayhoff MO, Schwart RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff M (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, DC, pp 345–352Google Scholar
  15. Donnelly D, Overington JP, Ruffle SV, Nugent JH, Blundell TL (1993) Modeling alpha-helical transmembrane domains: the calculation and use of substitution tables for lipid-facing residues. Protein Sci 2: 55–70Google Scholar
  16. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14: 755–763CrossRefGoogle Scholar
  17. Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform 5: 113CrossRefGoogle Scholar
  18. Feng DF and Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25: 351–360CrossRefGoogle Scholar
  19. Forrest LR, Tang CL, Honig B (2006) On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. Biophys J 91: 508–517CrossRefGoogle Scholar
  20. Fyfe PK, McAuley KE, Roszak AW, Isaacs NW, Cogdell RJ, Jones MR (2001) Probing the interface between membrane proteins and membrane lipids by X-ray crystallography. Trends Biochem Sci 26: 106–112CrossRefGoogle Scholar
  21. Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84: 4355–4358CrossRefGoogle Scholar
  22. Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256: 1443–1445CrossRefGoogle Scholar
  23. von Heijne G (1991) Proline kinks in transmembrane alpha-helices. J Mol Biol 218: 499–503CrossRefGoogle Scholar
  24. Henikoff S and Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89: 10915–10919CrossRefGoogle Scholar
  25. Heringa J (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comput Chem 23: 341–364CrossRefGoogle Scholar
  26. Heringa J (2002) Local weighting schemes for protein multiple sequence alignment. Comput Chem 26: 459–477CrossRefGoogle Scholar
  27. Hirosawa M, Totoki Y, Hoshida M, Ishikawa M (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11: 13–18Google Scholar
  28. Hogeweg P and Hesper B (1984) The alignment of sets of sequences and the construction of phylogenetic trees. An integrated method. J Mol Evol 20: 175–186Google Scholar
  29. Jimenez-Morales D, Adamian L, Liang J (2008) Detecting remote homologues using scoring matrices calculated from the estimation of amino acid substitution rates of beta-barrel membrane proteins. Conf Proc IEEE Eng Med Biol Soc 2008: 1347–1350Google Scholar
  30. Jennings MJ (1989) Topography of membrane proteins. Annu Rev Biochem 58: 999–1027CrossRefGoogle Scholar
  31. Jones DT (1998) Do transmembrane protein superfolds exist? FEBS Lett 423: 281–285CrossRefGoogle Scholar
  32. Jones DT (2007) Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics 23: 538–544CrossRefGoogle Scholar
  33. Jones DT, Taylor WR, Tornton JM (1994) A mutation matrix for transmembrane proteins. FEBS 339: 269–275CrossRefGoogle Scholar
  34. Käll L, Krogh A, Sonnhammer EL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338: 1027–1036CrossRefGoogle Scholar
  35. Käll L, Krogh A, Sonnhammer EL (2005) An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics 21(Suppl 1): i251–i257CrossRefGoogle Scholar
  36. Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14: 846–856CrossRefGoogle Scholar
  37. Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305: 567–580CrossRefGoogle Scholar
  38. Langosch D and Heringa J (1998) Interaction of transmembrane helices by a knobs-into-holes packing characteristic of soluble coiled coils. Proteins 31: 150–159CrossRefGoogle Scholar
  39. Müller T, Rahmann S, Rehmsmeier M (2001) Non-symmetric score matrices and the detection of homologous transmembrane proteins. Bioinformatics 17(Suppl 1): S182–S189Google Scholar
  40. Möller S, Croning MDR, Apweiler R (2001) Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17: 646–653CrossRefGoogle Scholar
  41. Ng PC, Henikoff JG, Henikoff S (2000) PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane. Bioinformatics 16: 760–766Google Scholar
  42. Nugent T and Jones DT (2009) Transmembrane protein topology prediction using support vector machines. BMC Bioinform 10: 159CrossRefGoogle Scholar
  43. Pascarella S and Argos P (1992) A data bank merging related protein structures and sequences. Protein Eng 5: 121–137CrossRefGoogle Scholar
  44. Pirovano WA (2010) Comparing building blocks of life — sequence alignment and evaluation of predicted structural and functional features. PhD thesis, VU University Amsterdam, ISBN 978-90-8659-419-1Google Scholar
  45. Pirovano W and Heringa J (2008) Multiple sequence alignment. Meth Mol Biol 452: 143–161CrossRefGoogle Scholar
  46. Pirovano W, Feenstra K, Heringa J (2008a) PR LINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 24: 492–497CrossRefGoogle Scholar
  47. Pirovano W, Feenstra K, Heringa J (2008b) The meaning of alignment: lessons from structural diversity. BMC Bioinform 23: 556CrossRefGoogle Scholar
  48. Rost B (1999) Twilight zone of protein sequence alignment. Protein Eng 12: 85–94CrossRefGoogle Scholar
  49. Sander C and Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9: 56–68CrossRefGoogle Scholar
  50. Schäffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29: 2994–3005CrossRefGoogle Scholar
  51. Shafrir Y and Guy HR (2004) STAM: simple transmembrane alignment method. Bioinformatics 20: 758–769CrossRefGoogle Scholar
  52. Simossis VA, Kleinjung J, Heringa J (2005) Homology-extended sequence alignment. Nucleic Acids Res 33: 816–824CrossRefGoogle Scholar
  53. Sonnhammer EL, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6: 175–182Google Scholar
  54. Stoye J (1998) Multiple sequence alignment with the divide-and-conquer method. Gene 211: GC45–GC56CrossRefGoogle Scholar
  55. Tompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673–4680CrossRefGoogle Scholar
  56. Tusnády GE, Dosztányi Zs, Simon I (2005) PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank. Nucleic Acids Res 33: D275–D278CrossRefGoogle Scholar
  57. Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249: 816–831CrossRefGoogle Scholar
  58. Wallin E and von Heijne G (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci 7: 1029–1038CrossRefGoogle Scholar
  59. White SH (2004) The progress of membrane protein structure determination. Protein Sci 13: 1948–1949CrossRefGoogle Scholar
  60. White SH and Wimley WC (1998) Hydrophobic interactions of peptides with membrane interfaces. Biochem Biophys Acta 1376: 339–352Google Scholar

Copyright information

© Springer-Verlag/Wien 2010

Authors and Affiliations

  • Walter Pirovano
    • 1
  • Sanne Abeln
    • 1
  • K. Anton Feenstra
    • 1
  • Jaap Heringa
    • 1
  1. 1.Centre for Integrative BioinformaticsVU University AmsterdamAmsterdamThe Netherlands

Personalised recommendations