Abstract
The diversity and complexity of Bioinformatics tools currently available for protein sequence analysis can make it difficult to know where to begin when presented with a new sequence. In this chapter we assume that the reader has a protein sequence (full-length or partial) identified from mass spectrometry or translation of a putative gene and wishes to identify aspects of its structure and function via Bioinformatics. We go through a protocol outlining one approach that should give the most complete picture possible given the limits of available tools, and then provide a worked example to illustrate the procedures involved. The nature of this paper is such that we are unable to give complete details of all the methods discussed. We refer the reader to references and websites described in the text for more information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Barton G. J. (1993) An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps. Comput. Appl. Biosci. 9, 729–734.
Perutz M. F. (1999) Glutamine repeats and neurodegenerative diseases: molecular aspects. Trends Biochem. Sci. 24, 58–63.
Wootton J. C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285.
Lupas, A., Van Dyke, M., and Stock, J. (1991) Predicting coiled coils from protein sequences. Science 252, 1162–1164.
Berger, B., Wilson, D. B., Wolf, E., Tonchev, T., Milla, M., and Kim, P. S. (1995) Predicting coiled coils by use of pairwise residue correlations. Proc. Natl. Acad. Sci. USA 92, 8259–8263.
Wolf, E., Kim, P. S., and Berger, B. (1997) MultiCoil: a program for predicting two-and three-stranded coiled coils. Protein Sci. 6, 1179–1189.
Hoffmann, K. and Stoffel, W. (1993) TMbase-A database of membrane spanning proteins segments Biol. Chem. Hoppe-Seyler 347, 166
Sonnhammer, E. L. L., von Heijne, G., and Krogh A., (1998) A hidden Markov model for predicting transmembrane helices in protein sequences, in Proceedings of Sixth International Conference on Intelligent Systems for Molecular Biology (Glasgow, J., Littlejohn, T., Major, F., Lathrop, R., Sankoff, D., and Sensen, C., eds.) Menlo Park, CA, AAAI Press, pp. 175–182.
Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. L. L. (2001) Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J. Mol. Biol. 305, 560–580
Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6.
Ladunga, I. (1999) PHYSEAN: PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids. Bioinformatics 15, 1028–1038.
Nakai, K. (2000) Protein sorting signals and prediction of subcellular localization. Adv. Prot. Chem. 54, 277–344.
Eisenhaber, B., Bork, P., and Eisenhaber, F. (1999) Prediction of potential GPImodification sites in proprotein sequences. J. Mol. Biol. 292, 741–758.
Emanuelsson, O., Nielsen, H., and von Heijne, G. (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 8, 978–984.
Emanuelsson, O., Nielsen, H., Brunak, S., and von Heijne, G. (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005–1016.
Hofmann, K., Bucher, P., Falquet, L., and Bairoch, A. (1999) The PROSITE database, its status in 1999. Nucleic Acids Res. 27, 215–221.
Bateman, A., Birney, E., Durbin, R., Eddy, S. R., Finn, R. D., and Sonnhammer E. L. L. (1999) Pfam 3.1: 1313 multiple alignments match the majority of proteins. Nucleic Acids Res. 27, 260–262.
Schultz, J., Copley, R. R., Doerks, T., Ponting C. P., and Bork, P. (2000) Nucleic Acids ReSMART: a web-based tool for the study of genetically mobile domainss. 28, 231–234.
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–680.
Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.
Eddy, S. R. (1998) Profile hidden Markov models. Bioinformatics 14, 755–763.
Karplus, K., Barrett, C., and Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856.
Brown, N. P., Leroy, C., and Sander, C. (1998) MView: a web-compatible database search or multiple alignment viewer. Bioinformatics 14, 380–381.
Galtier, N., Gouy, M., and Gautier, C. (1996) SeaView and Phylo_win, two graphic tools for sequence alignment and molecular phylogeny. Comput. Appl. Biosci. 12, 543–548.
Gibson, T. J. and Spring, J. (1998) Genetic redundancy in vertebrates: polyploidy and persistence of genes encoding multidomain proteins. Trends Genet. 14, 46–49.
Wilson, C. A., Kreychman, J., and Gerstein, M. (2000) Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J. Mol. Biol. 297, 233–249.
Peitsch, M. C. (1996) ProMod and Swiss-Model: Internet-based tools for automated comparative protein modelling. Biochem. Soc. Trans. 24, 274–279.
Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599.
Jones, D. T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202.
Cuff, J. A. and Barton, G. J. (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40, 502–511.
Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature 358, 86–89.
Rost, B. (1995) TOPITS: threading one-dimensional predictions into three-dimensional structures. Ismb 3, 314–321.
Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499–520.
White, J. H., Wise, A., Main, M. J., Green, A., Fraser, N. J., Disney, G. H., et al. (1998) Heterodimerization is required for the formation of a functional GABA(B) receptor. Nature 396, 679–682.
Galvez, T., Parmentier, M. L., Joly, C., Malitschek, B., Kaupmann, K., Kuhn, R., et al. (1999) Mutagenesis and modeling of the GABAB receptor extracellular domain support a venus flytrap mechanism for ligand binding. J. Biol. Chem. 274, 13362–13369.
Dekel, I., Russek, N., Jones, T., Mortin, M. A., and Katzav, S. (2000) Identification of the Drosophila melanogaster homologue of the mammalian signal transducer protein, Vav. FEBS Lett 472, 99–104
Kraulis P. J. (1991) Molscript: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallog. 24, 964–950.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Humana Press Inc.
About this protocol
Cite this protocol
Copley, R.R., Russell, R.B. (2003). Getting the Most from Your Protein Sequence. In: Smith, B.J. (eds) Protein Sequencing Protocols. Methods in Molecular Biology™, vol 211. Humana Press. https://doi.org/10.1385/1-59259-342-9:411
Download citation
DOI: https://doi.org/10.1385/1-59259-342-9:411
Publisher Name: Humana Press
Print ISBN: 978-0-89603-975-9
Online ISBN: 978-1-59259-342-2
eBook Packages: Springer Protocols