Summary
The chapter gives an overview of bioinformatic techniques of importance in protein analysis. These include database searches, sequence comparisons and structural predictions. Links to useful World Wide Web (WWW) pages are given in relation to each topic.
Databases with biological information are reviewed with emphasis on databases for nucleotide sequences (EMBL, GenBank, DDBJ), genomes, amino acid sequences (Swissprot, PIR, TrEMBL, GenePept), and three-dimensional structures (PDB). Integrated user interfaces for databases (SRS and Entrez) are described. An introduction to databases of sequence patterns and protein families is also given (Prosite, Pfam, Blocks).
Furthermore, the chapter describes the widespread methods for sequence comparisons, FASTA and BLAST, and the corresponding WWW services. The techniques involving multiple sequence alignments are also reviewed: alignment creation with the Clustal programs, phylogenetic tree calculation with the Clustal or Phylip packages and tree display using Drawtree, njplot or phylo_win.
Finally, the chapter also treats the issue of structural prediction. Different methods for secondary structure predictions are described (Chou-Fasman, Garnier-Osguthorpe-Robson, Predator, PHD). Techniques for predicting membrane proteins, antigenic sites and postranslational modifications are also reviewed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Stoesser G, Moseley MA, Sleep J, McGowran M, Garciapastor M, Sterk P (1998) The EMBL Nucleotide Sequence Database. Nucleic Acids Res 26: 8–15
Benson DA, Boguski MS, Lipman DJ, Ostell J, Ouellette BFF (1998) GenBank. Nucleic Acids Res 26: 1–7
Aaronson JS, Eckman B, Blevins RA, Borkowski JA, Myerson J, Imran S, Elliston KO (1996) Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. Genome Res 6: 829–845
Hillier LD, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N, DuBuque T, Favello A, Gish W et al (1996) Generation and analysis of 280,000 human expressed sequence tags. Genome Res 6: 807–828
Hudson TJ, Stein LD, Gerety SS, Ma J, Castle AB, Silva J, Slonim DK, Baptista R, Kruglyak L, Xu SH et al (1995) An STS-based map of the human genome. Science 270: 1945–1954
Fleischmann RD, Adams MD, White 0, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496–512
Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M et al (1996) Life with 6000 genes. Science 274: 563–567
Bairoch A, Apweiler R (1998) The SWISS-PROT Protein Sequence Data Bank and Its Supplement TrEMBL in 1998. Nucleic Acids Res 26: 38–42
Hoogland C, Sanchez JC, Tonella L, Bairoch A, Hochstrasser DF, Appel RD (1998) Current Status of the Swiss-2D-PAGE Database. Nucleic Acids Res 26: 332–333
Barker WC, Garavelli JS, Haft DH, Hunt LT, Marzec CR, Orcutt BC, Srinivasarao GY, Yeh LSL, Ledley RS, Mewes HW et al (1998) The PIR-International Protein Sequence Database. Nucleic Acids Res 26: 27–32
Dayhoff MO (eds) (1965) Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Spring, Maryland
a Kallberg Y, Persson B (1999) KIND — a non-redundant protein database. Bioinformatics 15: 260–261
Abola EE, Bernstein FC, Bryant SH, Koetzle TF, Weng J (1987) Crystallographic Databases. Data Commission of the International Union of Crystallography, Bonn/Cambridge/ Chester, 107–132
Etzold T, Argos P (1993) SRS: an indexing and retrieval tool for flat file data libraries. Computer Applications in the Biosciences 9: 49–57
Bairoch A, Bucher P, Hofmann K (1997) The PROSITE database, its status in 1997. Nucleic Acids Res 25: 217–221
Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28: 405–420
Henikoff S, HenikoffJG (1994) Protein family classification based on searching a database of blocks. Genomics 19: 97–107
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85: 2444–2448
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443–453
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147: 195–197
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4672–4680
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25: 4876–4882
Felsenstein J (1996) Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 266: 418–427
Perriere G, Gouy M (1996) WWW-query: an on-line retrieval system for biological sequence banks. Biochemie 78: 364–369
Galtier N, Gouy M, Gautier C (1996) SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. ComputAppl Biosci 12: 543–548
Chou PY, Fasman GD (1978) Prediction of the secondary structure of proteins from their amino acid sequences. Adv Enzym 47: 45–148
Gamier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120: 97–120
Persson B, Krook M, Jot-mall H (1991) Characteristics of short-chain alcohol dehydrogenases and related enzymes. Eur J Biochem 200: 537–543
Levin JM, Pascarella S, Argos P, Gamier J (1993) Quantification of secondary structure prediction improvement using multiple alignments. Protein Engineering 6: 849–854
Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27: 329–335
Rost B, Sander C, Schneider R (1994) PHD: an automatic mail server for protein secondary structure prediction. ComputAppl Biosci 10: 53–60
Rost B, Sander C (1995) Progress of 1D protein structure prediction at last. Proteins 23: 295–300
Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Nail Acad Sci USA 78: 3824–3828
Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157: 105–132
Degli Esposti M, Crimi M, Venturoli G (1990) A critical evaluation of the hydropathy profile of membrane proteins. Eur J Biochem 190: 207–219
von Heijne G (1986) The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology. EMBO J 5: 3021–3027
von Heijne G (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol 255: 487–494
Persson B, Argos P (1994) Prediction of transmembrane segments in proteins utilising multiple sequence alignments. J Mol Biol 237: 182–192
Persson B, Argos P (1996) Topology prediction of membrane proteins. Protein Sci 5: 363–371
Rost B, Casadio R, Fariselli P, Sander C (1995) Transmembrane helices predicted at 95% accuracy. Protein Sci 4: 521–533
Jameson BA, Wolf H (1988) The antigenic index: a novel algorithm for predicting antigenic determinants. Comput Appl Biosci 4: 181–186
Persson B, Flinta C, von Heijne G, Jornvall H (1985) Structures of N-terminally acetylated proteins. Eur J Biochem 152: 523–527
Eisenhaber F, Persson B, Argos P (1995) Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Critical Reviews in Biochemistry and Molecular Biology 30: 1–94
Han KK, Martinage A (1992) Possible relationship between coding recognition amino acid sequence motif or residue(s) and post-translational chemical modification of proteins. Int J Biochem 24: 1349–1363
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Basel AG
About this chapter
Cite this chapter
Persson, B. (2000). Bioinformatics in protein analysis. In: Jollès, P., Jörnvall, H. (eds) Proteomics in Functional Genomics. EXS, vol 88. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-8458-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-0348-8458-7_14
Publisher Name: Birkhäuser, Basel
Print ISBN: 978-3-0348-9576-7
Online ISBN: 978-3-0348-8458-7
eBook Packages: Springer Book Archive