Bioinformatics in protein analysis

Persson, Bengt

doi:10.1007/978-3-0348-8458-7_14

Bengt Persson³

Part of the book series: EXS ((EXS,volume 88))

723 Accesses
5 Citations

Summary

The chapter gives an overview of bioinformatic techniques of importance in protein analysis. These include database searches, sequence comparisons and structural predictions. Links to useful World Wide Web (WWW) pages are given in relation to each topic.

Databases with biological information are reviewed with emphasis on databases for nucleotide sequences (EMBL, GenBank, DDBJ), genomes, amino acid sequences (Swissprot, PIR, TrEMBL, GenePept), and three-dimensional structures (PDB). Integrated user interfaces for databases (SRS and Entrez) are described. An introduction to databases of sequence patterns and protein families is also given (Prosite, Pfam, Blocks).

Furthermore, the chapter describes the widespread methods for sequence comparisons, FASTA and BLAST, and the corresponding WWW services. The techniques involving multiple sequence alignments are also reviewed: alignment creation with the Clustal programs, phylogenetic tree calculation with the Clustal or Phylip packages and tree display using Drawtree, njplot or phylo_win.

Finally, the chapter also treats the issue of structural prediction. Different methods for secondary structure predictions are described (Chou-Fasman, Garnier-Osguthorpe-Robson, Predator, PHD). Techniques for predicting membrane proteins, antigenic sites and postranslational modifications are also reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Stoesser G, Moseley MA, Sleep J, McGowran M, Garciapastor M, Sterk P (1998) The EMBL Nucleotide Sequence Database. Nucleic Acids Res 26: 8–15
Article PubMed CAS Google Scholar
Benson DA, Boguski MS, Lipman DJ, Ostell J, Ouellette BFF (1998) GenBank. Nucleic Acids Res 26: 1–7
Article PubMed CAS Google Scholar
Aaronson JS, Eckman B, Blevins RA, Borkowski JA, Myerson J, Imran S, Elliston KO (1996) Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. Genome Res 6: 829–845
Article PubMed CAS Google Scholar
Hillier LD, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N, DuBuque T, Favello A, Gish W et al (1996) Generation and analysis of 280,000 human expressed sequence tags. Genome Res 6: 807–828
Article PubMed CAS Google Scholar
Hudson TJ, Stein LD, Gerety SS, Ma J, Castle AB, Silva J, Slonim DK, Baptista R, Kruglyak L, Xu SH et al (1995) An STS-based map of the human genome. Science 270: 1945–1954
Article PubMed CAS Google Scholar
Fleischmann RD, Adams MD, White 0, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496–512
Article PubMed CAS Google Scholar
Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M et al (1996) Life with 6000 genes. Science 274: 563–567
Article Google Scholar
Bairoch A, Apweiler R (1998) The SWISS-PROT Protein Sequence Data Bank and Its Supplement TrEMBL in 1998. Nucleic Acids Res 26: 38–42
Article PubMed CAS Google Scholar
Hoogland C, Sanchez JC, Tonella L, Bairoch A, Hochstrasser DF, Appel RD (1998) Current Status of the Swiss-2D-PAGE Database. Nucleic Acids Res 26: 332–333
Article PubMed CAS Google Scholar
Barker WC, Garavelli JS, Haft DH, Hunt LT, Marzec CR, Orcutt BC, Srinivasarao GY, Yeh LSL, Ledley RS, Mewes HW et al (1998) The PIR-International Protein Sequence Database. Nucleic Acids Res 26: 27–32
Article PubMed CAS Google Scholar
Dayhoff MO (eds) (1965) Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Spring, Maryland
Google Scholar
a Kallberg Y, Persson B (1999) KIND — a non-redundant protein database. Bioinformatics 15: 260–261
Article PubMed CAS Google Scholar
Abola EE, Bernstein FC, Bryant SH, Koetzle TF, Weng J (1987) Crystallographic Databases. Data Commission of the International Union of Crystallography, Bonn/Cambridge/ Chester, 107–132
Google Scholar
Etzold T, Argos P (1993) SRS: an indexing and retrieval tool for flat file data libraries. Computer Applications in the Biosciences 9: 49–57
PubMed CAS Google Scholar
Bairoch A, Bucher P, Hofmann K (1997) The PROSITE database, its status in 1997. Nucleic Acids Res 25: 217–221
Article PubMed CAS Google Scholar
Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28: 405–420
Article PubMed CAS Google Scholar
Henikoff S, HenikoffJG (1994) Protein family classification based on searching a database of blocks. Genomics 19: 97–107
Article PubMed CAS Google Scholar
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85: 2444–2448
Article PubMed CAS Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443–453
Article PubMed CAS Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147: 195–197
Article PubMed CAS Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402
Article PubMed CAS Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4672–4680
Article Google Scholar
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25: 4876–4882
Article PubMed CAS Google Scholar
Felsenstein J (1996) Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 266: 418–427
Article PubMed CAS Google Scholar
Perriere G, Gouy M (1996) WWW-query: an on-line retrieval system for biological sequence banks. Biochemie 78: 364–369
Article CAS Google Scholar
Galtier N, Gouy M, Gautier C (1996) SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. ComputAppl Biosci 12: 543–548
CAS Google Scholar
Chou PY, Fasman GD (1978) Prediction of the secondary structure of proteins from their amino acid sequences. Adv Enzym 47: 45–148
CAS Google Scholar
Gamier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120: 97–120
Article Google Scholar
Persson B, Krook M, Jot^-mall H (1991) Characteristics of short-chain alcohol dehydrogenases and related enzymes. Eur J Biochem 200: 537–543
Article PubMed CAS Google Scholar
Levin JM, Pascarella S, Argos P, Gamier J (1993) Quantification of secondary structure prediction improvement using multiple alignments. Protein Engineering 6: 849–854
Article PubMed CAS Google Scholar
Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27: 329–335
Article PubMed CAS Google Scholar
Rost B, Sander C, Schneider R (1994) PHD: an automatic mail server for protein secondary structure prediction. ComputAppl Biosci 10: 53–60
CAS Google Scholar
Rost B, Sander C (1995) Progress of 1D protein structure prediction at last. Proteins 23: 295–300
Article PubMed CAS Google Scholar
Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Nail Acad Sci USA 78: 3824–3828
Article CAS Google Scholar
Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157: 105–132
Article PubMed CAS Google Scholar
Degli Esposti M, Crimi M, Venturoli G (1990) A critical evaluation of the hydropathy profile of membrane proteins. Eur J Biochem 190: 207–219
Article PubMed CAS Google Scholar
von Heijne G (1986) The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology. EMBO J 5: 3021–3027
PubMed CAS Google Scholar
von Heijne G (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol 255: 487–494
Article Google Scholar
Persson B, Argos P (1994) Prediction of transmembrane segments in proteins utilising multiple sequence alignments. J Mol Biol 237: 182–192
Article PubMed CAS Google Scholar
Persson B, Argos P (1996) Topology prediction of membrane proteins. Protein Sci 5: 363–371
PubMed CAS Google Scholar
Rost B, Casadio R, Fariselli P, Sander C (1995) Transmembrane helices predicted at 95% accuracy. Protein Sci 4: 521–533
Article PubMed CAS Google Scholar
Jameson BA, Wolf H (1988) The antigenic index: a novel algorithm for predicting antigenic determinants. Comput Appl Biosci 4: 181–186
PubMed CAS Google Scholar
Persson B, Flinta C, von Heijne G, Jornvall H (1985) Structures of N-terminally acetylated proteins. Eur J Biochem 152: 523–527
Article PubMed CAS Google Scholar
Eisenhaber F, Persson B, Argos P (1995) Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Critical Reviews in Biochemistry and Molecular Biology 30: 1–94
Article CAS Google Scholar
Han KK, Martinage A (1992) Possible relationship between coding recognition amino acid sequence motif or residue(s) and post-translational chemical modification of proteins. Int J Biochem 24: 1349–1363
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Stockholm Bioinformatic Centre and Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, SE-17177, Sweden
Bengt Persson

Authors

Bengt Persson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratoire de Chimie des Substances Naturelles URA C.N.R.S. No. 401, Muséum National d’Histoire Naturelle, 63, rue Buffon, F-75005, Paris, France
P. Jollès
Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77, Stockholm, Sweden
H. Jörnvall

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Persson, B. (2000). Bioinformatics in protein analysis. In: Jollès, P., Jörnvall, H. (eds) Proteomics in Functional Genomics. EXS, vol 88. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-8458-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-0348-8458-7_14
Publisher Name: Birkhäuser, Basel
Print ISBN: 978-3-0348-9576-7
Online ISBN: 978-3-0348-8458-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics