Advertisement

Gene/Protein Sequence Analysis

A Compilation of Bioinformatic Tools
  • Bernd H. A. Rehm
  • Frank Reinecke
Protocol
  • 3k Downloads
Part of the Springer Protocols Handbooks book series (SPH)

1. Introduction

The advent of automated high throughput DNA sequencing methods has strongly enabled genome sequencing strategies, culminating in determination of the entire human genome ( 1, 2). An enormous amount of DNA sequence data are available and databases still grow exponentially (see Fig. 22.1). Analysis of this overwhelming amount of data, including hundreds of genomes from both prokaryotes and eukaryotes, has given rise to the field of bioinformatics. Development of bioinformatic tools has evolved rapidly in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the best studied bacterium Escherichia colimore than 30% of the identified open reading frames (ORFs) represent hypothetical genes with no known function. Future challenges of genome-sequence analysis will include the understanding of diseases, gene regulation, and metabolic pathway reconstruction. In addition, a set of methods for protein analysis...

Keywords

Alignment Score Unrooted Tree Remote Homolog Content Sensor PROSITE Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Venter JC, et al (2001) The sequence of the human genome. Science 291:1304–1351PubMedCrossRefGoogle Scholar
  2. 2.
    Lander ES, et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921PubMedCrossRefGoogle Scholar
  3. 3.
    Rehm BH(2001) Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification. Appl Microbiol Biotechnol 57:579–592PubMedCrossRefGoogle Scholar
  4. 4.
    Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–185PubMedGoogle Scholar
  5. 5.
    Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194PubMedGoogle Scholar
  6. 6.
    Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868–877PubMedCrossRefGoogle Scholar
  7. 7.
    Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202PubMedGoogle Scholar
  8. 8.
    Staden R (1996) The Staden Sequence Analysis Package. Mol Biotech 5:233–241CrossRefGoogle Scholar
  9. 9.
    Staden R (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12:505–519PubMedCrossRefGoogle Scholar
  10. 10.
    Claverie JM (1997) Computational methods for the identification of genes in vertebrate genomic sequences. Hum Mol Genet 6:1735–1744PubMedCrossRefGoogle Scholar
  11. 11.
    Guigo R (1997) Computational gene identification: an open problem. Comput Chem 21:215–222PubMedCrossRefGoogle Scholar
  12. 12.
    Krogh A (1998) In: Salzberg SL, Searls D, Kasif S (eds) Computational methods in molecular biology. Elsevier, AmsterdamGoogle Scholar
  13. 13.
    Krogh A (1998) In: Bishop MJ (ed) Guide to human genome computing, 2nd edn. Academic, New York, pp. 261–274CrossRefGoogle Scholar
  14. 14.
    Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved micro-bial gene identification with GLIMMER. Nucleic Acids Res 27:4636–4641PubMedCrossRefGoogle Scholar
  15. 15.
    Guigo R, Agarwal P, Abril JF, Burset M, Fickett JW (2000) An assessment of gene prediction accuracy in large DNA sequences. Genome Res 10:1631–1642PubMedCrossRefGoogle Scholar
  16. 16.
    Krogh A (2000) Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res 10:523–528PubMedCrossRefGoogle Scholar
  17. 17.
    Shibuya T, Rigoutsos I (2002) Dictionary-driven prokaryotic gene finding. Nucleic Acids Res 30:2710–2725PubMedCrossRefGoogle Scholar
  18. 18.
    Pedersen JS, Hein J (2003) Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19:219–227PubMedCrossRefGoogle Scholar
  19. 19.
    Guo FB, Ou HY, Zhang CT (2003) ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res 31: 1780–1789PubMedCrossRefGoogle Scholar
  20. 20.
    Larsen TS, Krogh A (2003) EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformat 4:21CrossRefGoogle Scholar
  21. 21.
    Gelfand MS (1995) Prediction of function in DNA sequence analysis. J Comput Biol 2:87–115PubMedCrossRefGoogle Scholar
  22. 22.
    Sherriff A, Ott J (2001) Applications of neural networks for gene finding. Adv Genet 42:287–297PubMedCrossRefGoogle Scholar
  23. 23.
    Fickett JW (1996) Finding genes by computer: the state of the art. Trends Genet 12:316–320PubMedCrossRefGoogle Scholar
  24. 24.
    Zhang CT, Wang J, Zhang R (2002) Using a Euclid distance discriminant method to find protein coding genes in the yeast genome. Comput Chem 26:195–206PubMedCrossRefGoogle Scholar
  25. 25.
    Bajic VB, Seah SH (2003) Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res 13:1923–1929PubMedGoogle Scholar
  26. 26.
    Zhang MQ (1998) Statistical features of human exons and their flanking regions. Hum Mol Genet 7:919–932PubMedCrossRefGoogle Scholar
  27. 27.
    Searls DB (1992) The linguistics of DNA. Am Sci 80:579–591Google Scholar
  28. 28.
    Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  29. 29.
    Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22:4768–4778PubMedCrossRefGoogle Scholar
  30. 30.
    Cole ST, Brosch R, Parkhill J, et al (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393: 537–544PubMedCrossRefGoogle Scholar
  31. 31.
    Thomas A, Skolnick M (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J Math Appl Med Biol 11:149–160PubMedCrossRefGoogle Scholar
  32. 32.
    Henderson J, Salzberg S, Fasman K (1997) Finding genes in DNA with a hidden Markov model. J Comput Biol 4:127–141PubMedCrossRefGoogle Scholar
  33. 33.
    Lukashin AV, Borodovsky M (1998) GeneMark hmm: new solutions for gene finding. Nucleic Acids Res 26:1107–1115PubMedCrossRefGoogle Scholar
  34. 34.
    Salzberg SL, Pertea M, Delcher AL, Gardner MJ, Tettelin H (1999) Interpolated Markov models for eukaryotic gene finding. Genomics 59:24–31PubMedCrossRefGoogle Scholar
  35. 35.
    Badger JH, Olsen GJ (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Eyol 16:512–524Google Scholar
  36. 36.
    Bocs S, Cruveiller S, Vallenet D, Nuel G, Medigue C (2003) AMIGene: annotation of microbial genes. Nucleic Acids Res 31:3723–6PubMedCrossRefGoogle Scholar
  37. 37.
    Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618PubMedCrossRefGoogle Scholar
  38. 38.
    Yeramian E, Jones L (2003) GeneFizz: a web tool to compare genetic (coding/ non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives. Nucleic Acids Res 31:3843–3849PubMedCrossRefGoogle Scholar
  39. 39.
    Kotlar D, Lavner Y (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13:1930–1937PubMedGoogle Scholar
  40. 40.
    Snyder E, Stormo G (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18PubMedCrossRefGoogle Scholar
  41. 41.
    Reese MG, Eeckman FH, Kulp D, Haussler D (1997) Improved splice site detection in Genie. J Comput Biol 4:311–323PubMedCrossRefGoogle Scholar
  42. 42.
    Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94PubMedCrossRefGoogle Scholar
  43. 43.
    Xu Y, Uberbacher EC (1997) Automated gene identification in large-scale genomic sequences. J Comput Biol 4:325–338PubMedCrossRefGoogle Scholar
  44. 44.
    Gelfand MS, Mironov AA, Pevzner PA (1996) Gene recognition via spliced sequence alignment. Proc Natl Acad Sci USA 93:9061–9066PubMedCrossRefGoogle Scholar
  45. 45.
    Foissac S, Bardou P, Moisan A, Cros MJ, Schiex T (2003) EUGENE'HOM: a generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res 31:3742–3745PubMedCrossRefGoogle Scholar
  46. 46.
    Smith TE, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197PubMedCrossRefGoogle Scholar
  47. 47.
    Yada T, Takagi T, Totoki Y, Sakaki Y, Takaeda Y (2003) DIGIT: a novel gene finding program by combining gene-finders. Pac Symp Biocomput 8:375–387Google Scholar
  48. 48.
    Quandt K, Frech K, Karas H, Wingender E, Werner T (1995) MatInd and MatInspector – new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res 23:4878–4884PubMedCrossRefGoogle Scholar
  49. 49.
    Prestridge DS (1991) SIGNAL SCAN: a computer program that scans DNA sequences for eukaryotic transcriptional elements. CABIOS 7:203–206PubMedGoogle Scholar
  50. 50.
    Wingender E, Chen X, Hehl R, et al (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 28:316–319PubMedCrossRefGoogle Scholar
  51. 51.
    Prestridge DS (1995) Predicting Pol II Promoter Sequences Using Transcription Factor Binding Sites. J Mol Biol 249:923–932PubMedCrossRefGoogle Scholar
  52. 52.
    Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6:361–365PubMedCrossRefGoogle Scholar
  53. 53.
    Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763PubMedCrossRefGoogle Scholar
  54. 54.
    Baldi R, Brunak S (1998) Bioinformatics: the machine learning approach. MIT Press, Boston, MAGoogle Scholar
  55. 55.
    Korenberg MJ, David R, Hunter IW, Solomon JE (2000) Automatic classification of protein sequences into structure/function groups via parallel cascade identification: a feasibility study. Ann Biomed Eng 28:803–811PubMedCrossRefGoogle Scholar
  56. 56.
    Thompson JD, Higgins, DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedCrossRefGoogle Scholar
  57. 57.
    Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876–4882PubMedCrossRefGoogle Scholar
  58. 58.
    Nicholas KB, Nicholas HB, Jr, Deerfield DW, II (1997) GeneDoc: analysis and visualization of genetic variation. EMBNEW NEWS 4:14Google Scholar
  59. 59.
    Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci USA 91: 1451–1459Google Scholar
  60. 60.
    Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering evolutionary trees under a more realistic model of sequence. Mol Biol Evol 11:605–612PubMedGoogle Scholar
  61. 61.
    Brocchieri L (2001) Phylogenetic inferences from molecular sequences: review and critique. Theor Popul Biol 59:27–40PubMedCrossRefGoogle Scholar
  62. 62.
    Stewart CB (1993) The powers and pitfalls of parsimony. Nature 361:603–607PubMedCrossRefGoogle Scholar
  63. 63.
    Attwood TK, Beck ME, Flower DR, Scordis P, Selley JN (1998) The PRINTS protein fingerprint database in its fifth year. Nucleic Acids Res 26:304–308PubMedCrossRefGoogle Scholar
  64. 64.
    Page RD (1996) Tree View: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12:357–358PubMedGoogle Scholar
  65. 65.
    Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31:3784–3788PubMedCrossRefGoogle Scholar
  66. 66.
    Rost B (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol 266:525–539PubMedCrossRefGoogle Scholar
  67. 67.
    Eyrich VA, Rost B (2003) META-PP: single interface to crucial prediction servers. Nucleic Acids Res 31:3308–3310PubMedCrossRefGoogle Scholar
  68. 68.
    Nielsen H, Engelbrecht J, Brunak S, von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10:1–6PubMedCrossRefGoogle Scholar
  69. 69.
    Hansen JE, Lund O, Tolstrup N, Gooley AA, Williams KL, Brunak S (1998) NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconjugate J 15:115–130CrossRefGoogle Scholar
  70. 70.
    Hansen JE, Lund O, Rapacki K, Brunak S (1997) O-glycbase version 2.0 – a revised database of O-glycosylated proteins. Nucleic Acids Res 25:278–282PubMedCrossRefGoogle Scholar
  71. 71.
    Hansen JE, Lund O, Rapacki K, et al (1995) Prediction of O-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:-polypeptide N-acetyl-galactosaminyltransferase. Biochem J 308:801–813PubMedGoogle Scholar
  72. 72.
    Blom N, Gammeltoft S, Brunak S (1999) Sequence- and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294:1351–1362PubMedCrossRefGoogle Scholar
  73. 73.
    Blom N, Hansen J, Blaas D, Brunak S (1996) Cleavage site analysis in picorna-viral polyproteins: discovering cellular targets by neural networks. Protein Sci 5:2203–2216PubMedCrossRefGoogle Scholar
  74. 74.
    Emanuelsson O, Nielsen H, von Heijne G (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci 8:978–984PubMedCrossRefGoogle Scholar
  75. 75.
    Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34:508–519PubMedCrossRefGoogle Scholar
  76. 76.
    Sonnhammer ELL, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences, In proceedings of the sixth intern conference on intelligent systems for molecular biology, (ISMB98), pp175–182Google Scholar
  77. 77.
    von Heijne G (1992) Membrane protein structure prediction, hydrophobicity analysis and the positive-inside rule. J Mol Biol 225:487–494CrossRefGoogle Scholar
  78. 78.
    Karplus K, Barrett C, Hughey R (1998) Hidden markov models for detecting remote protein homologies. Bioinformatics 14:846–856PubMedCrossRefGoogle Scholar
  79. 79.
    Cserzo M, Wallin E, Simon I, von Heijne G, Elofsson A (1997) Prediction of transmembrane alpha-helices in procariotic membrane proteins: the dense alignment surface method. Protein Eng 10:673–676PubMedCrossRefGoogle Scholar
  80. 80.
    Fischer D, Eisenberg DA (1996) Fold recognition using sequence-derived properties. Protein Sci 5:947–955PubMedCrossRefGoogle Scholar
  81. 81.
    Elofsson A, Fischer D, Rice DW, LeGrand S, Eisenberg DA (1996) Study of combined structure-sequence profiles. Folding Design 1:451–461PubMedCrossRefGoogle Scholar
  82. 82.
    Karplus K, Karchin R, Draper J, et al (2003) Combining local-structure, fold-recognition, and new-fold methods for protein structure prediction. Proteins 53(Suppl 6):491–496PubMedCrossRefGoogle Scholar
  83. 83.
    Peitsch MC (1995) Protein modelling by E-mail. BioTechnology 13:658–660CrossRefGoogle Scholar
  84. 84.
    Peitsch MC (1996) ProMod and Swiss-Model: internet-based tools for automated comparative protein modelling. Biochem Soc Trans 24:274–279PubMedGoogle Scholar
  85. 85.
    Guex N, Peitsch MC (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling. Electrophoresis 18:2714–2723PubMedCrossRefGoogle Scholar
  86. 86.
    Lund O, Frimand K, Gorodkin J, et al (1997) Protein distance constraints predicted by neural networks and probability density functions. Protein Eng 10:1241–1248PubMedCrossRefGoogle Scholar
  87. 87.
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410PubMedGoogle Scholar
  88. 88.
    Altschul SF (1991) Amino acid substitution matrices from an information theoretic perspective. J Mol Biol 219:555–565PubMedCrossRefGoogle Scholar
  89. 89.
    Altschul SF, Gish W (1996) Local alignment statistics. Methods Enzymol. 266:460–480PubMedCrossRefGoogle Scholar
  90. 90.
    Rost B, Schneider R, Sander C (1997) Protein fold recognition by prediction-based threading. J Mol Biol 270:471–480PubMedCrossRefGoogle Scholar
  91. 91.
    Dayhoff MO, Barker WC, Hunt LT (1983) Establishing homologies in protein sequences. Methods Enzymol 91:524–545PubMedCrossRefGoogle Scholar
  92. 92.
    Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10,915–10,919CrossRefGoogle Scholar
  93. 93.
    Pearson WR (1995) Comparison of methods for searching protein sequence databases. Protein Sci 4:1145–1160PubMedCrossRefGoogle Scholar
  94. 94.
    Karlin S, Altschul SE (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87:2264–2268PubMedCrossRefGoogle Scholar
  95. 95.
    Wootton JC (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem 18:269–285PubMedCrossRefGoogle Scholar
  96. 96.
    Altschul SF, Madden TL, Schäffer AA, et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  97. 97.
    Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448PubMedCrossRefGoogle Scholar
  98. 98.
    Martin AC, Orengo CA, Hutchinson EG, et al (1998) Protein folds and functions. Structure 6:875–884PubMedCrossRefGoogle Scholar
  99. 99.
    McGuffin LJ, Bryson K, Jones DT (2001) What are the baselines for protein fold recognition? Bioinformatics 17:63–72PubMedCrossRefGoogle Scholar
  100. 100.
    Bairoch A (1991) PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res 19:2241–2245PubMedGoogle Scholar
  101. 101.
    Bairoch A, Bucher P, Hofmann K (1997) The PROSITE database, its status in 1997. Nucleic Acids Res 25:217–221PubMedCrossRefGoogle Scholar
  102. 102.
    Bucher P, Karplus K, Moeri, N, Hofmann K (1996) A flexible motif search technique based on generalized profiles. Comput Chem 20:3–23PubMedCrossRefGoogle Scholar
  103. 103.
    Sonnhammer EL, Kahn D (1994) Modular arrangement of proteins as inferred from analysis of homology. Protein Sci 3:482–492PubMedCrossRefGoogle Scholar
  104. 104.
    Corpet F, Gouzy J, Kahn D (1998) The ProDom database of protein domain families. Nucleic Acids Res 26:323–326PubMedCrossRefGoogle Scholar
  105. 105.
    Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405–420PubMedCrossRefGoogle Scholar
  106. 106.
    Bateman A, Birney E, Cerruti L, et al (2002) The Pfam protein families database. Nucleic Acids Res 30:276–280PubMedCrossRefGoogle Scholar
  107. 107.
    Apweiler R, Attwood TK, Bairoch A, et al (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res 29:37–40PubMedCrossRefGoogle Scholar
  108. 108.
    Mulder NJ, Apweiler R, Attwood TK, et al (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 31:315–8PubMedCrossRefGoogle Scholar
  109. 109.
    Rawlings ND, O'Brien E, Barrett AJ (2002) MEROPS: the protease database. Nucleic Acids Res 30:343–346PubMedCrossRefGoogle Scholar
  110. 110.
    Storm CE, Sonnhammer EL (2001) NIFAS: visual analysis of domain evolution in proteins. Bioinformatics 17:343–348PubMedCrossRefGoogle Scholar
  111. 111.
    Schultz J, Milpetz F, Bork P, Ponting, CP (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci USA 95:5857–5864PubMedCrossRefGoogle Scholar
  112. 112.
    Schultz J, Copley RR, Doerks T, Ponting CP, Bork P (2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res 28:231–234PubMedCrossRefGoogle Scholar
  113. 113.
    Letunic I, Goodstadt L, Dickens NJ, et al (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res 30:242–244PubMedCrossRefGoogle Scholar
  114. 114.
    Pietrokovski S, Henikoff JG, Henikoff S (1996) The Blocks database–a system for protein classification. Nucleic Acids Res 24:197–200PubMedCrossRefGoogle Scholar
  115. 115.
    Attwood TK, Flower DR, Lewis AP, et al (1999) PRINTS prepares for the new millennium. Nucleic Acids Res 27:220–225PubMedCrossRefGoogle Scholar
  116. 116.
    Silverstein KA, Shoop E, Johnson JE, Retzel EF (2001) MetaFam: a unified classification of protein families. I. Overview and statistics. Bioinformatics 17:249– 261PubMedCrossRefGoogle Scholar
  117. 117.
    Yuan YP, Eulenstein O, Vingron M, Bork P (1998) Towards detection of ortho-logues in sequence databases. Bioinformatics 14:285–289PubMedCrossRefGoogle Scholar
  118. 118.
    Bernstein FC, Koetzle TF, Williams GJ, et al (1977) The Protein Data Bank. A computer-based archival file for macromolecular structures. Eur J Biochem 80:319–324PubMedCrossRefGoogle Scholar
  119. 119.
    Berman HM, Westbrook J, Feng Z, et al (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242PubMedCrossRefGoogle Scholar
  120. 120.
    Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540PubMedGoogle Scholar
  121. 121.
    Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thomton JM (1997) CATH–a Hierarchic classification of protein domain structures. Structure 5: 1093–1108PubMedCrossRefGoogle Scholar
  122. 122.
    Pearl FMG, Lee D, Bray JE, Sillitoe I, Todd AE, Harrison AP, Thomton JM, Orengo CA (2000) Assigning genomic sequences to CATH. Nucleic Acids Res 28:277–282PubMedCrossRefGoogle Scholar
  123. 123.
    Peitsch MC, Jongeneel V (1993) A 3- dimensional model for the CD40 ligand predicts that it is a compact trimer similar to the tumor necrosis factors. Int Immunol 5:233–238PubMedCrossRefGoogle Scholar
  124. 124.
    Schwede T, Kopp J, Guex N, Peitsch MC (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res 31:3381–3385PubMedCrossRefGoogle Scholar
  125. 125.
    Guex N, Peitsch MC (1997) SWISS-MODEL and the Swiss-Pdb Viewer: an environment for comparative protein modeling. Electrophoresis 18:2714–2723PubMedCrossRefGoogle Scholar
  126. 126.
    Combet C, Jambon M, Deleage G, Geourjon C (2002) Geno3D: automatic comparative molecular modelling of protein. Bioinformatics 18:213–214PubMedCrossRefGoogle Scholar
  127. 127.
    Lambert C, Leonard N, De Bolle X, Depiereux E (2002) ESyPred3D: prediction of proteins 3D structures. Bioinformatics 18:1250–1256PubMedCrossRefGoogle Scholar
  128. 128.
    Bader GD, Betel D, Hogue CW (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 31:248–250PubMedCrossRefGoogle Scholar
  129. 129.
    Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D (2000) DIP: The Database of Interacting Proteins. Nucleic Acids Res 28:289–291PubMedCrossRefGoogle Scholar
  130. 130.
    Levinthal C, Wodak SJ, Kahn P, Dadivanian AK (1975) Hemoglobin interaction in sickle cell fibers. I. Theoretical approaches to the molecular contacts. Proc Natl Acad Sci USA 72:1330–1334PubMedCrossRefGoogle Scholar
  131. 131.
    Wodak SJ, Janin J (1978) Computer analysis of protein-protein interaction. J Mol Biol 124:323–342PubMedCrossRefGoogle Scholar
  132. 132.
    Janin J, Henrick K, Moult J, et al (2003) CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 52:2–9PubMedCrossRefGoogle Scholar
  133. 133.
    Taylor RD, Jewsbury PJ, Essex JW (2002) A review of protein-small molecule docking methods. J Comput Aided Mol Des 16:151–166PubMedCrossRefGoogle Scholar
  134. 134.
    Read TD, Peterson SN, Tourasse N, et al (2003) The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423:81–86PubMedCrossRefGoogle Scholar
  135. 135.
    Ivanova N, Sorokin A, Anderson I, et al (2003) Genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis. Nature 423:87–91PubMedCrossRefGoogle Scholar
  136. 136.
    Smith DR (1996) Microbial pathogen genomes – new strategies for identifying therapeutics and vaccine targets. Trends Biotechnol 14:290–293PubMedCrossRefGoogle Scholar
  137. 137.
    Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637PubMedCrossRefGoogle Scholar
  138. 138.
    Tatusov, RL, Natale DA, Garkavtsev IV, et al (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29:22–28PubMedCrossRefGoogle Scholar
  139. 139.
    Wheeler DL, Church DM, Federhen S, et al (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res 31:28–33PubMedCrossRefGoogle Scholar
  140. 140.
    Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210PubMedCrossRefGoogle Scholar
  141. 141.
    Rehm BHA, Reinecke F (2004) Evaluation of proteomic techniques: applications and potential. Curr Proteomics 1:103–111CrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Bernd H. A. Rehm
    • 1
  • Frank Reinecke
    • 2
  1. 1.Institute of Molecular BioSciencesMassey UniversityPalmerston NorthNew Zealand
  2. 2.Institut für Medizinische Physik und Biophysik, Elektronenmikroskopie und AnalytikUniversitätsklinikum Münster Westfälische Wilhelms-UniversitätMünsterGermany

Personalised recommendations