Abstract
Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.
Similar content being viewed by others
References
Balaji S and Srinivasan N 2001 Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins; Protein Eng. 14 219–226
Balaji S, Sujatha S, Kumar S S C and Srinivasan N 2001 PALI: A database of Phylogeny and ALIgnment of homologous protein structures; Nucleic Acids Res. 29 61–65
Bateman A, Birney E, Durbin R, Eddy S R, Howe K L and Sonnhammer E L L 2000 The Pfam protein families database; Nucleic Acids Res. 28 263–266
Bray J E, Todd A E, Pearl F M, Thornton J M and Orengo C A 2000 The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues; Protein Eng. 13 153–165
Bujnicki J M 2000 Phylogeny of the restriction endonuclease-like superfamily inferred from comparison of protein structures; J. Mol. Evol. 50 39–44
Chothia C and Lesk A M 1986 The relation between the divergence of sequence and structure in protein; EMBO J. 5 823–826
Doolittle R F 1981 Similar amino acid sequences: chance or common ancestry?; Science 214 149–159
Efron B 1979a Bootstrap methods: Another look at the jackknife; Ann. Stat. 7 1–26
Efron B 1979b Computers and Theory of Statistics:Thinking the Unthinkable; SIAM Rev. 21 460–480
Evans S V 1993 SETOR: hardware-lighted three-dimensional solid model representations of macromolecules; J. Mol. Graph. 11 127–128, 134–138
Felsenstein J 1995 PHYLIP (Phylogeny Inference Package) version 3.57c (Department of Genetics, University of Washington, Seattle, USA)
Flores T P, Orengo C A, Moss D S and Thornton J M 1993 Comparison of conformational characteristics in structurally similar protein pairs; Protein Sci. 2 1811–1826
Goh C S, Bogan A A, Joachimiak M, Walter D and Cohen F E 2000 Co-evolution of proteins with their interaction partners; J. Mol. Biol. 299 283–293
Gowri V S, Pandit S B, Karthik P S, Srinivasan N and Balaji S 2003 Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database; Nucleic Acids Res. 31 486–488
Grishin N V 1997 Estimation of evolutionary distances from protein spatial structures; J. Mol. Evol. 45 359–369
Holm L and Sander C 1993 Protein structure comparison by alignment of distance matrices; J. Mol. Biol. 233 123–138
Holm L and Sander C 1997 An evolutionary treasure: unification of a broad set of amidohydrolases related to urease; Proteins: Struct. Funct. Genet. 28 72–82
Hubbard T J and Blundell T L 1987 Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modeling; Protein Eng. 1 59–71
Johnson M S, Overington J P and Blundell T L 1993 Alignment and searching for common protein folds using a data bank of structural templates; J. Mol. Biol. 231 735–752
Johnson M S, Sali A and Blundell T L 1992a Phylogenetic relationships from three-dimensional protein structures; Methods Enzymol. 183 670–690
Johnson M S, Sutcliffe M J and Blundell T L 1992b Molecular anatomy: phyletic relationships derived from three-dimensional structures of proteins; J. Mol. Evol. 1 43–59
Lesk A M and Chothia C 1980 How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins; J Mol. Biol. 136 225–270
Murzin A G 1993a Can homologous proteins evolve different enzymatic activities?; Trends Biochem. Sci. 18 403–405
Murzin A G 1993b Sweet-tasting protein monellin is related to the cystatin family of thiol proteinase inhibitors; J. Mol. Biol. 230 689–694
Murzin A G 1998 How far divergent evolution goes in proteins?; Curr. Opin. Struct. Biol. 8 380–387
Murzin A G, Brenner S E, Hubbard T and Chothia C 1995 SCOP: a structural classification of proteins database for the investigation of sequences and structures; J. Mol. Biol. 247 536–540
Pazos F and Valencia A 2001 Similarity of Phylogenetic trees as indicator of protein-protein interaction; Prot. Eng. 14 609–614
Russell R B and Barton G B 1992 Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels; Proteins: Struct. Funct. Genet. 14 309–323
Russell R B and Barton G J 1994 Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility; J. Mol. Biol. 244 332–350
Russell R B and Sternberg M J 1996 A novel binding site in catalase is suggested by structural similarity to the calycin superfamily; Protein Eng. 9 107–111
Russell R B and Sternberg M J 1997 Two new examples of protein structural similarities within the structure-function twilight zone; Protein Eng. 10 333–338
Russell R B, Saqi M A, Sayle R A and Sternberg M J 1997 Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation; J. Mol. Biol. 269 423–439
Sowdhamini R, Burke D F, Huang J F, Mizuguchi K, Nagarajaram H A, Srinivasan N, Steward R E and Blundell T L 1998 CAMPASS: a database of structurally aligned protein superfamilies; Structure 6 1087–1094
Sowdhamini R, Rufino S D and Blundell T L 1996 A database of globular protein structural domains: clustering of representative family members into similar folds; Fold. Des. 1 209–220
Sujatha S, Balaji S and Srinivasan N 2001 PALI: a database of alignments and phylogeny of homologous protein structures; Bioinformatics 17 375–376
Todd A E, Orengo C A and Thornton J M 2001 Evolution of function in protein superfamilies, from a structural perspective; J. Mol. Biol. 307 1113–1143
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Balaji, S., Srinivasan, N. Comparison of sequence-based and structure-based phylogenetic trees of homologous proteins: Inferences on protein evolution. J Biosci 32, 83–96 (2007). https://doi.org/10.1007/s12038-007-0008-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-007-0008-1