Abstract
Aligning protein sequences using a score matrix has became a routine but valuable method in modern biological research. However, alignment in the ‘twilight zone’ remains an open issue. It is feasible and necessary to construct a new score matrix as more protein structures are resolved. Three structural class-specific score matrices (all-alpha, allbeta and alpha/beta) were constructed based on the structure alignment of low identity proteins of the corresponding structural classes. The class-specific score matrices were significantly better than a structure-derived matrix (HSDM) and three other generalized matrices (BLOSUM30, BLOSUM60 and Gonnet250) in alignment performance tests. The optimized gap penalties presented here also promote alignment performance. The results indicate that different protein classes have distinct amino acid substitution patterns, and an amino acid score matrix should be constructed based on different structural classes. The class-specific score matrices could also be used in profile construction to improve homology detection.
Similar content being viewed by others
Abbreviations
- DSSP:
-
Database of Secondary Structure assignment Program
- GEP:
-
gap extension penalty
- GOP:
-
gap opening penalty
- HMM:
-
hidden Markov model
- PD:
-
pattern distance
- SCOP:
-
structural classification of proteins
References
Baussand J, Deremble C and Carbone A 2007 Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins; Proteins 67 695–708
Brenner S E, Chothia C, Hubbard T J P and Murzin A 1996 Understanding protein structure: using SCOP for fold interpretation; Methods Enzymol. 266 635–643
Brenner S E, Koehl P and Levitt M 2000 The ASTRAL compendium for sequence and structure analysis; Nucleic Acids Res. 28 254–256
Bystroff C, Thorsson V and Baker D 2000 HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins; J. Mol. Biol. 301 173–190
Cai Y D, Liu X J, Xu X B and Zhou G P 2001 Support Vector Machines for predicting protein structural class; BMC Bioinformatics 2 3
Chelliah V, Blundell T and Mizuguchi K 2005 Functional restraints on the patterns of amino acid substitutions: Application to sequence-structure homology recognition; Proteins 61 722–731
Doolittle R F 1981 Similar amino acid sequences: chance or common ancestry?; Science 214 149–159
Dunbrack J R L 2006 Sequence comparison and protein structure prediction; Curr. Opin. Struct. Biol. 16 374–384
Fan Y P 2002 Family specific protein sequence scoring matrices and applications; Dissertation Abstracts International, DAI-B 62 5826
Gelly J C, Chiche L and Gracy J 2005 EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments; BMC Bioinformatics 6 4
Gonnet G H, Cohen M A and Benner S A 1992 Exhaustive matching of the entire protein sequence database; Science 256 1443–1445
Gribskov M, McLachlan A D and Eisenberg D 1987 Profile analysis: detection of distantly related proteins; Proc. Natl. Acad. Sci. USA 84 4355–4358
Henikoff S and Henikoff J G 1992 Amino acid substitution matrices from protein blocks; Proc. Natl. Acad. Sci. USA 89 10915–10919
Huang Y M and Bystroff C 2006 Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions; Bioinformatics 22 413–422
Hutchison E G and Thornton J M 1994 A revised set of potentials for {beta}-turn formation in proteins; Protein Sci. 3 2207–2216
Johnson M S and Overington J P 1993 A structural basis for sequence comparisons: an evaluation of scoring methodologies; J. Mol. Biol. 233 716–738
Jones D T, Taylor W R and Thornton J M 1992 The rapid generation of mutation data matrices from protein sequences; Bioinformatics 8 275–282
Karlin S and Altschul S F 1990 Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes; Proc. Natl. Acad. Sci. USA 87 2264–2268
Karsch W and Sander C 1983 Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features; Biopolymers 22 2577–2637
Konagurthu A S, Whisstock J C, Stuckey P J and Lesk A M 2006 MUSTANG: a multiple structural alignment algorithm; Proteins 64 559–574
Krogh A, Brown M, Mian I S, Sjölander K and Haussler D 1994 Hidden markov models in computational biology: applications to protein modeling; J. Mol. Biol. 235 1301–1331
Kumar S and Bansal M 1998 Dissecting α-helices: position-specific analysis of α-helices in globular proteins; Proteins 31 460–476
Lüthy R, McLachlan A D and Eisenberg D 1991 Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities; Proteins 10 229–239
Murphy L R, Wallqvist A and Levy R M 2000 Simplified amino acid alphabets for protein fold recognition and implication for folding; Protein Eng. 13 149–152
Ng P C, Henikoff J G and Henikoff S 2000 PHAT: a transmembrane-specific substitution matrix; Bioinformatics 16 760–766
Overington J P, Donnelly D, Sali A, Johnson M S and Blundell T L 1992 Environmental-specific amino acid substitution tables: tertiary templates and prediction of protein folds; Protein Sci. 1 216–226
Overington J P, Johnson M S, Sali A and Blundell T L 1990 Tertiary structural constraints on protein evolutionary diversity; Proc. R. Soc. London B. Biol. Sci. 241 132–145
Prlic A, Domingues F S and Sippl M J 2000 Structure-derived substitution matrices for alignment of distantly related sequences; Protein Eng. 13 545–550
Rice D W and Eisenberg D 1997 A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence; J. Mol. Biol. 267 1026–1038
Risler J L, Delorme M O, Delacroix H and Henaut A 1988 Amino acid substitutions in structurally related proteins. A pattern recognition approach: determination of a new and efficient scoring matrix; J. Mol. Biol. 204 1019–1029
Schwartz R M and Dayhoff M O 1978 Atlas of protein sequence and structure; Nat. Biomed. Res. Found. 5 353–358
Shi J, Blundell T L and Mizuguchi K 2001 FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties; J. Mol. Biol. 310 243–257
Tang C L, Xie L, Koh I Y, Posy S, Alexov E and Honig B 2003 On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles; J. Mol. Biol. 334 1043–1062
Tatusov R L, Altschul S F and Koonin E V 1994 Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks; Proc. Natl. Acad. Sci. USA 91 12091–12095
Thompson J D, Higgins D G and Gibson T J 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice; Nucleic Acids Res. 22 4673–4680
Thompson J D, Plewniak F and Poch O 1999 BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs; Bioinformatics 15 87–88
Vilim R B, Cunningham R M, Lu B, Kheradpour P and Stevens F J 2004 Fold-specific substitution matrices for protein classification; Bioinformatics 20 847–853
Vogt G, Etzold T and Argos P 1995 An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited; J. Mol. Biol. 249 816–831
Zhou H Y and Zhou Y Q 2005 Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments; Proteins 58 321–328
Author information
Authors and Affiliations
Corresponding author
Additional information
Authors’ contributions: XQL and XHL did some preliminary work on sequence data collection and classification. WKR participated in part of the programming work. HSX conducted most of the programming work, the statistical analysis and result analysis. HSX drafted the manuscript. The whole work was conceived by HSX, WKR, XHL and XQL and was supervised by XQL. All the authors read and approved of the final manuscript.
Supplementary tables and figures pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/June2010/pp295-314/suppl.pdf
Rights and permissions
About this article
Cite this article
Xu, H.S., Ren, W.K., Liu, X.H. et al. Aligning protein sequence and analysing substitution pattern using a class-specific matrix. J Biosci 35, 295–314 (2010). https://doi.org/10.1007/s12038-010-0033-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-010-0033-3