Skip to main content
Log in

Aligning protein sequence and analysing substitution pattern using a class-specific matrix

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

Aligning protein sequences using a score matrix has became a routine but valuable method in modern biological research. However, alignment in the ‘twilight zone’ remains an open issue. It is feasible and necessary to construct a new score matrix as more protein structures are resolved. Three structural class-specific score matrices (all-alpha, allbeta and alpha/beta) were constructed based on the structure alignment of low identity proteins of the corresponding structural classes. The class-specific score matrices were significantly better than a structure-derived matrix (HSDM) and three other generalized matrices (BLOSUM30, BLOSUM60 and Gonnet250) in alignment performance tests. The optimized gap penalties presented here also promote alignment performance. The results indicate that different protein classes have distinct amino acid substitution patterns, and an amino acid score matrix should be constructed based on different structural classes. The class-specific score matrices could also be used in profile construction to improve homology detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

DSSP:

Database of Secondary Structure assignment Program

GEP:

gap extension penalty

GOP:

gap opening penalty

HMM:

hidden Markov model

PD:

pattern distance

SCOP:

structural classification of proteins

References

  • Baussand J, Deremble C and Carbone A 2007 Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins; Proteins 67 695–708

    Article  CAS  PubMed  Google Scholar 

  • Brenner S E, Chothia C, Hubbard T J P and Murzin A 1996 Understanding protein structure: using SCOP for fold interpretation; Methods Enzymol. 266 635–643

    Article  CAS  PubMed  Google Scholar 

  • Brenner S E, Koehl P and Levitt M 2000 The ASTRAL compendium for sequence and structure analysis; Nucleic Acids Res. 28 254–256

    Article  CAS  PubMed  Google Scholar 

  • Bystroff C, Thorsson V and Baker D 2000 HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins; J. Mol. Biol. 301 173–190

    Article  CAS  PubMed  Google Scholar 

  • Cai Y D, Liu X J, Xu X B and Zhou G P 2001 Support Vector Machines for predicting protein structural class; BMC Bioinformatics 2 3

    Article  CAS  PubMed  Google Scholar 

  • Chelliah V, Blundell T and Mizuguchi K 2005 Functional restraints on the patterns of amino acid substitutions: Application to sequence-structure homology recognition; Proteins 61 722–731

    Article  CAS  PubMed  Google Scholar 

  • Doolittle R F 1981 Similar amino acid sequences: chance or common ancestry?; Science 214 149–159

    Article  CAS  PubMed  Google Scholar 

  • Dunbrack J R L 2006 Sequence comparison and protein structure prediction; Curr. Opin. Struct. Biol. 16 374–384

    Article  CAS  PubMed  Google Scholar 

  • Fan Y P 2002 Family specific protein sequence scoring matrices and applications; Dissertation Abstracts International, DAI-B 62 5826

    Google Scholar 

  • Gelly J C, Chiche L and Gracy J 2005 EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments; BMC Bioinformatics 6 4

    Article  PubMed  Google Scholar 

  • Gonnet G H, Cohen M A and Benner S A 1992 Exhaustive matching of the entire protein sequence database; Science 256 1443–1445

    Article  CAS  PubMed  Google Scholar 

  • Gribskov M, McLachlan A D and Eisenberg D 1987 Profile analysis: detection of distantly related proteins; Proc. Natl. Acad. Sci. USA 84 4355–4358

    Article  CAS  PubMed  Google Scholar 

  • Henikoff S and Henikoff J G 1992 Amino acid substitution matrices from protein blocks; Proc. Natl. Acad. Sci. USA 89 10915–10919

    Article  CAS  PubMed  Google Scholar 

  • Huang Y M and Bystroff C 2006 Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions; Bioinformatics 22 413–422

    Article  CAS  PubMed  Google Scholar 

  • Hutchison E G and Thornton J M 1994 A revised set of potentials for {beta}-turn formation in proteins; Protein Sci. 3 2207–2216

    Article  Google Scholar 

  • Johnson M S and Overington J P 1993 A structural basis for sequence comparisons: an evaluation of scoring methodologies; J. Mol. Biol. 233 716–738

    Article  CAS  PubMed  Google Scholar 

  • Jones D T, Taylor W R and Thornton J M 1992 The rapid generation of mutation data matrices from protein sequences; Bioinformatics 8 275–282

    Article  CAS  Google Scholar 

  • Karlin S and Altschul S F 1990 Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes; Proc. Natl. Acad. Sci. USA 87 2264–2268

    Article  CAS  PubMed  Google Scholar 

  • Karsch W and Sander C 1983 Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features; Biopolymers 22 2577–2637

    Article  Google Scholar 

  • Konagurthu A S, Whisstock J C, Stuckey P J and Lesk A M 2006 MUSTANG: a multiple structural alignment algorithm; Proteins 64 559–574

    Article  CAS  PubMed  Google Scholar 

  • Krogh A, Brown M, Mian I S, Sjölander K and Haussler D 1994 Hidden markov models in computational biology: applications to protein modeling; J. Mol. Biol. 235 1301–1331

    Article  Google Scholar 

  • Kumar S and Bansal M 1998 Dissecting α-helices: position-specific analysis of α-helices in globular proteins; Proteins 31 460–476

    Article  CAS  PubMed  Google Scholar 

  • Lüthy R, McLachlan A D and Eisenberg D 1991 Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities; Proteins 10 229–239

    Article  PubMed  Google Scholar 

  • Murphy L R, Wallqvist A and Levy R M 2000 Simplified amino acid alphabets for protein fold recognition and implication for folding; Protein Eng. 13 149–152

    Article  CAS  PubMed  Google Scholar 

  • Ng P C, Henikoff J G and Henikoff S 2000 PHAT: a transmembrane-specific substitution matrix; Bioinformatics 16 760–766

    Article  CAS  PubMed  Google Scholar 

  • Overington J P, Donnelly D, Sali A, Johnson M S and Blundell T L 1992 Environmental-specific amino acid substitution tables: tertiary templates and prediction of protein folds; Protein Sci. 1 216–226

    Article  CAS  PubMed  Google Scholar 

  • Overington J P, Johnson M S, Sali A and Blundell T L 1990 Tertiary structural constraints on protein evolutionary diversity; Proc. R. Soc. London B. Biol. Sci. 241 132–145

    Article  CAS  Google Scholar 

  • Prlic A, Domingues F S and Sippl M J 2000 Structure-derived substitution matrices for alignment of distantly related sequences; Protein Eng. 13 545–550

    Article  CAS  PubMed  Google Scholar 

  • Rice D W and Eisenberg D 1997 A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence; J. Mol. Biol. 267 1026–1038

    Article  CAS  PubMed  Google Scholar 

  • Risler J L, Delorme M O, Delacroix H and Henaut A 1988 Amino acid substitutions in structurally related proteins. A pattern recognition approach: determination of a new and efficient scoring matrix; J. Mol. Biol. 204 1019–1029

    Article  CAS  PubMed  Google Scholar 

  • Schwartz R M and Dayhoff M O 1978 Atlas of protein sequence and structure; Nat. Biomed. Res. Found. 5 353–358

    Google Scholar 

  • Shi J, Blundell T L and Mizuguchi K 2001 FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties; J. Mol. Biol. 310 243–257

    Article  CAS  PubMed  Google Scholar 

  • Tang C L, Xie L, Koh I Y, Posy S, Alexov E and Honig B 2003 On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles; J. Mol. Biol. 334 1043–1062

    Article  CAS  PubMed  Google Scholar 

  • Tatusov R L, Altschul S F and Koonin E V 1994 Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks; Proc. Natl. Acad. Sci. USA 91 12091–12095

    Article  CAS  PubMed  Google Scholar 

  • Thompson J D, Higgins D G and Gibson T J 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice; Nucleic Acids Res. 22 4673–4680

    Article  CAS  PubMed  Google Scholar 

  • Thompson J D, Plewniak F and Poch O 1999 BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs; Bioinformatics 15 87–88

    Article  CAS  PubMed  Google Scholar 

  • Vilim R B, Cunningham R M, Lu B, Kheradpour P and Stevens F J 2004 Fold-specific substitution matrices for protein classification; Bioinformatics 20 847–853

    Article  CAS  PubMed  Google Scholar 

  • Vogt G, Etzold T and Argos P 1995 An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited; J. Mol. Biol. 249 816–831

    Article  CAS  PubMed  Google Scholar 

  • Zhou H Y and Zhou Y Q 2005 Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments; Proteins 58 321–328

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Qin Li.

Additional information

Authors’ contributions: XQL and XHL did some preliminary work on sequence data collection and classification. WKR participated in part of the programming work. HSX conducted most of the programming work, the statistical analysis and result analysis. HSX drafted the manuscript. The whole work was conceived by HSX, WKR, XHL and XQL and was supervised by XQL. All the authors read and approved of the final manuscript.

Supplementary tables and figures pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/June2010/pp295-314/suppl.pdf

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, H.S., Ren, W.K., Liu, X.H. et al. Aligning protein sequence and analysing substitution pattern using a class-specific matrix. J Biosci 35, 295–314 (2010). https://doi.org/10.1007/s12038-010-0033-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12038-010-0033-3

Keywords

Navigation