Aligning protein sequence and analysing substitution pattern using a class-specific matrix

Xu, Hai Song; Ren, Wen Ke; Liu, Xiao Hui; Li, Xiao Qin

doi:10.1007/s12038-010-0033-3

Aligning protein sequence and analysing substitution pattern using a class-specific matrix

Published: 04 May 2010

Volume 35, pages 295–314, (2010)
Cite this article

Journal of Biosciences Aims and scope Submit manuscript

Hai Song Xu¹,
Wen Ke Ren¹,
Xiao Hui Liu¹ &
…
Xiao Qin Li¹

93 Accesses
3 Citations
Explore all metrics

Abstract

Aligning protein sequences using a score matrix has became a routine but valuable method in modern biological research. However, alignment in the ‘twilight zone’ remains an open issue. It is feasible and necessary to construct a new score matrix as more protein structures are resolved. Three structural class-specific score matrices (all-alpha, allbeta and alpha/beta) were constructed based on the structure alignment of low identity proteins of the corresponding structural classes. The class-specific score matrices were significantly better than a structure-derived matrix (HSDM) and three other generalized matrices (BLOSUM30, BLOSUM60 and Gonnet250) in alignment performance tests. The optimized gap penalties presented here also promote alignment performance. The results indicate that different protein classes have distinct amino acid substitution patterns, and an amino acid score matrix should be constructed based on different structural classes. The class-specific score matrices could also be used in profile construction to improve homology detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Article Open access 07 May 2015

Fold-specific sequence scoring improves protein sequence matching

Article Open access 30 August 2016

Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix

Article Open access 01 December 2016

Abbreviations

DSSP:: Database of Secondary Structure assignment Program
GEP:: gap extension penalty
GOP:: gap opening penalty
HMM:: hidden Markov model
PD:: pattern distance
SCOP:: structural classification of proteins

References

Baussand J, Deremble C and Carbone A 2007 Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins; Proteins 67 695–708
Article CAS PubMed Google Scholar
Brenner S E, Chothia C, Hubbard T J P and Murzin A 1996 Understanding protein structure: using SCOP for fold interpretation; Methods Enzymol. 266 635–643
Article CAS PubMed Google Scholar
Brenner S E, Koehl P and Levitt M 2000 The ASTRAL compendium for sequence and structure analysis; Nucleic Acids Res. 28 254–256
Article CAS PubMed Google Scholar
Bystroff C, Thorsson V and Baker D 2000 HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins; J. Mol. Biol. 301 173–190
Article CAS PubMed Google Scholar
Cai Y D, Liu X J, Xu X B and Zhou G P 2001 Support Vector Machines for predicting protein structural class; BMC Bioinformatics 2 3
Article CAS PubMed Google Scholar
Chelliah V, Blundell T and Mizuguchi K 2005 Functional restraints on the patterns of amino acid substitutions: Application to sequence-structure homology recognition; Proteins 61 722–731
Article CAS PubMed Google Scholar
Doolittle R F 1981 Similar amino acid sequences: chance or common ancestry?; Science 214 149–159
Article CAS PubMed Google Scholar
Dunbrack J R L 2006 Sequence comparison and protein structure prediction; Curr. Opin. Struct. Biol. 16 374–384
Article CAS PubMed Google Scholar
Fan Y P 2002 Family specific protein sequence scoring matrices and applications; Dissertation Abstracts International, DAI-B 62 5826
Google Scholar
Gelly J C, Chiche L and Gracy J 2005 EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments; BMC Bioinformatics 6 4
Article PubMed Google Scholar
Gonnet G H, Cohen M A and Benner S A 1992 Exhaustive matching of the entire protein sequence database; Science 256 1443–1445
Article CAS PubMed Google Scholar
Gribskov M, McLachlan A D and Eisenberg D 1987 Profile analysis: detection of distantly related proteins; Proc. Natl. Acad. Sci. USA 84 4355–4358
Article CAS PubMed Google Scholar
Henikoff S and Henikoff J G 1992 Amino acid substitution matrices from protein blocks; Proc. Natl. Acad. Sci. USA 89 10915–10919
Article CAS PubMed Google Scholar
Huang Y M and Bystroff C 2006 Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions; Bioinformatics 22 413–422
Article CAS PubMed Google Scholar
Hutchison E G and Thornton J M 1994 A revised set of potentials for {beta}-turn formation in proteins; Protein Sci. 3 2207–2216
Article Google Scholar
Johnson M S and Overington J P 1993 A structural basis for sequence comparisons: an evaluation of scoring methodologies; J. Mol. Biol. 233 716–738
Article CAS PubMed Google Scholar
Jones D T, Taylor W R and Thornton J M 1992 The rapid generation of mutation data matrices from protein sequences; Bioinformatics 8 275–282
Article CAS Google Scholar
Karlin S and Altschul S F 1990 Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes; Proc. Natl. Acad. Sci. USA 87 2264–2268
Article CAS PubMed Google Scholar
Karsch W and Sander C 1983 Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features; Biopolymers 22 2577–2637
Article Google Scholar
Konagurthu A S, Whisstock J C, Stuckey P J and Lesk A M 2006 MUSTANG: a multiple structural alignment algorithm; Proteins 64 559–574
Article CAS PubMed Google Scholar
Krogh A, Brown M, Mian I S, Sjölander K and Haussler D 1994 Hidden markov models in computational biology: applications to protein modeling; J. Mol. Biol. 235 1301–1331
Article Google Scholar
Kumar S and Bansal M 1998 Dissecting α-helices: position-specific analysis of α-helices in globular proteins; Proteins 31 460–476
Article CAS PubMed Google Scholar
Lüthy R, McLachlan A D and Eisenberg D 1991 Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities; Proteins 10 229–239
Article PubMed Google Scholar
Murphy L R, Wallqvist A and Levy R M 2000 Simplified amino acid alphabets for protein fold recognition and implication for folding; Protein Eng. 13 149–152
Article CAS PubMed Google Scholar
Ng P C, Henikoff J G and Henikoff S 2000 PHAT: a transmembrane-specific substitution matrix; Bioinformatics 16 760–766
Article CAS PubMed Google Scholar
Overington J P, Donnelly D, Sali A, Johnson M S and Blundell T L 1992 Environmental-specific amino acid substitution tables: tertiary templates and prediction of protein folds; Protein Sci. 1 216–226
Article CAS PubMed Google Scholar
Overington J P, Johnson M S, Sali A and Blundell T L 1990 Tertiary structural constraints on protein evolutionary diversity; Proc. R. Soc. London B. Biol. Sci. 241 132–145
Article CAS Google Scholar
Prlic A, Domingues F S and Sippl M J 2000 Structure-derived substitution matrices for alignment of distantly related sequences; Protein Eng. 13 545–550
Article CAS PubMed Google Scholar
Rice D W and Eisenberg D 1997 A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence; J. Mol. Biol. 267 1026–1038
Article CAS PubMed Google Scholar
Risler J L, Delorme M O, Delacroix H and Henaut A 1988 Amino acid substitutions in structurally related proteins. A pattern recognition approach: determination of a new and efficient scoring matrix; J. Mol. Biol. 204 1019–1029
Article CAS PubMed Google Scholar
Schwartz R M and Dayhoff M O 1978 Atlas of protein sequence and structure; Nat. Biomed. Res. Found. 5 353–358
Google Scholar
Shi J, Blundell T L and Mizuguchi K 2001 FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties; J. Mol. Biol. 310 243–257
Article CAS PubMed Google Scholar
Tang C L, Xie L, Koh I Y, Posy S, Alexov E and Honig B 2003 On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles; J. Mol. Biol. 334 1043–1062
Article CAS PubMed Google Scholar
Tatusov R L, Altschul S F and Koonin E V 1994 Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks; Proc. Natl. Acad. Sci. USA 91 12091–12095
Article CAS PubMed Google Scholar
Thompson J D, Higgins D G and Gibson T J 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice; Nucleic Acids Res. 22 4673–4680
Article CAS PubMed Google Scholar
Thompson J D, Plewniak F and Poch O 1999 BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs; Bioinformatics 15 87–88
Article CAS PubMed Google Scholar
Vilim R B, Cunningham R M, Lu B, Kheradpour P and Stevens F J 2004 Fold-specific substitution matrices for protein classification; Bioinformatics 20 847–853
Article CAS PubMed Google Scholar
Vogt G, Etzold T and Argos P 1995 An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited; J. Mol. Biol. 249 816–831
Article CAS PubMed Google Scholar
Zhou H Y and Zhou Y Q 2005 Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments; Proteins 58 321–328
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, China
Hai Song Xu, Wen Ke Ren, Xiao Hui Liu & Xiao Qin Li

Authors

Hai Song Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wen Ke Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Hui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Qin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Qin Li.

Additional information

Authors’ contributions: XQL and XHL did some preliminary work on sequence data collection and classification. WKR participated in part of the programming work. HSX conducted most of the programming work, the statistical analysis and result analysis. HSX drafted the manuscript. The whole work was conceived by HSX, WKR, XHL and XQL and was supervised by XQL. All the authors read and approved of the final manuscript.

Supplementary tables and figures pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/June2010/pp295-314/suppl.pdf

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, H.S., Ren, W.K., Liu, X.H. et al. Aligning protein sequence and analysing substitution pattern using a class-specific matrix. J Biosci 35, 295–314 (2010). https://doi.org/10.1007/s12038-010-0033-3

Download citation

Received: 23 September 2009
Accepted: 01 April 2010
Published: 04 May 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s12038-010-0033-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aligning protein sequence and analysing substitution pattern using a class-specific matrix

Abstract

Access this article

Similar content being viewed by others

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Fold-specific sequence scoring improves protein sequence matching

Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Aligning protein sequence and analysing substitution pattern using a class-specific matrix

Abstract

Access this article

Similar content being viewed by others

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Fold-specific sequence scoring improves protein sequence matching

Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation