A seqlet-based maximum entropy Markov approach for protein secondary structure prediction

Dong, Qiwen; Wang, Xiaolong; Lin, Lei; Guan, Yi

doi:10.1360/062004-53

A seqlet-based maximum entropy Markov approach for protein secondary structure prediction

Published: July 2005

Volume 48, pages 394–405, (2005)
Cite this article

Science in China Series C: Life Sciences Aims and scope Submit manuscript

Qiwen Dong¹,
Xiaolong Wang¹,
Lei Lin¹ &
…
Yi Guan¹

16 Accesses
6 Citations
Explore all metrics

Abstract

A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q₃ accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Homology Searches Using Supersecondary Structure Code

Automated Family-Wide Annotation of Secondary Structure Elements

Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences

References

Thorton, J. M., From genome to function, Science, 2001, 292: 2095–2097.
Article Google Scholar
Cheng, L. P., Chen, S. X., Jenifer, M. B. et al., Three-dimensional structure determination of capsid of Aedes albopictus C6/36 cell densovirus, Science in China, Ser. C, 2004, 47(3): 224–228.
Article CAS Google Scholar
Liu, Z. Z., Wang, J. L., Wang, Q. et al., Structure expression pattern and chromosomal localization of the rice Osgrp-2 gene, Science in China, Ser. C, 2003, 46(6): 584–594.
Article CAS Google Scholar
Chou, P., Fasman, G., Empirical predictions of protein conformation, Annu. Rev. Biochem., 1978, 47(1): 251–276.
Article PubMed CAS Google Scholar
Ptisyn, O. B., Finkelstein, A. V., Theory of protein secondary structure and algorithm of its prediction, Biopolymers, 1983, 22(1): 15–22.
Article Google Scholar
Solovyev, V. V., Salamov, A. A., Method of calculation of discrete secondary structures in globular proteins, J. Mol. Biol., 1991, 25(3): 810–824.
Google Scholar
Rost, B., Sander, C., Prediction of protein secondary structure at better than 70% accuracy, J. Mol. Biol., 1993, 232(2): 584–599.
Article PubMed CAS Google Scholar
Hua, S., Sun, Z., A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach, J. Mol. Biol., 2001, 308(2): 397–407.
Article PubMed CAS Google Scholar
Rost, B., Sander, C., Combining evolutionary information and neural networks to predict protein secondary structure, Proteins: Struc. Funct. Genet., 1994, 19(1): 55–72.
Article CAS Google Scholar
Salzberg, S., Cost, S., Predicting protein secondary structure with nearest-neighbor algorithm, J. Mol. Biol., 1992, 22(2): 7371–7374.
Google Scholar
Frishman, D., Argos, P., Seventy-five percent accuracy in protein secondary structure prediction, Proteins: Struct. Funct. Genet., 1997, 27(3): 329–335.
Article CAS Google Scholar
Salamov, A. A., Solovyev, V. V., Protein secondary structure prediction using local alignments, J. Mol. Biol., 1997, 268(1): 31–36.
Article PubMed CAS Google Scholar
Schmidler, S. C., Liu, J. S., Brutlag, D. L., Bayesian protein structure prediction, Case Studies in Bayesian Statistics, 2001, 5: 363–378.
Google Scholar
Schmidler, S. C., Liu, J. S., Brutlag, D. L., Bayesian segmentation of protein secondary structure, J. Comp. Biol., 2000, 7(1/2): 233–248.
Article CAS Google Scholar
Language Modeling of Biological Data Workshop, ed. Searles, D., University of Pennsylvania, http://www.ircs.upenn.edu/modeling 2001/modeling.shtml, 2001.
Rigoutsos, I., Floratos, A., Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm, Bioinformatics, 1998, 14(1): 55–67.
Article PubMed CAS Google Scholar
Pisanti, N., Crochemore, M., Grossi, R., Sagot, M. F., A Basis for Repeated Motifs in Pattern Discovery and Text Mining, Institut Gaspard Monge, University of Marne-la-Vallée, IGM 2002–10, Juillet 2002.
Rigoutsos, I., Huynh, T., Floratos, A., Parida, L., Platt, D., Dictionary-driven protein annotation, Nucleic Acids Research, 2002, 30(17): 3901–3916.
Article PubMed CAS Google Scholar
Ganpathiraju, M., Weisser, D., Rosenfeld, R. et al., Comparative n-gram analysis of whole-genome protein sequences, in Proceedings of the Human Language Technologies Conference, San Diego, 2002.
McCallum, A., Freitag, D., Pereira, F., Maximum Entropy Markov Models for information extraction and segmentation, in Proceedings of the Seventeenth International Conf. on Machine Learning, Stanford, CA, 2002, 591–598.
Rabiner, L. R., Juang, B. H., An introduction to hidden markov models, IEEE ASSP Magazine, 1986, 3(1): 4–16.
Article Google Scholar
Berger, A. L., Della Pietra, S. A., Della Pietra, V. J., A maximum entropy approach to natural language processing, Computational Linguistics, 1996, 22(1): 39–71.
Google Scholar
Darroch, J. N., Ratcliff, D., Generalized iterative scaling for log-linear models, The Annals of Mathematical Statistics, 1972, 43(5): 1470–1480.
Article Google Scholar
Kabsch, W., Sander, C., Dictionary of protein secondary structure: Pattern recognition of hydrogen bonded and geometrical features, Biopolymers, 1983, 22(12): 235–242.
Article Google Scholar
Frishman, D., Argos, P., Knowledge-based secondary structure assignment, Proteins: Struc. Funct. Genet., 1995, 23(4): 566–579.
Article CAS Google Scholar
Richards, F. M., Kundrot, C. E., Identification of structural motifs from protein coordinate data: Secondary structure and first-level super-secondary structure, Proteins: Struc. Funct. Genet., 1988, 3(2): 71–84.
Article CAS Google Scholar
James, A. C., Geoffrey, J. B., Evaluation and improvement of multiple sequence methods for protein secondary structure prediction, Proteins: Struc. Funct. Genet., 1999, 34(4): 508–519.
Article Google Scholar
Zemla, A. Venclovas, C., Fidelis, K., Rost, B. A., A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment, Proteins: Struc. Funct. Genet., 1999, 34(2): 220–223.
Article CAS Google Scholar
Rost, B., Sander, C., Schneider, R., Redefining the goals of protein secondary structure prediction, J. Mol. Biol., 1994, 235(1): 13–26.
Article PubMed CAS Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, L. N., Bourne, P. E., The protein data bank, Nucleic Acids Research, 2000, 28(1): 235–242.
Article PubMed CAS Google Scholar
Wang, G., Dunbrack, R. L. Jr., PISCES: A protein sequence culling server, Bioinformatics, 2003, 19: 1589–1591.
Article PubMed CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J., Basic local alignment search tool, J. Mol. Biol., 1990, 215: 403–410.
PubMed CAS Google Scholar
Jones, D. T., Protein secondary structure prediction based on position-specific scoring matrices, J. Mol. Biol., 1999, 292: 195–202.
Article PubMed CAS Google Scholar
Karplus, K., Karchin, R., Barrett, C. et al., What is the value added by human intervention in protein structure prediction? Proteins: Struc. Funct. Genet. 2001, (Suppl. 5): 86–91.
Pollastri, G., Przybylski, D., Rost, B., Baldi, P., Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles, Proteins: Struc. Funct., 2002, 47: 228–235.
Article CAS Google Scholar
Yan, H. L., Song, Y. L., Liu, F. et al., Homology modeling three-dimensional structure of AnxB1 and reducing its immunogenicity by sequence-deleted mutagenesis, Science in China, Ser. C, 2004, 47(4): 359–367.
Article CAS Google Scholar
Cohen, F. E., Abarbanel, R. M., Kuntz, I. D. et al., Turn prediction in proteins using a pattern matching approach, Biochemistry, 1986, 25(1): 266–275.
Article PubMed CAS Google Scholar
Presnell, S. R., Cohen, B. I., Cohen, F. E., A segment-based approach to protein secondary structure prediction, Biochemistry, 1992, 31(4): 983–993.
Article PubMed CAS Google Scholar
Crooks, G. E., Brenner, S. E., Protein secondary structure: Entropy, correlations and prediction, Bioinformatics, 2004, 20(10): 1603–1611.
Article PubMed CAS Google Scholar
Zhou, P., Xie, M. Y., Nie, S. P. et al., Primary structure and configuration of tea polysaccharide, Science in China, Ser. C, 2004, 47(5): 416–424.
Article CAS Google Scholar
Rader, A. J., Anderson, G., Isin, B. et al., Identification of core amino acids stabilizing rhodopsin, Proc. Natl. Acad. Sci. USA, 2004, 101(19): 7246–725
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, 150006, Harbin, China
Qiwen Dong, Xiaolong Wang, Lei Lin & Yi Guan

Authors

Qiwen Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yi Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiwen Dong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Q., Wang, X., Lin, L. et al. A seqlet-based maximum entropy Markov approach for protein secondary structure prediction. Sci. China Ser. C.-Life Sci. 48, 394–405 (2005). https://doi.org/10.1360/062004-53

Download citation

Received: 14 December 2004
Issue Date: July 2005
DOI: https://doi.org/10.1360/062004-53

Keyword

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A seqlet-based maximum entropy Markov approach for protein secondary structure prediction

Abstract

Access this article

Similar content being viewed by others

Homology Searches Using Supersecondary Structure Code

Automated Family-Wide Annotation of Secondary Structure Elements

Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keyword

Navigation

A seqlet-based maximum entropy Markov approach for protein secondary structure prediction

Abstract

Access this article

Similar content being viewed by others

Homology Searches Using Supersecondary Structure Code

Automated Family-Wide Annotation of Secondary Structure Elements

Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keyword

Search

Navigation