Skip to main content
Log in

A seqlet-based maximum entropy Markov approach for protein secondary structure prediction

  • Published:
Science in China Series C: Life Sciences Aims and scope Submit manuscript

Abstract

A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Thorton, J. M., From genome to function, Science, 2001, 292: 2095–2097.

    Article  Google Scholar 

  2. Cheng, L. P., Chen, S. X., Jenifer, M. B. et al., Three-dimensional structure determination of capsid of Aedes albopictus C6/36 cell densovirus, Science in China, Ser. C, 2004, 47(3): 224–228.

    Article  CAS  Google Scholar 

  3. Liu, Z. Z., Wang, J. L., Wang, Q. et al., Structure expression pattern and chromosomal localization of the rice Osgrp-2 gene, Science in China, Ser. C, 2003, 46(6): 584–594.

    Article  CAS  Google Scholar 

  4. Chou, P., Fasman, G., Empirical predictions of protein conformation, Annu. Rev. Biochem., 1978, 47(1): 251–276.

    Article  PubMed  CAS  Google Scholar 

  5. Ptisyn, O. B., Finkelstein, A. V., Theory of protein secondary structure and algorithm of its prediction, Biopolymers, 1983, 22(1): 15–22.

    Article  Google Scholar 

  6. Solovyev, V. V., Salamov, A. A., Method of calculation of discrete secondary structures in globular proteins, J. Mol. Biol., 1991, 25(3): 810–824.

    Google Scholar 

  7. Rost, B., Sander, C., Prediction of protein secondary structure at better than 70% accuracy, J. Mol. Biol., 1993, 232(2): 584–599.

    Article  PubMed  CAS  Google Scholar 

  8. Hua, S., Sun, Z., A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach, J. Mol. Biol., 2001, 308(2): 397–407.

    Article  PubMed  CAS  Google Scholar 

  9. Rost, B., Sander, C., Combining evolutionary information and neural networks to predict protein secondary structure, Proteins: Struc. Funct. Genet., 1994, 19(1): 55–72.

    Article  CAS  Google Scholar 

  10. Salzberg, S., Cost, S., Predicting protein secondary structure with nearest-neighbor algorithm, J. Mol. Biol., 1992, 22(2): 7371–7374.

    Google Scholar 

  11. Frishman, D., Argos, P., Seventy-five percent accuracy in protein secondary structure prediction, Proteins: Struct. Funct. Genet., 1997, 27(3): 329–335.

    Article  CAS  Google Scholar 

  12. Salamov, A. A., Solovyev, V. V., Protein secondary structure prediction using local alignments, J. Mol. Biol., 1997, 268(1): 31–36.

    Article  PubMed  CAS  Google Scholar 

  13. Schmidler, S. C., Liu, J. S., Brutlag, D. L., Bayesian protein structure prediction, Case Studies in Bayesian Statistics, 2001, 5: 363–378.

    Google Scholar 

  14. Schmidler, S. C., Liu, J. S., Brutlag, D. L., Bayesian segmentation of protein secondary structure, J. Comp. Biol., 2000, 7(1/2): 233–248.

    Article  CAS  Google Scholar 

  15. Language Modeling of Biological Data Workshop, ed. Searles, D., University of Pennsylvania, http://www.ircs.upenn.edu/modeling 2001/modeling.shtml, 2001.

  16. Rigoutsos, I., Floratos, A., Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm, Bioinformatics, 1998, 14(1): 55–67.

    Article  PubMed  CAS  Google Scholar 

  17. Pisanti, N., Crochemore, M., Grossi, R., Sagot, M. F., A Basis for Repeated Motifs in Pattern Discovery and Text Mining, Institut Gaspard Monge, University of Marne-la-Vallée, IGM 2002–10, Juillet 2002.

  18. Rigoutsos, I., Huynh, T., Floratos, A., Parida, L., Platt, D., Dictionary-driven protein annotation, Nucleic Acids Research, 2002, 30(17): 3901–3916.

    Article  PubMed  CAS  Google Scholar 

  19. Ganpathiraju, M., Weisser, D., Rosenfeld, R. et al., Comparative n-gram analysis of whole-genome protein sequences, in Proceedings of the Human Language Technologies Conference, San Diego, 2002.

  20. McCallum, A., Freitag, D., Pereira, F., Maximum Entropy Markov Models for information extraction and segmentation, in Proceedings of the Seventeenth International Conf. on Machine Learning, Stanford, CA, 2002, 591–598.

  21. Rabiner, L. R., Juang, B. H., An introduction to hidden markov models, IEEE ASSP Magazine, 1986, 3(1): 4–16.

    Article  Google Scholar 

  22. Berger, A. L., Della Pietra, S. A., Della Pietra, V. J., A maximum entropy approach to natural language processing, Computational Linguistics, 1996, 22(1): 39–71.

    Google Scholar 

  23. Darroch, J. N., Ratcliff, D., Generalized iterative scaling for log-linear models, The Annals of Mathematical Statistics, 1972, 43(5): 1470–1480.

    Article  Google Scholar 

  24. Kabsch, W., Sander, C., Dictionary of protein secondary structure: Pattern recognition of hydrogen bonded and geometrical features, Biopolymers, 1983, 22(12): 235–242.

    Article  Google Scholar 

  25. Frishman, D., Argos, P., Knowledge-based secondary structure assignment, Proteins: Struc. Funct. Genet., 1995, 23(4): 566–579.

    Article  CAS  Google Scholar 

  26. Richards, F. M., Kundrot, C. E., Identification of structural motifs from protein coordinate data: Secondary structure and first-level super-secondary structure, Proteins: Struc. Funct. Genet., 1988, 3(2): 71–84.

    Article  CAS  Google Scholar 

  27. James, A. C., Geoffrey, J. B., Evaluation and improvement of multiple sequence methods for protein secondary structure prediction, Proteins: Struc. Funct. Genet., 1999, 34(4): 508–519.

    Article  Google Scholar 

  28. Zemla, A. Venclovas, C., Fidelis, K., Rost, B. A., A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment, Proteins: Struc. Funct. Genet., 1999, 34(2): 220–223.

    Article  CAS  Google Scholar 

  29. Rost, B., Sander, C., Schneider, R., Redefining the goals of protein secondary structure prediction, J. Mol. Biol., 1994, 235(1): 13–26.

    Article  PubMed  CAS  Google Scholar 

  30. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, L. N., Bourne, P. E., The protein data bank, Nucleic Acids Research, 2000, 28(1): 235–242.

    Article  PubMed  CAS  Google Scholar 

  31. Wang, G., Dunbrack, R. L. Jr., PISCES: A protein sequence culling server, Bioinformatics, 2003, 19: 1589–1591.

    Article  PubMed  CAS  Google Scholar 

  32. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J., Basic local alignment search tool, J. Mol. Biol., 1990, 215: 403–410.

    PubMed  CAS  Google Scholar 

  33. Jones, D. T., Protein secondary structure prediction based on position-specific scoring matrices, J. Mol. Biol., 1999, 292: 195–202.

    Article  PubMed  CAS  Google Scholar 

  34. Karplus, K., Karchin, R., Barrett, C. et al., What is the value added by human intervention in protein structure prediction? Proteins: Struc. Funct. Genet. 2001, (Suppl. 5): 86–91.

  35. Pollastri, G., Przybylski, D., Rost, B., Baldi, P., Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles, Proteins: Struc. Funct., 2002, 47: 228–235.

    Article  CAS  Google Scholar 

  36. Yan, H. L., Song, Y. L., Liu, F. et al., Homology modeling three-dimensional structure of AnxB1 and reducing its immunogenicity by sequence-deleted mutagenesis, Science in China, Ser. C, 2004, 47(4): 359–367.

    Article  CAS  Google Scholar 

  37. Cohen, F. E., Abarbanel, R. M., Kuntz, I. D. et al., Turn prediction in proteins using a pattern matching approach, Biochemistry, 1986, 25(1): 266–275.

    Article  PubMed  CAS  Google Scholar 

  38. Presnell, S. R., Cohen, B. I., Cohen, F. E., A segment-based approach to protein secondary structure prediction, Biochemistry, 1992, 31(4): 983–993.

    Article  PubMed  CAS  Google Scholar 

  39. Crooks, G. E., Brenner, S. E., Protein secondary structure: Entropy, correlations and prediction, Bioinformatics, 2004, 20(10): 1603–1611.

    Article  PubMed  CAS  Google Scholar 

  40. Zhou, P., Xie, M. Y., Nie, S. P. et al., Primary structure and configuration of tea polysaccharide, Science in China, Ser. C, 2004, 47(5): 416–424.

    Article  CAS  Google Scholar 

  41. Rader, A. J., Anderson, G., Isin, B. et al., Identification of core amino acids stabilizing rhodopsin, Proc. Natl. Acad. Sci. USA, 2004, 101(19): 7246–725

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiwen Dong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Q., Wang, X., Lin, L. et al. A seqlet-based maximum entropy Markov approach for protein secondary structure prediction. Sci. China Ser. C.-Life Sci. 48, 394–405 (2005). https://doi.org/10.1360/062004-53

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1360/062004-53

Keyword

Navigation