Abstract
Most of the state-of-the-art methods for protein seconday structure prediction are complex combinations of discriminant models. They apply a local approach of the prediction which is known to induce a limit on the expected prediction accuracy. A priori, the use of generative models should make it possible to overcome this limitation. However, among the numerous hidden Markov models which have been dedicated to this task over more than two decades, none has come close to providing comparable performance. A major reason for this phenomenon is provided by the nature of the relevant information. Indeed, it is well known that irrespective of the model implemented, the prediction should benefit significantly from the availability of evolutionary information. Currently, this knowledge is embedded in position-specific scoring matrices which cannot be processed easily with hidden Markov models. With this observation at hand, the next significant advance should come from making the best of the two approaches, i.e., using a generative model on top of discriminant models. This article introduces the first hybrid architecture of this kind with state-of-the-art performance. The conjunction of the two levels of treatment makes it possible to optimize the recognition rate both at the residue level and at the segment level.
Keywords
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Qian, N., Sejnowski, T.J.: Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology 202, 865–884 (1988)
Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)
Cole, C., Barber, J.D., Barton, G.J.: The Jpred 3 secondary structure prediction server. Nucleic Acids Research 36, W197–W201 (2008)
Kountouris, P., Hirst, J.D.: Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinformatics 10, 437 (2009)
Aydin, Z., Singh, A., Bilmes, J., Noble, W.S.: Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure. BMC Bioinformatics 12, 154 (2011)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)
Asai, K., Hayamizu, S., Handa, K.: Prediction of protein secondary structure by the hidden Markov model. CABIOS 9, 141–146 (1993)
Martin, J., Gibrat, J.-F., Rodolphe, F.: Analysis of an optimal hidden Markov model for secondary structure prediction. BMC Structural Biology 6, 25 (2006)
Won, K.-J., Hamelryck, T., Prügel-Bennett, A., Krogh, A.: An evolutionary method for learning HMM structure: prediction of protein secondary structure. BMC Bioinformatics 8, 357 (2007)
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)
Yao, X.-Q., Zhu, H., She, Z.-S.: A dynamic Bayesian network approach to protein secondary structure prediction. BMC Bioinformatics 9, 49 (2008)
Krogh, A., Riis, S.K.: Hidden neural networks. Neural Computation 11, 541–563 (1999)
Guermeur, Y.: Combining discriminant models with new multi-class SVMs. Pattern Analysis and Applications 5, 168–179 (2002)
Guermeur, Y., Pollastri, G., Elisseeff, A., Zelus, D., Paugam-Moisy, H., Baldi, P.: Combining protein secondary structure prediction models with ensemble methods of optimal complexity. Neurocomputing 56, 305–327 (2004)
Lin, K., Simossis, V.A., Taylor, W.R., Heringa, J.: A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21, 152–159 (2005)
Guermeur, Y., Thomarat, F.: Estimating the Class Posterior Probabilities in Protein Secondary Structure Prediction. In: Loog, M., Wessels, L., Reinders, M.J.T., de Ridder, D. (eds.) PRIB 2011. LNCS (LNBI), vol. 7036, pp. 260–271. Springer, Heidelberg (2011)
Bonidal, R., Thomarat, F., Guermeur, Y.: Estimating the class posterior probabilities in biological sequence segmentation. In: SMTDA 2012 (2012)
Ramesh, P., Wilpon, J.G.: Modeling state durations in hidden Markov models for automatic speech recognition. In: ICASSP 1992, pp. 381–384 (1992)
Guermeur, Y.: A generic model of multi-class support vector machine. International Journal of Intelligent Information and Database Systems (in press, 2012)
Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15, 937–946 (1999)
Chen, J., Chaudhari, N.S.: Cascaded bidirectional recurrent neural networks for protein secondary structure prediction. IEEE/ACM Transactions on Computational Biology and Bioinfomatics 4, 572–582 (2007)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley, London (1989)
Guermeur, Y.: Combining multi-class SVMs with linear ensemble methods that estimate the class posterior probabilities. Communications in Statistics (submitted)
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Guermeur, Y.: VC theory of large margin multi-category classifiers. Journal of Machine Learning Research 8, 2551–2594 (2007)
Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34, 508–519 (1999)
Jones, D.T., Swindells, M.B.: Getting the most from PSI-BLAST. Trends in Biochemical Sciences 27, 161–164 (2002)
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)
Weston, J., Watkins, C.: Multi-class support vector machines. Technical Report CSD-TR-98-04, Royal Holloway, University of London, Department of Computer Science (1998)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2001)
Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99, 67–81 (2004)
Guermeur, Y., Monfrini, E.: A quadratic loss multi-class SVM for which a radius-margin bound applies. Informatica 22, 73–96 (2011)
Lauer, F., Guermeur, Y.: MSVMpack: a multi-class support vector machine package. Journal of Machine Learning Research 12, 2293–2296 (2011)
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thomarat, F., Lauer, F., Guermeur, Y. (2012). Cascading Discriminant and Generative Models for Protein Secondary Structure Prediction. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds) Pattern Recognition in Bioinformatics. PRIB 2012. Lecture Notes in Computer Science(), vol 7632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34123-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-34123-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34122-9
Online ISBN: 978-3-642-34123-6
eBook Packages: Computer ScienceComputer Science (R0)