Faster Gradient Descent Training of Hidden Markov Models, Using Individual Learning Rate Adaptation
Hidden Markov Models (HMMs) are probabilistic models, suitable for a wide range of pattern recognition tasks. In this work, we propose a new gradient descent method for Conditional Maximum Likelihood (CML) training of HMMs, which significantly outperforms traditional gradient descent. Instead of using fixed learning rate for every adjustable parameter of the HMM, we propose the use of independent learning rate/step-size adaptation, which has been proved valuable as a strategy in Artificial Neural Networks training. We show here that our approach compared to standard gradient descent performs significantly better. The convergence speed is increased up to five times, while at the same time the training procedure becomes more robust, as tested on ap-plications from molecular biology. This is accomplished without additional computational complexity or the need for parameter tuning.
KeywordsHide Markov Model Learning Rate Gradient Descent Emission Probability Gradient Descent Method
Unable to display preview. Download preview PDF.
- 6.Baum, L.: An inequality and associated maximization technique in statistical estimation for probalistic functions of Markov processes. Inequalities 3, 1–8 (1972)Google Scholar
- 9.Krogh, A.: Hidden Markov models for labeled sequences. In: Krogh, A. (ed.) Proceedings of the12th IAPR International Conference on Pattern Recognition, pp. 140–144 (1994)Google Scholar
- 10.Krogh, A.: Two methods for improving performance of an HMM and their application for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997)Google Scholar
- 11.Bagos, P.G., Liakopoulos, T.D., Hamodrakas, S.J.: Maximum Likelihood and Conditional Maximum Likelihood learning algorithms for Hidden Markov Models with labeled data- Application to transmembrane protein topology prediction. In: Simos, T.E. (ed.) Computational Methods in Sciences and Engineering, Proceedings of the International Conference 2003 (ICCMSE 2003), pp. 47–55. World Scientific Publishing Co. Pte. Ltd, Singapore (2003)CrossRefGoogle Scholar
- 13.Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1998)Google Scholar
- 14.Schiffmann, W., Joost, M., Werner, R.: Optimization of the Backpropagation Algorithm for Training Multi-Layer Perceptrons. Technical report, University of Koblenz, Institute of Physics (1994)Google Scholar
- 15.Riedmiller, M., Braun, H.: RPROP-A Fast Adaptive Learning Algorithm. In: Riedmiller, M., Braun, H. (eds.) Proceedings of the 1992 International Symposium on Computer and Information Sciences, Antalya, Turkey, pp. 279–285 (1992)Google Scholar
- 20.Bagos, P.G., Liakopoulos, T.D., Spyropoulos, I.C., Hamodrakas, S.J.: PRED-TMBB: a web server for predicting the topology of beta-barrel outer membrane proteins. Nucleic Acids Res 404(Web Server) , W400-W404 (2004)Google Scholar