Neural Processing Letters

, Volume 17, Issue 2, pp 191–203 | Cite as

Accelerating Cyclic Update Algorithms for Parameter Estimation by Pattern Searches

  • Antti Honkela
  • Harri Valpola
  • Juha Karhunen


A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A well-known drawback of this strategy is slow convergence in low noise conditions. We propose using so-called pattern searches which consist of an exploratory phase followed by a line search. During the exploratory phase, a search direction is determined by combining the individual updates of all subproblems. The approach can be used to speed up several well-known learning methods such as variational Bayesian learning (ensemble learning) and expectation-maximization algorithm with modest algorithmic modifications. Experimental results show that the proposed method is able to reduce the required convergence time by 60–85% in realistic variational Bayesian learning problems.

alternating optimization Bayesian methods cyclic updating line search parameter estimation pattern search variational Bayesian learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Attias, H.: Independent factor analysis, Neural Computation, 11(4) (1999),803–851.Google Scholar
  2. 2.
    Bazaraa, M. S., Sherali, H. D. and Shetty, C. M.: Nonlinear Programming: Theory and Algorithms, J. Wiley, 1993, Second Edition.Google Scholar
  3. 3.
    Bezdek, J. C. and Hathaway, R. J.: Some notes on alternating optimization, In: N. R. Pal and M. Sugeno,(eds), Advances in Soft Computing-AFSS 2002, Vol: 2275 of Lecture Notes in Artificial Intelligence, Springer-Verlag,pp. 288–300, 2002.Google Scholar
  4. 4.
    Bishop, C.: Neural Networks for Pattern Recognition, Clarendon Press, 1995.Google Scholar
  5. 5.
    Dempster, A. P., Laird, N. M. and Rubin, D. B.: Maximum likelihood from incomplete data via the EM algorithm, J. of the Royal Statistical Society, Series B (Methodological), 39(1) (1977), 1–38.Google Scholar
  6. 6.
    Fletcher, R.: Practical Methods of Optimization, J. Wiley, 1987,Second Edition.Google Scholar
  7. 7.
    Ghahramani, Z. and Hinton, G. E.: Variational learning for switching state-space models, Neural Computation, 12(4) (2000), 963–996.Google Scholar
  8. 8.
    Hinton, G. E. and van Camp, D.: Keeping Neural Networks Simple by Minimizing the Description Length of the Weights,In: Proc. of the 6th Ann. ACM Conf. on Computational Learning Theory,Santa Cruz, CA,USA, pp. 5–13, 1993.Google Scholar
  9. 9.
    Hooke, R. and Jeeves, T. A.: “Direct search” solution of numerical and statistical problems, J. of the ACM, 8(2) (1961), 212–229.Google Scholar
  10. 10.
    Hyvärinen, A., Karhunen, J. and Oja, E.: Independent Component Analysis,J. Wiley, 2001.Google Scholar
  11. 11.
    Jamshidian, M. and Jennrich, R. I.: Conjugate gradient acceleration of the EM algorithm, J. of the American Statistical Association, 88(421) (1993), 221–228.Google Scholar
  12. 12.
    Jordan, M., Ghahramani, Z., Jaakkola, T. and Saul, L.: An introduction to variational methods for graphical models, In: M. Jordan,(ed), Learning in Graphical Models, Cambridge,MA,USA: The MIT Press, pp. 105–161, 1999.Google Scholar
  13. 13.
    Lappalainen, H. and Honkela, A.: Bayesian nonlinear independent component analysis by multi-layer perceptrons, In: M. Girolami,(ed), Advances in Independent Component Analysis, Berlin: Springer-Verlag, pp. 93–121, 2000.Google Scholar
  14. 14.
    Lappalainen, H. and Miskin, J.: Ensemble learning, In: M. Girolami, (ed), Advances in Independent Component Analysis, Berlin: Springer-Verlag, pp. 75–92, 2000.Google Scholar
  15. 15.
    MacKay, D. J. C.: Developments in Probabilistic Modelling with Neural Networks -Ensemble Learning, In: Neural Networks: Artificial Intelligence and Industrial Applications. Proc. of the 3rd Annual Symposium on Neural Networks, pp. 191–198, 1995.Google Scholar
  16. 16.
    MacKay, D. J. C.: Ensemble learning for hidden Markov models, Available from http:// 1997.Google Scholar
  17. 17.
    Neal, R. M. and Hinton, G. E.: A view of the EM algorithm that justifies incremental, sparse,and other variants, In: M. I. Jordan (ed), Learning in Graphical Models, Cambridge, MA, USA: The MIT Press, pp. 355–368, 1999.Google Scholar
  18. 18.
    Valpola, H. and Karhunen, J.: An unsupervised ensemble learning method for nonlinear dynamic state-space models, Neural Computation, 14(11) (2002), 2647–2692.Google Scholar
  19. 19.
    Valpola, H., Östman, T. and Karhunen, J.: Nonlinear Independent Factor Analysis by Hierarchical Models, In: Proc. 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003), 2003, to appear.Google Scholar
  20. 20.
    Valpola, H., Raiko, T. and Karhunen, J.: Building Blocks for Hierarchical Latent Variable Models, In: Proc. 3rd Int. Conf. on Independent Component Analysis and Signal Separation (ICA2001), San Diego, USA, pp. 710–715, 2001.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Antti Honkela
    • 1
  • Harri Valpola
    • 1
  • Juha Karhunen
    • 1
  1. 1.Helsinki University of Technology, Neural Networks Research CentreHUTFinland

Personalised recommendations