Advertisement

Machine Learning

, Volume 107, Issue 12, pp 1947–1986 | Cite as

Stochastic variational hierarchical mixture of sparse Gaussian processes for regression

  • Thi Nhat Anh Nguyen
  • Abdesselam Bouzerdoum
  • Son Lam Phung
Article
  • 775 Downloads

Abstract

In this article, we propose a scalable Gaussian process (GP) regression method that combines the advantages of both global and local GP approximations through a two-layer hierarchical model using a variational inference framework. The upper layer consists of a global sparse GP to coarsely model the entire data set, whereas the lower layer comprises a mixture of sparse GP experts which exploit local information to learn a fine-grained model. A two-step variational inference algorithm is developed to learn the global GP, the GP experts and the gating network simultaneously. Stochastic optimization can be employed to allow the application of the model to large-scale problems. Experiments on a wide range of benchmark data sets demonstrate the flexibility, scalability and predictive power of the proposed method.

Keywords

Gaussian processes Variational inference Hierarchical structure Graphical model 

Notes

References

  1. Bayer, J., Osendorfer, C., Diot-Girard, S., Rueckstiess, T., & Urban, S. (2015). Climin-a pythonic framework for gradient-based function optimization. Technical report of TUM.Google Scholar
  2. Bertin-Mahieux, T., Ellis, D. P., Whitman, B., & Lamere, P. (2011). The million song dataset. In Proceedings of 12th international conference on music information retrieval, pp. 591–596.Google Scholar
  3. Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.zbMATHGoogle Scholar
  4. Boyle, P., & Frean, M. (2005). Dependent Gaussian processes. Advances in Neural Information Processing Systems, 17, 217–224.Google Scholar
  5. Chalupka, K., Williams, C. K., & Murray, I. (2013). A framework for evaluating approximation methods for Gaussian process regression. Journal of Machine Learning Research, 14(1), 333–350.MathSciNetzbMATHGoogle Scholar
  6. Csató, L. (2002). Gaussian processes: Iterative sparse approximations. PhD thesis, Aston University.Google Scholar
  7. Gal, Y., & Turner, R. (2015). Improving the Gaussian process sparse spectrum approximation by representing uncertainty in frequency inputs. Advances in Neural Information Processing Systems, 27, 3257–3265.Google Scholar
  8. Gal, Y., van der Wilk, M., & Rasmussen, C. E. (2014). Distributed variational inference in sparse Gaussian process regression and latent variable models. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems 27 (pp. 3257–3265). Curran Associates, Inc.Google Scholar
  9. Goldberg, P. W., Williams, C. K., & Bishop, C. M. (1998). Regression with input-dependent noise: A Gaussian process treatment. Advances in Neural Information Processing Systems, 10, 493–499.Google Scholar
  10. GPy. (since 2012). GPy: A Gaussian process framework in python. http://github.com/SheffieldML/GPy. Accessed 08 July 2016.
  11. Gramacy, R. B., Lee, H. K., & Macready, W. G. (2004). Parameter space exploration with Gaussian process trees. In Proceedings of 21st international conference on machine learning, pp. 353–360.Google Scholar
  12. Hensman, J., Fusi, N., Lawrence, N. D. (2013). Gaussian processes for big data. In Proceedings of 29th conference on uncertainty in artificial intelligence, pp. 282–290.Google Scholar
  13. Hensman, J., Matthews, A., & Ghahramani, Z. (2015). Scalable variational Gaussian process classification. In Proceedings of 8th international conference on artificial intelligence and statistics, pp. 351–360.Google Scholar
  14. Hoang, T. N., Hoang, Q. M., Low, B. K. H. (2015). A unifying framework of anytime sparse Gaussian process regression models with stochastic variational inference for big data. In Proceedings of 32nd international conference on machine learning, pp. 569–578.Google Scholar
  15. Hoang, T. N., Hoang, Q. M., & Low, B. K. H. (2016). A distributed variational inference framework for unifying parallel sparse Gaussian process regression models. In Proceedings of 33rd international conference on machine learning, pp. 382–391.Google Scholar
  16. Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. W. (2013). Stochastic variational inference. Journal of Machine Learning Research, 14(1), 1303–1347.MathSciNetzbMATHGoogle Scholar
  17. Lawrence, N., Seeger, M., & Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. Advances in Neural Information Processing Systems, 15, 625–632.Google Scholar
  18. Low, K. H., Yu, J., Chen, J., & Jaillet, P. (2015). Parallel Gaussian process regression for big data: Low-rank representation meets Markov approximation. In Proceedings of 29th AAAI conference on artificial intelligence, pp. 2821–2827.Google Scholar
  19. Meeds, E., & Osindero, S. (2006). An alternative infinite mixture of Gaussian process experts. Advances in Neural Information Processing Systems, 18, 883–890.Google Scholar
  20. Nguyen, T., & Bonilla, E. (2014). Fast allocation of Gaussian process experts. In Proceedings of 31st international conference on machine learning, pp. 145–153.Google Scholar
  21. Nguyen, T., Bouzerdoum, A., & Phung, S. (2016). Variational inference for infinite mixtures of sparse Gaussian processes through KL-correction. In Proceedings of 41st IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2579–2583.Google Scholar
  22. O’Hagan, A., & Kingman, J. (1978). Curve fitting and optimal design for prediction. Journal of the Royal Statistical Society Series B (Methodological), 40(1), 1–42.MathSciNetzbMATHGoogle Scholar
  23. Park, S., & Choi, S. (2010). Hierarchical Gaussian process regression. In Proceedings 2nd Asian conference on machine learning, pp. 95–110.Google Scholar
  24. Quiñonero-Candela, J., & Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6, 1939–1959.MathSciNetzbMATHGoogle Scholar
  25. Rasmussen, C. E., & Ghahramani, Z. (2002). Infinite mixtures of Gaussian process experts. Advances in Neural Information Processing Systems, 14, 881–888.Google Scholar
  26. Seeger, M. (2003). Bayesian Gaussian process models: Pac-Bayesian generalisation error bounds and sparse approximations. PhD thesis, University of Edinburgh.Google Scholar
  27. Seeger, M., Williams, C., & Lawrence, N. (2003). Fast forward selection to speed up sparse Gaussian process regression. In Proceedings 9th international workshop on artificial intelligence and statistics, pp. 3–6.Google Scholar
  28. Shi, J. Q., Murray-Smith, R., & Titterington, D. (2003). Bayesian regression and classification using mixtures of Gaussian processes. Intern Journal of Adaptive Control and Signal Processing, 17(2), 149–161.CrossRefGoogle Scholar
  29. Shi, J. Q., Murray-Smith, R., & Titterington, D. (2005). Hierarchical Gaussian process mixtures for regression. Statistics and Computing, 15(1), 31–41.MathSciNetCrossRefGoogle Scholar
  30. Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve fitting. Journal of the Royal Statistical Society Series B (Methodological), 47(1), 1–52.MathSciNetzbMATHGoogle Scholar
  31. Smola, A. J., & Bartlett, P. L. (2001). Sparse greedy Gaussian process regression. Advances in Neural Information Processing Systems, 13, 619–625.Google Scholar
  32. Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. Advances in Neural Information Processing Systems, 18, 1257–1264.Google Scholar
  33. Snelson, E., & Ghahramani, Z. (2007). Local and global sparse Gaussian process approximations. In Proceedings of 11th international conference on artificial intelligence and statistics, pp. 524–531.Google Scholar
  34. Snelson, E. L. (2008). Flexible and efficient Gaussian process models for machine learning. PhD thesis, University of London, University College London, UK.Google Scholar
  35. Sollich, P., & Williams, C. (2005). Using the equivalent kernel to understand Gaussian process regression. Advances in Neural Information Processing Systems, 17, 1313–1320.Google Scholar
  36. Sun, S., & Xu, X. (2011). Variational inference for infinite mixtures of Gaussian processes with applications to traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, 12(2), 466–475.CrossRefGoogle Scholar
  37. Titsias, M. K. (2009). Variational learning of inducing variables in sparse Gaussian processes. In Proceedings of 12th international conference on artificial intelligence and statistics, pp. 567–574.Google Scholar
  38. Tresp, V. (2000a). A Bayesian committee machine. Neural Computation, 12(11), 2719–2741.CrossRefGoogle Scholar
  39. Tresp, V. (2000b). Mixtures of Gaussian processes. Advances in Neural Information Processing Systems, 13, 654–660.Google Scholar
  40. Williams, C. K., & Rasmussen, C. E. (2006a). Gaussian processes for machine learning (Vol. 2, No. 3, p. 4). Cambridge: The MIT Press.Google Scholar
  41. Williams, C. K., & Rasmussen, C. E. (2006b). Gaussian processes for machine learning. Cambridge, MA: The MIT Press.zbMATHGoogle Scholar
  42. Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. Advances in Neural Information Processing Systems, 8, 514–520.Google Scholar
  43. Yuan, C., & Neubauer, C. (2009). Variational mixture of Gaussian process experts. Advances in Neural Information Processing Systems, 21, 1897–1904.Google Scholar
  44. Zeiler, M. D. (2012). Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701.
  45. Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23(4), 550–560.MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.School of Electrical, Computer and Telecommunication EngineeringUniversity of WollongongWollongongAustralia
  2. 2.College of Science and EngineeringHamad Bin Khalifa UniversityDohaQatar

Personalised recommendations