Baldi, P., Sadowski, P., Lu, Z.: Learning in the machine: random backpropagation and the deep learning channel. arXiv:1612.02734 [cs.LG] (2016)
Bauckhage, C., Thurau, C.: Making archetypal analysis practical. In: Denzler, J., Notni, G., Süße, H. (eds.) DAGM 2009. LNCS, vol. 5748, pp. 272–281. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03798-6_28
CrossRef
Google Scholar
Bengio, Y., Lamblin, P., Popovic, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Proceedings NIPS (2006)
Google Scholar
Choy, M., Srinivasan, D., Cheu, R.: Neural networks for continuous online learning and control. IEEE Trans. Neural Netw. 17(6), 2006 (2006)
Google Scholar
Courbariaux, M., Bengio, Y., David, J.P.: Training deep neural networks with low precision multiplications. arXiv:1412.7024 [cs.LG] (2014)
Garipov, T., Izmailov, P., Podoprikhin, D., Vetrov, D., Wilson, A.: Loss surfaces, mode connectivity, and fast ensembling of DNNs. arXiv:1802.10026 [stat.ML] (2018)
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. In: Proceedings NIPS (2006)
MathSciNet
CrossRef
Google Scholar
Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)
CrossRef
Google Scholar
Hooke, R., Jeeves, T.: Direct search solution of numerical and statistical problems. J. ACM 8(2), 212–229 (1961)
CrossRef
Google Scholar
Izmailov, P., Garipov, D.P.T., Vetrov, D., Wilson, A.: Averaging weights leads to wider optima and better generalization. arXiv:1803.05407 [cs.LG] (2018)
Jaderberg, M., et al.: Decoupled neural interfaces using synthetic gradients. arXiv:1608.05343 [cs.LG] (2016)
Kiefer, J., Wolfowitz, J.: Estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952)
MathSciNet
CrossRef
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
CrossRef
Google Scholar
Lillicrap, T., Cownden, D., Tweed, D., Akerman, J.: Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 7(13276) (2016)
CrossRef
Google Scholar
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: Proceedings ICLR (2017)
Google Scholar
Nelder, J., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965)
MathSciNet
CrossRef
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
MathSciNet
CrossRef
Google Scholar
Rosenfeld, A., Tsotsos, J.: Intriguing properties of randomly weighted networks: generalizing while learning next to nothing. arXiv:1802.00844 [cs.LG] (2018)
Rummelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
CrossRef
Google Scholar
Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., Schmidhuber, J.: Policy gradients with parameter-based exploration for control. In: Kůrková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008. LNCS, vol. 5163, pp. 387–396. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87536-9_40
CrossRef
Google Scholar
Smith, L.: Cyclical learning rates for training neural networks. In: Proceedings Winter Conference on Applications of Computer Vision. IEEE (2017)
Google Scholar
Song, Q., Spall, J., Soh, Y.C., Nie, J.: Robust neural network tracking controller using simultaneous perturbation stochastic approximation. IEEE Trans. Neural Netw. 19(5), 817–835 (2008)
CrossRef
Google Scholar
Spall, J.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control 37(3), 332–341 (1992)
MathSciNet
CrossRef
Google Scholar
Spall, J.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley, Hoboken (2003)
CrossRef
Google Scholar
Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., Goldstein, T.: Training neural networks without gradients: a scalable ADMM approach. In: Proceedings ICML (2016)
Google Scholar
Thurau, C., Kersting, K., Wahabzada, M., Bauckhage, C.: Convex non-negative matrix factorization for massive datasets. Knowl. Inf. Syst. 29(2), 457–478 (2011)
CrossRef
Google Scholar
Vande Wouver, A., Renotte, C., Remy, M.: On the use of simultaneuous perturbation stochastic approximation for neural network training. In: Proceedings American Control Conference. IEEE (1999)
Google Scholar
Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
MATH
Google Scholar