Determining Adaptive Loss Functions and Algorithms for Predictive Models

  • Michael C. BurkhartEmail author
  • Kourosh Modarresi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11537)


We consider the problem of training models to predict sequential processes. We use two econometric datasets to demonstrate how different losses and learning algorithms alter the predictive power for a variety of state-of-the-art models. We investigate how the choice of loss function impacts model training and find that no single algorithm or loss function results in optimal predictive performance. For small datasets, neural models prove especially sensitive to training parameters, including choice of loss function and pre-processing steps. We find that a recursively-applied artificial neural network trained under \(L_1\) loss performs best under many different metrics on a national retail sales dataset, whereas a differenced autoregressive model trained under \(L_1\) loss performs best under a variety of metrics on an e-commerce dataset. We note that different training metrics and processing steps result in appreciably different performance across all model classes and argue for an adaptive approach to model fitting.


  1. 1.
    Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. ArXiv e-prints (2018). arXiv:1803.01271
  2. 2.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  3. 3.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)Google Scholar
  4. 4.
    Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS Workshop on Deep Learning (2014)Google Scholar
  5. 5.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 1019–1027 (2016)Google Scholar
  7. 7.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with lstm. Neural Comput. 12(10), 2451–2471 (2000)CrossRefGoogle Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, vol. 37, pp. 448–456 (2015)Google Scholar
  11. 11.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)Google Scholar
  12. 12.
    Lea, Colin, Vidal, René, Reiter, Austin, Hager, Gregory D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, Gang, Jégou, Hervé (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). Scholar
  13. 13.
    Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Nat. Inst. Sci. India 2(1), 49–55 (1936)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in Neural Information Processing Systems, pp. 841–848 (2002)Google Scholar
  15. 15.
    Noh, H., You, T., Mun, J., Han, B.: Regularizing deep neural networks by noise: its interpretation and optimization. In: Advances in Neural Information Processing Systems, pp. 5109–5118 (2017)Google Scholar
  16. 16.
    Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: International Conference on Frontiers in Handwriting Recognition, pp. 285–290 (2014)Google Scholar
  17. 17.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)zbMATHGoogle Scholar
  18. 18.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Tikhonov, A.N.: On the stability of inverse problems. Doklady Akademii Nauk SSSR 39(5), 195–198 (1943)MathSciNetGoogle Scholar
  20. 20.
    Tikhonov, A.N.: Solution of incorrectly formulated problems and the regularization method. Doklady Akademii Nauk SSSR 151(3), 501–504 (1963)MathSciNetzbMATHGoogle Scholar
  21. 21.
    U.S. Bureau of the Census: Advance retail sales: Retail and food services, total [RSAFSNA] dataset. FRED, Federal Reserve Bank of St. Louis (2018)Google Scholar
  22. 22.
    U.S. Bureau of the Census: E-commerce retail sales [ECOMSA] dataset. FRED, Federal Reserve Bank of St. Louis (2018)Google Scholar
  23. 23.
    Wu, C.F.J.: On the convergence properties of the EM algorithm. Annal. Stat. 11(1), 95–103 (1983)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. ArXiv e-prints (2014)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Adobe Inc.San JoséUSA

Personalised recommendations