Abstract
The present work introduces an original and new online regression method that extends the shrinkage via limit of Gibbs sampler (SLOG) in the context of online learning. In particular, we theoretically show how the proposed online SLOG (OSLOG) is obtained using the Bayesian framework without resorting to the Gibbs sampler or considering a hierarchical representation. Moreover, in order to define the performance guarantee of OSLOG, we derive an upper bound on the cumulative squared loss. It is the only online regression algorithm with sparsity that gives logarithmic regret. Furthermore, we do an empirical comparison with two state-of-the-art algorithms to illustrate the performance of OSLOG relying on three aspects: normality, sparsity and multicollinearity showing an excellent achievement of trade-off between these properties.
Similar content being viewed by others
Notes
This is a mild assumption which is always satisfied in practice. Not making such assumption will lead to counter intuitive results such as Banach–Tarski paradox. For details, see, for example, [18].
All algorithms are available from SOLMA library: https://github.com/proteus-h2020/proteus-solma.
References
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288
Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681–686
Rajaratnam B, Roberts S, Sparks D, Dalal O (2016) Lasso regression: estimation and shrinkage via the limit of Gibbs sampling. J R Stat Soc Ser B (Stat Methodol) 78(1):153–174
Sambasivan R, Das S, Saha SK (2018) A Bayesian perspective of statistical machine learning for big data. arXiv preprint arXiv:1811.04788
Langford J, Li L, Zhang T (2009) Sparse online learning via truncated gradient. J Mach Learn Res 10(Mar):777–801
Gerchinovitz S (2013) Sparsity regret bounds for individual sequences in online linear regression. J Mach Learn Res 14(Mar):729–769
Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. J Mach Learn Res 10(Dec):2899–2934
Shalev-Shwartz S, Tewari A (2011) Stochastic methods for l1-regularized loss minimization. J Mach Learn Res 12(Jun):1865–1892
Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. Technical Report CMU-CS-03-110, School of Computer Science, Carnegie Mellon University
Hazan E, Agarwal A, Kale S (2007) Logarithmic regret algorithms for online convex optimization. Mach Learn 69(2–3):169–192
Francesco O, Nicolo C-B, Claudio G (2012) Beyond logarithmic bounds in online learning. In: Artificial intelligence and statistics, pp 823–831
Tibshirani RJ et al (2013) The lasso problem and uniqueness. Electron J Stat 7:1456–1490
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B (Methodol) 36:99–102
Murphy K (2014) Machine learning, a probabilistic perspective. Taylor & Francis, London
Kotowicz J (1990) Convergent real sequences. Upper and lower bound of sets of real numbers. Formaliz Math 1(3):477–481
Abbott S (2001) Understanding analysis. Springer, Berlin
Walter R et al (1976) Principles of mathematical analysis, vol 3. McGraw-Hill, New York
Tao T (2011) An introduction to measure theory. American Mathematical Society, Providence
Kivinen J, Warmuth M(1999) Averaging expert predictions. In: Computational learning theory. Springer, p 638
Kakade SM, Ng AY (2005) Online bounds for Bayesian algorithms. In: Advances in neural information processing systems, pp 641–648
Beckenbach EF, Bellman R (2012) Inequalities, vol 30. Springer, Berlin
Vovk V (2001) Competitive on-line statistics. Int Stat Rev/Revue Internationale de Statistique 69:213–248
Quinonero-Candela J, Dagan I, Magnini B, d’Alché BF (2006) Machine learning challenges: evaluating predictive uncertainty, visual object classification, and recognizing textual entailment. In: First Pascal Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, 11–13 April 2005, Revised Selected Papers, vol 3944. Springer
Akbilgic O, Bozdogan H, Balaban ME (2014) A novel hybrid RBF neural networks model as a forecaster. Stat Comput 24(3):365–375
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
Authors have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jamil, W., Bouchachia, A. Online Bayesian shrinkage regression. Neural Comput & Applic 32, 17759–17767 (2020). https://doi.org/10.1007/s00521-020-04947-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-04947-y