Relevant regressors selection by continuous AIC
In the paper we propose an algorithm for regressors (features, basis functions) selection in linear regression problems. To do this we use continuous generalization of known Akaike information criterion (AIC). We develop a method for AIC optimization w.r.t. individual regularization coefficients. Each coefficient defines the relevance degree of the corresponding regressor. We provide the experimental results, which prove that the proposed approach can be considered as a non-Bayesian analog of automatic relevance determination (ARD) approach and marginal likelihood optimization used in Relevance Vector Regression (RVR). The key difference of new approach is its ability to find zero regularization coefficients. We hope that this helps to avoid type-II overfitting (underfitting) which is reported for RVR. In the paper we also show that in some special case both methods become identical.
KeywordsAkaike Information Criterion Ridge Regression Neural Information Processing System Feature Selection Problem Linear Regression Problem
Unable to display preview. Download preview PDF.
- 2.H. Akaike, “A New Look at Statistical Model Identification,” IEEE Trans. Automat. Control 25, 461–464 (1974).Google Scholar
- 3.C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006).Google Scholar
- 7.M. E. Tipping, “The Relevance Vector Machine,” in Advances Neural Information Processing Systems (2000), Vol. 12, pp. 652–658.Google Scholar
- 9.D. A. Kropotov and D. P. Vetrov, “On One Method of Non-Diagonal Regularization in Sparse Bayesian Learning,” in Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), 2007, pp. 457–464.Google Scholar
- 10.A. Asuncion and D. J. Newman, UCI Machine Learning Repository (University of California, Irvine, School of Information and Computer Sciences, 2007), http://archive.ics.uci.edu/ml.
- 11.G. C. Cawley, N. L. C. Talbot, and M. Girolami, “Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation,” in Advances in Neural Information Processing Systems, Ed. by B. Scholkopf, J. C. Platt, and T. Hoffmann (MIT Press, 2007), Vol. 19.Google Scholar
- 12.A. C. Faul and M. E. Tipping, “Analysis of Sparse Bayesian Learning,” Adv. Neural Information Processing Systems 14, 383–389 (2002).Google Scholar
- 16.Y. Qi, T. Minka, R. Picard, and Z. Ghahramani, “Predictive Automatic Relevance Determination by Expectation Propagation,” in Proceedings of the 21-st International Google Scholar