Pattern Recognition and Image Analysis

, Volume 19, Issue 3, pp 456–464 | Cite as

Relevant regressors selection by continuous AIC

  • D. KropotovEmail author
  • N. Ptashko
  • D. Vetrov
Mathematical Theory of Pattern Recognition


In the paper we propose an algorithm for regressors (features, basis functions) selection in linear regression problems. To do this we use continuous generalization of known Akaike information criterion (AIC). We develop a method for AIC optimization w.r.t. individual regularization coefficients. Each coefficient defines the relevance degree of the corresponding regressor. We provide the experimental results, which prove that the proposed approach can be considered as a non-Bayesian analog of automatic relevance determination (ARD) approach and marginal likelihood optimization used in Relevance Vector Regression (RVR). The key difference of new approach is its ability to find zero regularization coefficients. We hope that this helps to avoid type-II overfitting (underfitting) which is reported for RVR. In the paper we also show that in some special case both methods become identical.


Akaike Information Criterion Ridge Regression Neural Information Processing System Feature Selection Problem Linear Regression Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    D. Spiegelhalter, N. Best, B. Carlin, and A. van der Linde, “Bayesian Measures of Model Complexity and Fit,” J. Royal Statistic. Soc., Ser. B 64, 583–640 (2002).zbMATHCrossRefGoogle Scholar
  2. 2.
    H. Akaike, “A New Look at Statistical Model Identification,” IEEE Trans. Automat. Control 25, 461–464 (1974).Google Scholar
  3. 3.
    C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006).Google Scholar
  4. 4.
    T. G. Dietterich, “Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms,” Neural Computation 10, 1895–1924 (1998).CrossRefGoogle Scholar
  5. 5.
    D. J. C. MacKay, “The Evidence Framework Applied to Classification Networks,” Neural Computation 4, 720–736 (1992).CrossRefGoogle Scholar
  6. 6.
    G. Schwarz, “Estimating the Dimension of a Model,” Annals Statistics 6, 461–464 (1978).zbMATHCrossRefGoogle Scholar
  7. 7.
    M. E. Tipping, “The Relevance Vector Machine,” in Advances Neural Information Processing Systems (2000), Vol. 12, pp. 652–658.Google Scholar
  8. 8.
    M. E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Mach. Learn. Res. 1, 211–244 (2001).zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    D. A. Kropotov and D. P. Vetrov, “On One Method of Non-Diagonal Regularization in Sparse Bayesian Learning,” in Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), 2007, pp. 457–464.Google Scholar
  10. 10.
    A. Asuncion and D. J. Newman, UCI Machine Learning Repository (University of California, Irvine, School of Information and Computer Sciences, 2007),
  11. 11.
    G. C. Cawley, N. L. C. Talbot, and M. Girolami, “Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation,” in Advances in Neural Information Processing Systems, Ed. by B. Scholkopf, J. C. Platt, and T. Hoffmann (MIT Press, 2007), Vol. 19.Google Scholar
  12. 12.
    A. C. Faul and M. E. Tipping, “Analysis of Sparse Bayesian Learning,” Adv. Neural Information Processing Systems 14, 383–389 (2002).Google Scholar
  13. 13.
    P. M. Williams, Bayesian Regularization and Pruning Using a Laplace Prior,” Neural Computation 7, 117–143 (1995).CrossRefGoogle Scholar
  14. 14.
    R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. Royal Statistic. Soc., Ser. B 58(1), 267–288 (1995).MathSciNetGoogle Scholar
  15. 15.
    M. Figueiredo, “Adaptive Sparseness for Supervised Learning, IEEE Trans. Pattern Analysis Machine Intelligence 25(9), 1150–1159 (1995).CrossRefGoogle Scholar
  16. 16.
    Y. Qi, T. Minka, R. Picard, and Z. Ghahramani, “Predictive Automatic Relevance Determination by Expectation Propagation,” in Proceedings of the 21-st International Google Scholar

Copyright information

© Pleiades Publishing, Ltd. 2009

Authors and Affiliations

  1. 1.Dorodnicyn Computing Centre of the Russian Academy of SciencesMoscowRussia
  2. 2.CMC departmentLomonosov Moscow State UniversityMoscowRussia

Personalised recommendations