Sample Complexity of Linear Learning Machines with Different Restrictions over Weights

  • Marcin Korzeń
  • Przemysław Klęsk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7268)


Known are many different capacity measures for learning machines like: Vapnik-Chervonenkis dimension, covering numbers or fat dimension. In this paper we present experimental results of sample complexity estimation, taking into account rather simple learning machines linear in parameters. We show that, sample complexity can be quite different even for learning machines having the same VC-dimension. Moreover, independently from the capacity of a learning machine, the distribution of data is also significant. Experimental results are compared with known theoretical results for sample complexity and generalization bounds.


Mean Square Error Sample Complexity True Error Bayesian Regularisation Capacity Concept 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press (1999)Google Scholar
  2. 2.
    Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2003)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)CrossRefGoogle Scholar
  4. 4.
    Cawley, G.C., Talbot, N.L.C.: Gene selection in cancer classification using sparse logistic regression with bayesian regularisation. Bioinformatics 22(19), 2348–2355 (2006)CrossRefGoogle Scholar
  5. 5.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at,
  6. 6.
    Domingos, P.: The role of occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery 3, 409–425 (1999)CrossRefGoogle Scholar
  7. 7.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Annals of Statistics 32(2), 407–451 (1996)MathSciNetGoogle Scholar
  8. 8.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2009)Google Scholar
  9. 9.
    Hesterberg, T., Choi, N.H., Meier, L., Fraley, C.: Least angle and l 1 penalized regression: A review. Statistics Surveys 2, 61–93 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Klęsk, P., Korzeń, M.: Sets of approximating functions with finite vapnik-chervonenkis dimension for nearest-neighbors algorithms. Pattern Recognition Letters 32(14), 1882–1893 (2011)CrossRefGoogle Scholar
  11. 11.
    MacKay, D.J.C.: Information theory, inference, and learning algorithms. Cambridge University Press (2003)Google Scholar
  12. 12.
    Minka, T.P.: A comparison of numerical optimizers for logistic regression. Technical report, Dept. of Statistics, Carnegie Mellon Univ. (2003)Google Scholar
  13. 13.
    Ng, A.Y.: Feature selection, l1 vs. l2 regularization, and rotational invariance. In: ICML 2004: Proceedings of the Twenty-First International Conference on Machine Learning, p. 78. ACM, New York (2004)CrossRefGoogle Scholar
  14. 14.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Vapnik, V.: Statistical learning theory. Wiley (1998)Google Scholar
  16. 16.
    Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbors algorithms. In: Advances in Neural Information Processing Systems, pp. 985–992 (2001)Google Scholar
  17. 17.
    Williams, P.M.: Bayesian regularisation and pruning using a laplace prior. Neural Computation 7, 117–143 (1994)CrossRefGoogle Scholar
  18. 18.
    Zahálka, J., Železný, F.: An experimental test of occam’s razor in classification. Machine Learning 82, 475–481 (2011)CrossRefGoogle Scholar
  19. 19.
    Zhang, T.: Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research 2, 527–550 (2002)zbMATHGoogle Scholar
  20. 20.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Statist. Soc. B 67(2), 301–320 (2005)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marcin Korzeń
    • 1
  • Przemysław Klęsk
    • 1
  1. 1.Faculty of Computer Science and Information TechnologyWest Pomeranian University of TechnologySzczecinPoland

Personalised recommendations