Skip to main content

Leaving the Span

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 3559)

Abstract

We discuss a simple sparse linear problem that is hard to learn with any algorithm that uses a linear combination of the training instances as its weight vector. The hardness holds even if we allow the learner to embed the instances into any higher dimensional feature space (and use a kernel function to define the dot product between the embedded instances). These algorithms are inherently limited by the fact that after seeing k instances only a weight space of dimension k can be spanned.

Our hardness result is surprising because the same problem can be efficiently learned using the exponentiated gradient (EG) algorithm: Now the component-wise logarithms of the weights are essentially a linear combination of the training instances and after seeing k instances. This algorithm enforces additional constraints on the weights (all must be non-negative and sum to one) and in some cases these constraints alone force the rank of the weight space to grow as fast as 2k.

Keywords

  • Weight Vector
  • Singular Value Decomposition
  • Weight Space
  • Reproduce Kernel Hilbert Space
  • Training Instance

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Azoury, K., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning 43(3), 211–246 (2001); Special issue on Theoretical Advances in On-line Learning, Game Theory and Boosting

    CrossRef  MATH  Google Scholar 

  • Ben-David, S., Eiron, N., Simon, H.U.: Limitations of learning via embeddings in Euclidean half-spaces. Journal of Machine Learning Research 3, 441–461 (2002)

    CrossRef  MathSciNet  Google Scholar 

  • Cristianini, N., Campbell, C., Shawe-Taylor, J.: Multiplicative updatings for support vector learning. In: Proc. of European Symposium on Artificial Neural Networks, pp. 189–194 (1999)

    Google Scholar 

  • Davidson, K.R., Szarek, S.J.: Banach space theory and local operator theory. In: Lindenstrauss, J., Johnson, W. (eds.) Handbook of the Geometry of Banach Spaces, North-Holland, Amsterdam (2003)

    Google Scholar 

  • Forster, J., Schmitt, N., Simon, H.U.: Estimating the optimal margins of embeddings in Euclidean half spaces. In: Proc. of the 14th Annual Conference on Computational Learning Theory, pp. 402–415. Springer, Heidelberg (2001)

    Google Scholar 

  • Gentile, C., Littlestone, N.: The robustness of the p-norm algorithms. In: Proc. 12th Annu. Conf. on Comput. Learning Theory, pp. 1–11. ACM Press, New York (1999)

    Google Scholar 

  • Gentile, C., Warmuth, M.K.: Linear hinge loss and average margin. In: Kearns, M.S., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems 11, Cambridge, MA, pp. 225–231. MIT Press, Cambridge (1999)

    Google Scholar 

  • Helmbold, D.P., Kivinen, J., Warmuth, M.K.: Relative loss bounds for single neurons. IEEE Transactions on Neural Networks 10(6), 1291–1304 (1999)

    CrossRef  Google Scholar 

  • Herbrich, R., Graepel, T., Williamson, R.C.: Innovations in Machine Learning. In: Holmes, D., Jain, L.C. (eds.) The Structure of Version Space, January 2005, Springer, Heidelberg (2005)

    Google Scholar 

  • Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985)

    MATH  Google Scholar 

  • Khardon, R., Roth, D., Servedio, R.: Efficiency versus convergence of Boolean kernels for on-line learning algorithms. In: Advances in Neural Information Processing Systems 14, pp. 423–430 (2001)

    Google Scholar 

  • Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Applic. 33, 82–95 (1971)

    CrossRef  MATH  MathSciNet  Google Scholar 

  • Kivinen, J., Warmuth, M.K.: Relative loss bounds for multidimensional regression problems. Machine Learning 45(3), 301–329 (2001)

    CrossRef  MATH  Google Scholar 

  • Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Information and Computation 132(1), 1–64 (1997)

    CrossRef  MATH  MathSciNet  Google Scholar 

  • Kivinen, J., Warmuth, M.K., Auer, P.: The perceptron learning algorithm vs. Winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence 97(1 - 2), 325–343 (1997)

    CrossRef  MATH  MathSciNet  Google Scholar 

  • Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2, 285–318 (1988)

    Google Scholar 

  • Meckes, M.W.: Concentration of norms and eigenvalues of random matrices. Journal of Functional Analysis 211(2), 508–524 (2004)

    CrossRef  MATH  MathSciNet  Google Scholar 

  • Pitt, L., Warmuth, M.K.: The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the ACM 40(1), 95–142 (1993)

    CrossRef  MATH  MathSciNet  Google Scholar 

  • Schapire, R., Freund, Y., Bartlett, P.L., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics 26, 1651–1686 (1998)

    CrossRef  MATH  MathSciNet  Google Scholar 

  • Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 416–426. Springer, Heidelberg (2001)

    CrossRef  Google Scholar 

  • Takimoto, E., Warmuth, M.K.: Path kernels and multiplicative updates. Journal of Machine Learning Research 4, 773–818 (2003)

    CrossRef  MathSciNet  Google Scholar 

  • Warmuth, M.K.: Towards representation independence in PAC-learning. In: Jantke, J.P. (ed.) AII 1989. LNCS (LNAI), vol. 397, pp. 78–103. Springer, Heidelberg (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Warmuth, M.K., Vishwanathan, S.V.N. (2005). Leaving the Span. In: Auer, P., Meir, R. (eds) Learning Theory. COLT 2005. Lecture Notes in Computer Science(), vol 3559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11503415_25

Download citation

  • DOI: https://doi.org/10.1007/11503415_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26556-6

  • Online ISBN: 978-3-540-31892-7

  • eBook Packages: Computer ScienceComputer Science (R0)