Advertisement

Evidence that Incremental Delta-Bar-Delta Is an Attribute-Efficient Linear Learner

  • Harlan D. Harris
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2430)

Abstract

The Winnow class of on-line linear learning algorithms [10],[11] was designed to be attribute-efficient. When learning with many irrelevant attributes, Winnow makes a number of errors that is only logarithmic in the number of total attributes, compared to the Perceptron algorithm, which makes a nearly linear number of errors. This paper presents data that argues that the Incremental Delta-Bar-Delta (IDBD) second-order gradient-descent algorithm [14] is attribute-efficient, performs similarly to Winnow on tasks with many irrelevant attributes, and also does better than Winnow on a task where Winnow does poorly. Preliminary analysis supports this empirical claim by showing that IDBD, like Winnow and other attribute-efficient algorithms, and unlike the Perceptron algorithm, has weights that can grow exponentially quickly. By virtue of its more flexible approach to weight updates, however, IDBD may be a more practically useful learning algorithm than Winnow.

Keywords

Learning Rate Irrelevant Attribute Target Concept Cumulative Error Perceptron Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    P. Auer and M. K. Warmuth. Tracking the best disjunction. Machine Learning, 32:127–150, 1998.zbMATHCrossRefGoogle Scholar
  2. 2.
    N. Christiani and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000.Google Scholar
  3. 3.
    Claudio Gentile. A new approximate maximial margin classification algorithm. Journal of Machine Learning Research, 2:213–242, December 2001.Google Scholar
  4. 4.
    Adam J. Grove, Nick Littlestone, and Dale Schurrmans. General convegence results for linear discriminant updates. Machine Learning, 43(3):173–210, 2001.zbMATHCrossRefGoogle Scholar
  5. 5.
    Harlan D. Harris and Jesse A. Reichler. Learning in the cerebellum with sparse conjunctions and linear separator algorithms. In Proceedings of the International Joint Conference on Neural Networks 2001, 2001.Google Scholar
  6. 6.
    Robert A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295–307, 1988.CrossRefGoogle Scholar
  7. 7.
    Roni Khardon, Dan Roth, and Rocco Servedio. Efficiency versus convergence of boolean kernels for on-line learning algorithms. In Proceedings of Neural Information Processing Systems 2001, 2001.Google Scholar
  8. 8.
    J. Kivinen, M. K. Warmuth, and P. Auer. The Perceptron algorithm versus Winnow: Linear versus logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:325–343, 1997.zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108:212–261, 1994.zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. Machine Learning, 2:285–318, 1988.Google Scholar
  11. 11.
    Nick Littlestone. Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, University of California, Santa Cruz, Technical Report UCSCCRL-89-11, March 1989.Google Scholar
  12. 12.
    Dan Roth. Learning to resolve natural language ambiguities: A unified approach. In Proceedings of AAAI-98, 15th Conference of the American Association for Artificial Intelligence, pages 806–813, 1998.Google Scholar
  13. 13.
    Nicolas Schweighofer and Michael A. Arbib. A model of cerebellar metaplasticity. Learning and Memory, 4:421–428, 1998.CrossRefGoogle Scholar
  14. 14.
    Richard S. Sutton. Adapting bias by gradient descent: An incremental version of Delta-Bar-Delta. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 171–176. MIT Press, 1992.Google Scholar
  15. 15.
    Leslie G. Valiant. Projection learning. Machine Learning, 37:115–130, 1999.zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Harlan D. Harris
    • 1
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations