Advertisement

Maximum Margin Algorithms with Boolean Kernels

  • Roni Khardon
  • Rocco A. Servedio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2777)

Abstract

Recent work has introduced Boolean kernels with which one can learn over a feature space containing all conjunctions of length up to k (for any 1≤ kn) over the original n Boolean features in the input space. This motivates the question of whether maximum margin algorithms such as support vector machines can learn Disjunctive Normal Form expressions in the PAC learning model using this kernel. We study this question, as well as a variant in which structural risk minimization (SRM) is performed where the class hierarchy is taken over the length of conjunctions.

We show that such maximum margin algorithms do not PAC learn t(n)-term DNF for any t(n) = ω(1), even when used with such a SRM scheme. We also consider PAC learning under the uniform distribution and show that if the kernel uses conjunctions of length \(\tilde{\omega}(\sqrt{n})\) then the maximum margin hypothesis will fail on the uniform distribution as well. Our results concretely illustrate that margin based algorithms may overfit when learning simple target functions with natural kernels.

Keywords

Support Vector Machine Boolean Function Threshold Function Maximum Margin Structural Risk Minimization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blum, A., Furst, M., Jackson, J., Kearns, M., Mansour, Y., Rudich, S.: Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In: Proceedings of the 26th Annual Symposium on Theory of Computing, pp. 253–262 (1994)Google Scholar
  2. 2.
    Blum, A., Rudich, S.: Fast learning of k-term DNF formulas with queries. Journal of Computer and System Sciences 51(3), 367–373 (1995)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
  4. 4.
    Bshouty, N.: A subexponential exact learning algorithm for DNF using equivalence queries. Information Processing Letters 59, 37–39 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Bshouty, N., Tamon, C.: On the Fourier spectrum of monotone functions. Journal of the ACM 43(4), 747–770 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Gentile, C.: A new approximate maximal margin classification algorithm. Journal of Machine Learning Research 2, 213–242 (2001)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Hancock, T., Mansour, Y.: Learning monotone k-μ DNF formulas on product distributions. In: Proceedings of the 4th Annual Conference on Computational Learning Theory, pp. 179–193 (1991)Google Scholar
  8. 8.
    Jackson, J.: An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences 55, 414–440 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Kearns, M., Vazirani, U.: An introduction to computational learning theory. MIT Press, Cambridge (1994)Google Scholar
  10. 10.
    Khardon, R.: On using the Fourier transform to learn disjoint DNF. Information Processing Letters 49, 219–222 (1994)zbMATHCrossRefGoogle Scholar
  11. 11.
    Khardon, R., Roth, D., Servedio, R.: Efficiency versus convergence of boolean kernels for on-line learning algorithms. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, Cambridge, MA, vol. 14. MIT Press, Cambridge (2002)Google Scholar
  12. 12.
    Klivans, A., Servedio, R.: Learning DNF in time 2õ(n 1/3). In: Proceedings of the Thirty-Third Annual Symposium on Theory of Computing, pp. 258–265 (2001)Google Scholar
  13. 13.
    Kowalczyk, A., Smola, A.J., Williamson, R.C.: Kernel machines and boolean functions. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, Cambridge, MA, vol. 14. MIT Press, Cambridge (2002)Google Scholar
  14. 14.
    Kucera, L., Marchetti-Spaccamela, A., Protassi, M.: On learning monotone DNF formulae under uniform distributions. Information and Computation 110, 84–95 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Kushilevitz, E., Roth, D.: On learning visual concepts and DNF formulae. In: Proceedings of the 6th Annual Conference on Computational Learning Theory, pp. 317–326 (1993)Google Scholar
  16. 16.
    Minsky, M., Papert, S.: Perceptrons: an introduction to computational geometry. MIT Press, Cambridge (1968)Google Scholar
  17. 17.
    Sadohara, K.: Learning of boolean functions using support vector machines. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS (LNAI), vol. 2225, pp. 106–118. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  18. 18.
    Sakai, Y., Maruoka, A.: Learning monotone log-term DNF formulas under the uniform distribution. Theory of Computing Systems 33, 17–33 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Servedio, R.: On PAC learning using winnow, perceptron, and a perceptron-like algorithm. In: Proceedings of the 12th Annual Conference on Computational Learning Theory, pp. 296–307 (1999)Google Scholar
  20. 20.
    Servedio, R.: On learning monotone DNF under product distributions. In: Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pp. 473–489 (2001)Google Scholar
  21. 21.
    Shawe-Taylor, J., Cristianini, N.: An introduction to support vector machines. Cambridge University Press, Cambridge (2000)Google Scholar
  22. 22.
    Tarui, J., Tsukiji, T.: Learning DNF by approximating inclusion-exclusion formulae. In: Proceedings of the Fourteenth Conference on Computational Complexity, pp. 215–220 (1999)Google Scholar
  23. 23.
    Valiant, L.: A theory of the learnable. Communications of the ACM 27(11), 1134–1142 (1984)zbMATHCrossRefGoogle Scholar
  24. 24.
    Verbeurgt, K.: Learning DNF under the uniform distribution in quasi-polynomial time. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 314–326 (1990)Google Scholar
  25. 25.
    Verbeurgt, K.: Learning sub-classes of monotone DNF on the uniform distribution. In: Proceedings of the 9th Conference on Algorithmic Learning Theory, pp. 385–399 (1998)Google Scholar
  26. 26.
    Watkins, C.: Kernels from matching operations. Technical Report CSD-TR-98-07, Computer Science Department, Royal Holloway, University of London (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Roni Khardon
    • 1
  • Rocco A. Servedio
    • 2
  1. 1.Department of Computer ScienceTufts UniversityMedfordUSA
  2. 2.Department of Computer ScienceColumbia UniversityNew YorkUSA

Personalised recommendations