# A new PAC bound for intersection-closed concept classes

## Abstract

For hyper-rectangles in \(\mathbb{R}^{d}\) Auer (1997) proved a PAC bound of \(O(\frac{1}{\varepsilon}(d+\log \frac{1}{\delta}))\), where \(\varepsilon\) and \(\delta\) are the accuracy and confidence parameters. It is still an open question whether one can obtain the same bound for intersection-closed concept classes of VC-dimension \(d\) in general. We present a step towards a solution of this problem showing on one hand a new PAC bound of \(O(\frac{1}{\varepsilon}(d\log d + \log \frac{1}{\delta}))\) for arbitrary intersection-closed concept classes, complementing the well-known bounds \(O(\frac{1}{\varepsilon}(\log \frac{1}{\delta}+d\log \frac{1}{\varepsilon}))\) and \(O(\frac{d}{\varepsilon}\log \frac{1}{\delta})\) of Blumer et al. and (1989) and Haussler, Littlestone and Warmuth (1994). Our bound is established using the *closure algorithm*, that generates as its hypothesis the intersection of all concepts that are consistent with the positive training examples. On the other hand, we show that many intersection-closed concept classes including e.g. maximum intersection-closed classes satisfy an additional combinatorial property that allows a proof of the optimal bound of \(O(\frac{1}{\varepsilon}(d+\log \frac{1}{\delta}))\). For such improved bounds the choice of the learning algorithm is crucial, as there are consistent learning algorithms that need \(\Omega(\frac{1}{\varepsilon}(d\log\frac{1}{\varepsilon} +\log\frac{1}{\delta}))\) examples to learn some particular maximum intersection-closed concept classes.

## Keywords

PAC bounds Intersection-closed classes## References

- Auer, P. (1997). Learning nested differences in the presence of malicious noise.
*Theor. Comput. Sci., 185(1)*, 159–175.zbMATHCrossRefMathSciNetGoogle Scholar - Auer, P., & Cesa-Bianchi, N. (1998). On-line learning with malicious noise and the closure algorithm.
*Ann. Math. Artif. Intell., 23*(1–2), 83–99.Google Scholar - Auer, P., Long, P.M., & Srinivasan, A. (1998). Approximating hyper-rectangles: Learning and pseudorandom sets.
*J. Comput. Syst. Sci., 57*(3), 376–388.Google Scholar - Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension.
*J. ACM, 36*(4), 929–965.zbMATHCrossRefMathSciNetGoogle Scholar - Ehrenfeucht, A., Haussler, D., Kearns, M. J., & Valiant, L. G. (1989). A general lower bound on the number of examples needed for learning.
*Inf. Comput., 82*(3), 247–261.zbMATHCrossRefMathSciNetGoogle Scholar - Floyd, A., & Warmuth, M. (1995). Sample compression, learnability, and the Vapnik-Chervonenkis Dimension.
*Machine Learning, 21*(3), 269–304.Google Scholar - Haussler, D., Littlestone, N., & Warmuth, M. (1994). Predicting {l0,1}-functions on randomly drawn points.
*Inf. Comput., 115*(2), 248–292.zbMATHCrossRefMathSciNetGoogle Scholar - Helmbold, D., Sloan, R., & Warmuth, M. (1990). Learning nested differences of intersection-closed concept classes.
*Machine Learning*5, 165–196.Google Scholar - Sauer, N. (1972). On the density of families of sets.
*J. Combin. Theory Ser. A, 13*, 145–147.zbMATHCrossRefMathSciNetGoogle Scholar