Learning distributions by their density levels — A paradigm for learning without a teacher
Can we learn from unlabeled examples? We consider here the un-supervised learning scenario in which the examples provided are not labeled (and are not necessarily all positive or all negative). The only information about their membership is indirectly disclosed to the student through the sampling distribution.
We view this problem as a restricted instance of the fundamental issue of inferring information about a probability distribution from the random samples it generates. We propose a framework, density-level-learning, for acquiring some partial information about a distribution and develop a model of un-supervised concept learning based on this framework.
We investigate the basic features of these types of learning and provide lower and upper bounds on the sample complexity of these tasks. Our main result is that the learnability of a class in this setting is equivalent to the finiteness of its VC-dimension. One direction of the proof involves a reduction of the density-level-learnability to PAC learnability, while the sufficiency condition is proved through the introduction of a generic learning algorithm.
KeywordsLearning Theory PAC Vapnik-Chervonenkis dimension ∈-approximation unsupervised learning
Unable to display preview. Download preview PDF.
- [BL93]S. Ben-David and M. Lindenbaum, 1993, “Localization vs. Identification of Semi-Algebraic Sets”, Proc. COLT 93, pp. 327–336.Google Scholar
- [BEHW89]Blumer, A., A. Ehrenfeucht, D. Haussler and M.K. Warmuth, 1989, “Learnability and The Vapnik-Chervonenkis Dimension”, JACM, 36(4), pp. 929–965.Google Scholar
- [CEG93]Canetti, R., G. Even and O. Goldreich, 1993, “Lower Bounds for sampling Algorithms for Estimating the Average”, Computer Science Technical Report No. 789, Technion-Israel Institute of Technology.Google Scholar
- [DH73]Duda, R.O., and P.E. Hart, 1973, “Pattern Classification and Scene Analysis”, Wiley.Google Scholar
- [D84]Dudley, R.M., 1984, “A course on empirical processes”, Lecture Notes in Mathematics, 1097, pp. 2–142.Google Scholar
- [IK88]Illingworth, J., and J. Kittler, 1988 “A Survey of Hough Transform”, Computer Vision, Graphics, and Image Processing, 44, pp. 87–116.Google Scholar
- [K91]Khovanskii, A.G. 1991, “Fewnomials”, Translations of Mathematical Monographs, 88.Google Scholar
- [KMRRSS94]Kearns, M., Y. Mansour, D. Ron, R. Rubinfeld, R.E. Schapire, and L. Sellie, 94, “On the Learnability of Discrete Distributions”, Proc. of 26th ACM STOC 94, pp. 273–282.Google Scholar
- [Kim9l]Kim, W.M., 991, “Learning by Smoothing: a Morphological approach”, Proc. of COLT 91, pp. 43–57.Google Scholar
- [Nat91]Natarajan, B.K., 1991, “Probably Approximate Learning of Sets and Functions”, SIAM J. Comput. 20(1), pp. 328–351.Google Scholar
- [Pap84]Papoulis, A., 1984, “Probability, Random Variables, and Stochastic Processes”, 1984, McGraw-Hill.Google Scholar
- [VC71]Vapnik, V.N. and A.Y. Chervonenkis, 1971, “On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities”, Theory of Probability and its applications, 16(2), pp. 264–280.Google Scholar