Abstract
In the problem of learning a real-valued function from examples, in an extension of the ‘PAC’ model, a learner sees a sequence of values of an unknown function at a number of randomly chosen points. On the basis of these examples, the learner chooses a function—called a hypothesis—from some class H of hypotheses, with the aim that the learner's hypothesis is close to the target function on future random examples. In this paper we require that, for most training samples, with high probability the absolute difference between the values of the learner's hypothesis and the target function on a random point is small. A natural learning algorithm to consider is one that chooses a function in H that is close to the target function on the training examples. This, together with the success criterion described above, leads to the definition of a statistical property which we would wish a class of functions to possess.
We derive a characterization of function classes that have this property, in terms of their ‘fat-shattering function’, a notion that has proven useful in other problems in computational learning theory, such as the learnability of probabilistic concepts and the learnability of functions in the presence of random noise. This work has applications to the learning of functions in the presence of malicious noise.
This research was carried out while Martin Anthony was visiting the Department of Systems Engineering, ANU. This research is supported in part by the Australian Telecommunications and Electronics Research Board. The work of Martin Anthony is supported in part by the European Union through the “Neurocolt” ESPRIT Working Group.
Preview
Unable to display preview. Download preview PDF.
References
Alon, N., Ben-David, S., Cesa-Bianchi, N. and Haussler, D. (1993). Scale-sensitive dimensions, uniform convergence, and learnability. In Proceedings of the 1993 IEEE Symposium on Foundations of Computer Science, IEEE Press.
Anthony, M., Bartlett, P.L., Ishai, Y., Shawe-Taylor, J. (1994). Valid generalisation from approximate interpolation. To appear, Combinatorics, Probability and Computing.
Anthony, M. and Biggs, N. (1992). Computational Learning Theory: An Introduction, Cambridge University Press.
Anthony, M. and Shawe-Taylor, J. (1993). Valid generalisation of functions from close approximations on a sample. In Computational Learning Theory: EURO-COLT'93, Oxford university Press, 1994.
Bartlett, P.L., Long, P.M. and Williamson, R.C. (1994). Fat-shattering and the learnability of real-valued functions. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, ACM Press, New York.
Ben-David, S., Cesa-Bianchi, N., Haussier, D. and Long, P. (1992). Characterizations of learnability for classes of {0,..., n}-valued functions. Technical Report. (An earlier version appeared in Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, ACM Press, New York.) To appear, Journal of Computer and System Sciences.
Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension, Journal of the ACM 36(4): 929–965.
Dudley, R. M., Giné, E. and Zinn, J. (1991). Uniform and universal GlivenkoCantelli classes. Journal of Theoretical Probability, 4: 485–510.
Ehrenfeucht, A., Haussler, D., Kearns, M. and Valiant, L. (1989). A general lower bound on the number of examples needed for learning, Information and Computation 82: 247–261.
Gurvits, L. and Koiran, P. (1995). Approximation and learning of real-valued functions, in these Proceedings.
Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications, Information and Computation, 100: 78–150.
Kearns, M.J. and Schapire, R. E. (1990). Efficient distribution-free learning of probabilistic concepts, in Proceedings of the 1990 IEEE Symposium on Foundations of Computer Science, IEEE Press.
Pollard, D. (1984). Convergence of Stochastic Processes, Springer-Verlag.
Simon, H. U. (1994). Bounds on the number of examples needed for learning functions. In Computational Learning Theory: EUROCOLT'93 (ed. J. Shawe-Taylor and M. Anthony), Oxford University Press.
Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM, 27(11): 1134–1142.
Vapnik, V.N. and Chervonenkis, A.Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2): 264–280.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Anthony, M., Bartlett, P. (1995). Function learning from interpolation (extended abstract). In: Vitányi, P. (eds) Computational Learning Theory. EuroCOLT 1995. Lecture Notes in Computer Science, vol 904. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59119-2_179
Download citation
DOI: https://doi.org/10.1007/3-540-59119-2_179
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59119-1
Online ISBN: 978-3-540-49195-8
eBook Packages: Springer Book Archive