Abstract
Active learning differs from “learning from examples” in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples.
In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning called selective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers “useful.” We test our implementation, called an SG-network, on three domains and observe significant improvement in generalization.
References
- Aggoune, M., Atlas, L., Cohn, D., Damborg, M., El-Sharkawi, M., & Marks, R. II. (1989). Artificial neural networks for power system static security assessment. Proceedings, International Symposium on Circuits and Systems. IEEE.Google Scholar
- Angluin, D. (1986). Learning regular sets from queries and counter-examples. (Technical Report YALEU/DCS/TR-64). Dept. of Computer Science, Yale University, New Haven, CT.Google Scholar
- Ash, T. (1989). Dynamic node creation in backpropagation networks. ICS Report 8901. Institute for Cognitive Science, University of California, San Diego, CA.Google Scholar
- Aum, E., & Haussler, D. (1989). What size net gives valid generalization? In D. Touretzky (Ed.), Advances in neural information processing systems, (Vol. 1). San Francisco, CA: Morgan Kaufmann.Google Scholar
- Baum, E., & Lang, K. (1991). Constructing hidden units using examples and queries. In R. Lippmann et al. (Eds.), Advances in neural information processing systems (Vol. 3). San Francisco, CA: Morgan Kaufmann.Google Scholar
- Blum, A., & Rivest, R. (1989). Training a 3-node neural network is NP-complete. In D. Touretzky (Ed.), Advances in neural information processing systems, Volume 1. San Francisco, CA: Morgan Kaufmann.Google Scholar
- Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension. JACM, 36 (4), 929–965.Google Scholar
- Cohn, D., Atlas, L., & Ladner, R. (1990). Training connectionist networks with queries and selective sampling. In D. Touretzky (Ed.), Advances in neural information processing systems, (Vol. 2). San Francisco, CA: Morgan Kaufmann.Google Scholar
- Cohn, D., & Tesauro, G. (1992). How tight are the Vapnik-Chervonenkis bounds? Neural Computation 4 (2), 249–269.Google Scholar
- Eisenberg, B., & Rivest, R. (1990). On the sample complexity of pac-learning using random and chosen examples. In M. Fulk & J. Case (Eds.), ACM 3rd Annual Workshop on Computational Learning Theory. San Francisco, CA: Morgan Kaufmann.Google Scholar
- Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for Motherese speech. Infant Behavior and Development, 10, 279–293.Google Scholar
- Freund, Y., Seung, H.S., Shamir, E., & Tishby, N. (1993). Information, prediction, and query by committee. In S. Hanson et al., (Eds.). Advances in Neural Information Processing Systems (Vol. 5). San Francisco, CA: Morgan Kaufmann.Google Scholar
- Haussler, D. (1987). Learning conjunctive concepts in structural domains. Proceedings, AAAI '87 (pp. 466–470). San Francisco, CA: Morgan Kaufmann.Google Scholar
- Haussler, D., (1992). Decision-theoretic generalizations of the PAC model for neural net and other applications. Information and Computation, 100 (1), 78–150.PubMedGoogle Scholar
- Hwang, J.-N., Choi, J., Oh, S., & Marks, R. (1990). Query learning based on boundary search and gradient computation of trained multilayer perceptrons. IJCNN 90. San Diego, CA.Google Scholar
- Judd, S. (1988). On the complexity of loading shallow neural networks. Journal of Complexity, 4, 177–192.Google Scholar
- Le Cunn, Y., Denker, J., & Solla, S. (1990). Optimal brain damage. In D. Touretzky (Ed.), Advances in neural information processing systems (Vol. 2). San Francisco, CA: Morgan Kaufmann.Google Scholar
- MacKay, D. (1992). Information-based objective functions for active data selection. Neural Computation, 4 (4), 590–604.Google Scholar
- Mitchell, T. (1982). Generalization as search. Artificial Intelligence, 18, 203–226.Google Scholar
- Pratt, L.Y. (1993). Discriminability-based transfer between neural networks. In C.L. Giles, et al. (Eds.), Advances in Neural Information Processing Systems, (Vol. 5). San Francisco, CA: Morgan Kaufmann.Google Scholar
- Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning internal representations by error propagation. In D. Rumelhart & J. McClelland (Eds.), Parallel distributed processing, Cambridge, MA: MIT Press.Google Scholar
- Seung, H.S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 287–294). New York: ACM.Google Scholar
- Valiant, L. (1984). A theory of the learnable. Communications of the ACM, 27, 1134–1142.Google Scholar