Machine Learning

, Volume 15, Issue 2, pp 201–221 | Cite as

Improving generalization with active learning

  • David Cohn
  • Les Atlas
  • Richard Ladner


Active learning differs from “learning from examples” in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples.

In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning calledselective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers “useful.” We test our implementation, called anSG-network, on three domains and observe significant improvement in generalization.


queries active learning generalization version space neural networks 


  1. Aggoune, M., Atlas, L., Cohn, D., Damborg, M., El-Sharkawi, M., & Marks, R. H. (1989). Artificial neural networks for power system static security assessment.Proceedings, International Symposium on Circuits and Systems. IEEE.Google Scholar
  2. Angluin, D. (1986). Learning regular sets from queries and counter-examples. (Technical Report YALEU/DCS/TR-64). Dept. of Computer Science, Yale University, New Haven, CT.Google Scholar
  3. Ash, T. (1989). Dynamic node creation in backpropagation networks.ICS Report 8901. Institute for Cognitive Science, University of California, San Diego, CA.Google Scholar
  4. Aum, E., & Haussler, D. (1989). What size net gives valid generalization? In D. Touretzky (Ed.),Advances in neural information processing systems, (Vol. 1). San Francisco, CA: Morgan Kaufmann.Google Scholar
  5. Baum, E., & Lang, K. (1991). Constructing hidden units using examples and queries. In R. Lippmann et al. (Eds.),Advances in neural information processing systems (Vol. 3). San Francisco, CA: Morgan Kaufmann.Google Scholar
  6. Blum, A., & Rivest, R. (1989). Training a 3-node neural network is NP-complete. In D. Touretzky (Ed.),Advances in neural information processing systems, Volume 1. San Francisco, CA: Morgan Kaufmann.Google Scholar
  7. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension.JACM, 36(4), 929–965.Google Scholar
  8. Cohn, D., Atlas, L., & Ladner, R. (1990). Training connectionist networks with queries and selective sampling. In D. Touretzky (Ed.),Advances in neural information processing systems, (Vol. 2). San Francisco, CA: Morgan Kaufmann.Google Scholar
  9. Cohn, D., & Tesauro, G. (1992). How tight are the Vapnik-Chervonenkis bounds?Neural Computation 4(2), 249–269.Google Scholar
  10. Eisenberg, B., & Rivest, R. (1990). On the sample complexity of pac-learning using random and chosen examples. In M. Fulk & J. Case (Eds.),ACM 3rd Annual Workshop on Computational Learning Theory. San Francisco, CA: Morgan Kaufmann.Google Scholar
  11. Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for Motherese speech.Infant Behavior and Development, 10, 279–293.Google Scholar
  12. Freund, Y., Seung, H.S., Shamir, E., & Tishby, N. (1993). Information, prediction, and query by committee. In S. Hanson et al., (Eds.),Advances in Neural Information Processing Systems (Vol. 5). San Francisco, CA: Morgan Kaufmann.Google Scholar
  13. Haussler, D. (1987). Learning conjunctive concepts in structural domains.Proceedings, AAAI '87 (pp. 466–470). San Francisco, CA: Morgan Kaufmann.Google Scholar
  14. Haussler, D., (1992). Decision-theoretic generalizations of the PAC model for neural net and other applications.Information and Computation, 100(1), 78–150.Google Scholar
  15. Hwang, J.-N., Choi, J., Oh, S., & Marks, R. (1990). Query learning based on boundary search and gradient computation of trained multilayer perceptrons.IJCNN 90. San Diego, CA.Google Scholar
  16. Judd, S. (1988). On the complexity of loading shallow neural networks.Journal of Complexity, 4, 177–192.Google Scholar
  17. Le Cunn, Y., Denker, J., & Solla, S. (1990). Optimal brain damage. In D. Touretzky (Ed.),Advances in neural information processing systems (Vol. 2). San Francisco, CA: Morgan Kaufmann.Google Scholar
  18. MacKay, D. (1992). Information-based objective functions for active data selection.Neural Computation, 4(4), 590–604.Google Scholar
  19. Mitchell, T. (1982). Generalization as search.Artificial Intelligence, 18, 203–226.Google Scholar
  20. Pratt, L.Y. (1993). Discriminability-based transfer between neural networks. In C.L. Giles, et al. (Eds.),Advances in Neural Information Processing Systems, (Vol. 5). San Francisco, CA: Morgan Kaufmann.Google Scholar
  21. Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning internal representations by error propagation. In D. Rumelhart & J. McClelland (Eds.),Parallel distributed processing, Cambridge, MA: MIT Press.Google Scholar
  22. Seung, H.S., Opper, M., & Sompolinsky, H. (1992). Query by committee. InProceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 287–294). New York: ACM.Google Scholar
  23. Valiant, L. (1984). A theory of the learnable.Communications of the ACM, 27, 1134–1142.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • David Cohn
    • 1
  • Les Atlas
    • 2
  • Richard Ladner
    • 3
  1. 1.Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridge
  2. 2.Department of Electrical EngineeringUniversity of WashingtonSeattle
  3. 3.Department of Computer Science and EngineeringUniversity of WashingtonSeattle

Personalised recommendations