Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Littlestone, Nick

doi:10.1023/A:1022869011914

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Published: April 1988

Volume 2, pages 285–318, (1988)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Download PDF

Nick Littlestone¹

3006 Accesses
88 Citations
Explore all metrics

Abstract

Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linear-threshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space.

References

Angluin, D. (1987). Queries and concept learning. Machine Learning, 2, 319–342.
Google Scholar
Angluin, D., & Smith, C. H. (1983). Inductive inference: Theory and methods. Computing Surveys, 15, 237–269.
Google Scholar
Banerji, R. B. (1985). The logic of learning: A basis for pattern recognition and for improvement of performance. Advances in Computers, 24, 177–216.
Google Scholar
Barzdin, J. M., & Freivald, R. V. (1972). On the prediction of general recursive functions. Soviet Mathematics Doklady, 13, 1224–1228.
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987a). Learnability and the Vapnik-Chervonenkis dimension (Technical Report USCS-CRL-87–20). Santa Cruz: University of California, Computer Research Laboratory.
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987b). Occam's Razor. Information Processing Letters, 24, 377–380.
Google Scholar
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: John Wiley.
Google Scholar
Hampson, S. E., & Volper, D. J. (1986). Linear function neurons: Structure and training. Biological Cybernetics, 53, 203–217.
Google Scholar
Haussler, D. (1985). Space efficient learning algorithms. Unpublished manuscript, University of California, Department of Computer and Information Sciences, Santa Cruz.
Google Scholar
Haussler, D. (1986). Quantifying the inductive bias in concept learning. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 485–489). Philadelphia, PA: Morgan Kaufman
Google Scholar
Haussler, D., Littlestone, N., & Warmuth, M. (1987). Predicting 0, 1–functions on randomly drawn points. Unpublished manuscript, University of California, Department of Computer and Information Sciences, Santa Cruz.
Google Scholar
Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987a). On the learnability of Boolean formulae. Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (pp. 285–295). New York: The Association for Computing Machinery.
Google Scholar
Kearns, M., Li, M., Pitt, L., & Valiant, L. G. (1987b). Recent results on Boolean concept learning. Proceedings of the Fourth International Workshop on Ma-chine Learning (pp. 337–352). Irvine, CA: Morgan Kaufmann.
Google Scholar
Mitchell, T. M. (1982). Generalization as search. Artificial Intelligence, 18, 203–226.
Google Scholar
Muroga, S. (1971). Threshold logic and its applications. New York: John Wiley.
Google Scholar
Nilsson, N. J. (1965). Learning machines.New York: McGraw-Hill.
Google Scholar
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press.
Google Scholar
Slade, S. (1987). The programmer's guide to the Connection Machine (Technical Report). New Haven, CT: Yale University, Department of Computer Science.
Google Scholar
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134–1142.
Google Scholar
Valiant, L. G. (1985). Learning disjunctions of conjunctions. Proceedings of the Ninth International Joint Conference on Artificial Intelligence (pp. 560–566). Los Angeles, CA: Morgan Kaufmann.
Google Scholar
Vapnik, V. N. (1982). Estimation of dependencies based on empirical data.New York: Springer-Verlag.
Google Scholar
Vapnik, V. N., & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16, 264–280.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, University of California, Santa Cruz, CA, 95064, USA
Nick Littlestone

Authors

Nick Littlestone
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Littlestone, N. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm. Machine Learning 2, 285–318 (1988). https://doi.org/10.1023/A:1022869011914

Download citation

Issue Date: April 1988
DOI: https://doi.org/10.1023/A:1022869011914

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Abstract

Article PDF

Similar content being viewed by others

When is the Naive Bayes approximation not so naive?

Fast – Asymptotically Optimal – Methods for Determining the Optimal Number of Features

Incremental supervised learning: algorithms and applications in pattern recognition

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Abstract

Article PDF

Similar content being viewed by others

When is the Naive Bayes approximation not so naive?

Fast – Asymptotically Optimal – Methods for Determining the Optimal Number of Features

Incremental supervised learning: algorithms and applications in pattern recognition

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation