Learning Quickly When Irrelevant Attributes Abound: A New LinearThreshold Algorithm
 Nick Littlestone
 … show all 1 hide
Abstract
Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linearthreshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space.
 Angluin, D. (1987). Queries and concept learning. Machine Learning, 2, 319–342.
 Angluin, D., & Smith, C. H. (1983). Inductive inference: Theory and methods. Computing Surveys, 15, 237–269.
 Banerji, R. B. (1985). The logic of learning: A basis for pattern recognition and for improvement of performance. Advances in Computers, 24, 177–216.
 Barzdin, J. M., & Freivald, R. V. (1972). On the prediction of general recursive functions. Soviet Mathematics Doklady, 13, 1224–1228.
 Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987a). Learnability and the VapnikChervonenkis dimension (Technical Report USCSCRL87–20). Santa Cruz: University of California, Computer Research Laboratory.
 Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987b). Occam's Razor. Information Processing Letters, 24, 377–380.
 Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: John Wiley.
 Hampson, S. E., & Volper, D. J. (1986). Linear function neurons: Structure and training. Biological Cybernetics, 53, 203–217.
 Haussler, D. (1985). Space efficient learning algorithms. Unpublished manuscript, University of California, Department of Computer and Information Sciences, Santa Cruz.
 Haussler, D. (1986). Quantifying the inductive bias in concept learning. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 485–489). Philadelphia, PA: Morgan Kaufman
 Haussler, D., Littlestone, N., & Warmuth, M. (1987). Predicting 0, 1–functions on randomly drawn points. Unpublished manuscript, University of California, Department of Computer and Information Sciences, Santa Cruz.
 Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987a). On the learnability of Boolean formulae. Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (pp. 285–295). New York: The Association for Computing Machinery.
 Kearns, M., Li, M., Pitt, L., & Valiant, L. G. (1987b). Recent results on Boolean concept learning. Proceedings of the Fourth International Workshop on Machine Learning (pp. 337–352). Irvine, CA: Morgan Kaufmann.
 Mitchell, T. M. (1982). Generalization as search. Artificial Intelligence, 18, 203–226.
 Muroga, S. (1971). Threshold logic and its applications. New York: John Wiley.
 Nilsson, N. J. (1965). Learning machines.New York: McGrawHill.
 Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press.
 Slade, S. (1987). The programmer's guide to the Connection Machine (Technical Report). New Haven, CT: Yale University, Department of Computer Science.
 Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134–1142.
 Valiant, L. G. (1985). Learning disjunctions of conjunctions. Proceedings of the Ninth International Joint Conference on Artificial Intelligence (pp. 560–566). Los Angeles, CA: Morgan Kaufmann.
 Vapnik, V. N. (1982). Estimation of dependencies based on empirical data.New York: SpringerVerlag.
 Vapnik, V. N., & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16, 264–280.
 Title
 Learning Quickly When Irrelevant Attributes Abound: A New LinearThreshold Algorithm
 Journal

Machine Learning
Volume 2, Issue 4 , pp 285318
 Cover Date
 19880401
 DOI
 10.1023/A:1022869011914
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Kluwer Academic PublishersPlenum Publishers
 Additional Links
 Topics
 Keywords

 Learning from examples
 prediction
 incremental learning
 mistake bounds
 learning Boolean functions
 linearthreshold algorithms
 Industry Sectors
 Authors

 Nick Littlestone ^{(1)}
 Author Affiliations

 1. Department of Computer and Information Sciences, University of California, Santa Cruz, CA, 95064, USA