A memory-based learning system is an extended memory management system that decomposes the input space either statically or dynamically into subregions for the purpose of storing and retrieving functional information. The main generalization techniques employed by memory-based learning systems are the nearest-neighbor search, space decomposition techniques, and clustering. Research on memory-based learning is still in its early stage. In particular, there are very few rigorous theoretical results regarding memory requirement, sample size, expected performance, and computational complexity. In this paper, we propose a model for memory-based learning and use it to analyze several methods— ∈-covering, hashing, clustering, tree-structured clustering, and receptive-fields—for learning smooth functions. The sample size and system complexity are derived for each method. Our model is built upon the generalized PAC learning model of Haussler (Haussler, 1989) and is closely related to the method of vector quantization in data compression. Our main result is that we can build memory-based learning systems using new clustering algorithms (Lin & Vitter, 1992a) to PAC-learn in polynomial time using only polynomial storage in typical situations.
Albus, J. S. (1975a). Data storage in the cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement, and Control, 228–233.
Albus, J. S. (1975b). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement, and Control, 220–227.
Albus, J. S. (1981). Brains, Behaviour, and Robotics. Byte Books, Peterborough, NH.
Carter, J. L., & Wegman, M. N. (1979). Universal classes of hash functions. Journal of Computer System and Science, 18 (2):143–154.
Chvátal, V. (1979). A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4 (3):233–235.
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13:21–27.
Dantzig, G. (1951). Programming of interdependent activities, II, mathematical models. In Activity Analysis of Production and Allocation, 19–32. John Wiley & Sons, Inc, New York.
Dean, T. L., & Wellman, M. P. (1991). Planning and Control. Morgan Kaufmann Publishers.
Devroye, L. (1988). Automatic pattern recognition: A study of the probability of error. IEEE Transactions on Pattern Recognition and Machine Intelligence, 10 (4):530–543.
Duda, R. M., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.
Dudley, R. M. (1978). Central limit theorems for empirical measures. Annals of Probability, 6 (6):899–929.
Dudley, R. M. (1984). A course on empirical processes. In Lecture Note in Mathematics 1097. Springer Verlag.
Friedman, J. H. (1988). Multivariate Adaptive Regression Splines. Technical Report 102, Standford University, Lab for Computational Statistics.
Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A Guide to the Theory of NP-completeness. W. H. Freeman and Co., San Francisco, CA.
Gersho, A. (1982). On the structure of vector quantizers. IEEE Transactions on Information Theory, 28 (2):157–166.
Gersho, A., & Gray, R. M. (1991). Vector Quantization and Signal Compression. Kluwer Academic Press, Massachusetts.
Gray, R. M. (1984). Vector quantization. IEEE ASSP Magazine, 4–29.
Haussler, D. (1989). Generalizing the PAC model: Sample size bounds from metric dimension-based uniform convergence results. In Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science, 40–45.
Haussler, D., Kearns, M., Littlestone, N., & Warmuth, M. K. (1991). Equivalence of models for polynomial learnability. Information and Computation, 95:129–161.
Haussler, D., & Long, P (1990). A generalization of sauer's lemma. Ucsc-crl–90–15, Dept. of Computer Science, UCSC.
Johnson, D. S. (1974). Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 9:256–278.
Kariv, O., & Hakimi, S. L. (1979). An algorithmic approach to network location problems. II: The p-medians. SIAM Journal on Applied Mathematics, 539–560.
Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395.
Khachiyan, L. G. (1979). A polynomial algorithm in linear programming. Soviet Math. Doklady, 20:191–194.
Lin, J.-H., & Vitter, J. S. (1992a). ɛ-approximations with minimum packing constraint violation. In Proceedings of the 24th Annual ACM Symposium on Theory of Computing, 771–782, Victoria, BC, Canada.
Lin, J.-H., & Vitter, J. S. (1992b). Nearly optimal vector quantization via linear programming. In Proceedings of the IEEE Data Compression Conference, 22–31, Snowbird, Utah.
Lovász, L. (1975). On the ratio of optimal integral and fractional covers. Discrete Mathematics, 13:383–390.
Megiddo, N., & Supowit, K. J. (1984). On the complexity of some common geometric location problems. SIAM Journal on Computing, 13 (1):182–196.
Miller, W. T. (1987). Sensor-based control of robotic manipulators using a general learning algorithms. IEEE Journal of Robotics and Automation, 3 (2):157–165.
Miller, W. T., Glanz, F. H., & Kraft, L. G. (1987a). Application of a general learning algorithm to the control of robotic manipulators. International Journal of Robotics Research, 6 (2):84–98.
Moody, J. (1989). Fast learning in multi-resolution hierarchies. In Advances in Neural Information Processing Systems I, 29–39. Morgan Kaufmann Publisher.
Moody, J., & Darken, C. (1988). Learning with localized receptive fields. In Proceedings of the 1988 Connectionist Models Summer School, 133–143. Morgan Kaufmann Publisher.
Moore, A. W. (1989). Acquisition of Dynamic Control Knowledge for Robotic Manipulator. Manuscript.
Papadimitriou, C. H. (1981). Worst-case and probabilistic analysis of a geometric location problem. SIAM Journal on Computing, 10:542–557.
Poggio, T., & Girosi, F. (1989). A theory of networks for approximation and learning. A. I. Memo No. 1140, MIT. Artificial Intelligence Laboratory, Boston, MA.
Poggio, T., & Girosi, F. (1990). Extensions of a theory of networks for approximation and learning: Dimensionality reduction and clustering. A. I. Memo No. 1167, MIT. Artificial Intelligence Laboratory, Boston, MA.
Pollard, D. (1984). Convergence of Stochastic Processes. Springer-Verlag New York Inc.
Pollard, D. (1990). Empirical Processes: Theory and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics Volume 2.
Ramakrishna, M. V., & Awasthi, V. (1991). A Survey of Perfect Hashing. Manuscript.
Riskin, E. A. (1990). Variable Rate Vector Quantization of Images. Ph. D. Dissertation, Stanford University.
Sauer, N. (1972). On the density of families of sets. Journal of Combinatorial Theory (A), 13:145–147.
Siegel, A. (1991). Coalesced Hashing is Computably Good. Manuscript.
Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer Verlag, New York.
Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 264–280.
Vitter, J. S., & Chen, W.-C. (1987). Design and Analysis of Coalesced Hashing. Oxford University Press.
Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 2 (3):408–421.
About this article
Cite this article
Lin, JH., Vitter, J.S. A Theory for Memory-Based Learning. Machine Learning 17, 143–167 (1994). https://doi.org/10.1023/A:1022667616941
- Memory-based learning
- PAC learning
- linear programming