Skip to main content

A Theory for Memory-Based Learning

Abstract

A memory-based learning system is an extended memory management system that decomposes the input space either statically or dynamically into subregions for the purpose of storing and retrieving functional information. The main generalization techniques employed by memory-based learning systems are the nearest-neighbor search, space decomposition techniques, and clustering. Research on memory-based learning is still in its early stage. In particular, there are very few rigorous theoretical results regarding memory requirement, sample size, expected performance, and computational complexity. In this paper, we propose a model for memory-based learning and use it to analyze several methods— ∈-covering, hashing, clustering, tree-structured clustering, and receptive-fields—for learning smooth functions. The sample size and system complexity are derived for each method. Our model is built upon the generalized PAC learning model of Haussler (Haussler, 1989) and is closely related to the method of vector quantization in data compression. Our main result is that we can build memory-based learning systems using new clustering algorithms (Lin & Vitter, 1992a) to PAC-learn in polynomial time using only polynomial storage in typical situations.

References

  1. Albus, J. S. (1975a). Data storage in the cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement, and Control, 228–233.

  2. Albus, J. S. (1975b). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement, and Control, 220–227.

  3. Albus, J. S. (1981). Brains, Behaviour, and Robotics. Byte Books, Peterborough, NH.

    Google Scholar 

  4. Carter, J. L., & Wegman, M. N. (1979). Universal classes of hash functions. Journal of Computer System and Science, 18 (2):143–154.

    Google Scholar 

  5. Chvátal, V. (1979). A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4 (3):233–235.

    Google Scholar 

  6. Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13:21–27.

    Google Scholar 

  7. Dantzig, G. (1951). Programming of interdependent activities, II, mathematical models. In Activity Analysis of Production and Allocation, 19–32. John Wiley & Sons, Inc, New York.

    Google Scholar 

  8. Dean, T. L., & Wellman, M. P. (1991). Planning and Control. Morgan Kaufmann Publishers.

  9. Devroye, L. (1988). Automatic pattern recognition: A study of the probability of error. IEEE Transactions on Pattern Recognition and Machine Intelligence, 10 (4):530–543.

    Google Scholar 

  10. Duda, R. M., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.

  11. Dudley, R. M. (1978). Central limit theorems for empirical measures. Annals of Probability, 6 (6):899–929.

    Google Scholar 

  12. Dudley, R. M. (1984). A course on empirical processes. In Lecture Note in Mathematics 1097. Springer Verlag.

  13. Friedman, J. H. (1988). Multivariate Adaptive Regression Splines. Technical Report 102, Standford University, Lab for Computational Statistics.

  14. Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A Guide to the Theory of NP-completeness. W. H. Freeman and Co., San Francisco, CA.

    Google Scholar 

  15. Gersho, A. (1982). On the structure of vector quantizers. IEEE Transactions on Information Theory, 28 (2):157–166.

    Google Scholar 

  16. Gersho, A., & Gray, R. M. (1991). Vector Quantization and Signal Compression. Kluwer Academic Press, Massachusetts.

    Google Scholar 

  17. Gray, R. M. (1984). Vector quantization. IEEE ASSP Magazine, 4–29.

  18. Haussler, D. (1989). Generalizing the PAC model: Sample size bounds from metric dimension-based uniform convergence results. In Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science, 40–45.

  19. Haussler, D., Kearns, M., Littlestone, N., & Warmuth, M. K. (1991). Equivalence of models for polynomial learnability. Information and Computation, 95:129–161.

    Google Scholar 

  20. Haussler, D., & Long, P (1990). A generalization of sauer's lemma. Ucsc-crl–90–15, Dept. of Computer Science, UCSC.

  21. Johnson, D. S. (1974). Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 9:256–278.

    Google Scholar 

  22. Kariv, O., & Hakimi, S. L. (1979). An algorithmic approach to network location problems. II: The p-medians. SIAM Journal on Applied Mathematics, 539–560.

  23. Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395.

    Google Scholar 

  24. Khachiyan, L. G. (1979). A polynomial algorithm in linear programming. Soviet Math. Doklady, 20:191–194.

    Google Scholar 

  25. Lin, J.-H., & Vitter, J. S. (1992a). ɛ-approximations with minimum packing constraint violation. In Proceedings of the 24th Annual ACM Symposium on Theory of Computing, 771–782, Victoria, BC, Canada.

  26. Lin, J.-H., & Vitter, J. S. (1992b). Nearly optimal vector quantization via linear programming. In Proceedings of the IEEE Data Compression Conference, 22–31, Snowbird, Utah.

    Google Scholar 

  27. Lovász, L. (1975). On the ratio of optimal integral and fractional covers. Discrete Mathematics, 13:383–390.

    Google Scholar 

  28. Megiddo, N., & Supowit, K. J. (1984). On the complexity of some common geometric location problems. SIAM Journal on Computing, 13 (1):182–196.

    Google Scholar 

  29. Miller, W. T. (1987). Sensor-based control of robotic manipulators using a general learning algorithms. IEEE Journal of Robotics and Automation, 3 (2):157–165.

    Google Scholar 

  30. Miller, W. T., Glanz, F. H., & Kraft, L. G. (1987a). Application of a general learning algorithm to the control of robotic manipulators. International Journal of Robotics Research, 6 (2):84–98.

    Google Scholar 

  31. Moody, J. (1989). Fast learning in multi-resolution hierarchies. In Advances in Neural Information Processing Systems I, 29–39. Morgan Kaufmann Publisher.

  32. Moody, J., & Darken, C. (1988). Learning with localized receptive fields. In Proceedings of the 1988 Connectionist Models Summer School, 133–143. Morgan Kaufmann Publisher.

  33. Moore, A. W. (1989). Acquisition of Dynamic Control Knowledge for Robotic Manipulator. Manuscript.

  34. Papadimitriou, C. H. (1981). Worst-case and probabilistic analysis of a geometric location problem. SIAM Journal on Computing, 10:542–557.

    Google Scholar 

  35. Poggio, T., & Girosi, F. (1989). A theory of networks for approximation and learning. A. I. Memo No. 1140, MIT. Artificial Intelligence Laboratory, Boston, MA.

    Google Scholar 

  36. Poggio, T., & Girosi, F. (1990). Extensions of a theory of networks for approximation and learning: Dimensionality reduction and clustering. A. I. Memo No. 1167, MIT. Artificial Intelligence Laboratory, Boston, MA.

    Google Scholar 

  37. Pollard, D. (1984). Convergence of Stochastic Processes. Springer-Verlag New York Inc.

  38. Pollard, D. (1990). Empirical Processes: Theory and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics Volume 2.

  39. Ramakrishna, M. V., & Awasthi, V. (1991). A Survey of Perfect Hashing. Manuscript.

  40. Riskin, E. A. (1990). Variable Rate Vector Quantization of Images. Ph. D. Dissertation, Stanford University.

  41. Sauer, N. (1972). On the density of families of sets. Journal of Combinatorial Theory (A), 13:145–147.

    Google Scholar 

  42. Siegel, A. (1991). Coalesced Hashing is Computably Good. Manuscript.

  43. Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer Verlag, New York.

    Google Scholar 

  44. Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 264–280.

  45. Vitter, J. S., & Chen, W.-C. (1987). Design and Analysis of Coalesced Hashing. Oxford University Press.

  46. Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 2 (3):408–421.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lin, JH., Vitter, J.S. A Theory for Memory-Based Learning. Machine Learning 17, 143–167 (1994). https://doi.org/10.1023/A:1022667616941

Download citation

  • Memory-based learning
  • PAC learning
  • clustering
  • approximation
  • linear programming
  • relaxation
  • covering
  • hashing