Skip to main content
Log in

Deterministic Generative Models for Fast Feature Discovery

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We propose a vector quantisation method which does not only provide a compact description of data vectors in terms codebook vectors, but also gives an explanation of codebook vectors as binary combinations of elementary features. This corresponds to the intuitive notion that, in the real world, patterns can be usefully thought of as being constructed by compositions from simpler features. The model can be understood as a generative model, in which the codebook vector is generated by a hidden binary state vector. The model is non-probabilistic in the sense that it assigns each data vector to a single codebook vector. We describe exact and approximate learning algorithms for learning deterministic feature representations. In contrast to probabilistic models, the deterministic approach allows the use of message propagation algorithms within the learning scheme. These are compared with standard mean-field/Gibbs sampling learning. We show that Generative Vector Quantisation gives a good performance in large scale real world tasks like image compression and handwritten digit analysis with up to 400 data dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Attias, H. 1999. Independent factor analysis. Neural Computation, 11(4):803–851.

    Google Scholar 

  • Barlow, H.B. 1989. Unsupervised learning. Neural Computation, 1(3):295–311.

    Google Scholar 

  • Barnes, C.F. and Frost, R.L. 1993. Vector Quantizers with Direct Sum Codebooks. IEEE Transactions on Information Theory, 36(2):565–580.

    Google Scholar 

  • Bell, A.J. and Sejnowski, T.J. 1995. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129–1159.

    Google Scholar 

  • Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38.

    Google Scholar 

  • Ghahramani, Z. 1995. Factorial learning and the EM algorithm In Advances in Neural Information Processing Systems, vol. 7. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Ghahramani, Z. and Hinton, G.E. 1998. Hierarchical non-linear factor analysis and topographic maps. In Advances in Neural Information Processing Systems, volume 10. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Gray, R.M. 1984. Vector quantisation. IEEE ASSP Magazine, pp. 4–29.

  • Hull, J.J. 1997. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554.

    Google Scholar 

  • Knagenhjelm, P. and Agrell, E. 1996. The Hadamard transform—a tool for index assignment. IEEE Transaction on Information Theory, 42(4):1139–1151.

    Google Scholar 

  • McEliece, R., Rodemich, E., and Cheng, J. 1995. The Turbo Decision Algorithm. In Proc. 33rd Allerton Conference on Communications, Control and Computing, pp. 366–379.

  • McLaughlin, S.W., Neuhoff, D.L., and Ashley, J.J. 1995. Optimal binary index assignments for a class of equiprobable scalar and vector quantizers. IEEE Transactions on Information Theory, 41(6):2031–2037.

    Google Scholar 

  • Mehes, A. and Zeger, K. 1998. Binary lattice vector quantisation with linear block codes and affine index assignments. IEEE Transactions on Information Theory, 44(1):79–94.

    Google Scholar 

  • Neal, R.M. 1991. Connectionist learning of belief networks. Artificial Intelligence, 56:71–113.

    Google Scholar 

  • Neal, R.M. and Hinton, G.E. 1998. Learning in Graphical Models, chapter A view of the EM algorithm that justifies incremental, sparse, and other variants, Dordrecht, The Netherlands: Kluwer Academic Publishers, pp. 355–368.

    Google Scholar 

  • Olshausen, B.A. and Field, D.J. 1996. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609.

    Google Scholar 

  • Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann Publishers.

    Google Scholar 

  • Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. 1992. Numerical Recipes in C: The Art of Scientific Computing. 2nd ed. Cambridge, MA: Cambridge University Press.

    Google Scholar 

  • Sallans, B., Hinton, G.E., and Ghahramani, Z. 1998. A hierarchical community of experts. In Neural Networks and Machine Learning, NATO ASI Series F, C.M. Bishop (Ed.). pp. 269–284, Berlin: Springer-Verlag.

    Google Scholar 

  • Saul, L.K., Jaakkola, T., and Jordan, M.I. 1996. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4:61–76.

    Google Scholar 

  • Saund, E. 1995. A multiple cause mixture model for unsupervised learning. Neural Computation, 7(1):51–71.

    Google Scholar 

  • Weiss, Y. 1997. Belief propagation and revision in networks with loops. Technical Report, MIT-AILab, AIM-1616.

  • Zemel, R.S. 1994. A minimum description length framework for unsupervised learning. Technical Report, University of Toronto, CRG-TR-93-2.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Westerdijk, M., Barber, D. & Wiegerinck, W. Deterministic Generative Models for Fast Feature Discovery. Data Mining and Knowledge Discovery 5, 337–363 (2001). https://doi.org/10.1023/A:1011405212443

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011405212443

Navigation