Abstract
The goal of a so-called vector quantizer is to compute a compact representation for a set of data vectors. It maps vectors from some input data space onto a finite set of typical reproduction vectors. Ideally, during this transformation no information should be lost that is relevant for the further processing of the data. Consequently, one tries to reduce the effort for storage and transmission of vector-valued data by eliminating redundant information contained therein.
The goal of finding a compact representation for the distribution of some data can also be considered from the viewpoint of statistics. Then the task can be described as trying to find a suitable probability distribution that adequately represents the input data. This is usually achieved by means of mixture densities.
In this chapter we will first formally define the concept of a vector quantizer and derive conditions for its optimality. Subsequently, the most important algorithms for building vector quantizers will be presented. Finally, the unsupervised estimation of mixture densities will be treated as a generalization of the vector quantization problem.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A vector quantizer with a codebook containing N prototypes is sometimes also referred to as an N-level quantizer. This term should, however, not be confused with the notion of a multi-stage vector quantizer which generates the quantization result in multiple subsequent processing steps for increased efficiency.
- 2.
Vector quantizers that use a variable data rate also transmit the quantized indices themselves in compressed form, e.g., by applying a Huffman coding to them (cf. e.g. [110, Chap. 17, pp. 631–633]).
- 3.
The quantization error ε(x|Q)=∥x−Q(x)∥ is obtained if the Euclidean distance is used here instead of a general distance measure.
- 4.
In [110, p. 350] it is suggested to perform the mapping in such cases onto the codebook vector with smallest index i.
- 5.
This derivation of a lower bound on the average quantization error is possible because both factors d(⋅,⋅) and p(x) in the integral take on non-negative values. It is, therefore, sufficient to chose d(⋅,⋅) to be locally minimal for every vector x of the input space in order to minimize the integral as a whole.
- 6.
Depending on the distribution of the vectors within the cell considered, it may happen that in certain cases the centroid is not uniquely defined. The minimum of the mean distance is then achieved in the same way for different prototype vectors. As with the choice of the nearest neighbor, the centroid can then be selected arbitrarily among the candidates without affecting optimality.
- 7.
Experiments of the author have shown that a relative improvement of the quantization error of less than a tenth of a percent usually does not result in an improvement of the codebook which is relevant for practical applications.
- 8.
When using the Euclidean distance measure, the centroid of a cell R is given by the expected value of its data distribution. This boils down to the sample mean for empirically defined distributions: \(\mathrm{cent}(R) \hat{=} \frac{1}{|R|} \sum_{\boldsymbol{x} \in R} \boldsymbol{x}\). The sample mean can easily be computed incrementally by performing the summation over all data vectors first and then normalizing the result.
- 9.
In order to speed up the procedure in the starting phase one can also choose a small codebook containing N 0≪N randomly selected vectors without adversely affecting the final result of the algorithm.
- 10.
In contrast to splitting up all existing prototypes, the splitting procedure can be applied to a suitable subset only, e.g., those codebook entries that account for the highest local quantization error on the data set considered.
- 11.
See also the bibliographical remarks in Sect. 4.5.
- 12.
As we have seen from the description of ML estimation (see Sect. 3.6.1) using the log-likelihood instead of the likelihood function is admissible and can lead to considerable simplifications in the further mathematical derivations.
References
Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report TR-97-021, International Computer Science Institute, Berkeley (1997)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Bock, H.-H.: Origins and extensions of the k-means algorithm in cluster analysis. J. Électron. Hist. Probab. Stat. 4(2) (2008)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–22 (1977)
Furui, S.: Digital Speech Processing, Synthesis, and Recognition. Signal Processing and Communications Series. Marcel Dekker, New York (2000)
Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Audio Speech Lang. Process. 2(2), 291–298 (1994)
Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Communications and Information Theory. Kluwer Academic, Boston (1992)
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)
Huang, X.D., Ariki, Y., Jack, M.A.: Hidden Markov Models for Speech Recognition. Information Technology Series, vol. 7. Edinburgh University Press, Edinburgh (1990)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proc. Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–296 (1967)
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Sabin, M., Gray, R.: Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans. Inf. Theory 32(2), 148–155 (1986)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Fink, G.A. (2014). Vector Quantization and Mixture Estimation. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_4
Download citation
DOI: https://doi.org/10.1007/978-1-4471-6308-4_4
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6307-7
Online ISBN: 978-1-4471-6308-4
eBook Packages: Computer ScienceComputer Science (R0)