Vector Quantization and Mixture Estimation

Fink, Gernot A.

doi:10.1007/978-1-4471-6308-4_4

Vector Quantization and Mixture Estimation

Gernot A. Fink⁴

Chapter

4594 Accesses

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

The goal of a so-called vector quantizer is to compute a compact representation for a set of data vectors. It maps vectors from some input data space onto a finite set of typical reproduction vectors. Ideally, during this transformation no information should be lost that is relevant for the further processing of the data. Consequently, one tries to reduce the effort for storage and transmission of vector-valued data by eliminating redundant information contained therein.

The goal of finding a compact representation for the distribution of some data can also be considered from the viewpoint of statistics. Then the task can be described as trying to find a suitable probability distribution that adequately represents the input data. This is usually achieved by means of mixture densities.

In this chapter we will first formally define the concept of a vector quantizer and derive conditions for its optimality. Subsequently, the most important algorithms for building vector quantizers will be presented. Finally, the unsupervised estimation of mixture densities will be treated as a generalization of the vector quantization problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A vector quantizer with a codebook containing N prototypes is sometimes also referred to as an N-level quantizer. This term should, however, not be confused with the notion of a multi-stage vector quantizer which generates the quantization result in multiple subsequent processing steps for increased efficiency.
2.
Vector quantizers that use a variable data rate also transmit the quantized indices themselves in compressed form, e.g., by applying a Huffman coding to them (cf. e.g. [110, Chap. 17, pp. 631–633]).
3.
The quantization error ε(x|Q)=∥x−Q(x)∥ is obtained if the Euclidean distance is used here instead of a general distance measure.
4.
In [110, p. 350] it is suggested to perform the mapping in such cases onto the codebook vector with smallest index i.
5.
This derivation of a lower bound on the average quantization error is possible because both factors d(⋅,⋅) and p(x) in the integral take on non-negative values. It is, therefore, sufficient to chose d(⋅,⋅) to be locally minimal for every vector x of the input space in order to minimize the integral as a whole.
6.
Depending on the distribution of the vectors within the cell considered, it may happen that in certain cases the centroid is not uniquely defined. The minimum of the mean distance is then achieved in the same way for different prototype vectors. As with the choice of the nearest neighbor, the centroid can then be selected arbitrarily among the candidates without affecting optimality.
7.
Experiments of the author have shown that a relative improvement of the quantization error of less than a tenth of a percent usually does not result in an improvement of the codebook which is relevant for practical applications.
8.
When using the Euclidean distance measure, the centroid of a cell R is given by the expected value of its data distribution. This boils down to the sample mean for empirically defined distributions: \(\mathrm{cent}(R) \hat{=} \frac{1}{|R|} \sum_{\boldsymbol{x} \in R} \boldsymbol{x}\). The sample mean can easily be computed incrementally by performing the summation over all data vectors first and then normalizing the result.
9.
In order to speed up the procedure in the starting phase one can also choose a small codebook containing N ⁰≪N randomly selected vectors without adversely affecting the final result of the algorithm.
10.
In contrast to splitting up all existing prototypes, the splitting procedure can be applied to a suitable subset only, e.g., those codebook entries that account for the highest local quantization error on the data set considered.
11.
See also the bibliographical remarks in Sect. 4.5.
12.
As we have seen from the description of ML estimation (see Sect. 3.6.1) using the log-likelihood instead of the likelihood function is admissible and can lead to considerable simplifications in the further mathematical derivations.

References

Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report TR-97-021, International Computer Science Institute, Berkeley (1997)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Bock, H.-H.: Origins and extensions of the k-means algorithm in cluster analysis. J. Électron. Hist. Probab. Stat. 4(2) (2008)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–22 (1977)
MATH MathSciNet Google Scholar
Furui, S.: Digital Speech Processing, Synthesis, and Recognition. Signal Processing and Communications Series. Marcel Dekker, New York (2000)
Google Scholar
Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Audio Speech Lang. Process. 2(2), 291–298 (1994)
Google Scholar
Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Communications and Information Theory. Kluwer Academic, Boston (1992)
Book MATH Google Scholar
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)
Google Scholar
Huang, X.D., Ariki, Y., Jack, M.A.: Hidden Markov Models for Speech Recognition. Information Technology Series, vol. 7. Edinburgh University Press, Edinburgh (1990)
Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Google Scholar
Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)
Article Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MATH MathSciNet Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proc. Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–296 (1967)
Google Scholar
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Sabin, M., Gray, R.: Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans. Inf. Theory 32(2), 148–155 (1986)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund University, Dortmund, Germany
Gernot A. Fink

Authors

Gernot A. Fink
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fink, G.A. (2014). Vector Quantization and Mixture Estimation. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_4

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6308-4_4
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6307-7
Online ISBN: 978-1-4471-6308-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics