Abstract
Based on n randomly drawn vectors in a Hilbert space, we study the k-means clustering scheme. Here, clustering is performed by computing the Voronoi partition associated with centers that minimize an empirical criterion, called distorsion. The performance of the method is evaluated by comparing the theoretical distorsion of empirical optimal centers to the theoretical optimal distorsion. Our first result states that, provided that the underlying distribution satisfies an exponential moment condition, an upper bound for the above performance criterion is \(O(1/\sqrt{n})\). Then, motivated by a broad range of applications, we focus on the case where the data are real-valued random fields. Assuming that they share a Hölder property in quadratic mean, we construct a numerically simple k-means algorithm based on a discretized version of the data. With a judicious choice of the discretization, we prove that the performance of this algorithm matches the performance of the classical algorithm.
This is a preview of subscription content, access via your institution.
References
Abaya EA, Wise GL (1984) Convergence of vector quantizers with applications to optimal quantization. SIAM J Appl Math 183–189
Adamczak R (2008) A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron J Probab 1000–1034
Antos A (2005) Improved minimax bounds on the test and training distortion of empirically designed vector quantizers. IEEE Trans Inf Theory 4022–4032
Antos A, Györfi L, György A (2005) Improved convergence rates in empirical vector quantizer design. IEEE Trans Inf Theory 4013–4022
Bartlett PL, Linder T, Lugosi G (1998) The minimax distorsion redundancy in empirical quantizer design. IEEE Trans Inf Theory 1802–1813
Biau G, Devroye L, Lugosi G (2008) On the performance of clustering in Hilbert spaces. IEEE Trans Inf Theory 781–790
Breton JC, Nourdin I, Peccati G (2009). Exact confidence intervals for the Hurst parameter of a fractional Brownian motion. Electron J Stat 416–425
Chou PA (1994) The distorsion of vector quantizers trained on n vectors decreases to the optimum at O _{ P }(1/n). IEEE Trans Inf Theory 457–457
Coeurjolly J-F (2008). Hurst exponent estimation of locally self-similar Gaussian processes using sample quantiles. Ann Stat 1404–1434
Cont R, Tankov P (2003) Financial modelling with jump processes, 2nd edn. Chapmann and Hall, CRC Press, London
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern-recognition. Springer, New York
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York
Frisch U (1995) Turbulences. Cambridge University Press, Cambridge
Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer Academic, Boston
Graf S, Luschgy H (2000) Foundations of quantization for probability distributions. Lectures notes in mathematics, vol 1730. Springer, New York
Gray RM, Neuhoff DL (1998) Quantization. IEEE Trans Inf Theory 2325–2384
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Hartigan JA (1978) Asymptotic distributions for clustering criteria. Ann Stat 117–131
Huang W (2009) Exponential integrability of Itô’s processes. J Math Anal Appl 427–433
Kärner O (2001) Comments on Hurst exponent. Geophys Res Lett 3825–3826
Kimmel M, Axelrod DE (2002) Branching processes in biology. Springer, Berlin
Lacaux C, Loubès J-M (2007) Hurst exponent estimation of fractional Lévy motions. Alea 143–164
Lamberton D, Lapeyre B (1996) Introduction to stochastic calculus applied to finance. Chapman and Hall, CRC Press, London
Linder T (2000) On the training distortion of vector quantizers. IEEE Trans Inf Theory 1617–1623
Linder T (2001) Learning-theoretic methods in vector quantization. Lecture notes for the advanced school on the principle of nonparametric learning, Udine, Italy, July 9–13
Linder T, Lugosi G, Zeger K (1994) Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding. IEEE Trans Inf Theory 1728–1740
Lindstrøm T (1993). Fractional Brownian fields as integrals of white noise. Bull Lond Math Soc 83–88
Mandelbrot B (1997) Fractals and scaling in finance. Selected works of Benoit B. Mandelbrot. Discontinuity, concentration, risk, selecta vol E, with a forward by RE Gomory. Springer, New York
Mandelbrot B, van Ness J (1968) Fractional Brownian motion, fractional noises and applications. SIAM Rev 422–437
Maurer A, Pontil M (2010). K-dimensional coding schemes in Hilbert spaces. IEEE Trans Inf Theory 5839–5846
Pipiras V, Taqqu MS (2003) Fractional calculus and its connection to fractional Brownian motion. In: Long range dependence. Birkhäuser, Basel, pp 166–201
Pisier G (1983) Some applications of the metric entropy condition to harmonic analysis. In: Banach spaces, harmonic analysis, and probability theory. Lecture notes in math, vol 995. Springer, Berlin, pp 123–154
Pollard D (1981) Strong consistency of k-means clustering. Ann Stat 135–140
Pollard D (1982a) A central limit theorem for k-means clustering. Ann Probab 199–205
Pollard D (1982b) Quantization and the method of k-means. IEEE Trans Inf Theory 1728–1740
Revuz D, Yor M (1999) Continuous martingales and Brownian motion. Springer, New York
Van Kampen NG (2007) Stochastic processes in physics and chemistry, 3rd edn. Elsevier, New York
Author information
Affiliations
Corresponding author
Additional information
Communicated by Domingo Morales.
Rights and permissions
About this article
Cite this article
Cadre, B., Paris, Q. On Hölder fields clustering. TEST 21, 301–316 (2012). https://doi.org/10.1007/s11749-011-0244-4
Received:
Accepted:
Published:
Issue Date:
Keywords
- Clustering
- k-means
- Unsupervised learning
- Random fields
- Hilbert space
- Empirical risk minimization
Mathematics Subject Classification (2000)
- 62H30
- 68T05