On Hölder fields clustering

Abstract

Based on n randomly drawn vectors in a Hilbert space, we study the k-means clustering scheme. Here, clustering is performed by computing the Voronoi partition associated with centers that minimize an empirical criterion, called distorsion. The performance of the method is evaluated by comparing the theoretical distorsion of empirical optimal centers to the theoretical optimal distorsion. Our first result states that, provided that the underlying distribution satisfies an exponential moment condition, an upper bound for the above performance criterion is \(O(1/\sqrt{n})\). Then, motivated by a broad range of applications, we focus on the case where the data are real-valued random fields. Assuming that they share a Hölder property in quadratic mean, we construct a numerically simple k-means algorithm based on a discretized version of the data. With a judicious choice of the discretization, we prove that the performance of this algorithm matches the performance of the classical algorithm.

This is a preview of subscription content, access via your institution.

References

  1. Abaya EA, Wise GL (1984) Convergence of vector quantizers with applications to optimal quantization. SIAM J Appl Math 183–189

  2. Adamczak R (2008) A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron J Probab 1000–1034

  3. Antos A (2005) Improved minimax bounds on the test and training distortion of empirically designed vector quantizers. IEEE Trans Inf Theory 4022–4032

  4. Antos A, Györfi L, György A (2005) Improved convergence rates in empirical vector quantizer design. IEEE Trans Inf Theory 4013–4022

  5. Bartlett PL, Linder T, Lugosi G (1998) The minimax distorsion redundancy in empirical quantizer design. IEEE Trans Inf Theory 1802–1813

  6. Biau G, Devroye L, Lugosi G (2008) On the performance of clustering in Hilbert spaces. IEEE Trans Inf Theory 781–790

  7. Breton JC, Nourdin I, Peccati G (2009). Exact confidence intervals for the Hurst parameter of a fractional Brownian motion. Electron J Stat 416–425

  8. Chou PA (1994) The distorsion of vector quantizers trained on n vectors decreases to the optimum at O P (1/n). IEEE Trans Inf Theory 457–457

  9. Coeurjolly J-F (2008). Hurst exponent estimation of locally self-similar Gaussian processes using sample quantiles. Ann Stat 1404–1434

  10. Cont R, Tankov P (2003) Financial modelling with jump processes, 2nd edn. Chapmann and Hall, CRC Press, London

    Google Scholar 

  11. Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern-recognition. Springer, New York

    Google Scholar 

  12. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York

    Google Scholar 

  13. Frisch U (1995) Turbulences. Cambridge University Press, Cambridge

    Google Scholar 

  14. Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer Academic, Boston

    Google Scholar 

  15. Graf S, Luschgy H (2000) Foundations of quantization for probability distributions. Lectures notes in mathematics, vol 1730. Springer, New York

    Google Scholar 

  16. Gray RM, Neuhoff DL (1998) Quantization. IEEE Trans Inf Theory 2325–2384

  17. Hartigan JA (1975) Clustering algorithms. Wiley, New York

    Google Scholar 

  18. Hartigan JA (1978) Asymptotic distributions for clustering criteria. Ann Stat 117–131

  19. Huang W (2009) Exponential integrability of Itô’s processes. J Math Anal Appl 427–433

  20. Kärner O (2001) Comments on Hurst exponent. Geophys Res Lett 3825–3826

  21. Kimmel M, Axelrod DE (2002) Branching processes in biology. Springer, Berlin

    Google Scholar 

  22. Lacaux C, Loubès J-M (2007) Hurst exponent estimation of fractional Lévy motions. Alea 143–164

  23. Lamberton D, Lapeyre B (1996) Introduction to stochastic calculus applied to finance. Chapman and Hall, CRC Press, London

    Google Scholar 

  24. Linder T (2000) On the training distortion of vector quantizers. IEEE Trans Inf Theory 1617–1623

  25. Linder T (2001) Learning-theoretic methods in vector quantization. Lecture notes for the advanced school on the principle of nonparametric learning, Udine, Italy, July 9–13

    Google Scholar 

  26. Linder T, Lugosi G, Zeger K (1994) Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding. IEEE Trans Inf Theory 1728–1740

  27. Lindstrøm T (1993). Fractional Brownian fields as integrals of white noise. Bull Lond Math Soc 83–88

  28. Mandelbrot B (1997) Fractals and scaling in finance. Selected works of Benoit B. Mandelbrot. Discontinuity, concentration, risk, selecta vol E, with a forward by RE Gomory. Springer, New York

    Google Scholar 

  29. Mandelbrot B, van Ness J (1968) Fractional Brownian motion, fractional noises and applications. SIAM Rev 422–437

  30. Maurer A, Pontil M (2010). K-dimensional coding schemes in Hilbert spaces. IEEE Trans Inf Theory 5839–5846

  31. Pipiras V, Taqqu MS (2003) Fractional calculus and its connection to fractional Brownian motion. In: Long range dependence. Birkhäuser, Basel, pp 166–201

    Google Scholar 

  32. Pisier G (1983) Some applications of the metric entropy condition to harmonic analysis. In: Banach spaces, harmonic analysis, and probability theory. Lecture notes in math, vol 995. Springer, Berlin, pp 123–154

    Google Scholar 

  33. Pollard D (1981) Strong consistency of k-means clustering. Ann Stat 135–140

  34. Pollard D (1982a) A central limit theorem for k-means clustering. Ann Probab 199–205

  35. Pollard D (1982b) Quantization and the method of k-means. IEEE Trans Inf Theory 1728–1740

  36. Revuz D, Yor M (1999) Continuous martingales and Brownian motion. Springer, New York

    Google Scholar 

  37. Van Kampen NG (2007) Stochastic processes in physics and chemistry, 3rd edn. Elsevier, New York

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Benoît Cadre.

Additional information

Communicated by Domingo Morales.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cadre, B., Paris, Q. On Hölder fields clustering. TEST 21, 301–316 (2012). https://doi.org/10.1007/s11749-011-0244-4

Download citation

Keywords

  • Clustering
  • k-means
  • Unsupervised learning
  • Random fields
  • Hilbert space
  • Empirical risk minimization

Mathematics Subject Classification (2000)

  • 62H30
  • 68T05