Advertisement

TEST

, Volume 21, Issue 2, pp 301–316 | Cite as

On Hölder fields clustering

  • Benoît CadreEmail author
  • Quentin Paris
Original Paper

Abstract

Based on n randomly drawn vectors in a Hilbert space, we study the k-means clustering scheme. Here, clustering is performed by computing the Voronoi partition associated with centers that minimize an empirical criterion, called distorsion. The performance of the method is evaluated by comparing the theoretical distorsion of empirical optimal centers to the theoretical optimal distorsion. Our first result states that, provided that the underlying distribution satisfies an exponential moment condition, an upper bound for the above performance criterion is \(O(1/\sqrt{n})\). Then, motivated by a broad range of applications, we focus on the case where the data are real-valued random fields. Assuming that they share a Hölder property in quadratic mean, we construct a numerically simple k-means algorithm based on a discretized version of the data. With a judicious choice of the discretization, we prove that the performance of this algorithm matches the performance of the classical algorithm.

Keywords

Clustering k-means Unsupervised learning Random fields Hilbert space Empirical risk minimization 

Mathematics Subject Classification (2000)

62H30 68T05 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abaya EA, Wise GL (1984) Convergence of vector quantizers with applications to optimal quantization. SIAM J Appl Math 183–189 Google Scholar
  2. Adamczak R (2008) A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron J Probab 1000–1034 Google Scholar
  3. Antos A (2005) Improved minimax bounds on the test and training distortion of empirically designed vector quantizers. IEEE Trans Inf Theory 4022–4032 Google Scholar
  4. Antos A, Györfi L, György A (2005) Improved convergence rates in empirical vector quantizer design. IEEE Trans Inf Theory 4013–4022 Google Scholar
  5. Bartlett PL, Linder T, Lugosi G (1998) The minimax distorsion redundancy in empirical quantizer design. IEEE Trans Inf Theory 1802–1813 Google Scholar
  6. Biau G, Devroye L, Lugosi G (2008) On the performance of clustering in Hilbert spaces. IEEE Trans Inf Theory 781–790 Google Scholar
  7. Breton JC, Nourdin I, Peccati G (2009). Exact confidence intervals for the Hurst parameter of a fractional Brownian motion. Electron J Stat 416–425 Google Scholar
  8. Chou PA (1994) The distorsion of vector quantizers trained on n vectors decreases to the optimum at O P(1/n). IEEE Trans Inf Theory 457–457 Google Scholar
  9. Coeurjolly J-F (2008). Hurst exponent estimation of locally self-similar Gaussian processes using sample quantiles. Ann Stat 1404–1434 Google Scholar
  10. Cont R, Tankov P (2003) Financial modelling with jump processes, 2nd edn. Chapmann and Hall, CRC Press, London CrossRefGoogle Scholar
  11. Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern-recognition. Springer, New York zbMATHGoogle Scholar
  12. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York Google Scholar
  13. Frisch U (1995) Turbulences. Cambridge University Press, Cambridge Google Scholar
  14. Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer Academic, Boston zbMATHCrossRefGoogle Scholar
  15. Graf S, Luschgy H (2000) Foundations of quantization for probability distributions. Lectures notes in mathematics, vol 1730. Springer, New York zbMATHCrossRefGoogle Scholar
  16. Gray RM, Neuhoff DL (1998) Quantization. IEEE Trans Inf Theory 2325–2384 Google Scholar
  17. Hartigan JA (1975) Clustering algorithms. Wiley, New York zbMATHGoogle Scholar
  18. Hartigan JA (1978) Asymptotic distributions for clustering criteria. Ann Stat 117–131 Google Scholar
  19. Huang W (2009) Exponential integrability of Itô’s processes. J Math Anal Appl 427–433 Google Scholar
  20. Kärner O (2001) Comments on Hurst exponent. Geophys Res Lett 3825–3826 Google Scholar
  21. Kimmel M, Axelrod DE (2002) Branching processes in biology. Springer, Berlin zbMATHGoogle Scholar
  22. Lacaux C, Loubès J-M (2007) Hurst exponent estimation of fractional Lévy motions. Alea 143–164 Google Scholar
  23. Lamberton D, Lapeyre B (1996) Introduction to stochastic calculus applied to finance. Chapman and Hall, CRC Press, London Google Scholar
  24. Linder T (2000) On the training distortion of vector quantizers. IEEE Trans Inf Theory 1617–1623 Google Scholar
  25. Linder T (2001) Learning-theoretic methods in vector quantization. Lecture notes for the advanced school on the principle of nonparametric learning, Udine, Italy, July 9–13 Google Scholar
  26. Linder T, Lugosi G, Zeger K (1994) Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding. IEEE Trans Inf Theory 1728–1740 Google Scholar
  27. Lindstrøm T (1993). Fractional Brownian fields as integrals of white noise. Bull Lond Math Soc 83–88 Google Scholar
  28. Mandelbrot B (1997) Fractals and scaling in finance. Selected works of Benoit B. Mandelbrot. Discontinuity, concentration, risk, selecta vol E, with a forward by RE Gomory. Springer, New York zbMATHGoogle Scholar
  29. Mandelbrot B, van Ness J (1968) Fractional Brownian motion, fractional noises and applications. SIAM Rev 422–437 Google Scholar
  30. Maurer A, Pontil M (2010). K-dimensional coding schemes in Hilbert spaces. IEEE Trans Inf Theory 5839–5846 Google Scholar
  31. Pipiras V, Taqqu MS (2003) Fractional calculus and its connection to fractional Brownian motion. In: Long range dependence. Birkhäuser, Basel, pp 166–201 Google Scholar
  32. Pisier G (1983) Some applications of the metric entropy condition to harmonic analysis. In: Banach spaces, harmonic analysis, and probability theory. Lecture notes in math, vol 995. Springer, Berlin, pp 123–154 CrossRefGoogle Scholar
  33. Pollard D (1981) Strong consistency of k-means clustering. Ann Stat 135–140 Google Scholar
  34. Pollard D (1982a) A central limit theorem for k-means clustering. Ann Probab 199–205 Google Scholar
  35. Pollard D (1982b) Quantization and the method of k-means. IEEE Trans Inf Theory 1728–1740 Google Scholar
  36. Revuz D, Yor M (1999) Continuous martingales and Brownian motion. Springer, New York zbMATHGoogle Scholar
  37. Van Kampen NG (2007) Stochastic processes in physics and chemistry, 3rd edn. Elsevier, New York Google Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2011

Authors and Affiliations

  1. 1.IRMAR, ENS Cachan BretagneCNRS, UEBBruzFrance

Personalised recommendations