Skip to main content
Log in

Data Analysis from Empirical Moments and the Christoffel Function

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

Spectral features of the empirical moment matrix constitute a resourceful tool for unveiling properties of a cloud of points, among which, density, support and latent structures. This matrix is readily computed from an input dataset, and its eigen decomposition can then be used to identify algebraic properties of the support or density/support estimates with the Christoffel function. It is already well known that the empirical moment matrix encodes a great deal of subtle attributes of the underlying measure. Starting from this object as base of observations, we combine ideas from statistics, real algebraic geometry, orthogonal polynomials and approximation theory for opening new insights relevant for machine learning problems with data supported on algebraic sets. Refined concepts and results from real algebraic geometry and approximation theory are empowering a simple tool (the empirical moment matrix) for the task of solving non-trivial questions in data analysis. We provide (1) theoretical support, (2) numerical experiments and (3) connections to real-world data as a validation of the stamina of the empirical moment matrix approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Remark 2 provides a justification of the choice of a decrease in order \(1/\sqrt{d}\), and this achieves balance between two terms in the case of Lipschitz densities.

  2. For the null polynomial, we use the convention that its degree is \(-\infty \).

References

  1. E. Aamari and C. Levrard (2018). Non-Asymptotic Rates for Manifold, Tangent Space, and Curvature Estimation. Annals of statistics, in Press.

  2. Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American mathematical society, 68(3), 337–404.

    Article  MathSciNet  Google Scholar 

  3. W. V. Assche (1987). Asymptotics for orthogonal polynomials. Springer-Verlag Berlin Heidelberg.

    Book  Google Scholar 

  4. E. Batschelet (1981). Circular statistics in biology. Academic press, New York.

    MATH  Google Scholar 

  5. M. Belkin and P. Niyogi (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems.

  6. M. Belkin and P. Niyogi (2005). Towards a theoretical foundation for Laplacian-based manifold methods. In International Conference on Computational Learning Theory.

  7. R. Bennett (1969). The intrinsic dimensionality of signal collections. IEEE Transactions on Information Theory, 15(5):517–525.

    Article  Google Scholar 

  8. J. Bochnak and M. Coste and M.F. Roy (1998). Real algebraic geometry (36). Springer Science & Business Media.

  9. L. Bos (1994). Asymptotics for the Christoffel function for Jacobi like weights on a ball in \({\mathbb{R}}^m\). New Zealand Journal of Mathematics, 23(99):109–116.

    MATH  Google Scholar 

  10. L. Bos, B. D. Vecchia, and G. Mastroianni (1998). On the asymptotics of Christoffel functions for centrally symmetric weight functions on the ball in \({\mathbb{R}}^d\). Rend. Circ. Mat. Palermo, 2(52):277–290.

    MATH  Google Scholar 

  11. P. Breiding and S. K. Verovsek and B. Sturmfels and M. Weinstein (2018). Learning Algebraic Varieties from Samples. arXiv preprint arXiv:1802.09436.

  12. P. Bubenik (2015). Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research, 16(1):77–102.

    MathSciNet  MATH  Google Scholar 

  13. G. Carlsson (2009). Topology and data. Bulletin of the American Mathematical Society, 46(2), 255-308.

    Article  MathSciNet  Google Scholar 

  14. F. Chazal and D. Cohen-Steiner and Q. Mérigot (2011). Geometric inference for probability measures. Foundations of Computational Mathematics, 11(6):733–751.

    Article  MathSciNet  Google Scholar 

  15. F. Chazal and M. Glisse and C. Labruère and B. Michel (2015). Convergence rates for persistence diagram estimation in topological data analysis. The Journal of Machine Learning Research, 16(1):3603–3635.

    MathSciNet  MATH  Google Scholar 

  16. D. Cox and J. Little and D. O’shea (2007). Ideals, varieties, and algorithms (3). Springer New York.

    Book  Google Scholar 

  17. A. Cuevas and W. González-Manteiga and A. Rodríguez-Casal (2006). Plug-in estimation of general level sets. Australian & New Zealand Journal of Statistics, 48(1), 7–19.

    Article  MathSciNet  Google Scholar 

  18. D.L. Donoho and C. Grimes (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591–5596.

    Article  MathSciNet  Google Scholar 

  19. J.J. Duistermaat and J.A. Kolk (2004). Multidimensional real analysis II: integration. Cambridge University Press.

  20. C. Dunkl and Y. Xu (2001). Orthogonal polynomials of several variables. Cambridge University Press.

  21. H. Edelsbrunner and J. Harer (2008). Persistent homology–a survey. Contemporary mathematics, 453, 257-282.

    Google Scholar 

  22. E. Elhamifar and R. Vidal (2011). Sparse manifold clustering and embedding. In Advances in neural information processing systems.

  23. European Space Agency (1997). The Hipparcos and Tycho catalogues. http://cdsweb.u-strasbg.fr/hipparcos.html.

  24. E. Facco and M. d’Errico and A. Rodriguez and A. Lai (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific reports, 7(1), 12140.

    Article  Google Scholar 

  25. C. Fefferman and S. Mitter and H. Narayanan (2016). Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049.

    Article  MathSciNet  Google Scholar 

  26. K. Fukunaga and D. R. Olsen (1971). An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 100(2):176–183.

    Article  Google Scholar 

  27. C.R. Genovese and M. Perone-Pacifico and Verdinelli and L.  Wasserman (2012). Manifold estimation and singular deconvolution under Hausdorff loss. The Annals of Statistics, 40(2):941–963.

    Article  MathSciNet  Google Scholar 

  28. C. Genovese and M. Perone-Pacifico and I. Verdinelli and L. Wasserman (2012). Minimax manifold estimation. Journal of machine learning research, 13(May), 1263-1291.

    MathSciNet  MATH  Google Scholar 

  29. R. Ghrist (2008). Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society, 45(1):61–75.

    Article  MathSciNet  Google Scholar 

  30. M. Hein and J. Y. Audibert (2005). Intrinsic dimensionality estimation of submanifolds in \({\mathbb{R}}^d\). In Proceedings of the international conference on Machine learning.

  31. M. Hein and J.Y. Audibert and U. Von Luxburg (2005). From graphs to manifolds-weak and strong pointwise consistency of graph Laplacians. In International Conference on Computational Learning Theory.

  32. D. Huybrechts (2006). Complex geometry: an introduction. Springer Science & Business Media.

  33. A.K. Kim and H. H. Zhou (2015). Tight minimax rates for manifold estimation under Hausdorff loss. Electronic Journal of Statistics, 9(1):1562–1582.

    Article  MathSciNet  Google Scholar 

  34. M. Korda and M. Putinar and I. Mezic (2017). Data-driven spectral analysis of the Koopman operator. Applied Comp. Harmonic Analysis, to appear.

  35. A. Kroò and D. S. Lubinsky (2012). Christoffel functions and universality in the bulk for multivariate orthogonal polynomials. Canadian Journal of Mathematics 65(3):600620.

    MathSciNet  Google Scholar 

  36. M.H. Law A.K. Jain (2006). Incremental nonlinear dimensionality reduction by manifold learning. IEEE transactions on pattern analysis and machine intelligence, 28(3):377–391.

    Article  Google Scholar 

  37. E. Levina and P. J. Bickel (2005). Maximum likelihood estimation of intrinsic dimension. In Advances in neural information processing systems.

  38. S.C. Lovell and I.W. Davis and W.B. Arendall and P.I. De Bakker and J.M. Word and M.G. Prisant and D.C. Richardson (2003). Structure validation by C\(\alpha \) geometry: \(\phi \),\(\psi \) and C\(\beta \) deviation. Proteins: Structure, Function, and Genetics, 50(3):437–450.

    Article  Google Scholar 

  39. J. B.  Lasserre and E.  Pauwels (2016). Sorting out typicality via the inverse moment matrix SOS polynomial. In Advances in Neural Information Processing Systems 29 (NIPS 2016, Barcelona), D.D. Lee and M. Sugiyama and U.V. Luxburg and I. Guyon and R. Garnett, Eds., Curran Associates, Inc. 2016, pp. 190–198.

  40. J. B.  Lasserre and E.  Pauwels (2019). The empirical Christoffel function with applications in Machine Learning. Advances in Computational Mathematics, 45(3):1439–1468.

    Article  MathSciNet  Google Scholar 

  41. M. Laurent and P. Rostalski (2012). The approach of moments for polynomial equations. In Handbook on Semidefinite, Conic and Polynomial Optimization, 25–60. Springer, Boston, MA.

    Chapter  Google Scholar 

  42. A. Máté and P. Nevai (1980). Bernstein’s Inequality in \(L^p\) for \(0<p<1\) and \((C, 1)\) Bounds for Orthogonal Polynomials. Annals of Mathematics, 111(1):145–154.

  43. A. Máté, P. Nevai, and V. Totik (1991). Szego’s extremum problem on the unit circle. Annals of Mathematics, 134(2):433–53.

    Article  MathSciNet  Google Scholar 

  44. P. Nevai (1986). Géza Freud, orthogonal polynomials and Christoffel functions. A case study. Journal of Approximation Theory, 48(1):3–167.

    Article  MathSciNet  Google Scholar 

  45. P. Niyogi and S. Smale and S. Weinberger (2008). Finding the homology of submanifolds with high confidence from random samples. Discrete & Computational Geometry, 39(1-3):419–441.

    Article  MathSciNet  Google Scholar 

  46. P. Niyogi and S. Smale and S. Weinberger (2011). A topological view of unsupervised learning from noisy data. SIAM Journal on Computing, 40(3):646–663.

    Article  MathSciNet  Google Scholar 

  47. K. W. Pettis and T. A. Bailey and A. K. Jain and R. C. Dubes (1979). An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on pattern analysis and machine intelligence, 1:125–37.

    MATH  Google Scholar 

  48. S. T. Roweis and L. K. Saul (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326.

    Article  Google Scholar 

  49. J. P. Serre. Géométrie algébrique et géométrie analytique Annales de l’institut Fourier, 6:1–42

  50. B. Simon (2008). The Christoffel-Darboux kernel. Perspectives in PDE, Harmonic Analysis and Applications, a volume in honor of VG Maz’ya’s 70th birthday, Proceedings of Symposia in Pure Mathematics (79), 295–335.

  51. G. Szegö (1974). Orthogonal polynomials. In Colloquium publications, AMS, (23), fourth edition.

  52. J. B. Tenenbaum and V. De Silva and J. C. Langford (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323.

    Article  Google Scholar 

  53. V. Totik (2000). Asymptotics for Christoffel functions for general measures on the real line. Journal d’Analyse Mathématique, 81(1):283–303.

    Article  MathSciNet  Google Scholar 

  54. G .V. Trunk (1968). Statistical estimation of the intrinsic dimensionality of data collections. Information and Control, 12(5):508–525.

    Article  Google Scholar 

  55. E. D. Vito, L. Rosasco, and A. Toigo (2014). Learning sets with separating kernels. Applied and Computational Harmonic Analysis, 37(2):185–217.

    Article  MathSciNet  Google Scholar 

  56. Y. Xu (1996). Asymptotics for orthogonal polynomials and Christoffel functions on a ball. Methods and Applications of Analysis, 3(2):257–272.

    Article  MathSciNet  Google Scholar 

  57. Y. Xu (1999). Asymptotics of the Christoffel Functions on a Simplex in \({\mathbb{R}}^d\). Journal of approximation theory, 99(1):122–133.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The research of J.B. Lasserre was funded by the European Research Council (ERC) under the European Union Horizon 2020 research and innovation program (grant agreement 666981 TAMING). This work was partially supported by CIMI (Centre International de Mathématiques et d’Informatique). Edouard Pauwels acknowledges the support of the French Agence Nationale de la Recherche (ANR) under references ANR-PRC-CE23 Masdol under grant ANR-PRC-CE23 the AI Interdisciplinary Institute ANITI funding, through the French “Investing for the Future - PIA3” program under the Grant agreement n\(^{^{\circ }}\)ANR-19-PI3A-0004, the support of Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant number FA9550-18-1-0226. Mihai Putinar is grateful to LAAS and Universite Paul Sabatier for support. Jean-Bernard Lasserre and Edouard Pauwels would like to thank Yousouf Emin for early discussions on the topic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edouard Pauwels.

Additional information

Communicated by Felipe Cucker.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Introduction

We collect below some supplementary information to the main body of the present article. Section B provides additional details regarding the notations and constructions employed in the main text. More insights about the numerical simulations are given in Sect. C. Technical Lemmas are presented in Sect. D.

Polynomials and the Moment Matrix

In this section, \(\mu \) denotes a positive Borel measure on \({\mathbb {R}}^p\) with finite moments, and for any \(d \in {\mathbb {N}}\), \({\mathbf {M}}_{\mu ,d}\) denotes its moment matrix with moments of degree up to 2d. As a matter of fact, we will soon restrict our attention to probability measures, which is a minor constraint to impose on the constructs below.

Henceforth, we fix the dimension of the ambient euclidean space to be p, for example vectors in \({\mathbb {R}}^p\) as well as p-variate polynomials with real coefficients. We denote by X the tuple of p variables \(X_1, \ldots , X_p\) which appear in mathematical expressions involving polynomials. Monomials from the canonical basis of p-variate polynomials are identified with their exponents in \({\mathbb {N}}^p\): Specifically, \(\alpha = (\alpha _i)_{i =1 \ldots p} \in {\mathbb {N}}^p\) is associated with the monomial \(X^\alpha := X_1^{\alpha _1} X_2^{\alpha _2} \ldots X_p^{\alpha _p}\) of degree \(\deg (\alpha ) := \sum _{i=1}^p \alpha _i=\vert \alpha \vert \). The notations \(<_{gl}\) and \(\le _{gl}\) stand for the graded lexicographic order, a well ordering over p-variate monomials. This amounts to, first, use the canonical order on the degree and, second, break ties in monomials with the same degree using the lexicographic order with \(X_1 =a, X_2 = b \ldots \). For example, the monomials in two variables \(X_1, X_2\) of degree less than or equal to 3 listed in this order are given by: \(1,\,X_1, \,X_2, \,X_1^2, \,X_1X_2, \,X_2^2,\, X_1^3,\, X_1^2X_2,\, X_1X_2^2,\, X_2^3\). We focus here on the graded lexicographic order to provide a concrete example, but any ordering compatible with the degree would work similarly.

By definition, \({\mathbb {N}}^p_d\) is the set \(\left\{ \alpha \in {\mathbb {N}}^p;\; \deg (\alpha ) \le d \right\} \), while \({\mathbb {R}}[X]\) is the algebra of p-variate polynomials with real coefficients. The degree of a polynomial is the highest of the degrees of its monomials with nonzero coefficientsFootnote 2. The notation \(\deg (\cdot )\) applies a polynomial as well as to an element of \({\mathbb {N}}^p\). For \(d \in {\mathbb {N}}\), \({\mathbb {R}}_d[X]\) stands for the set of p-variate polynomials of degree less than or equal to d. We set \(s(d) = {p+d \atopwithdelims ()d} = \dim {\mathbb {R}}_d[X]\); this is of course the number of monomials of degree less than or equal to d.

From now on, \({\mathbf {v}}_d(X)\) denotes the vector of monomials of degree less or equal to d (sorted using \(\le _{gl}\)), i.e., \({\mathbf {v}}_d(X) := \left( X^\alpha \right) _{\alpha \in {\mathbb {N}}^p_d} \in {\mathbb {R}}_d[X]^{s(d)}\). With this notation, one can write a polynomial \(P\in {\mathbb {R}}_d[X]\) as \(P(X) = \left\langle {\mathbf {p}}, {\mathbf {v}}_d(X)\right\rangle \) for some real vector of coefficients \({\mathbf {p}}= \left( p_{\alpha } \right) _{\alpha \in {\mathbb {N}}_d^p} \in {\mathbb {R}}^{s(d)}\) ordered using \(\le _{gl}\). Given \({\mathbf {x}}= (x_i)_{i = 1 \ldots p} \in {\mathbb {R}}^p\), \(P({\mathbf {x}})\) denotes the evaluation of P with respect to the assignments \(X_1 = x_1, X_2 = x_2, \ldots X_p = x_{p}\). Given a Borel probability measure \(\mu \) and \(\alpha \in {\mathbb {N}}^p\), \(y_{\alpha }(\mu )\) denotes the moment \(\alpha \) of \(\mu \), i.e., \(y_{\alpha }(\mu ) = \int _{{\mathbb {R}}^p} {\mathbf {x}}^{\alpha } \mathrm{d}\mu ({\mathbf {x}})\). Throughout the paper, we will only consider rapidly decaying measures, that is, measure whose all moments are finite. For a positive Borel measure \(\mu \) on \({\mathbb {R}}^p\), denote by \(\mathrm{supp}(\mu )\) its support, i.e., the smallest closed set \({{\varvec{\Omega }}}\subset {\mathbb {R}}^p\) such that \(\mu ({\mathbb {R}}^p{\setminus }{{\varvec{\Omega }}})=0\).

Moment matrix The moment matrix of \(\mu \), \({\mathbf {M}}_{\mu ,d}\), is a matrix indexed by monomials of degree at most d ordered with respect to \(\le _{gl}\). For \(\alpha ,\beta \in {\mathbb {N}}^p_d\), the corresponding entry in \({\mathbf {M}}_{\mu ,d}\) is defined by \([{\mathbf {M}}_{\mu ,d}]_{\alpha ,\beta } := y_{\alpha +\beta }(\mu )\), the moment \(\int {\mathbf {x}}^{\alpha +\beta }\mathrm{d}\mu \) of \(\mu \). For example, in the case \(p = 2\), letting \(y_{\alpha } = y_{\alpha } (\mu )\) for \(\alpha \in {\mathbb {N}}_4^2\), one finds:

$$\begin{aligned} {\mathbf {M}}_{\mu ,2}: \quad \begin{array}{rccccccccc} &{} &{} &{}1&{} X_1 &{} X_2 &{} X_1^2 &{} X_1 X_2 &{} X_2^2 \\ &{} &{} &{} \\ 1 &{} \quad &{} &{}1 &{} y_{10} &{} y_{01} &{} y_{20}&{} y_{11}&{} y_{02} \\ X_1&{} \quad &{}&{}y_{10} &{} y_{20} &{} y_{11}&{}y_{30} &{} y_{21}&{} y_{12} \\ X_2&{} \quad &{} &{}y_{01} &{} y_{11} &{} y_{02}&{} y_{21} &{}y_{12}&{} y_{03} \\ X_1^2&{} \quad &{} &{}y_{20} &{} y_{30} &{} y_{21}&{} y_{40} &{} y_{31}&{} y_{22} \\ X_1X_2&{} \quad &{} &{}y_{11} &{} y_{21} &{} y_{12}&{} y_{31} &{} y_{22}&{} y_{13}\\ X_2^2&{} \quad &{} &{}y_{02} &{} y_{12} &{}y_{03}&{} y_{22} &{} y_{13}&{} y_{04}\\ \end{array}. \end{aligned}$$

The matrix \({\mathbf {M}}_{\mu ,d}\) is positive semidefinite for all \(d \in {\mathbb {N}}\). Indeed, for any \(p \in {\mathbb {R}}^{s(d)}\), let \(P \in {\mathbb {R}}_d[X]\) be the polynomial with vector of coefficients \({\mathbf {p}}\); then, \({\mathbf {p}}^T{\mathbf {M}}_{\mu ,d}{\mathbf {p}}= \int _{{\mathbb {R}}^p} P({\mathbf {x}})^2 \mathrm{d}\mu ({\mathbf {x}}) \ge 0\). We also have the identity \({\mathbf {M}}_{\mu ,d} = \int _{{\mathbb {R}}^p} {\mathbf {v}}_d({\mathbf {x}}) {\mathbf {v}}_d({\mathbf {x}})^T \mathrm{d}\mu \mathrm{d}\mu ({\mathbf {x}})\) where the integral is understood element-wise. Actually, it is useful to interpret the moment matrix as representing the bilinear form

$$\begin{aligned} \left\langle \cdot , \cdot \right\rangle _\mu :{\mathbb {R}}[X] \times {\mathbb {R}}[X]&\mapsto {\mathbb {R}}\\ (P,Q)&\mapsto \int _{{\mathbb {R}}^p} P({\mathbf {x}})Q({\mathbf {x}})\mathrm{d}\mu ({\mathbf {x}}), \end{aligned}$$

restricted to polynomials of degree up to d. Indeed, if \({\mathbf {p}},{\mathbf {q}}\in {\mathbb {R}}^{s(d)}\) are the vectors of coefficients of any two polynomials P and Q of degree up to d, one has \({\mathbf {p}}^T {\mathbf {M}}_{\mu ,d} {\mathbf {q}}= \left\langle P, Q\right\rangle _\mu \)

Numerical Experiments

This section provides additional details regarding numerical experiments.

1.1 Rank of the Moment Matrix

We choose four different subsets of \({\mathbb {R}}^3\):

  • The unit cube \(\left\{ x,y,z, |x| \le 1, |y| \le 1, |z|\le 1 \right\} \).

  • The three-dimensional unit sphere \(\left\{ x,y,z, x^2 + y^2 + z^2 = 1 \right\} \).

  • The three-dimensional TV screen \(\left\{ x,y,z, x^6 + y^6 + z^6 - 2x^2y^2z^2 = 1 \right\} \).

  • The three-dimensional torus \(\Big \{ x,y,z, \left( x^2 + y^2 + z^2 + \frac{9}{16} - \frac{1}{16}\right) ^2 - \frac{9}{16} (x^2 + y^2) = 0 \Big \}\).

Among the above sets, the first one is three-dimensional, while all the others are two-dimensional. The two-dimensional sets are displayed in Fig. 1. For each set, we sample 20000 points on it and compute the rank of the empirical moment matrix for different values of the degree. To perform this computation, we threshold the singular values of the design matrix consisting in the expansion of each data point in the multivariate Tchebychev polynomial basis.

1.2 Density Estimation on Algebraic Sets

The following quantities are used in the literature as divergences combined with density estimation techniques for the examples treated in the main article.

  • The quantity \(\cos (\theta _1 - \theta _2)\) where \(\theta _1\) and \(\theta _2\) are angular coordinates of two points on the circle.

  • The dot product on the sphere which generalizes the previous situation to larger dimensions, used in [17].

  • The quantity \(\cos \left( \sqrt{(\phi _1 - \phi _2)^2+(\psi _1 - \psi _2)^2 } \right) \) where \(\phi _1, \phi _2, \psi _1, \psi _2\) are angles which correspond to points on the torus in \({\mathbb {R}}^4\), used in [38].

Technical Lemmas

We begin with the following simple Lemma:

Lemma 9

Let \(\mu \) be a Borel probability measure on \({\mathbb {R}}^p\) and S its support which is assumed to be a bounded subset of \({\mathbb {R}}^p\). Then for any \(d \in {\mathbb {N}}^*\) and for any \(P \in {\mathbb {R}}_d[X]\):

$$\begin{aligned} \sup _{{\mathbf {x}}\in S} |P({\mathbf {x}})|^2 \le \sup _{{\mathbf {z}}\in S} \varLambda _{\mu ,d}^{-1}({\mathbf {z}}) \int (P({\mathbf {x}}))^2 \mathrm{d}\mu ({\mathbf {x}}). \end{aligned}$$

Proof

For any \({\mathbf {x}}\in S\) and \(P \in {\mathbb {R}}_d[X]\), one finds

$$\begin{aligned} \varLambda _{\mu ,d}({\mathbf {x}}) \le \frac{\int (P({\mathbf {z}}))^2 d \mu ({\mathbf {z}})}{P({\mathbf {x}})^2} \end{aligned}$$

and

$$\begin{aligned} P({\mathbf {x}})^2 \le \frac{\int (P({\mathbf {z}}))^2 d \mu ({\mathbf {z}})}{\varLambda _{\mu ,d}({\mathbf {x}})}. \end{aligned}$$

The result follows by considering the supremum over S on both sides. \(\square \)

Finally, the following Lemma is a quantitative adaptation of [35, Lemma 2.1]:

Lemma 10

For any \(d \in {\mathbb {N}}^*\) and any \(\delta \in (0,1)\), there exists a p-variate polynomial of degree 2d, Q, such that

$$\begin{aligned} Q(0) \,=\, 1\,;\quad -1\,\le \,Q \,\le \, 1,\text { on } {\mathbf {B}}\,;\quad \vert Q\vert \,\le \, 2^{1-\delta d} \text { on } {\mathbf {B}}{\setminus } {\mathbf {B}}_{\delta }(0). \end{aligned}$$

Proof

Let R be the univariate polynomial of degree 2d, defined by

$$\begin{aligned} R:t \rightarrow \frac{T_d(1+\delta ^2 - t^2)}{T_d(1+\delta ^2)}, \end{aligned}$$

where \(T_d\) is the Chebyshev polynomial of the first kind. We obtain

$$\begin{aligned} R(0) = 1. \end{aligned}$$
(18)

Furthermore, for \(t \in [-1,1]\), we have \(0 \le 1+\delta ^2 - t^2 \le 1+\delta ^2\). \(T_d\) has absolute value less than 1 on \([-1,1]\) and is increasing on \([1, \infty )\) with \(T_d(1) = 1\), so for \(t \in [-1,1]\):

$$\begin{aligned} -1 \le R(t) \le 1. \end{aligned}$$
(19)

For \(|t| \in [\delta , 1]\), we have \(\delta ^2 \le 1+\delta ^2-t^2 \le 1\), so

$$\begin{aligned} |R(t)| \le \frac{1}{T_d(1+\delta ^2)}. \end{aligned}$$
(20)

Let us bound the last quantity. Recall that for \(t \ge 1\), we have the following explicit expression:

$$\begin{aligned} T_d(t) = \frac{1}{2}\left( \left( t + \sqrt{t^2-1} \right) ^d + \left( t + \sqrt{t^2-1} \right) ^{-d}\right) . \end{aligned}$$

We have \(1 + \delta ^2 +\sqrt{(1+\delta ^2)^2 - 1} \ge 1 + \sqrt{2} \delta \), which leads to

$$\begin{aligned} T_d(1+\delta ^2)&\ge \frac{1}{2}\left( 1 + \sqrt{2} \delta \right) ^d\nonumber \\&= \frac{1}{2} \exp \left( \log \left( 1+\sqrt{2}\delta \right) d \right) \nonumber \\&\ge \frac{1}{2} \exp \left( \log (1+\sqrt{2}) \delta d \right) \nonumber \\&\ge 2^{\delta d - 1}, \end{aligned}$$
(21)

where we have used concavity of the \(\log \) and the fact that \(1+\sqrt{2} \ge 2\). It follows by combining (18), (19), (20) and (21) that \(Q :{\mathbf {y}}\rightarrow R(\Vert {\mathbf {y}}\Vert _2)\) satisfies the claimed properties. \(\square \)

1.1 Proof of Lemma 8

Proof

The implication (ii) to (i) is trivial since \(\sigma _W\) is supported on W. Let us assume that (i) is true and deduce (ii). This is classically formulated in the language of sheaves; we adopt a more elementary language and describe details for completeness. The main idea of the proof is to reduce the statement to Rückert’s complex analytic Nullstellensatz [32, Proposition 1.1.29] which characterizes the class of analytic functions vanishing locally on the zero set of a family of analytic functions. The first point is precisely of local nature, and the Nullstellensatz combined with properties of analytic functions allows to deduce properties of P which extend globally [49]. The irreducibility hypothesis and the definition of the area measure allow precisely to switch from real to complex variables.

First, let \(P_1, \ldots , P_k\) generate the ideal I of polynomials vanishing on W. Since W is irreducible, I is a prime ideal and \(W = \left\{ {\mathbf {x}}\in {\mathbb {R}}^p,\, P_i({\mathbf {x}}) = 0,\, i =1,\ldots ,k \right\} \) (see Propositions 3.3.14 and 3.3.16 in [8]). Point (i) entails that there exists \({\mathbf {x}}_0 \in W\) and \(U_1\) an euclidean neighborhood of \({\mathbf {x}}_0\) in \({\mathbb {R}}^p\) such that \(W \cap U_1\) is an analytic submanifold, or more precisely, the Jacobian matrix \(\left( \frac{\partial P_i}{\partial X_j} \right) _{i=1\ldots k,\, j=1\ldots p}\) has rank k on \(U_1\), and furthermore, \(P({\mathbf {x}}) = 0\) for all \({\mathbf {x}}\in W \cap U_1\).

Consider the complex analytic manifold \(Z = \left\{ {\mathbf {z}}\in {\mathbb {C}}^p,\, P_i({\mathbf {z}}) = 0,\, i =1,\ldots ,k \right\} \) and the polynomial map

$$\begin{aligned} G:(X_1,\ldots ,X_p) \mapsto (P_1(X_1,\ldots ,X_p),\ldots ,P_k(X_1,\ldots ,X_p),X_{k+1},\ldots , X_p). \end{aligned}$$

This map is locally invertible around \({\mathbf {x}}_0\) in \({\mathbb {C}}^p\), and its inverse is analytic. The function \((H:X_{k+1},\ldots ,X_p)\mapsto P(G^{-1}(0,\ldots ,0,X_{k+1},\ldots ,X_p)\) is analytic and vanishes in a euclidean neighborhood, of \((x_{0,k+1},\ldots ,x_{0,p})\), the last k coordinates of \({\mathbf {x}}_0\), in \({\mathbb {R}}^k\). Hence, it can be identified with the constant null function on \(U_1\). This shows that H vanishes in a euclidean neighborhood of \((x_{0,k+1},\ldots ,x_{0,p})\) in \({\mathbb {C}}^k\).

This proves that there exists a euclidean neighborhood of \({\mathbf {x}}_0\) in \({\mathbb {C}}^p\), \(U_2\), such that P vanishes on \(Z \cap U_2\). At this point, we can invoke the complex analytic Nullstellensatz [32, Proposition 1.1.29], to obtain k analytic functions, \(O_1,\ldots ,O_k\), on \(U_2\), and an integer \(m \ge 1\), such that, for all \({\mathbf {z}}\in U_2\), we have

$$\begin{aligned} (P({\mathbf {z}}))^m = \sum _{i=1}^k P_i({\mathbf {z}}) O_i({\mathbf {z}}). \end{aligned}$$
(22)

We can now use powerful result related to completion of local rings. Combining Corollary 1 of Proposition 4 and Proposition 22 in [49], we obtain that identity (22) still holds with the constraint that \(O_i\), \(i=1,\ldots , k\), are rational functions. In other words, reducing to common denominator, there exists a euclidean neighborhood of \({\mathbf {x}}_0\), \(U_3\), and k complex polynomials \(Q_1,\ldots ,Q_k\) and a complex polynomial Q such that Q does not vanish on \(U_3\), and

$$\begin{aligned} Q({\mathbf {z}})(P({\mathbf {z}}))^m = \sum _{i=1}^k P_i({\mathbf {z}}) Q_i({\mathbf {z}}). \end{aligned}$$
(23)

From this, we deduce that identity (23) still holds by restricting to real variables and real polynomials. Now since the ideal I is prime and \(Q \not \in I\) (since \(Q({\mathbf {x}}_0) \ne 0\)), we deduce that \(P \in I\), that is, P vanishes on W. This is what we wanted to prove. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pauwels, E., Putinar, M. & Lasserre, JB. Data Analysis from Empirical Moments and the Christoffel Function. Found Comput Math 21, 243–273 (2021). https://doi.org/10.1007/s10208-020-09451-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-020-09451-2

Keywords

Mathematics Subject Classification

Navigation