Deterministic sampling based on Kullback–Leibler divergence and its applications

Wang, Sumin; Sun, Fasheng

doi:10.1007/s00362-023-01449-6

Deterministic sampling based on Kullback–Leibler divergence and its applications

Regular Article
Published: 27 April 2023

(2023)
Cite this article

Statistical Papers Aims and scope Submit manuscript

410 Accesses
Explore all metrics

Abstract

This paper introduces a new way to extract a set of representative points from a continuous distribution, which focuses on a method where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when the size of points is small. These points are generated by minimizing the Kullback–Leibler divergence, which is an information-based measure of the disparity between two probability distributions. We refer to these points as Kullback–Leibler points. Based on the link between the total variation and the Kullback–Leibler divergence, we prove that the empirical distribution of Kullback–Leibler points converges to the target distribution. Additionally, we illustrate that Kullback–Leibler points have advantages in simulations when compared with representative points generated by Monte Carlo or other representative points methods. In addition, to prevent the frequent evaluation of complex functions, a sequential version of Kullback–Leibler points is proposed, which adaptively updates the representative points by learning about the complex or unknown functions sequentially. Two potential applications of Kullback-Leibler points in simulation of complex probability densities and optimization of complex response surfaces are discussed and demonstrated with examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sample size recommendations for studies on reliability and measurement error: an online application based on simulation studies

Article Open access 23 November 2022

A new computational framework for log-concave density estimation

Article Open access 30 April 2024

Overfitting, Model Tuning, and Evaluation of Prediction Performance

References

Billingsley P (2008) Probability and measure. Wiley, New York
MATH Google Scholar
Brooks S, Gelman A, Jones G, Meng XL (2011) Handbook of Markov chain Monte Carlo. Chapman and Hall/CRC, Boca Raton
Book MATH Google Scholar
Chen WY, Mackey L, Gorham J, Briol FX, Oates CJ (2018) Stein points. In Proceedings of the 35th international conference on machine learning, vol 80, pp 844–853
Cover TM, Thomas JA (2006) Elements of information theory. Wiley, New York
MATH Google Scholar
Dick J, Kuo FY, Sloan IH (2013) High-dimensional integration: the quasi-Monte Carlo way. Acta Numer 22:133–573
Article MathSciNet MATH Google Scholar
Dudewicz EJ, Van DMEC (1981) Entropy-based tests of uniformity. J Am Stat Assoc 76:967–974
Article MathSciNet MATH Google Scholar
Fang KT, Li RZ, Sudijanto A (2006) Designs and modeling for computer experiments. Chapman and Hall/CRCl, Boca Raton
Google Scholar
Fasshauer G (2007) Meshfree approximation methods with MATLAB. World Scientific, Singapore
Book MATH Google Scholar
Haario H, Saksman E, Tamminen J (1999) Adaptive proposal distribution for random walk Metropolis algorithm. Comput Stat 14:375–395
Article MATH Google Scholar
Härdle WG, Werwatz A, Müller M, Sperlich S (2004) Nonparametric and semiparametric models. Springer, New York
Book MATH Google Scholar
Hickernell F (1998) A generalized discrepancy and quadrature error bound. Math Comput 67:299–322
Article MathSciNet MATH Google Scholar
Lin CD, Tang BX (2015) Latin hypercubes and space-filling designs. Handb Des Anal Exp Chap 17:593–626
MathSciNet MATH Google Scholar
Joseph VR, Dasgupta T, Tuo R, Wu CFJ (2015) Sequential exploration of complex surfaces using minimum energy designs. Technometrics 57:64–74
Article MathSciNet Google Scholar
Joseph VR, Wang DP, Gu L, Lv SJ, Tuo R (2019) Deterministic sampling of expensive posteriors using minimum energy designs. Technometrics 61:297–308
Article MathSciNet Google Scholar
Jourdan A, Franco J (2010) Optimal Latin hypercube designs for the Kullback–Leibler criterion. AStA Adv Stat Anal 94:341–351
Article MathSciNet MATH Google Scholar
Kennedy MC, O’Hagan A (2001) Bayesian calibration of computer models (with discussion). J R Stat Soc B 63:425–464
Article MathSciNet MATH Google Scholar
Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22:79–86
Article MathSciNet MATH Google Scholar
Mak S, Joseph VR (2017) Projected support points: a new method for high-dimensional data reduction. arXiv: 1708.06897
Mak S, Joseph VR (2018) Support points. Ann Stat 46:2562–2592
Article MathSciNet MATH Google Scholar
McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239–245
MathSciNet MATH Google Scholar
Morris MD, Mitchell TJ (2018) Exploratory designs for computer experiments. J Stat Plan Inference 43:381–402
Article MATH Google Scholar
Miettinen K (2012) Nonlinear multiobjective optimization. Springer, New York
MATH Google Scholar
Sacks J, Schiller SB, Welch WJ (1989) Designs for computational experiments. Technometrics 31:41–47
Article MathSciNet Google Scholar
Santner TJ, Williams BJ, Notz WI (2019) The design and analysis of computer experiments. Springer, New York
MATH Google Scholar
Shi CL, Tang BX (2020) Construction results for strong orthogonal arrays of strength three. Bernoulli 26:418–431
Article MathSciNet MATH Google Scholar
Sobol’ IM (1967) On the distribution of points in a cube and the approximate evaluation of integrals. Zh Vychisl Mat Mat Fiz 7:784–802
MathSciNet Google Scholar
Tsybakov AB (2009) Introduction to nonparametric estimation. Springer, New York
Book MATH Google Scholar
Wang Q, Kulkarni SR, Verdú S (2006) A nearest-neighbor approach to estimating divergence between continuous random vectors. In: IEEE international symposium on information theory
Worley BA (1987) Deterministic uncertainty analysis. Technical Report ORNL-6428. Oak Ridge National Laboratories
Wu Y, Ghosal S (2008) Kullback Leibler property of kernel mixture priors in Bayesian density estimation. Electron J Stat 2:298–331
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Grant Nos. 11971098, 11471069 and 12131001) and National Key Research and Development Program of China (Grant Nos. 2020YFA0714102 and 2022YFA1003701).

Author information

Authors and Affiliations

Center for Combinatorics, LPMC & KLMDASR, Nankai University, Weijin Road No. 94, Tianjin, 300071, China
Sumin Wang
School of Mathematics and Statistics & KLAS, Northeast Normal University, Renmin Street No. 5268, Changchun, 130024, Jilin Province, China
Fasheng Sun

Authors

Sumin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fasheng Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fasheng Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proofs

This appendix material provides proofs of Theorems 1–3. Define $f_n^*$ as the kernel density estimator on $\{\textbf{x}_j^*\}_{j=1}^{n}$, where $\{\textbf{x}_j^*\}_{j=1}^{n} \overset{i.i.d.}{\sim }\ f(\textbf{x})$. The proof of Theorem 1 relies on Lemma 1 below, which indicates that the expectation of $D(f_n^*\Vert f)$ converges to 0 as $n \rightarrow \infty $.

Lemma 1

Suppose probability density function f satisfies conditions (A1)–(A2) and that $\{\textbf{x}_j^*\}_{j=1}^{n} \overset{i.i.d.}{\sim }\ f$. $f_n^*$ is the kernel density estimator on $\{\textbf{x}_j^*\}_{j=1}^{n}$, and the kernel function K and the bandwidth $h_n$ satisfy (K1)–(K5). Then,

$$\begin{aligned} \lim _{n\rightarrow \infty } E[D(f_n^*\Vert f)]=0. \end{aligned}$$

Proof

For simplicity, we take $d=1$ for example, the proof for $d>1$ is similar. Let $\{x_j^*\}_{j=1}^{n} \overset{i.i.d.}{\sim }\ f$, $f_n^*(x)=\frac{1}{nh_{n}}\sum _{i=1}^n K(\frac{x-x_i^*}{h_{n}})$ be the kernel density estimator based on $\{x_j^*\}_{j=1}^{n}$, and kernel function K and the bandwidth $h_n$ satisfy (K1)–(K5).

Since f satisfies (A1)-(A2), then f is continuous over $\mathcal {X}$. And note that f is a density function then we know that there exists constant $f_{\max }$ such $f\le f_{\max } <\infty $ holds. The variance of $f_n^*(x)$ satisfies:

$$\begin{aligned} { \text{ var } }(f_n^{*}(x))= & {} { \text{ var } }\left( \frac{1}{nh_{n}}\sum _{i=1}^n K\left( \frac{x-x_i^*}{h_{n}}\right) \right) \nonumber \\\le & {} \frac{1}{nh_{n}^2}E\left[ K^{2}\left( \frac{x-x_1^*}{h_n}\right) \right] \nonumber \\= & {} \frac{1}{nh_{n}^2}\int _{\mathcal {X}}K^{2}\left( \frac{z-x}{h_n}\right) f(z)dz\nonumber \\= & {} \frac{1}{nh_{n}}\int _{\mathcal {X}}K^{2}(t)f(th_{n}+x)dt\nonumber \\\le & {} \frac{f_{\max }}{nh_{n}}\int _{\mathcal {X}}K^2(t)dt=\frac{C_1}{nh_{n}}, \end{aligned}$$

(A1)

where $C_1=f_{\max } \Vert K\Vert _{2}^{2}$, $\Vert K\Vert _{2}^{2}=\int _{\mathcal {X}}K^{2}(t)dt$.

The bias of $f_n^*(x)$ have the following results.

$$\begin{aligned} E[f_n^{*}(x)]-f(x)= & {} \frac{1}{nh_{n}}\sum _{i=1}^{n}E\left[ K\left( \frac{x-x_i^*}{h_{n}}\right) \right] -f(x)\nonumber \\= & {} \frac{1}{h_{n}}\int _{\mathcal {X}}K\left( \frac{z-x}{h_{n}}\right) f(z)dz-f(x)\nonumber \\= & {} \int _{\mathcal {X}}K(t)f(th_{n}+x)dt-f(x)\nonumber \\= & {} \int _{\mathcal {X}}K(t)[f(th_{n}+x)-f(x)]dt. \end{aligned}$$

(A2)

In term of f satisfies $\forall $ $x, y \in \mathcal {X}$, $|f(x)-f(y)|\le L|x- y|$. Hence, we obtain

$$\begin{aligned} |E[f_n^{*}(x)]-f(x)|\le & {} \int _{\mathcal {X}}K(t)|Lth_{n}|dt\nonumber \\= & {} Lh_{n}\int _{\mathcal {X}}|t|K(t) dt\nonumber \\\le & {} Lh_{n}\left\{ \int _{\mathcal {X}}t^2K(t)dt\right\} ^{1/2}=C_2h_{n}, \end{aligned}$$

(A3)

where $C_2=L\left\{ \int _{\mathcal {X}}t^2K(t)dt\right\} ^{1/2}$.

The Kullback–Leibler divergence between f and $f_n^*$ is

$$\begin{aligned} D(f_n^*\Vert f)=\int _{\mathcal {X}} f_n^*(x)\log \frac{f_n^*(x)}{f(x)}dx. \end{aligned}$$

By the inequality $\log x\le (x-1)$, the expectation of $D(f_n^*\Vert f)$ satisfies:

$$\begin{aligned} E[D(f_n^*\Vert f)]= & {} E\left[ \int _{\mathcal {X}} f_n^*(x)\log \frac{f_n^*(x)}{f(x)}dx\right] \nonumber \\\le & {} E\left[ \int _{\mathcal {X}}f_n^*(x)\left( \frac{f_n^*(x)}{f(x)}-1\right) dx\right] \nonumber \\= & {} E\left[ \int _{\mathcal {X}}f_n^*(x)\frac{f_n^*(x)}{f(x)}dx\right] -1\nonumber \\= & {} \int _{\mathcal {X}}E\left[ f_n^*(x)\frac{f_n^*(x)}{f(x)}\right] dx-1\nonumber \\= & {} \int _{\mathcal {X}}\frac{E[f_n^*(x)]^2-f^{2}(x)}{f(x)}dx \nonumber \\\le & {} \int _{\mathcal {X}}\frac{|E[f_n^*(x)]^2-f^{2}(x)|}{f(x)}dx. \end{aligned}$$

(A4)

The penultimate “=” sign holds following from Fubini theorem.

According to (A1) and (A3), we can obtain

$$\begin{aligned} |E[f_n^*(x)]^2-f^{2}(x)|{} & {} =\left|E[f_n^*(x)-f(x)+f(x)]^2-f^{2}(x)\right|\nonumber \\{} & {} =\left|E[f_n^*(x)-f(x)]^2+2f(x)E[f_n^*(x)-f(x)] \right|\nonumber \\{} & {} \le E[f_n^*(x)-f(x)]^2+2f(x)\left|E[f_n^*(x)]-f(x)\right|\nonumber \\{} & {} = E\left\{ f_n^*(x)-E[f_n^*(x)]+E(f_n^*(x))- f(x)\right\} ^2\nonumber \\{} & {} \quad +2f(x)\left|E[f_n^*(x)]-f(x)\right|\nonumber \\{} & {} = { \text{ var } }(f_n^{*}(x)) +\left|E[f_n^{*}(x)]-f(x)\right|^2+2f(x)\left|E[f_n^*(x)]-f(x)\right|\nonumber \\{} & {} \le { \text{ var } }(f_n^{*}(x))+C_{2}^{2}h_{n}^{2}+2f(x)\left|E[f_n^*(x)]-f(x)\right|\nonumber \\{} & {} \le \frac{C_{1}}{nh_n}+C_{2}^{2}h_{n}^{2}+2C_{2}h_{n}f(x) \overset{\Delta }{=}G_n(x). \end{aligned}$$

(A5)

Obviously, $\frac{G_n(x)}{f(x)}$ is monotonically decreasing in n, and $\lim _{n\rightarrow \infty }\frac{G_n(x)}{f(x)}=0$. So, $\forall n$, $\frac{G_n(x)}{f(x)} \le \frac{G_1(x)}{f(x)}$. Due to $\int _{\mathcal {X}}\frac{G_1(x)}{f(x)}dx < \infty $, then, by the Lebesgue’s convergence theorem (Billingsley 2008), we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _{\mathcal {X}}\frac{G_n(x)}{f(x)}dx=\int _{\mathcal {X}}\lim _{n\rightarrow \infty }\frac{G_n(x)}{f(x)}dx=0. \end{aligned}$$

(A6)

Note that

$$\begin{aligned} \int _{\mathcal {X}}\frac{|E[f_n^*(x)^2]-f^{2}(x)|}{f(x)}dx \le \int _{\mathcal {X}}\frac{G_n(x)}{f(x)}dx \end{aligned}$$

and we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _{\mathcal {X}}\frac{|E[f_n^*(x)^2]-f^{2}(x) |}{f(x)}dx=0. \end{aligned}$$

(A7)

Then the conclusion $\lim _{n\rightarrow \infty } E[D(f_n^*\Vert f)]=0$ is established.

To prove Theorems 1 and 2, we also need the following lemma. $\square $

Lemma 2

Let f and g be two density functions supported on $\mathcal {X}\subseteq R^d$, then

(a)
$$\begin{aligned} V(f, g)\overset{\textrm{def}}{=}\sup _{A\in \mathcal {B}}\left|\int _{A} (f(\textbf{x})-g(\textbf{x}))d\textbf{x}\right|=\frac{1}{2}\int _{\mathcal {X}}|f(\textbf{x})-g(\textbf{x})|d\textbf{x}, \end{aligned}$$
where $\mathcal {B}$ is the Borel $\sigma $-algebra of $\mathcal {X}$.
(b)
$$\begin{aligned} 2V^{2}(f, g) \le D(g \Vert f). \end{aligned}$$

This Lemma can be obtained by Scheffé’s theorem (refer to Tsybakov 2009 p.84) and Pinsker’s inequality [refer to Tsybakov (2009, p. 88)], respectively.

Proof of Theorem 1

Define the sequence of random variables $\{\textbf{x}_j^*\}_{j=1}^{\infty } \overset{i.i.d.}{\sim }\ f$, and let $f_n^*$ denote the kernel density estimator on $\{\textbf{x}_j^*\}_{j=1}^{n}$. According the Lemma 1, $\lim _{n\rightarrow \infty } E[D(f_n^*\Vert f)]=0$.

Consider now the kernel density estimator $f_{n}^{KL}$ on the KL points $\{\varvec{\xi }_{i}\}_{i=1}^n$. By the definition of KL points,

$$\begin{aligned} D(f_{n}^{KL}\Vert f) \le E[D(f_n^*\Vert f)], \end{aligned}$$

so $\lim _{n\rightarrow \infty }D(f_{n}^{KL}\Vert f)=0$.

Using Pinsker’s inequality in Lemma 2 (b),

$$\begin{aligned} \frac{1}{2}\left( \int _{\mathcal {X}} |f_{n}^{KL}(\textbf{x})-f(\textbf{x})|d\textbf{x}\right) ^2\le D(f_{n}^{KL}\Vert f), \end{aligned}$$

then conclusion $\lim _{n\rightarrow \infty } \int _{\mathcal {X}} |f_{n}^{KL} (\textbf{x})-f(\textbf{x})|d\textbf{x}=0$ follows.

Proof of Theorem 2

Due to $f_{n}^{KL}$ is the kernel density estimator on KL points $\{\varvec{\xi }_{i}\}_{i=1}^n$ and satisfies (K1)–(K5), then

$$\begin{aligned} \sup _{A \in \mathcal {B}} \left|\int _{A} [f_{n}^{KL}(\textbf{x})-f(x)]d\textbf{x}\right|=\frac{1}{2} \int _{\mathcal {X}} |f_{n}^{KL}(\textbf{x})-f(\textbf{x})|d\textbf{x} \end{aligned}$$

(A8)

based on Lemma 2 (a). Combing Theorem 1 and (A8), we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \sup _{A \in \mathcal {B}} \left|\int _{A} [f_{n}^{KL}(\textbf{x})-f(\textbf{x})]d\textbf{x}\right|=0. \end{aligned}$$

(A9)

Set $A=(-\infty , \textbf{x}]\in \mathcal {B}$, we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \sup _{\textbf{x}}|F_{n}(\textbf{x})-F(\textbf{x})|=0. \end{aligned}$$

where $F_n$ is the cumulative distribution function of density function $f_{n}^{KL}$, and F is the cumulative distribution function of f. $\square $

Two lemmas will be needed for the proof of Theorem 3.

Lemma 3

Suppose kernel K satisfys (K1)–(K5). Then,

(a)
$\forall $ $\epsilon >0$, there exist $M>0$, such that $\int _{[-M, M]^d}K(\textbf{t})d\textbf{t}\ge 1-\epsilon $.
(b)
For the above $M>0$, $\exists $ $\textbf{y}_0$, such that $\varphi (\textbf{x}, \textbf{y}_0)\ge 1/3$, where
$$\begin{aligned} \varphi (\textbf{x}, \textbf{y})=\int _{\prod _{i=1}^d[x_i-y_i, x_i+y_i]}K(\textbf{t})d\textbf{t}, \end{aligned}$$
$\textbf{x}=(x_1, \ldots , x_d)\in [-M, M]^d$ and $\textbf{y}=(y_1, \ldots , y_d)$.

Proof

(a) follows from is a density function.

(b) Note that, $\lim _{\textbf{y}\rightarrow +\infty } \varphi (\textbf{x}, \textbf{y})\ge 1/2$ for any $\textbf{x}=(x_1, \ldots , x_d)\in [-M, M]^d$, then conclusion follows. $\square $

Lemma 4

Let $\{\varvec{\xi }_{i}\}_{i=1}^n$ be the KL points of f. Then

$$\begin{aligned} \lim _{n\rightarrow \infty } \sup \limits _{\textbf{x}\in \mathcal {X}}\frac{N_{\textbf{x}}}{n}=0, \end{aligned}$$

where

$$\begin{aligned} N_{\textbf{x}}=\sum _{i=1}^{n}I\left( \frac{\textbf{x}-\varvec{\xi }_{i}}{h_n}\in [-M, M]^d\right) , \end{aligned}$$

$h_n$ is the bandwidth used to generate KL points and M is defined in Lemma 3.

Proof

For simplicity, we take $d = 1$ for example. Due to $\forall $ $\delta >0 $, $\lim _{n\rightarrow \infty }\frac{\delta }{h_n}=+\infty $, so $\exists $ $n_0$, such that when $n> n_0$, $\frac{\delta }{h_{n}}\ge y_0$ holds, where $y_0$ satisfies $I(|x|\le M)\varphi (x, y_0)\ge 1/3$ defined in Lemma 3.

We use reduction to absurdity to prove this result. If Lemma 4 doesn’t hold, then $\exists $ $x^{*}$, for $\forall $ $ N>n_0$, $\exists $ $n_{k}>N$, such that $ \frac{N_{x^*}}{n_k}\ge c_0$, which means

$$\begin{aligned} \frac{\sum _{i=1}^{n_k}I(|\frac{x^*-\xi _i}{h_{n_k}}|\le M)}{n_k}\ge c_0, \end{aligned}$$

where $c_0$ is a positive constant.

Then, we have

$$\begin{aligned} F_{n_k}(x^*+\delta )-F_{n_k}(x^*-\delta )= & {} \frac{1}{n_k}\sum _{i=1}^{n_k}\int _{x^*-\delta }^{x^*+\delta }\frac{1}{h_{n_k}}K\left( \frac{t-\xi _i}{h_{n_k}}\right) dt\nonumber \\= & {} \frac{1}{n_k}\sum _{i=1}^{n_k}\int _{\frac{x^*-\xi _i}{h_{n_k}}-\frac{\delta }{h_{n_k}}}^{\frac{x^*-\xi _i}{h_{n_k}}+\frac{\delta }{h_{n_k}}}K(z)dz\nonumber \\\ge & {} \frac{1}{n_k}\sum _{i=1}^{n_k}I\left( \left|\frac{x^*-\xi _i}{h_{n_k}}\right|\le M\right) \int _{\frac{x^*-\xi _i}{h_{n_k}}-y_0}^{\frac{x^*-\xi _i}{h_{n_k}}+y_0}K(z)dz\nonumber \\\ge & {} \frac{\sum _{i=1}^{n_k}I(|\frac{x^{*}-\xi _i}{h_{n_k}}|\le M)}{3n_k}\nonumber \\\ge & {} \frac{c_0}{3}, \end{aligned}$$

where $F_{n_k}$ is the cumulative distribution function correspond the kernel density estimator $f_{n_k}$, which used to generated KL points. The penultimate “$\ge $" holds by Lemma 3. This contradicts to $F_{n_k}$ is a continuous distribution. We conclude the proof. $\square $

Proof of Theorem 3

Let $F_n^{^{KL}}$ denote the standard empirical distribution of KL ponits $\{\varvec{\xi }_{i}\}_{i=1}^n$, $F_{n}$ denote the cumulative distribution function of $f_{n}^{KL}$. We first prove that

$$\begin{aligned} \lim _{n \rightarrow \infty }\sup _\textbf{x}|F_n(\textbf{x})-F_n^{^{KL}}(\textbf{x})|=0. \end{aligned}$$

For simplicity, we take $d=1$, $\forall $ $x \in \mathcal {X}$,

$$\begin{aligned} |F_{n}(x)-F_n^{^{KL}}(x)|= & {} \frac{1}{n}\sum _{i=1}^{n}\left|\int _{-\infty }^{x}\frac{1}{h_{n}}K\left( \frac{t-\xi _{i}}{h_n}\right) dt-I(\xi _i \le x)\right|\\= & {} \frac{1}{n}\sum _{i=1}^{n}\left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-I(\xi _i \le x) \right|. \end{aligned}$$

If $\xi _{i} \le x$, then $\frac{x-\xi _{i}}{h_n} \ge 0$, and

$$\begin{aligned} \left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-I(\xi _i \le x)\right|= & {} \left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-1\right|\\= & {} \int _{\frac{x-\xi _{i}}{h_n}}^{ +\infty }K(z)dz\\\le & {} I\left( 0\le \frac{x-\xi _{i}}{h_n}\le M\right) \int _{ \frac{x-\xi _{i}}{h_n}}^ {M}K(z)dz+\frac{ \epsilon }{2}, \end{aligned}$$

where M and $ \epsilon $ are defined as in Lemma 3, i.e. $\forall $ $ \epsilon >0$, there exist $M>0$, such that $\int _{-M}^{M}K(t)dt\ge 1-\epsilon $, and $\int _{-\infty }^{-M}K(t)dt=\int _{M}^{+\infty }K(t)dt<\frac{\epsilon }{2}$.

If $\xi _{i} > x$, then $\frac{x-\xi _{i}}{h_n} <0$, and

$$\begin{aligned} \left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-I(\xi _i \le x)\right|= & {} \left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-0\right|\\\le & {} I\left( -M\le \frac{x-\xi _{i}}{h_n}< 0\right) \int _{-M}^{ \frac{x-\xi _{i}}{h_n}}K(z)dz+\frac{ \epsilon }{2}. \end{aligned}$$

In summary, for $\forall $ $x\in \mathcal {X}$ the absolute error between $F_{n}(x)$ and $F_n^{^{KL}}(x)$ is

$$\begin{aligned} 0\le |F_{n}(x)-F_n^{^{KL}}(x)|\le & {} \frac{1}{n}\sum _{i=1}^{n}I\left( \left|\frac{x-\xi _{i}}{h_n}\right|\le M\right) \int _{ |\frac{x-\xi _{i}}{h_n}|}^{ M}K(z)dz+\frac{ \epsilon }{2}\nonumber \\\le & {} \frac{1}{2n}\sum _{i=1}^{n}I\left( \left|\frac{x-\xi _{i}}{h_n}\right|\le M\right) +\frac{ \epsilon }{2}. \end{aligned}$$

(A10)

In term of Lemma 4,

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _x \frac{1}{n}\sum _{i=1}^{n}I\left( \left|\frac{x-\xi _{i}}{h_n}\right|\le M\right) =0. \end{aligned}$$

Hence, $\forall $ $\epsilon > 0$, there exist N, such that when $n\ge N$, for $\forall $ $x\in \mathcal {X}$, we have

$$\begin{aligned} 0\le |F_{n}(x)-F_n^{^{KL}}(x)|\le \frac{\epsilon }{2}+\frac{\epsilon }{2}=\epsilon . \end{aligned}$$

Consequently, we obtain $\lim _{n \rightarrow \infty }\sup _{\textbf{x}} |F_{n}(x)-F_n^{^{KL}}(x)|=0$. The proof for $d > 1$ is similar, hence

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{\textbf{x}} |F_{n}(\textbf{x})-F_n^{^{KL}}(\textbf{x})|=0. \end{aligned}$$

(A11)

Due to

$$\begin{aligned} \sup _{\textbf{x}}|F_n^{^{KL}}(\textbf{x})-F(\textbf{x})|\le \sup _{\textbf{x}}|F_n^{^{KL}}(\textbf{x})-F_{n}(\textbf{x})|+\sup _{\textbf{x}}|F_{n}(\textbf{x})-F(\textbf{x})|, \end{aligned}$$

and combing with Theorem 2, we have

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{\textbf{x}} |F_n^{^{KL}}(\textbf{x})-F(\textbf{x})|=0. \end{aligned}$$

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, S., Sun, F. Deterministic sampling based on Kullback–Leibler divergence and its applications. Stat Papers (2023). https://doi.org/10.1007/s00362-023-01449-6

Download citation

Received: 05 November 2022
Revised: 06 April 2023
Published: 27 April 2023
DOI: https://doi.org/10.1007/s00362-023-01449-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deterministic sampling based on Kullback–Leibler divergence and its applications

Abstract

Access this article

Similar content being viewed by others

Sample size recommendations for studies on reliability and measurement error: an online application based on simulation studies

A new computational framework for log-concave density estimation

Overfitting, Model Tuning, and Evaluation of Prediction Performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proofs

Lemma 1

Proof

Lemma 2

Proof of Theorem 1

Proof of Theorem 2

Lemma 3

Proof

Lemma 4

Proof

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Deterministic sampling based on Kullback–Leibler divergence and its applications

Abstract

Access this article

Similar content being viewed by others

Sample size recommendations for studies on reliability and measurement error: an online application based on simulation studies

A new computational framework for log-concave density estimation

Overfitting, Model Tuning, and Evaluation of Prediction Performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proofs

Appendix A: Proofs

Lemma 1

Proof

Lemma 2

Proof of Theorem 1

Proof of Theorem 2

Lemma 3

Proof

Lemma 4

Proof

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation