Skip to main content
Log in

Deterministic sampling based on Kullback–Leibler divergence and its applications

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

This paper introduces a new way to extract a set of representative points from a continuous distribution, which focuses on a method where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when the size of points is small. These points are generated by minimizing the Kullback–Leibler divergence, which is an information-based measure of the disparity between two probability distributions. We refer to these points as Kullback–Leibler points. Based on the link between the total variation and the Kullback–Leibler divergence, we prove that the empirical distribution of Kullback–Leibler points converges to the target distribution. Additionally, we illustrate that Kullback–Leibler points have advantages in simulations when compared with representative points generated by Monte Carlo or other representative points methods. In addition, to prevent the frequent evaluation of complex functions, a sequential version of Kullback–Leibler points is proposed, which adaptively updates the representative points by learning about the complex or unknown functions sequentially. Two potential applications of Kullback-Leibler points in simulation of complex probability densities and optimization of complex response surfaces are discussed and demonstrated with examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Billingsley P (2008) Probability and measure. Wiley, New York

    MATH  Google Scholar 

  • Brooks S, Gelman A, Jones G, Meng XL (2011) Handbook of Markov chain Monte Carlo. Chapman and Hall/CRC, Boca Raton

    Book  MATH  Google Scholar 

  • Chen WY, Mackey L, Gorham J, Briol FX, Oates CJ (2018) Stein points. In Proceedings of the 35th international conference on machine learning, vol 80, pp 844–853

  • Cover TM, Thomas JA (2006) Elements of information theory. Wiley, New York

    MATH  Google Scholar 

  • Dick J, Kuo FY, Sloan IH (2013) High-dimensional integration: the quasi-Monte Carlo way. Acta Numer 22:133–573

    Article  MathSciNet  MATH  Google Scholar 

  • Dudewicz EJ, Van DMEC (1981) Entropy-based tests of uniformity. J Am Stat Assoc 76:967–974

    Article  MathSciNet  MATH  Google Scholar 

  • Fang KT, Li RZ, Sudijanto A (2006) Designs and modeling for computer experiments. Chapman and Hall/CRCl, Boca Raton

    Google Scholar 

  • Fasshauer G (2007) Meshfree approximation methods with MATLAB. World Scientific, Singapore

    Book  MATH  Google Scholar 

  • Haario H, Saksman E, Tamminen J (1999) Adaptive proposal distribution for random walk Metropolis algorithm. Comput Stat 14:375–395

    Article  MATH  Google Scholar 

  • Härdle WG, Werwatz A, Müller M, Sperlich S (2004) Nonparametric and semiparametric models. Springer, New York

    Book  MATH  Google Scholar 

  • Hickernell F (1998) A generalized discrepancy and quadrature error bound. Math Comput 67:299–322

    Article  MathSciNet  MATH  Google Scholar 

  • Lin CD, Tang BX (2015) Latin hypercubes and space-filling designs. Handb Des Anal Exp Chap 17:593–626

    MathSciNet  MATH  Google Scholar 

  • Joseph VR, Dasgupta T, Tuo R, Wu CFJ (2015) Sequential exploration of complex surfaces using minimum energy designs. Technometrics 57:64–74

    Article  MathSciNet  Google Scholar 

  • Joseph VR, Wang DP, Gu L, Lv SJ, Tuo R (2019) Deterministic sampling of expensive posteriors using minimum energy designs. Technometrics 61:297–308

    Article  MathSciNet  Google Scholar 

  • Jourdan A, Franco J (2010) Optimal Latin hypercube designs for the Kullback–Leibler criterion. AStA Adv Stat Anal 94:341–351

    Article  MathSciNet  MATH  Google Scholar 

  • Kennedy MC, O’Hagan A (2001) Bayesian calibration of computer models (with discussion). J R Stat Soc B 63:425–464

    Article  MathSciNet  MATH  Google Scholar 

  • Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22:79–86

    Article  MathSciNet  MATH  Google Scholar 

  • Mak S, Joseph VR (2017) Projected support points: a new method for high-dimensional data reduction. arXiv: 1708.06897

  • Mak S, Joseph VR (2018) Support points. Ann Stat 46:2562–2592

    Article  MathSciNet  MATH  Google Scholar 

  • McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239–245

    MathSciNet  MATH  Google Scholar 

  • Morris MD, Mitchell TJ (2018) Exploratory designs for computer experiments. J Stat Plan Inference 43:381–402

    Article  MATH  Google Scholar 

  • Miettinen K (2012) Nonlinear multiobjective optimization. Springer, New York

    MATH  Google Scholar 

  • Sacks J, Schiller SB, Welch WJ (1989) Designs for computational experiments. Technometrics 31:41–47

    Article  MathSciNet  Google Scholar 

  • Santner TJ, Williams BJ, Notz WI (2019) The design and analysis of computer experiments. Springer, New York

    MATH  Google Scholar 

  • Shi CL, Tang BX (2020) Construction results for strong orthogonal arrays of strength three. Bernoulli 26:418–431

    Article  MathSciNet  MATH  Google Scholar 

  • Sobol’ IM (1967) On the distribution of points in a cube and the approximate evaluation of integrals. Zh Vychisl Mat Mat Fiz 7:784–802

    MathSciNet  Google Scholar 

  • Tsybakov AB (2009) Introduction to nonparametric estimation. Springer, New York

    Book  MATH  Google Scholar 

  • Wang Q, Kulkarni SR, Verdú S (2006) A nearest-neighbor approach to estimating divergence between continuous random vectors. In: IEEE international symposium on information theory

  • Worley BA (1987) Deterministic uncertainty analysis. Technical Report ORNL-6428. Oak Ridge National Laboratories

  • Wu Y, Ghosal S (2008) Kullback Leibler property of kernel mixture priors in Bayesian density estimation. Electron J Stat 2:298–331

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Grant Nos. 11971098, 11471069 and 12131001) and National Key Research and Development Program of China (Grant Nos. 2020YFA0714102 and 2022YFA1003701).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fasheng Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proofs

Appendix A: Proofs

This appendix material provides proofs of Theorems 13. Define \(f_n^*\) as the kernel density estimator on \(\{\textbf{x}_j^*\}_{j=1}^{n}\), where \(\{\textbf{x}_j^*\}_{j=1}^{n} \overset{i.i.d.}{\sim }\ f(\textbf{x})\). The proof of Theorem 1 relies on Lemma 1 below, which indicates that the expectation of \(D(f_n^*\Vert f)\) converges to 0 as \(n \rightarrow \infty \).

Lemma 1

Suppose probability density function f satisfies conditions (A1)–(A2) and that \(\{\textbf{x}_j^*\}_{j=1}^{n} \overset{i.i.d.}{\sim }\ f\). \(f_n^*\) is the kernel density estimator on \(\{\textbf{x}_j^*\}_{j=1}^{n}\), and the kernel function K and the bandwidth \(h_n\) satisfy (K1)–(K5). Then,

$$\begin{aligned} \lim _{n\rightarrow \infty } E[D(f_n^*\Vert f)]=0. \end{aligned}$$

Proof

For simplicity, we take \(d=1\) for example, the proof for \(d>1\) is similar. Let \(\{x_j^*\}_{j=1}^{n} \overset{i.i.d.}{\sim }\ f\), \(f_n^*(x)=\frac{1}{nh_{n}}\sum _{i=1}^n K(\frac{x-x_i^*}{h_{n}})\) be the kernel density estimator based on \(\{x_j^*\}_{j=1}^{n}\), and kernel function K and the bandwidth \(h_n\) satisfy (K1)–(K5).

Since f satisfies (A1)-(A2), then f is continuous over \(\mathcal {X}\). And note that f is a density function then we know that there exists constant \(f_{\max }\) such \(f\le f_{\max } <\infty \) holds. The variance of \(f_n^*(x)\) satisfies:

$$\begin{aligned} { \text{ var } }(f_n^{*}(x))= & {} { \text{ var } }\left( \frac{1}{nh_{n}}\sum _{i=1}^n K\left( \frac{x-x_i^*}{h_{n}}\right) \right) \nonumber \\\le & {} \frac{1}{nh_{n}^2}E\left[ K^{2}\left( \frac{x-x_1^*}{h_n}\right) \right] \nonumber \\= & {} \frac{1}{nh_{n}^2}\int _{\mathcal {X}}K^{2}\left( \frac{z-x}{h_n}\right) f(z)dz\nonumber \\= & {} \frac{1}{nh_{n}}\int _{\mathcal {X}}K^{2}(t)f(th_{n}+x)dt\nonumber \\\le & {} \frac{f_{\max }}{nh_{n}}\int _{\mathcal {X}}K^2(t)dt=\frac{C_1}{nh_{n}}, \end{aligned}$$
(A1)

where \(C_1=f_{\max } \Vert K\Vert _{2}^{2}\), \(\Vert K\Vert _{2}^{2}=\int _{\mathcal {X}}K^{2}(t)dt\).

The bias of \(f_n^*(x)\) have the following results.

$$\begin{aligned} E[f_n^{*}(x)]-f(x)= & {} \frac{1}{nh_{n}}\sum _{i=1}^{n}E\left[ K\left( \frac{x-x_i^*}{h_{n}}\right) \right] -f(x)\nonumber \\= & {} \frac{1}{h_{n}}\int _{\mathcal {X}}K\left( \frac{z-x}{h_{n}}\right) f(z)dz-f(x)\nonumber \\= & {} \int _{\mathcal {X}}K(t)f(th_{n}+x)dt-f(x)\nonumber \\= & {} \int _{\mathcal {X}}K(t)[f(th_{n}+x)-f(x)]dt. \end{aligned}$$
(A2)

In term of f satisfies \(\forall \) \(x, y \in \mathcal {X}\), \(|f(x)-f(y)|\le L|x- y|\). Hence, we obtain

$$\begin{aligned} |E[f_n^{*}(x)]-f(x)|\le & {} \int _{\mathcal {X}}K(t)|Lth_{n}|dt\nonumber \\= & {} Lh_{n}\int _{\mathcal {X}}|t|K(t) dt\nonumber \\\le & {} Lh_{n}\left\{ \int _{\mathcal {X}}t^2K(t)dt\right\} ^{1/2}=C_2h_{n}, \end{aligned}$$
(A3)

where \(C_2=L\left\{ \int _{\mathcal {X}}t^2K(t)dt\right\} ^{1/2}\).

The Kullback–Leibler divergence between f and \(f_n^*\) is

$$\begin{aligned} D(f_n^*\Vert f)=\int _{\mathcal {X}} f_n^*(x)\log \frac{f_n^*(x)}{f(x)}dx. \end{aligned}$$

By the inequality \(\log x\le (x-1)\), the expectation of \(D(f_n^*\Vert f)\) satisfies:

$$\begin{aligned} E[D(f_n^*\Vert f)]= & {} E\left[ \int _{\mathcal {X}} f_n^*(x)\log \frac{f_n^*(x)}{f(x)}dx\right] \nonumber \\\le & {} E\left[ \int _{\mathcal {X}}f_n^*(x)\left( \frac{f_n^*(x)}{f(x)}-1\right) dx\right] \nonumber \\= & {} E\left[ \int _{\mathcal {X}}f_n^*(x)\frac{f_n^*(x)}{f(x)}dx\right] -1\nonumber \\= & {} \int _{\mathcal {X}}E\left[ f_n^*(x)\frac{f_n^*(x)}{f(x)}\right] dx-1\nonumber \\= & {} \int _{\mathcal {X}}\frac{E[f_n^*(x)]^2-f^{2}(x)}{f(x)}dx \nonumber \\\le & {} \int _{\mathcal {X}}\frac{|E[f_n^*(x)]^2-f^{2}(x)|}{f(x)}dx. \end{aligned}$$
(A4)

The penultimate “=” sign holds following from Fubini theorem.

According to (A1) and (A3), we can obtain

$$\begin{aligned} |E[f_n^*(x)]^2-f^{2}(x)|{} & {} =\left|E[f_n^*(x)-f(x)+f(x)]^2-f^{2}(x)\right|\nonumber \\{} & {} =\left|E[f_n^*(x)-f(x)]^2+2f(x)E[f_n^*(x)-f(x)] \right|\nonumber \\{} & {} \le E[f_n^*(x)-f(x)]^2+2f(x)\left|E[f_n^*(x)]-f(x)\right|\nonumber \\{} & {} = E\left\{ f_n^*(x)-E[f_n^*(x)]+E(f_n^*(x))- f(x)\right\} ^2\nonumber \\{} & {} \quad +2f(x)\left|E[f_n^*(x)]-f(x)\right|\nonumber \\{} & {} = { \text{ var } }(f_n^{*}(x)) +\left|E[f_n^{*}(x)]-f(x)\right|^2+2f(x)\left|E[f_n^*(x)]-f(x)\right|\nonumber \\{} & {} \le { \text{ var } }(f_n^{*}(x))+C_{2}^{2}h_{n}^{2}+2f(x)\left|E[f_n^*(x)]-f(x)\right|\nonumber \\{} & {} \le \frac{C_{1}}{nh_n}+C_{2}^{2}h_{n}^{2}+2C_{2}h_{n}f(x) \overset{\Delta }{=}G_n(x). \end{aligned}$$
(A5)

Obviously, \(\frac{G_n(x)}{f(x)}\) is monotonically decreasing in n, and \(\lim _{n\rightarrow \infty }\frac{G_n(x)}{f(x)}=0\). So, \(\forall n\), \(\frac{G_n(x)}{f(x)} \le \frac{G_1(x)}{f(x)}\). Due to \(\int _{\mathcal {X}}\frac{G_1(x)}{f(x)}dx < \infty \), then, by the Lebesgue’s convergence theorem (Billingsley 2008), we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _{\mathcal {X}}\frac{G_n(x)}{f(x)}dx=\int _{\mathcal {X}}\lim _{n\rightarrow \infty }\frac{G_n(x)}{f(x)}dx=0. \end{aligned}$$
(A6)

Note that

$$\begin{aligned} \int _{\mathcal {X}}\frac{|E[f_n^*(x)^2]-f^{2}(x)|}{f(x)}dx \le \int _{\mathcal {X}}\frac{G_n(x)}{f(x)}dx \end{aligned}$$

and we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _{\mathcal {X}}\frac{|E[f_n^*(x)^2]-f^{2}(x) |}{f(x)}dx=0. \end{aligned}$$
(A7)

Then the conclusion \(\lim _{n\rightarrow \infty } E[D(f_n^*\Vert f)]=0\) is established.

To prove Theorems 1 and 2, we also need the following lemma. \(\square \)

Lemma 2

Let f and g be two density functions supported on \(\mathcal {X}\subseteq R^d\), then

  1. (a)
    $$\begin{aligned} V(f, g)\overset{\textrm{def}}{=}\sup _{A\in \mathcal {B}}\left|\int _{A} (f(\textbf{x})-g(\textbf{x}))d\textbf{x}\right|=\frac{1}{2}\int _{\mathcal {X}}|f(\textbf{x})-g(\textbf{x})|d\textbf{x}, \end{aligned}$$

    where \(\mathcal {B}\) is the Borel \(\sigma \)-algebra of \(\mathcal {X}\).

  2. (b)
    $$\begin{aligned} 2V^{2}(f, g) \le D(g \Vert f). \end{aligned}$$

This Lemma can be obtained by Scheffé’s theorem (refer to Tsybakov 2009 p.84) and Pinsker’s inequality [refer to Tsybakov (2009, p. 88)], respectively.

Proof of Theorem 1

Define the sequence of random variables \(\{\textbf{x}_j^*\}_{j=1}^{\infty } \overset{i.i.d.}{\sim }\ f\), and let \(f_n^*\) denote the kernel density estimator on \(\{\textbf{x}_j^*\}_{j=1}^{n}\). According the Lemma 1, \(\lim _{n\rightarrow \infty } E[D(f_n^*\Vert f)]=0\).

Consider now the kernel density estimator \(f_{n}^{KL}\) on the KL points \(\{\varvec{\xi }_{i}\}_{i=1}^n\). By the definition of KL points,

$$\begin{aligned} D(f_{n}^{KL}\Vert f) \le E[D(f_n^*\Vert f)], \end{aligned}$$

so \(\lim _{n\rightarrow \infty }D(f_{n}^{KL}\Vert f)=0\).

Using Pinsker’s inequality in Lemma 2 (b),

$$\begin{aligned} \frac{1}{2}\left( \int _{\mathcal {X}} |f_{n}^{KL}(\textbf{x})-f(\textbf{x})|d\textbf{x}\right) ^2\le D(f_{n}^{KL}\Vert f), \end{aligned}$$

then conclusion \(\lim _{n\rightarrow \infty } \int _{\mathcal {X}} |f_{n}^{KL} (\textbf{x})-f(\textbf{x})|d\textbf{x}=0\) follows.

Proof of Theorem 2

Due to \(f_{n}^{KL}\) is the kernel density estimator on KL points \(\{\varvec{\xi }_{i}\}_{i=1}^n\) and satisfies (K1)–(K5), then

$$\begin{aligned} \sup _{A \in \mathcal {B}} \left|\int _{A} [f_{n}^{KL}(\textbf{x})-f(x)]d\textbf{x}\right|=\frac{1}{2} \int _{\mathcal {X}} |f_{n}^{KL}(\textbf{x})-f(\textbf{x})|d\textbf{x} \end{aligned}$$
(A8)

based on Lemma 2 (a). Combing Theorem 1 and (A8), we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \sup _{A \in \mathcal {B}} \left|\int _{A} [f_{n}^{KL}(\textbf{x})-f(\textbf{x})]d\textbf{x}\right|=0. \end{aligned}$$
(A9)

Set \(A=(-\infty , \textbf{x}]\in \mathcal {B}\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \sup _{\textbf{x}}|F_{n}(\textbf{x})-F(\textbf{x})|=0. \end{aligned}$$

where \(F_n\) is the cumulative distribution function of density function \(f_{n}^{KL}\), and F is the cumulative distribution function of f. \(\square \)

Two lemmas will be needed for the proof of Theorem 3.

Lemma 3

Suppose kernel K satisfys (K1)–(K5). Then,

  1. (a)

    \(\forall \) \(\epsilon >0\), there exist \(M>0\), such that \(\int _{[-M, M]^d}K(\textbf{t})d\textbf{t}\ge 1-\epsilon \).

  2. (b)

    For the above \(M>0\), \(\exists \) \(\textbf{y}_0\), such that \(\varphi (\textbf{x}, \textbf{y}_0)\ge 1/3\), where

    $$\begin{aligned} \varphi (\textbf{x}, \textbf{y})=\int _{\prod _{i=1}^d[x_i-y_i, x_i+y_i]}K(\textbf{t})d\textbf{t}, \end{aligned}$$

    \(\textbf{x}=(x_1, \ldots , x_d)\in [-M, M]^d\) and \(\textbf{y}=(y_1, \ldots , y_d)\).

Proof

(a) follows from is a density function.

(b) Note that, \(\lim _{\textbf{y}\rightarrow +\infty } \varphi (\textbf{x}, \textbf{y})\ge 1/2\) for any \(\textbf{x}=(x_1, \ldots , x_d)\in [-M, M]^d\), then conclusion follows. \(\square \)

Lemma 4

Let \(\{\varvec{\xi }_{i}\}_{i=1}^n\) be the KL points of f. Then

$$\begin{aligned} \lim _{n\rightarrow \infty } \sup \limits _{\textbf{x}\in \mathcal {X}}\frac{N_{\textbf{x}}}{n}=0, \end{aligned}$$

where

$$\begin{aligned} N_{\textbf{x}}=\sum _{i=1}^{n}I\left( \frac{\textbf{x}-\varvec{\xi }_{i}}{h_n}\in [-M, M]^d\right) , \end{aligned}$$

\(h_n\) is the bandwidth used to generate KL points and M is defined in Lemma 3.

Proof

For simplicity, we take \(d = 1\) for example. Due to \(\forall \) \(\delta >0 \), \(\lim _{n\rightarrow \infty }\frac{\delta }{h_n}=+\infty \), so \(\exists \) \(n_0\), such that when \(n> n_0\), \(\frac{\delta }{h_{n}}\ge y_0\) holds, where \(y_0\) satisfies \(I(|x|\le M)\varphi (x, y_0)\ge 1/3\) defined in Lemma 3.

We use reduction to absurdity to prove this result. If Lemma 4 doesn’t hold, then \(\exists \) \(x^{*}\), for \(\forall \) \( N>n_0\), \(\exists \) \(n_{k}>N\), such that \( \frac{N_{x^*}}{n_k}\ge c_0\), which means

$$\begin{aligned} \frac{\sum _{i=1}^{n_k}I(|\frac{x^*-\xi _i}{h_{n_k}}|\le M)}{n_k}\ge c_0, \end{aligned}$$

where \(c_0\) is a positive constant.

Then, we have

$$\begin{aligned} F_{n_k}(x^*+\delta )-F_{n_k}(x^*-\delta )= & {} \frac{1}{n_k}\sum _{i=1}^{n_k}\int _{x^*-\delta }^{x^*+\delta }\frac{1}{h_{n_k}}K\left( \frac{t-\xi _i}{h_{n_k}}\right) dt\nonumber \\= & {} \frac{1}{n_k}\sum _{i=1}^{n_k}\int _{\frac{x^*-\xi _i}{h_{n_k}}-\frac{\delta }{h_{n_k}}}^{\frac{x^*-\xi _i}{h_{n_k}}+\frac{\delta }{h_{n_k}}}K(z)dz\nonumber \\\ge & {} \frac{1}{n_k}\sum _{i=1}^{n_k}I\left( \left|\frac{x^*-\xi _i}{h_{n_k}}\right|\le M\right) \int _{\frac{x^*-\xi _i}{h_{n_k}}-y_0}^{\frac{x^*-\xi _i}{h_{n_k}}+y_0}K(z)dz\nonumber \\\ge & {} \frac{\sum _{i=1}^{n_k}I(|\frac{x^{*}-\xi _i}{h_{n_k}}|\le M)}{3n_k}\nonumber \\\ge & {} \frac{c_0}{3}, \end{aligned}$$

where \(F_{n_k}\) is the cumulative distribution function correspond the kernel density estimator \(f_{n_k}\), which used to generated KL points. The penultimate “\(\ge \)" holds by Lemma 3. This contradicts to \(F_{n_k}\) is a continuous distribution. We conclude the proof. \(\square \)

Proof of Theorem 3

Let \(F_n^{^{KL}}\) denote the standard empirical distribution of KL ponits \(\{\varvec{\xi }_{i}\}_{i=1}^n\), \(F_{n}\) denote the cumulative distribution function of \(f_{n}^{KL}\). We first prove that

$$\begin{aligned} \lim _{n \rightarrow \infty }\sup _\textbf{x}|F_n(\textbf{x})-F_n^{^{KL}}(\textbf{x})|=0. \end{aligned}$$

For simplicity, we take \(d=1\), \(\forall \) \(x \in \mathcal {X}\),

$$\begin{aligned} |F_{n}(x)-F_n^{^{KL}}(x)|= & {} \frac{1}{n}\sum _{i=1}^{n}\left|\int _{-\infty }^{x}\frac{1}{h_{n}}K\left( \frac{t-\xi _{i}}{h_n}\right) dt-I(\xi _i \le x)\right|\\= & {} \frac{1}{n}\sum _{i=1}^{n}\left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-I(\xi _i \le x) \right|. \end{aligned}$$

If \(\xi _{i} \le x\), then \(\frac{x-\xi _{i}}{h_n} \ge 0\), and

$$\begin{aligned} \left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-I(\xi _i \le x)\right|= & {} \left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-1\right|\\= & {} \int _{\frac{x-\xi _{i}}{h_n}}^{ +\infty }K(z)dz\\\le & {} I\left( 0\le \frac{x-\xi _{i}}{h_n}\le M\right) \int _{ \frac{x-\xi _{i}}{h_n}}^ {M}K(z)dz+\frac{ \epsilon }{2}, \end{aligned}$$

where M and \( \epsilon \) are defined as in Lemma 3, i.e. \(\forall \) \( \epsilon >0\), there exist \(M>0\), such that \(\int _{-M}^{M}K(t)dt\ge 1-\epsilon \), and \(\int _{-\infty }^{-M}K(t)dt=\int _{M}^{+\infty }K(t)dt<\frac{\epsilon }{2}\).

If \(\xi _{i} > x\), then \(\frac{x-\xi _{i}}{h_n} <0\), and

$$\begin{aligned} \left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-I(\xi _i \le x)\right|= & {} \left|\int _{-\infty }^{ \frac{x-\xi _{i}}{h_n}}K(z)dz-0\right|\\\le & {} I\left( -M\le \frac{x-\xi _{i}}{h_n}< 0\right) \int _{-M}^{ \frac{x-\xi _{i}}{h_n}}K(z)dz+\frac{ \epsilon }{2}. \end{aligned}$$

In summary, for \(\forall \) \(x\in \mathcal {X}\) the absolute error between \(F_{n}(x)\) and \(F_n^{^{KL}}(x)\) is

$$\begin{aligned} 0\le |F_{n}(x)-F_n^{^{KL}}(x)|\le & {} \frac{1}{n}\sum _{i=1}^{n}I\left( \left|\frac{x-\xi _{i}}{h_n}\right|\le M\right) \int _{ |\frac{x-\xi _{i}}{h_n}|}^{ M}K(z)dz+\frac{ \epsilon }{2}\nonumber \\\le & {} \frac{1}{2n}\sum _{i=1}^{n}I\left( \left|\frac{x-\xi _{i}}{h_n}\right|\le M\right) +\frac{ \epsilon }{2}. \end{aligned}$$
(A10)

In term of Lemma 4,

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _x \frac{1}{n}\sum _{i=1}^{n}I\left( \left|\frac{x-\xi _{i}}{h_n}\right|\le M\right) =0. \end{aligned}$$

Hence, \(\forall \) \(\epsilon > 0\), there exist N, such that when \(n\ge N\), for \(\forall \) \(x\in \mathcal {X}\), we have

$$\begin{aligned} 0\le |F_{n}(x)-F_n^{^{KL}}(x)|\le \frac{\epsilon }{2}+\frac{\epsilon }{2}=\epsilon . \end{aligned}$$

Consequently, we obtain \(\lim _{n \rightarrow \infty }\sup _{\textbf{x}} |F_{n}(x)-F_n^{^{KL}}(x)|=0\). The proof for \(d > 1\) is similar, hence

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{\textbf{x}} |F_{n}(\textbf{x})-F_n^{^{KL}}(\textbf{x})|=0. \end{aligned}$$
(A11)

Due to

$$\begin{aligned} \sup _{\textbf{x}}|F_n^{^{KL}}(\textbf{x})-F(\textbf{x})|\le \sup _{\textbf{x}}|F_n^{^{KL}}(\textbf{x})-F_{n}(\textbf{x})|+\sup _{\textbf{x}}|F_{n}(\textbf{x})-F(\textbf{x})|, \end{aligned}$$

and combing with Theorem 2, we have

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{\textbf{x}} |F_n^{^{KL}}(\textbf{x})-F(\textbf{x})|=0. \end{aligned}$$

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Sun, F. Deterministic sampling based on Kullback–Leibler divergence and its applications. Stat Papers (2023). https://doi.org/10.1007/s00362-023-01449-6

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1007/s00362-023-01449-6

Keywords

Mathematics Subject Classification

Navigation