Abstract
We consider the problem of approximating a continuous random variable, characterized by a cumulative distribution function (cdf) F(x), by means of k points, \(x_1<x_2<\dots < x_k\), with probabilities \(p_i\), \(i=1,\dots ,k\). For a given k, a criterion for determining the \(x_i\) and \(p_i\) of the approximating kpoint discrete distribution can be the minimization of some distance to the original distribution. Here we consider the weighted Cramérvon Mises distance between the original cdf F(x) and the stepwise cdf \({\hat{F}}(x)\) of the approximating discrete distribution, characterized by a nonnegative weighting function w(x). This problem has been already solved analytically when w(x) corresponds to the probability density function of the continuous random variable, \(w(x)=F'(x)\), and when w(x) is a piecewise constant function, through a numerical iterative procedure based on a homotopy continuation approach. In this paper, we propose and implement a solution to the problem for different choices of the weighting function w(x), highlighting how the results are affected by w(x) itself and by the number of approximating points k, in addition to F(x); although an analytic solution is not usually available, yet the problem can be numerically solved through an iterative method, which alternately updates the two subsets of k unknowns, the \(x_i\)’s (or a transformation thereof) and the \(p_i\)’s, till convergence. The main apparent advantage of these discrete approximations is their universality, since they can be applied to most continuous distributions, whether they possess or not the first moments. In order to shed some light on the proposed approaches, applications to several wellknown continuous distributions (among them, the normal and the exponential) and to a practical problem where discretization is a useful tool are also illustrated.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Using a discrete approximation of a continuous random variable (rv) is a procedure that is often adopted in many problems where uncertainty is present and needs to be taken into account. Substituting a continuous probability density function (pdf) with an approximating probability mass function (pmf), supported on a (possibly) finite number of points, can heavily reduce the computational burden required for determining a numerical solution for the problem at hand, and can produce an approximate solution whose degree of accuracy is still acceptable. This is particularly true when a problem involves several quantities that should be modelled as continuous rvs and the solution depends on some complex function thereof. The exact solution in this case could be obtained by applying some multivariate numerical integration technique whose computational cost may be severe and dramatically increasing with the number of rvs involved. Approximating each continuous rv through a properly chosen discrete rv allows the researcher to avoid numerical integration and resort to enumeration, which is much easier to manage (Luceno 1999).
An interesting application of discrete approximation of continuous distributions can be found in the field of insurance. If one is required to determine the distribution of the total claim amount \(S=\sum _{i=1}^N X_i\) corresponding to a random number N of i.i.d. claims whose size \(X_i\) is assumed to follow some continuous probability distribution, rather than resorting to integral convolution to determine the exact distribution of S, one can approximate the distribution of the claim size \(X_i\) through discretization and then apply Panjer’s formula (Panjer 1981), which is a recursive formula that exactly calculates the distribution of the total, holding when the claim size is arithmetic with span size \(h>0\) and the distribution of the number of claims N belongs to the (a, b, 0) class.
In the field of quantitative finance, a similar application is the following. Let \(\pmb {L} = (L_1, \dots , L_d)\) denote a vector of possibly dependent rvs, each one representing a loss on a particular trading desk, portfolio or operating unit within a firm, over a fixed time period. Sometimes we need to aggregate these losses into a single rv, typically the sum \(L^+=\sum _{i=1}^d L_i\), on which one can calculate a measure of the aggregate risk, for example, the ValueatRisk, which is nothing else than the quantile at a prespecified level \(0<\alpha <1\) of the distribution of \(L^+\). However, determining this measure of risk, when the joint distribution of \(\pmb {L}\) is fully specified, requires the computation of the distribution of \(L^+\), which is not straightforward to derive even if all the rvs are mutually independent. A possible answer is represented by the discretization of the \(L_i\) and the construction of a joint pmf approximating the joint distribution of \(\pmb {L}\), on which the evaluation of the VaR of the aggregating function is much more straightforward (see, for example, Jamshidian and Zhu (1996), where the authors consider the multivariate Gaussian distribution).
In reliability engineering, the stressstrength model describes a system with random strength which is subject to a random stress during its functioning, so that the system works only when the strength is greater than the stress. The probability that a system correctly works is termed reliability (Johnson 1988). Evaluating the reliability of a system thus requires the knowledge of the probability distributions of both stress and strength; when these latter depend on several stochastic factors, the probability distributions (and then the reliability) are often not analytically tractable and hence some form of approximation is needed. Given a known functional relationship between stress (or strength) and its random subfactors and assuming the subfactors of the stress (or strength) are independent, one feasible approach of approximating the probability distribution of stress (or strength) is through discretization of the subfactors (see, e.g., English et al. 1996).
For an assigned continuous probability distribution and a fixed number of approximating points k, the matter is how to build an “optimal” discrete approximation. Several criteria have been used so far; here is a rough classification into four main categories:

moment equalization or moment matching: the discrete approximation is the one preserving as many moments as possible of the original distribution. Moments’ matching is carried out through a procedure known as Gaussian quadrature (Golub and Welsch 1969); in its more authentic form, the support points of the discrete approximation and their probabilities are derived simultaneously. This is by far the oldest and most popular discretization technique; as stated by Miller and Rice (1983), “Few people would accept an approximation that did not have roughly the same mean, variance, and skew as the original distribution”. However, its application is limited by the finiteness and existence of a closedform expression for the first integer moments of the continuous rv: many random distributions commonly used in quantitative finance, for example, do not possess even lowerorder moments. Several variants of Gaussian quadrature have been proposed; for example, Tanaka and Toda (2013) suggested that the k support points of the approximating distribution have to be chosen “a priori” and the probabilities have to be derived by maximizing the relative entropy to an assigned “reference distribution”, under the constraint that it matches as many moments as possible. Convergence properties of this maximum entropy method and applications to stochastic processes were later presented in Tanaka and Toda (2015), Farmer and Toda (2017);

preservation of the distribution function: the discrete approximation preserves, at each support point, the value of the cdf or, alternatively, of the survival function (Roy and Dasgupta 2001). Actually, this technique has been employed for constructing discrete counterparts of continuous distributions, by defining its pmf as \(p(i)=F(i)F(i1)\), with i integer; such construction automatically supplies a valid pmf, which preserves the cdf of the continuous distribution at any integer value i (Chakraborty 2015). Straightforward modifications have been proposed in order to handle finite supports consisting of possibly noninteger points;

minimizing the mean squared error between the assigned continuous rv \(X\sim F\) and its approximation \({\hat{X}}\), i.e., minimizing \({\mathbb {E}}_F(X{\hat{X}})^2\), where the expected value is computed with respect to the distribution of X; this yields the socalled optimal quantization, which is a wellknown technique in signal theory (Gray and Neuhoff 1998). The minimization problem can be solved iteratively, by alternately solving two sets of conditions, one expressing each support point (also named “quantum”) as the center of mass of the intervals of a partition of the support of the continuous rv, and the other expressing the endpoint of each of these intervals as the midpoint of the segments between successive quanta (Lloyd 1982);

minimizing a distance between the two distribution functions: the discrete approximation is obtained as the distribution minimizing some statistical distance between the two (continuous and stepwise) cdf. In Kennan (2006), an analytical solution is found for the optimal kpoint discrete approximation when employing a particular class of distances.
In this work, we will concentrate on the last class of discrete approximations. More precisely, we propose the use of three distances: the Cramérvon Mises, the AndersonDarling, and the Cramér distance between the cdf of the assigned continuous rv and the stepwise cdf of its kpoint discrete approximation. They can be all regarded as weighted Cramérvon Mises distances with a proper choice of the weighting function. We will derive for these three cases the optimal solution (i.e., the kpoint discrete approximation leading to the minimum distance) either analytically or computationally, providing in the latter case some details about the numerical procedure to be implemented. To the best of our knowledge, these distances have not been used regularly in the existing literature for the discrete approximation of a continuous rv. The aim of this paper is to shed some light on their use and explain their pros and cons.
The rest of the paper is structured as follows. In the next section, we will discuss distances between cdf in general and then present and solve the main research question, that is, finding an optimal kpoint discrete approximation to a continuous random distribution through minimization of a statistical distance. The three statistical distances mentioned above will be considered and the corresponding solutions to the problem will be described and compared, also practically referring to some known parametric families of continuous distributions. Section 3 illustrates two applications of discretization to the (approximate) determination of the distribution or of some parameter of an assigned function of independent rvs, which is carried out by using each of the discrete approximations previously described and also other existing techniques. A software implementation in the R programming environment is presented in Sect. 4. Some comments and remarks are provided in the last section.
2 Optimal kpoint approximation based on minimization of a distance between distribution functions
2.1 Statement of the problem
Let us consider two cdfs F(x) and G(x), and assume that F(x) possesses a density, say f(x), so that we can write \(F(x)=\int _{\infty }^x f(u)\text {d}u\) for any real x. Several statistical distances between such two cdfs have been proposed. The most popular is the KolmogorovSmirnov (KS) distance, defined as \(\sup _{x\in {\mathbb {R}}} F(x)G(x)\), i.e., the maximum (or, better, supremum) absolute difference between the two cdfs. The KS distance is commonly employed when one has to test whether an i.i.d. sample \((x_1,x_2,\dots ,x_n)\) is consistent with some known cdf F; in this case, the distance to be computed is between the assigned F and the empirical cdf \({\tilde{F}}(x)\), defined as \({\tilde{F}}(x)=\frac{1}{n}\sum _{i=1}^n \mathbbm {1}(x_i\le x)\).
Another wide family of distances between cdfs is the following
where w(x) is some nonnegative function on \({\mathbb {R}}\).
If we set \(w(x)\equiv 1\), then we obtain the secondorder squared Cramér distance (Cramér 1928), which is closely related to the socalled energy distance (Rizzo and Székely 2016). If we set \(w(x)\equiv f(x)\), then we obtain the socalled Cramérvon Mises distance, which can be also conveniently written as \(\int _0^1 [uG(F^{1}(u))]^2\text {d}u\), provided that F(x) is invertible. If we set \(w(x)=f(x)[F(x)(1F(x))]^{1}\), then we obtain the AndersonDarling distance; with respect to Cramérvon Mises, this distance puts more weight in the tails of the distribution, i.e., where F(x) is close to either zero or one.
Hanebeck and Klumpp (2008), also referring to the less explored multivariate case, remark “how statistical distances are used for both analysis and synthesis purposes. Analysis is concerned with assessing whether a given sample stems from a given continuous distribution. Synthesis is concerned with both density estimation, i.e., calculating a suitable continuous approximation of a given sample, and density discretization, i.e., approximation of a given continuous random vector by a discrete one”. In this work, we focus on the latter aspect of research.
If one is interested in building an optimal kpoint discrete approximation of a given continuous cdf F, i.e., a discrete probability distribution somehow resembling the original one, then he/she can find it as the discrete distribution minimizing, over all the kpoint discrete distributions, the distance (1), computed between the cdf F and the cdf \({\hat{F}}\) of the discrete approximation, for a given choice of the weighting function w(x).
The problem can be more formally stated as follows. Suppose we want to approximate the continuous probability distribution F by a discrete probability distribution \({\hat{F}}\), consisting of \(k>1\) points \(x_1<x_2<\dots<x_{k1}<x_k\), with probabilities \(p_i\), \(i=1,\dots ,k\) (obviously, \(p_i\ge 0\) for \(i=1,\dots ,k\) and \(\sum _{i=1}^k p_i=1\)). Let us gather the support points and the probabilities of the discrete distribution in a vector \(\pmb {\eta }=(x_1,\dots ,x_k,p_1,\dots ,p_k)\). Then, the optimal discrete distribution can be defined as the one (univocally identified by \(\hat{\pmb {\eta }}\)) minimizing \(d(F(x),{\hat{F}}(x;\pmb {\eta }))\): \(\hat{\pmb {\eta }} = \arg \min d(F,{\hat{F}};\pmb {\eta })\). For the case \(w(x)\equiv f(x)\), corresponding to the Cramérvon Mises distance, an analytical solution is available (Kennan 2006). When w(x) is a piecewise constant function, the problem was solved numerically by Schremp et al. (2006), through an iterative procedure based on a homotopy continuation approach. For any other possible choice of the weighting function w(x), we can solve the minimization problem (at least) numerically.
We start our analysis from the Cramérvon Mises distance.
2.2 Cramérvon Mises
The Cramérvon Mises distance between continuous cdf is one of the distinguished measures of deviation between distributions (Cramér 1928; von Mises 1931). For a probabilistic interpretation, see Baringhaus and Henze (2017). It is obtained by setting \(w(x)\equiv f(x)\) in (1), so that the distance between F and \({\hat{F}}\) becomes
where \(F^{1}\) denotes the mathematical inverse of the cdf F of X, which we assume to be strictly increasing over the support of X; the distance in (2) is a particular case of
The best kpoint discrete approximation (that is, the one yielding the minimum value of the distance \(d_r\)) has been proved (Kennan 2006) to have, for any \(r>0\), equallyweighted support points \(x_i\) (i.e., with probability 1/k each) such that \(F(x_i)=\frac{2i1}{2k}\) or, equivalently,
Here we report the proof, fixing and adding some points left out in Kennan (2006).
Let us introduce the quantities \(q_i\) and \(Q_i\), defined in the following way: \(q_i=F(x_i)\), \(i=1,\dots ,k\), setting \(q_0=0\) and \(q_{k+1}=1\); \(Q_i={\hat{F}}(x_i)\), \(i=1,\dots ,k\), setting \(Q_0=0\); it follows that \(Q_k=1\). Therefore, the \(q_i\) represent the values of the cdf of the continuous rv at the support points \(x_i\) of its discrete approximation; the \(Q_i\) represent the values of the cdf of the discrete approximating rv at its support points \(x_i=F^{1}(q_i)\), so that \(p_i=Q_iQ_{i1}\), \(i=1,\dots ,k\) (see Fig. 1).
The distance (3) can be thus rewritten as
and the firstorder condition on the \(Q_i\) gives
from which \(Q_i=(q_i+q_{i+1})/2\); the firstorder condition on the \(q_i\) provides
from which \(q_i=(Q_{i1}+Q_i)/2\). Combining these two last results together, we obtain
that is, \(Q_i Q_{i1}=Q_{i+1}Q_i\), and being \(Q_0=0\) and \(Q_k=1\), we derive \(Q_i=i/k\) and \(p_i=1/k\), for \(i=1,\dots ,k1\). By substituting this expression in that for the \(q_i\), we obtain
and then \(x_i=F^{1}((2i1)/k)\) for \(i=1,\dots ,k\). Note the particular form of the optimal solution: the specific continuous distribution enters the equation of the support points through the inverse of its cdf F, which is applied to the \(q_i\), which are “distributionfree” quantities; the probabilities \(p_i\) are independent from F as well, being all constant.
Another interesting property of this solution can be pointed out. Letting \({\mathcal {F}}_k\) be the set of discrete distributions with k support points, we have just proved that for each \(r>1\) the kpoint discrete approximation \({\hat{F}}\in {\mathcal {F}}_k\) that satisfies
is the discrete uniform distribution on the set of point \(x_i=F^{1}((2i1)/(2k))\), \(i=1,\dots ,k\). So, being for each \(G\in {\mathcal {F}}_k\)
recalling the relationship between the \(L^r\) norm and the supremum norm (see e.g. Stein and Shakarchi 2011), one obtains
and deduces from this that
that is, for a fixed integer k, the best approximating distribution obtained by minimizing \(d_r\) is the same we would obtain by minimizing \(d_{KS}\).
Another peculiarity of the optimal solution or, better, of the statistical distance used as a criterion for finding an optimal solution, is its allencompassing applicability, since it does not require the existence and finiteness of any integer moment of the original continuous distribution: the distance (3), since it can be rewritten as an integral over the unit interval of a quantity which is finite, can be always computed and always possesses a global minimum, corresponding to the solution derived above. Therefore, it is possible to derive such a discrete approximation for Student’s t, say, with any value of the degreeoffreedom parameter, and other heavytailed distributions, which would not be possible in general if using the momentmatching or quantization techniques (comprising the method by Drezner and Zerom (2016), which can be considered as a compromise between the two techniques): the finiteness of the first \(2k1\) moments is required by the former, the finiteness of the first two moments by the latter. It can be also shown (Barbiero and Hitaj 2022) that \(\lim _{k\rightarrow \infty } {\hat{F}}_k(x) = F(x)\) \(\forall x\in {\mathbb {R}}\), i.e., the approximating kpoint discrete rv converges in distribution to the original rv.
The distance (3) can be further rewritten as the sum of \(k+1\) integrals,
and its minimum value is thus equal to
since for the optimal solution \(q_1=q_{i+1}Q_i=Q_iq_i=1q_k=1/(2k)\), for each \(i=1,\dots ,k1\). Furthermore, as one can expect, the minimum distance is a decreasing function of k for any fixed \(r>0\): by increasing the number of points, the optimal approximating discrete distribution gets closer (in terms of Cramérvon Mises distance) to the continuous one.
Table 1 displays, just for illustrative purposes, kpoint approximations (\(k=5;6;7\)) of a standard normal and an exponential distribution (with unit rate parameter). For each k and for both distributions, values of expectation and variance (simply indicated as \(\mu \) and \(\sigma ^2\)) are reported, in order to be easily compared with the analogous values of the continuous distribution.
For the standard normal distribution, variance and kurtosis of the kpoint discrete approximation monotonically converge to the corresponding value of the original continuous distribution rather slowly: kurtosis, in particular, when \(k=100\), is 2.834. This result could have been expected: although the cdf of this discrete approximation converges to the cdf of the original continuous rv, this does not mean that for a finite k the two functions are so similar: for the normal case, recall that the pdf of the continuous rv is displayed through the classical bellshaped curve, whereas the pmf of the discrete approximation has uniform probabilities, and this overall translates into a mismatch of (even) moments. Recall that moment equalization would produce discrete distributions with a very large range and very small probabilities assigned to the extreme support points (Barbiero and Hitaj 2022).
For the discrete approximation of the exponential distribution, expected value, variance, skewness, and kurtosis tend monotonically and asymptotically to the value of the parent distribution (1, 1, 2, and 9, respectively), but for finite k, it underestimates kurtosis to a large extent; when \(k=7\), the value of kurtosis for the discretized exponential distribution is 2.779 (its expected value is 0.951, its variance 0.686 and its skewness 0.963); when \(k=100\), it is 6.662 (its expected value is 0.997, its variance 0.960 and its skewness 1.759).
2.3 AndersonDarling
What happens if we consider a weighted Cramérvon Mises distance, for example the AndersonDarling distance, characterized by the weighting function \(w(x)\equiv f(x)/[F(x)(1F(x))]\)? We may expect that the best approximating distribution has now unequal probabilities or that the support points are no longer quantiles of equallyspaced orders of the original distribution, as it occurs with Cramérvon Mises distance.
Minimizing the AndersonDarling distance is equivalent to minimizing the following quantity, which consists of the sum of \(k+1\) contributions:
with respect to the \(q_i\) and \(Q_i\). The firstorder condition for \(Q_i\) is
for \(i=1,\dots ,k\), from which
The firstorder condition for \(q_i\) implies
from which we obtain again
The solution derived by Eqs.(6) and (7) cannot be expressed in an analytic closed form for each \(q_i\) and \(Q_i\). However, a simple iterative algorithm can be implemented with the aim of recovering their values numerically; as initial guess values for \(q_i\) and \(Q_i\), we adopt their optimal values found by minimizing the Cramérvon Mises distance. In a similar fashion as done in Pavlikov and Uryasev (2018), one can think of alternatively updating the values of the probabilities \(p_i\) and the values of the discrete points \(x_i\) (or, better, their probability transforms \(q_i\)) till convergence. Note that in Pavlikov and Uryasev (2018) all the \(p_i\) are initially set equal to 1/k, as for the analytical solution based on the Cramérvon Mises distance. The algorithm works as follows:

1.
Set \(t=0\); for \(i=1,\dots ,k\), set \(p_i^{(0)}=1/k\), \(Q_i^{(0)}=i/k\)

2.
Set \(\epsilon ^{(0)}=1\) (or any large positive value) and \(\epsilon _{\max }=10^{6}\) (or any arbitrarily small positive value, to be used for checking convergence of the solution)

3.
While \(\epsilon ^{(t)}>\epsilon _{\max }\):

(a)
Update the iteration index \(t\leftarrow t+1\)

(b)
Update the \(q_i\) according to (7): \(q_i^{(t)}=\frac{Q_{i1}^{(t1)} + Q_i^{(t1)}}{2}\)

(c)
Update the \(Q_i\) according to (6): \(Q_i^{(t)} = \log \left( \frac{1q_{i}^{(t)}}{1q_{i+1}^{(t)}}\right) / \log \left( \frac{q_{i+1}^{(t)}(1q_i^{(t)})}{q_i^{(t)}(1q_{i+1}^{(t)})} \right) \)

(d)
Derive the updated probabilities for the discrete rv: \(p_i^{(t)}=Q_i^{(t)}Q_{i1}^{(t)}\)

(e)
Calculate the maximum absolute deviation between two consecutive iterations in terms of \(p_i\) : \(\epsilon ^{(t)}=\max _{i=1}^k p_i^{(t)}p^{(t1)}_i\). Alternatively, other distances can be used to compare the probability vectors obtained in two consecutive iterations, as the Euclidean distance \(\sqrt{\sum _{i=1}^k(p_i^{(t)}p^{(t1)}_i)^2}\)

(a)

4.
Return \(q_i^{(t)}\), \(Q_i^{(t)}\) and \(p_i^{(t)}\)
The number of iterations required clearly depends on the threshold \(\epsilon _{\max }\) and on the number of points k: diminishing \(\epsilon _{\max }\) or increasing k leads to a larger number of iterations; for plausible values of \(\epsilon \) (say \(10^{8}\)) and k (say smaller than one hundred), the computation times are in the order of fractions of a second.
By numerical inspection, the AndersonDarling weighting function leads to an inverted Ushaped trend for the \(p_i\), which turn out to be symmetrical around the central value or values, according to whether k is odd or even, i.e., \(p_j=p_{kj+1}\), \(j=1,\dots ,k\); see Table 2. As for the optimal \(q_i\) values, they present a form of symmetry around the central value(s), such that a continuous symmetrical distribution remains symmetrical after discretization (see Tables 2 and 3a, referring to the discretization of a standard normal rv).
If we consider the exponential distribution with unit rate parameter and we derive the kpoint discrete approximation by minimizing the AndersonDarling distance with \(k=7\) (see Table 3b), its expected value turns to be equal to 0.961, its variance 0.743, its skewness 1.147, and its kurtosis 3.424. Although not so close to the values of the underlying continuous distribution, the discrepancies are smaller if compared to those resulting from Cramérvon Mises approximation (see the previous subsection).
We note that using a distance \(d(F,{\hat{F}})=\int _{{\mathbb {R}}}(F(x){\hat{F}}(x))^2w(F(x))\text {d}F(x)\), with any other possible positive weighting function w(x), even asymmetrical, supplies (although in general only numerically) values of \(q_i\) and \(Q_i\) that do not depend on the specific cdf F(x); the original distance can in fact be rewritten as \(\sum _{i=0}^k \int _{q_i}^{q_{i+1}}(tQ_i)^2w(t)\text {d}t\) and the firstorder condition on the \(Q_i\) becomes \(\int _{q_i}^{q_{i+1}}(tQ_i)w(t)\text {d}t=0\) and thus the \(Q_i\) that solve this condition can be expressed as a function, dependent on w(x), of the \(q_i\) and \(q_{i+1}\) only. The simplest and most apparent case is provided by the “unweighted” Cramérvon Mises distance discussed in Sect. 2.2, where such values are available through simple analytical formulas involving just the index i and the number of support points k.
2.4 Cramér distance
We now consider the Cramér distance between the cdf of a continuous rv and that of its discrete approximation:
which can be also rewritten as
This last expression highlights how, differently from the two distances previously examined, the Cramér distance may not be always computed: for some cdf F, the first and/or the last term of the sum, in fact, may be not finite: although the two corresponding integrals are computed over a limited interval, the integrand function may be not finite (when t tends to 0 or t tends to 1, the denominator \(F'(F^{1}(t))\) may tend to zero; and in the former case, for example, it may tend to zero as fast as \(t^3\) or faster and then the improper integral would not converge). We will see an example later.
Assuming that the Cramér distance can be computed, the firstorder condition on the \(q_i\) leads again to
while the firstorder condition on the \(Q_i\) leads to
which becomes
for \(i=1,\dots ,k\). This condition therefore depends on the specific distribution F of the continuous rv X.
We will now analytically develop condition (9) for several choices of F, by examining some well known parametric families. We will see that in general Eq. (9), combined with the condition on the \(q_i\), does not lead to a closedform solution of the discrete approximation, which must be solved numerically, for instance resorting to the iterative procedure sketched out in Sect. 2.3.
2.4.1 Normal
In case of a standard normal rv, with cdf \(F(x)=\varPhi (x)=\int _{\infty }^x \frac{1}{\sqrt{2\pi }}e^{t^2/2}\text {d}t\), for which the following equality holds (see, for example, Owen (1980), formula 1000):
where \(\phi (x)=\varPhi '(x)\), the firstorder condition on the \(Q_i\) becomes
which, along with the firstorder condition on the \(q_i\), allows us to determine the optimal solution numerically, following analogous steps to those sketched out in Sect. 2.3.
If instead of considering a standard normal rv, we focus on a generic normal rv with parameters \(\mu \) and \(\sigma ^2\), with cdf \(F(x)=\varPhi (\frac{x\mu }{\sigma })\), the firstorder condition on the \(Q_i\) would not change; both the left and right members of (9), in fact, are simply multiplied by \(\sigma \).
Table 4 displays the values \((x_i,p_i)\) of the best kpoint approximation of a standard normal rv, for \(k=5;6;7\). We note that, as for the discrete approximation based on the minimization of the AndersonDarling distance, the support points are symmetrical around zero and the probabilities satisfy \(p_j=p_{kj+1}\), \(j=1,\dots ,k\).
2.4.2 Exponential
In case of an exponential rv, with cdf \(F(x)=1e^{\lambda x}\), \(x>0\), and quantile function \(F^{1}(u)=\frac{\log (1u)}{\lambda }\), \(0<u<1\), we have that the first order condition on the \(Q_i\) can be written as
from which
Table 5 displays the values \((x_i,p_i)\) of the best kpoint approximation of an exponential rv with unit parameter for some values of k. It is very important to notice that empirical inspection shows that this type of approximation is characterized by a decreasing pmf, which thus resembles the decreasing trend of the pdf; on the contrary, the discrete approximation derived from the minimization of the AndersonDarling distance possesses an increasingdecreasing pmf; the discrete approximation based on the minimization of Cramérvon Mises distance possesses a constant pmf. If we consider the exponential distribution with unit rate parameter and its optimal 7point discrete approximation, the latter has an expected value equal to 0.972, a variance equal to 0.804, a skewness 1.313, and a kurtosis 4.089. Such values are closer to the analogous ones for the exponential distribution if compared to the discrete approximations obtained from the minimization of Cramérvon Mises and AndersonDarling distances.
2.4.3 Cauchy
In case of a standard Cauchy rv, with cdf \(F(x)=\frac{1}{\pi }\arctan x+\frac{1}{2}\), \(x\in {\mathbb {R}}\), we have that the firstorder condition on the \(Q_i\) provides
and then
from which
We highlight that it is possible to find a kpoint discrete approximation for the Cauchy distribution by minimizing the Cramér distance (as well as Cramérvon Mises or AndersonDarling), although such a distribution does not possess any positive integer moment and then any discretization technique based on some form of momentequalization – among others, moment matching, which requires equalization of the first \(2k1\) moments, but even the technique presented by Drezner and Zerom (2016) – would be not applicable at all.
2.4.4 Logistic
If we consider the standard logistic distribution with cdf \(F(x)=(1+e^{x})^{1}\), \(x\in {\mathbb {R}}\), then its inverse cdf is \(F^{1}(u)=\log \frac{u}{1u}\), \(0<u<1\), and the firstorder condition on the \(Q_i\) can be written as:
from which
and then
We note that the latter condition is the same as condition (6), obtained for the optimal kpoint approximation based on the minimization of AndersonDarling distance. This occurs since for the logistic distribution the equality \(f(x)[F(x)(1F(x))]^{1}=1\) holds for any \(x\in {\mathbb {R}}\), and hence the Cramér distance coincides with the AndersonDarling distance.
Similarly to what happens with the normal distribution, if we consider a nonstandard logistic distribution with location parameter \(\mu \) and scale parameter \(\sigma \), with cdf \(F(x;\mu ,\sigma )=\frac{1}{1+e^{\left( \frac{x\mu }{\sigma }\right) }}\) and quantile function \(F^{1}(u;\mu ,\sigma )=\mu +\sigma \log \frac{u}{1u}\), it is straightforward to see that the firstorder condition on \(Q_i\) would remain the same.
2.4.5 Lomax or Pareto
For the Lomax distribution with scale parameter \(\lambda >0\) and shape parameter \(\alpha >0\), the expression of the cdf is \(F(x) = 1 \left( 1+\frac{x}{\lambda } \right) ^{\alpha }\), \(x>0\); the quantile function is \(F^{1}(u)=\lambda [(1u)^{1/\alpha }1]\). The firstorder condition on \(Q_i\), if \(\alpha \ne 1\), is
from which
In particular, if \(\alpha =2\), formula (12) reduces to the following expression
If \(\alpha =1\), the firstorder condition on the \(Q_i\) can be written as
from which
If \(\alpha \le 1/2\), it can be proved that the Cramér distance cannot be computed; in fact, for any feasible value of \(\lambda \) and k, since the integrand function of the last integral in (8), \(\int _{q_k}^1 (1t)^2/f(F^{1}(t))\text {d}t\), turns out to be proportional to \((1t)^{11/\alpha }\), then, being \(\alpha \le 1/2\), the (improper) integral is not finite for any feasible value of \(q_k\). We highlight that the Lomax distribution does not possess any integer moment for \(\alpha \le 1\), so the proposed procedure is still able to produce a kpoint discrete approximation even if the distribution does not possess a finite expectation, for \(1/2<\alpha \le 1\).
We would obtain the same results if we considered the Pareto rv with cdf \(F(x)=1\left( \frac{\lambda }{x}\right) ^\alpha \), \(x>\lambda \), by simply substituting x with \(\lambda +x\).
In general, it is not possible to obtain a closedform optimal solution, unless for small values of k. For example, if \(\lambda =1\) and \(\alpha =2\), for \(k=2\), the conditions on the \(q_i\) are \(q_1=Q_1/2\) and \(q_2=(1+Q_1)/2\); the condition on \(Q_1\) is \(Q_1=1\sqrt{(1q_1)(1q_2)}\). Substituting the first two expressions for \(q_1\) and \(q_2\) into the last condition, we obtain, after a few simple passages, that \(1Q_1=\sqrt{\frac{1}{2}+\frac{Q_1^2}{4}\frac{3Q_1}{4}}\) and then a secondorder equation in \(Q_1\) whose unique feasible solution is \(Q_1=\frac{2}{3}\). Consequently, \(q_1=\frac{1}{3}\) and \(q_2=\frac{5}{6}\); the best discrete approximation consists of the points \(x_1=\sqrt{\frac{3}{2}}1\) and \(x_2=\sqrt{6}1\) with probabilities \(p_1=2/3\) and \(p_2=1/3\).
2.4.6 Power function
The cdf of a power function rv is \(F(x)=(x/b)^c\), \(0<x<b\), \(b>0\), \(c>0\). The corresponding quantile function is \(F^{1}(u)=bu^{1/c}\), \(0<u<1\).
The firstorder condition on \(Q_i\) turns into
from which
If \(c=1\) (corresponding to the case of a uniform distribution) then \(Q_i=(q_i+q_ {i+1})/2\) and we obviously obtain the same solution as in Sect. 2.2. If \(c=2\), then we have
thus the cumulative probability \(Q_i\) of the optimal solution corresponds to the arithmetic mean of the two consecutive probabilities \(q_i\) and \(q_{i+1}\) and their geometric mean.
It can be numerically shown that for values of c larger than 1 (when the pdf is increasing), the kpoint discrete approximation has an increasing pmf; on the contrary, for \(0<c<1\) (when the pdf is decreasing), the kpoint discrete approximation has a decreasing pmf.
In general, it is not possible to derive the optimal solution in a closedform, unless for small values of k. For example, if \(b=1\) and \(c=2\), for \(k=2\), the conditions on the \(q_i\) are \(q_1=Q_1/2\) and \(q_2=(1+Q_1)/2\); the condition on \(Q_1\) is \(Q_1=\frac{1}{3}(q_1+q_2+\sqrt{q_1q_2})\). Substituting the first two expressions for \(q_1\) and \(q_2\) into the last equation, we obtain, after a few simple passages, that \(4Q_11=\sqrt{Q_1(1+Q_1)}\) and then obtain a secondorder equation in \(Q_1\) whose unique feasible solution is \(Q_1=\frac{9+\sqrt{21}}{30}\). Consequently, \(q_1=\frac{9+\sqrt{21}}{60}\) and \(q_2=\frac{39+\sqrt{21}}{60}\); the best discrete approximation consists of the points \(x_1=\sqrt{\frac{9+\sqrt{21}}{60}}\) and \(x_2=\sqrt{\frac{39+\sqrt{21}}{60}}\) with probabilities \(p_1=\frac{9+\sqrt{21}}{30}\) and \(p_2=\frac{21\sqrt{21}}{30}\).
2.5 Remarks
2.5.1 The case \(k=1\)
We have always implicitly assumed so far that the number of points k by which one approximates the assigned continuous random distribution is greater than 1. It is immediate to realize that if one wants to approximate the distribution of the rv X through one value only, then, except for the cases where the selected statistical distance cannot be computed (see Sect. 2.4.5) the “optimal” approximating value \(x_1\) is the median of F, \(F^{1}(0.5)\), since the only effective condition to be satisfied (for all the three distances examined) would be \(q_1=(Q_0+Q_1)/2=1/2\), being \(Q_0=0\) and \(Q_1=1\). We note that optimal quantization (Lloyd 1982) would return the expected value \({\mathbb {E}}_F(X)\), as far as it exists and it is finite, as the optimal approximating value of F.
2.5.2 Locationscale transformation
Let us consider a continuous rv \(X\sim F_X\) and its optimal kpoint approximation derived from the minimization of the Cramérvon Mises or AndersonDarling distance, \(x_i=F_X^{1}(q_i^*)\), \(p_i^*=Q_i^*Q_{i1}^*\). We know that the optimal \(q_i^*\) and \(Q_i^*\) are only functions of k and of the selected distance, but do not depend on the specific \(F_X\). Then consider the locationscale transformation \(Y=a+bX\), \(a\in {\mathbb {R}}\), \(b>0\). Since for such a transformation we have \(F_Y(y)=F_X(\frac{ya}{b})\) for any real y and \(F^{1}_Y(u)=a+bF^{1}(u)\), \(0<u<1\), the optimal kpoint approximation of Y (derived by using the same distance as for X) is represented by \(y_i=F_Y^{1}(q_i^*)=a+bF_X^{1}(q_i^*)=a+bx_i\) and the same \(p_i^*\) as before, which means that the locationscale transformation applies also to the discretized rv.
This property does not hold in general for the discrete approximation based on the minimization of the Cramér distance: the optimal values \(q_i^*\) and \(Q_i^*\) now depend on the specific form of the cdf, and \(F_Y\) may not belong to the same family as \(F_X\), unless the cdf \(F_X\) is a locationscale family of distributions itself: in this case, the property still holds, as the condition (9) does not change if we consider a locationscale transformation of the cdf (recall the examples with the normal and logistic distributions in Sect. 2.4).
3 Example of application
Often a researcher in the statistical field is required to determine the distribution (or parameters) of some complex function of several (independent and continuous) rvs \(T=t(X_1,X_2,\dots ,X_d)\). Multidimensional integration techniques should be employed, but often, due to the complexity of t and to the high dimensionality d, they are either cumbersome to apply or not applicable at all. One can then resort to Monte Carlo simulation, i.e., simulating (independently) a huge number N of pseudorandom values from \(X_1,X_2,\dots ,X_d\) and then calculating the corresponding N values of the transformation t, which can be regarded as a random sample drawn from T. An alternative is to find an approximate solution via approximationbydiscretization and enumeration. This consists of substituting each \(X_i\) with a discrete approximation \({\hat{X}}_i\) and then determine the corresponding pmf of \({\hat{T}}=t({\hat{X}}_1,{\hat{X}}_2,\dots ,{\hat{X}}_d)\) by “enumeration”, based on the joint pmf of \(({\hat{X}}_1,{\hat{X}}_2,\dots ,{\hat{X}}_d)\) over the Cartesian product of the single supports: \({\mathcal {S}}({\hat{X}}_1)\times \dots \times {\mathcal {S}}({\hat{X}}_d)\).
In order to illustrate and compare the discretization techniques proposed in Sect. 2, in the following subsection we will consider a case where it is possible to derive the exact distribution of T analytically, which allows us to evaluate the statistical performance (in terms of degree of approximation, to be measured through some index) of each technique; in Sect. 3.2, we will analyzize a more complicated case where the exact distribution of T is not analytically computable, but a parameter of interest can be recovered numerically.
3.1 Sum of exponential random variables
Let \(X_1\sim \text {Gamma}(\alpha _1,\lambda )\), \(X_2\sim \text {Gamma}(\alpha _2,\lambda )\), ..., \(X_d\sim \text {Gamma}(\alpha _d,\lambda )\), independent to each other; it is wellknown that the sum \(S=\sum _{i=1}^d X_i\) is \(\text {Gamma}(\sum _{i=1}^d \alpha _i,\lambda )\). However, let us approximate each \(X_i\) through one of the discrete approximations we illustrated, and then reconstruct the approximated distribution of the sum, \({\hat{F}}_S\). We can then evaluate the degree of approximation by using a measure of discrepancy between \(F_S\) and \({\hat{F}}_S\); we can use the KS distance, rather than one of the statistical distances employed for the univariate approximation. Here we consider the sum of \(d=3\) exponential rvs; \(\lambda _i=1/2\), \(\alpha _i=1\), \(i=1,2,3\), for which we know that \(S\sim \text {Gamma}(3,1/2)\). Table (6), for different values of k (from 5 to 11), reports the KS distance between the exact and the approximated cdf of S, obtained according to the three univariate discrete approximations for the \(X_i\). It is easy to see how the KS distance decreases with k for all the methods: by increasing the number of approximating points for the \(X_i\), it is legitimate to expect that the discrepancy between the true and the approximate cdf decreases, independently from the measure used. Moreover, we notice that the discrete approximation based on Cramér distance overperforms the other two approximations, and its relative level of accuracy quickly improves with k: for \(k=11\), the KS distance provided by the approximation based on the minimization of Cramér distance is much less than one half of the KS distance provided by the approximation based on the minimization of the Cramérvon Mises distance. As we already remarked, the discrete approximation of the exponential distribution based on the minimization of the Cramér distance seems to resemble the shape of the continuous pdf better than the discrete approximations based on the Cramérvon Mises and AndersonDarling distances, and this also positively reflects when calculating the approximate distribution of the sum of three i.i.d. exponential rvs.
The graph in Fig. 2 displays the curve of the true cdf of the sum, and the three stepwise cdf of the three discrete approximations. It is quite apparent that the latter struggle in resembling the continuous curve in its right tail: it is visible to an unaided eye, in fact, for relatively high values of x (say, between 10 and 15) the three approximations show their maximum absolute error (which corresponds to the value of KS distance).
3.2 Reliability parameter for a hollow rectangular tube
In Sect. 1, we mentioned the problem of recovering the socalled reliability parameter \(R=P(X>Y)\) for a stressstrength model, where X and Y are the strength and stress rvs, respectively, possibly depending on several stochastic subfactors. Let us consider this example. The functional form of shear stress of a hollow rectangular tube is \(Y=\frac{M}{2t(Wt)(Ht)}\), where M is the applied torque, t is the wall thickess, W is the width, and H the height of the tube. Let X, M, t, W, and H be mutually independent rvs. Consider the following parametric setup: \(M\sim {\mathcal {N}}(\mu _M=1500,\sigma _M=150)\), \(t\sim {\mathcal {N}}(\mu _t=0.2,\sigma _t=0.005)\), \(W\sim {\mathcal {N}}(\mu _W=2,\sigma _W=0.02)\), \(H\sim {\mathcal {N}}(\mu _H=3,\sigma _H=0.03)\) (with \({\mathcal {N}}\) denoting the normal distribution) which accord with that in Roy and Dasgupta (2001). Let the standard deviation of the normal strength X be 60, and assume that the mean of X can vary from 520 to 970 units by steps of 10, thus generating an array of \(n=46\) scenarios.
Although the exact evaluation of R is unfeasible here, as suggested by a referee, its value can be obtained numerically as follows. Denoting by \(\phi _{\mu ,\sigma }\) and \(\varPhi _{\mu ,\sigma }\) the pdf and cdf, respectively, of a normal rv with mean \(\mu \) and standard deviation \(\sigma \), putting
for \((t,m,w,h)\in {\mathbb {R}}^4\), the reliability parameter R can be expressed as
and can be approximated by
where Q is the hyperrectangle defined as \(Q=I_t\times I_M\times I_W\times I_H\), with \(I_t=[\mu _t\gamma _t\sigma _t,\mu _t+\gamma _t\sigma _t]\), \(I_M=[\mu _M\gamma _M\sigma _M,\mu _M+\gamma _M\sigma _M]\), \(I_W=[\mu _W\gamma _W\sigma _W,\mu _W+\gamma _W\sigma _W]\), \(I_H=[\mu _H\gamma _H\sigma _H,\mu _H+\gamma _H\sigma _H]\), and \(\gamma _t\), \(\gamma _M\), \(\gamma _W\), \(\gamma _H\) are positive scale factors to be chosen suitably large. The computation of \(R_Q\) is easily done by using the function cuhre that is included in the package cubature (Narasimhan et al. 2022) of the statistical software environment R (R Core Team 2022).
Alternatively, one can resort either to Monte Carlo simulation or to the approximationbydiscretization approach, by using one of the methods described in the previous sections.
We will regard the value of reliability recovered by numerical evaluation through the cuhre function (with all the \(\gamma \) scaling factors set equal to 9) as the actual (true) value and thus indicate it simply with R; the value obtained through Monte Carlo simulation is denoted by \({\hat{R}}^{MC}\); the values obtained by the approximationbydiscretization approaches are denoted by \({\hat{R}}\), with a superscript identifying the specific discretization: GQ (Gaussian Quadrature), DZ (Drezner and Zerom 2016), CvM (Cramérvon Mises), C (Cramér), AD (AndersonDarling). In order to measure the degree of accuracy of each technique, one can consider the following synthetic measures, which are: Mean Deviation (MD), Mean Absolute Deviation (MAD), Root Mean Squared Deviation (RMSD), defined as follows,
where the subscript i refers to the ith scenario.
We considered a number \(N=10\) millions of pseudorandom simulations for the Monte Carlo approach, and a number \(k=5\) of approximating points for the discretization approaches.
The results, reported in Table 7, show that Monte Carlo simulation provides overall the best results in terms of MAD and RMSD; the proposed discretization methods, based on the minimization of the AndersonDarling and Cramér distances, perform better if compared to the Gaussian quadrature method in terms of MD, MAD, and RMDE; they perform worse than the method proposed in Drezner and Zerom (2016), which however requires a much considerable computational effort required by its inner numerical minimization routine and has a narrower applicability, as already remarked in Sect. 2.2. We remark that in this example we focused just on discretization techniques which are somehow comparable: according to their specific criterion, they all compute both the k support points and the corresponding k probabilities simultaneously. Other techniques, mentioned in the Introduction, such as the maximum entropy method, first assign the support points a priori and then calculate the probabilities only.
4 Software implementation
Code in the R programming environment has been developed, which implements the different routines used for finding the optimal discrete approximations. For the Cramér distance, in particular, several parametric distributions are considered. Some functions are also available for plotting discretized distributions, which makes graphical comparison to the original continuous distribution more effective.
For the Cramérvon Mises and AndersonDarling distances (and potentially, for any other distributionfree distance), the function Discr has to be used, whose arguments are the number of points k and the type of distance (CvM and AD). It returns a list containing the vectors of \(p_i\) and \(q_i\) (along with the vector of \(Q_i\)), from which one can extrapolate the support values \(x_i\) through the quantile transformation of the assigned distribution function F.
For the Cramér distance, the main function is called DiscrF; its arguments are the number of points k by which we want to approximate the continuous random distribution; the type of continuous distribution (a string identifying it), along with the value of its parameters (a vector, par). At the moment, a few families of continuous distributions can be selected, namely, those discussed in Sect. 2.4, for which the firstorder condition on the \(Q_i\) is available in a closedform: normal (norm), exponential (exp), Cauchy (cauchy), logistic (logis), Lomax (lomax), power function (power). The output of the function is a list, containing the vector of probabilities \(p_i\) and the vector of support values \(x_i\) of the discrete approximation, along with the vectors of \(q_i\) and \(Q_i\).
The companion function moments receives as input a vector of support points and a vector of corresponding probabilities possibly obtained as a result from DiscrF or Discr, and computes the expectation, variance, skewness and kurtosis of the related discrete rv. This is useful if one wants to keep under control the effects of discretization over the first (normalized) moments, since we know that the techniques we introduced do not offer any guarantee, in general, that the moments of the continuous rv are preserved.
The graphical function plotdist receives as its first argument the result of Discr or DiscrF and plots the corresponding discrete approximation (its pmf or its cdf, to be selected through the argument plot, to be set equal to “pmf” or “cdf”). This can be plotted over the unit square (by setting the argument xaxis equal to “q”: on the x axis the \(q_i\), on the y axis the \(p_i\) or the \(Q_i\), see Fig. 1); or on the usual \({\mathbb {R}}\times [0,1]\) space (by setting the argument xaxis equal to “x”: on the x axis the \(x_i\), which should be supplied if the first argument comes from Discr; on the y axis the \(p_i\) or the \(Q_i\)).
In the supplementary material, available at https://tinyurl.com/STPAD2100382, the relevant R code along with some examples is supplied.
5 Conclusion
In this work, we discussed a class of discretization techniques that calculate a kpoint discrete approximation of a continuous rv by minimizing a distance between the two cdf. For one distance in this class, the optimal discrete approximation (i.e., both the points \(x_i\) and their probabilities \(p_i\), \(i=1,\dots ,k\)) turns out to have an analytical expression for any k. For other distances, a closedform solution is not available, but the \(x_i\) and the \(p_i\) can be iteratively computed by alternately solving two sets of equations till convergence, in a similar fashion to optimal quantization. It may happen that the solution is “distributionfree”, meaning that the probability transforms \(F(x_i)\) and \(p_i\) do not depend on the particular distribution function F selected; or, on the contrary, that the solution directly depends on F: in the latter case, we derived the sets of equations to be satisfied by the solution for a wide array of continuous random distributions.
We underline that this class of discretization techniques represents a valid alternative to the other extant procedures, among them, the consolidated momentmatching technique, whose applicability is however hindered by the often unattainable hypothesis of finiteness of the first integer moments. This class is also competitive if compared to optimal quantization and modifications thereof, since for the latter the algorithm leading to the best discrete approximation is very similar, being based on the minimization of the mean squared error between the two rvs. What we cannot expect from the suggested class of discrete approximations is the preservation of the first (and second) moment; this can be seen as a shortcoming, but only if one is used to deal with or synthesize random distributions in terms of moments or if the transformation of rvs one needs to approximate is a smooth function. In the financial or insurance fields, one is more familiar with quantiles (the most popular risk measure for market risk is the Value at Risk (VaR), which is nothing else than a quantile of a loss distribution over a fixed time horizon; see, e.g., McNeil, Frey and Embrechts 2005) and then adopting a discrete approximation which minimizes a distance between cdfs may be intuitively more appropriate and convenient. Moreover, many continuous distributions do not even possess the second or the first moment (just think of the Cauchy distribution) and then momentmatching and quantization techniques would fail in providing a discrete approximation, which is instead guaranteed by the proposed class (with rare exceptions arising if the Cramér distance is considered). The practical problem presented in the third section, concerning a complex function of several random variables, illustrates how the proposed procedures can overperform the standard moment matching approach.
Future research will examine other statistical distances between cdf, in particular asymmetrical ones, and derive the corresponding optimal kpoint discrete approximation from their minimization. We will also examine possible extensions of this class of discretization techniques to the bivariate and more generally dvariate case. Although it can be naïvely judged to be straightforward, high dimensionality leads to nonnegligible theoretical and computational issues: apart from the choice of the statistical distance to be minimized (an analogous quadratic distance between joint cdfs or the energy distance, see Rizzo and Székely 2016, can be considered) the most challenging matter is represented by the selection of all feasible dvariate supports for the discrete approximation (they are not necessarily a dvariate Cartesian product of univariate supports, and this complicates the evaluation of the statistical distance) and by the possible statistical dependence between the univariate components, which unavoidably calls for the use of copulas. Similar kinds of problems arise in the computation of principal points for multivariate random distributions (Flury 1990).
Another aspect that can be investigated is related to the construction of discrete approximations with “a priori” assigned support points \(x_i\). This can be useful when one wants to construct a discrete counterpart or analog to a continuous probability distribution, and not just a discrete approximation. Typically, the discrete counterpart to a continuous distribution supported on \({\mathbb {R}}\) (or \({\mathbb {R}}^+\)) is constructed as the discrete distribution supported on \({\mathbb {Z}}\) (or \({\mathbb {N}}\)) preserving the expression of the continuous cdf at the integer support points (Chakraborty 2015). Alternatively, a discrete counterpart can be constructed by considering the same integer support as above and minimizing one of the statistical distances examined in this work with respect to the probabilities \(p_i\), which now constitute a countable set. By considering any parametric random distribution with infinite support, we can thus generate several new count distributions that can be actually regarded as its discrete counterparts and can be employed for modeling count data.
References
Barbiero A, Hitaj A (2022) Approximation of continuous rvs for the evaluation of the reliability parameter of complex stress–strength models. Ann Oper Res 315(2):1–26
Baringhaus L, Henze N (2017) Cramérvon Mises distance: probabilistic interpretation, confidence intervals, and neighbourhoodofmodel validation. J Nonparametr Stat 29(2):167–188
Chakraborty S (2015) Generating discrete analogues of continuous probability distributions: a survey of methods and constructions. J Stat Distrib Appl 2(1):1–30
Cramér H (1928) On the composition of elementary errors: first paper: mathematical deductions. Scand Actuar J 1928(1):13–74
Drezner Z, Zerom D (2016) A simple and effective discretization of a continuous random variable. Commun Stat Simul Comput 45(10):3798–3810
English JR, Sargent T, Landers TL (1996) A discretizing approach for stress/strength analysis. IEEE Trans Reliability 45(1):84–89
Farmer LE, Toda AA (2017) Discretizing nonlinear, nonGaussian Markov processes with exact conditional moments. Quant Econom 8(2):651–683
Flury BA (1990) Principal points. Biometrika 77(1):33–41
Golub GH, Welsch JH (1969) Calculation of Gauss quadrature rules. Math Comput 23(106):221–230
Gray RM, Neuhoff DL (1998) Quantization. IEEE Trans Inform Theory 44(6):2325–2383
Hanebeck UD, Klumpp V (2008) Localized cumulative distributions and a multivariate generalization of the Cramérvon Mises distance. In: 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp 33–39
Jamshidian F, Zhu Y (1996) Scenario simulation: theory and methodology. Finance Stoch 1(1):43–67
Johnson RA (1988) Stressstrength models for reliability. In: Krishnaiah PR, Rao CR (eds) Handbook of statistics, vol 7. Elsevier, Amsterdam, pp 27–54
Kennan J (2006) A note on discrete approximations of continuous distributions. University of Wisconsin, Madison
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inform Theory 28(2):129–137
Luceno A (1999) Discrete approximations to continuous univariate distributions: an alternative to simulation. J R Stat Soc Ser B 61(2):345–352
McNeil AJ, Frey R, Embrechts P (2005) Quantitative risk management: concepts, techniques and tools, 2nd edn. Princeton University Press, Princeton
Miller AC, Rice TR (1983) Discrete approximations of probability distributions. Manag Sci 29(3):352–362
Narasimhan B, Johnson SG, Hahn T, Bouvier A, Kiêu K (2022). cubature: Adaptive Multivariate Integration over Hypercubes. R package version 2.0.4.4. https://CRAN.Rproject.org/package=cubature
Owen DB (1980) A table of normal integrals. Commun Stat Simul Comput 9(4):389–419
Panjer HH (1981) Recursive evaluation of a family of compound distributions. ASTIN Bull 12(1):22–26
Pavlikov K, Uryasev S (2018) CVaR distance between univariate probability distributions and approximation problems. Ann Oper Res 262(1):67–88
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.Rproject.org/
Rizzo ML, Székely GJ (2016) Energy distance. Wiley Interdiscip Rev 8(1):27–38
Roy D, Dasgupta T (2001) A discretizing approach for evaluating reliability of complex systems under stress–strength model. IEEE Trans Reliab 50(2):145–150
Schrempf OC, Brunn D, Hanebeck UD (2006) Dirac mixture density approximation based on minimization of the weighted Cramérvon Mises distance. In: 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp 512–517
Stein EM, Shakarchi R (2011) Functional analysis: introduction to further topics in analysis. Princeton University Press, Princeton
Tanaka KI, Toda AA (2013) Discrete approximations of continuous distributions by maximum entropy. Econom Lett 118(3):445–450
Tanaka KI, Toda AA (2015) Discretizing distributions with exact moments: error estimate and convergence analysis. SIAM J Numer Anal 53(5):2158–2177
von Mises R (1931) Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen Physik. Mary S, Rosenberg
Acknowledgements
We wish to thank the Editor and two anonymous reviewers for their careful reading of our manuscript and their insightful comments and suggestions.
Funding
Open access funding provided by Università degli Studi di Milano within the CRUICARE Agreement
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Barbiero, A., Hitaj, A. Discrete approximations of continuous probability distributions obtained by minimizing Cramérvon Misestype distances. Stat Papers 64, 1669–1697 (2023). https://doi.org/10.1007/s00362022013562
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362022013562