Abstract
The paper defines new distances between univariate probability distributions, based on the concept of the CVaR norm. We consider the problem of approximation of a discrete distribution by some other discrete distribution. The approximating distribution has a smaller number of atoms than the original one. Such problems, for instance, must be solved for generation of scenarios in stochastic programming. The quality of the approximation is evaluated with new distances suggested in this paper. We use CVaR constraints to assure that the approximating distribution has tail characteristics similar to the target distribution. The numerical algorithm is based on two main steps: (i) optimal placement of positions of atoms of the approximating distribution with fixed probabilities; (ii) optimization of probabilities with fixed positions of atoms. These two steps are iterated to find both optimal atom positions and probabilities. Numerical experiments show high efficiency of the proposed algorithms, solved with convex and linear programming.
Similar content being viewed by others
References
AORDA. (2016). Portfolio Safeguard Version 2.3. http://www.aorda.com/index.php/portfolio-safeguard/.
Artzner, P., Delbaen, F., Eber, J.-M., & Heath, D. (1999). Coherent measures of risk. Mathematical Finance, 9(3), 203–228.
Boos, D. D. (1981). Minimum distance estimators for location and goodness of fit. Journal of the American Statistical Association, 76(375), 663–670.
Case Study. (2017). Minimization of Kantorovich-Rubinstein distance between two distributions. http://www.ise.ufl.edu/uryasev/research/testproblems/advanced-statistics/minimize_kantorovich_distance/.
Darling, D. A. (1957). The Kolmogorov–Smirnov, Cramer–von Mises tests. The Annals of Mathematical Statistics, 28, 823–838.
Feller, W. (1948). On the Kolmogorov–Smirnov limit theorems for empirical distributions. The Annals of Mathematical Statistics, 19(2), 177–189.
Gibbons, J. D., & Chakraborti, S. (2011). Nonparametric Statistical Inference. In M. Lovric (Ed.), International encyclopedia of statistical science (pp. 977–979). Berlin: Springer. https://doi.org/10.1007/978-3-642-04898-2_420.
Grigoriu, M. (2009). Reduced order models for random functions. Application to stochastic problems. Applied Mathematical Modelling, 33(1), 161–175.
Hammond, R. K., & Bickel, J. E. (2013). Reexamining discrete approximations to continuous distributions. Decision Analysis, 10(1), 6–25.
Keefer, D. L. (1994). Certainty equivalents for three-point discrete-distribution approximations. Management Science, 40(6), 760–773.
Keefer, D. L., & Bodily, S. E. (1983). Three-point approximations for continuous random variables. Management Science, 29(5), 595–609.
Kennan, J. (2006). A note on discrete approximations of continuous distributions. Madison: University of Wisconsin.
Mafusalov, A., & Uryasev, S. (2016). CVaR (Superquantile) norm: Stochastic case. European Journal of Operational Research, 249(1), 200–208.
Mason, D. M., & Schuenemeyer, J. H. (1983). A modified Kolmogorov–Smirnov test sensitive to tail alternatives. The Annals of Statistics, 11(3), 933–946.
McDonald, J. N., & Weiss, N. A. (1999). A course in real analysis. Academic Press. https://books.google.dk/books?id=T-PUyB9YpqcC.
Miller, A. C. III., Rice, T. R. (1983). Discrete approximations of probability distributions. Management Science, 29(3), 352–362. https://doi.org/10.1287/mnsc.29.3.352.
Ogryczak, W. (2010). On robust solutions to multi-objective linear programs. Multiple Criteria Decision Making, 9, 197–212.
Pavlikov, K., & Uryasev, S. (2014). CVaR norm and applications in optimization. Optimization Letters, 8(7), 1999–2020.
Pflug, G. C. (2000). Some remarks on the Value-at-Risk and the Conditional Value-at-Risk. In S. P. Uryasev (Ed.), Probabilistic constrained optimization: Methodology and applications (pp. 272–281). Boston: Springer. https://doi.org/10.1007/978-1-4757-3150-7_15.
Rachev, S. T., Stoyanov, S. V., & Fabozzi, F. J. (2008). Advanced stochastic models, risk assessment, and portfolio optimization: The ideal risk, uncertainty, and performance measures (Vol. 149). Hoboken: Wiley.
Rockafellar, R. T. (1970). Convex analysis (Vol. 28). Princeton: Princeton University Press.
Rockafellar, R. T., & Royset, J. O. (2014). Random variables, monotone relations, and convex analysis. Mathematical Programming, 148(1–2), 297–331.
Rockafellar, R. T., & Uryasev, S. (2000). Optimization of Conditional Value-at-Risk. Journal of Risk, 2(3), 21–41.
Rockafellar, R. T., & Uryasev, S. (2002). Conditional Value-at-Risk for general loss distributions. Journal of Banking and Finance, 26(7), 1443–1471.
Rockafellar, R. T., & Uryasev, S. (2013). The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surveys in Operations Research and Management Science, 18(1), 33–53.
Rosenblueth, E., & Hong, H. P. (1987). Maximum entropy and discretization of probability distributions. Probabilistic Engineering Mechanics, 2(2), 58–63.
Rudin, W. (1964). Principles of mathematical analysis (Vol. 3). New York: McGraw-Hill.
Smith, J. E. (1993). Moment methods for decision analysis. Management Science, 39(3), 340–358.
Vallander, S. S. (1973). Calculation of the Wasserstein distance between probability distributions on the line. Teoriya Veroyatnostei i ee Primeneniya, 18(4), 784–786.
Villani, C. (2009). Optimal transport: Old and new (Vol. 338). Berlin: Springer.
Xpress, (2014). FICO™ Xpress Optimization Suite 7.8. http://www.fico.com.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research has been supported by the AFOSR Grant FA9550-12-1-0427, “Design and Redesign of Engineering Systems”.
Appendices
Appendix A: Risk-measure-based distance between maximal monotone relations
This section defines the notion of a risk-measure-based distance between two probability distributions in a formal way. It is convenient to formally define risk-measure-based distances based on maximal monotone relations, Rockafellar and Royset (2014).
A general cumulative distribution function can have points of discontinuity, however, if the corresponding jumps of such function are filled with vertical segments, we obtain an example of the maximal monotone relation. In a similar fashion, the quantile function of a probability distribution can generate a maximal monotone relation. We will consider further distances based on both cumulative distribution functions and quantile functions, so it is convenient to define the notion of distance in a more general fashion. First, we define the notion of a monotone relation on a set \(\mathcal A \subseteq \mathbb R\).
Definition 9
(Rockafellar and Royset 2014). Let \(\mathcal A = [a, b] \subseteq \mathbb {R}\), possibly unbounded closed interval. A set \(\Gamma = \{(x,\, p) \subset \mathcal A \times \mathbb R \}\) is called a monotone relation on \(\mathcal {A}\) if \(\, \forall \; (x_1,\, p_1),\, (x_2,\, p_2) \in \Gamma \)
Set \(\Gamma \) is called a maximal monotone relation on \(\mathcal A\) if there exists no monotone relation \(\Gamma ^{'} \ne \Gamma \) on \(\mathcal A\), such that \(\Gamma \subset \Gamma ^{'}\).
Associated with a maximal monotone relation \(\Gamma \) on \(\mathcal A\), the function \(\Gamma (x),\, x \in \mathcal A\), is defined. An arbitrary monotone relation can clearly contain a vertical segment, therefore, let \(\Gamma (x)\) be defined as
where b is the right point of the closed interval \(\mathcal {A}\). Clearly, \(\Gamma (x)\) is a nondecreasing function on \(\mathcal {A}\).
Suppose F and G are two maximal monotone relations on \(\mathcal A\) and we randomly pick a point \(\xi \in \mathcal {A}\), so that the absolute difference between F and G becomes a random variable taking the value \(|F(\xi ) - G(\xi )|\). Specifically, we suppose there is an underlying probability space \((\Omega , \mathcal {F}, \mathbb {P})\) and the random variable \(\xi \) is a \(\mathcal {F}-\)measurable function from \(\Omega \) to \(\mathcal A\), \(\xi : \Omega \rightarrow \mathcal {A}\). Let \(\mathcal A\) be equipped with a Borel \(\sigma -\)algebra \(\mathcal B\). Moreover, the auxiliary random variable \(\xi \) is supposed to have a probability distribution H, such that (i) it has a density function \(h(x), \, x \in \mathcal A\) and (ii) \(h(x) > 0\) for any \(x \in int(\mathcal A)\). The distance (discrepancy metric) between F and G will be defined using a risk measure to the random variable \(|F(\xi ) - G(\xi )|\).
A risk measure \(\mathcal R\) is a map from a space of random variables into \(\mathbb R\). In our study \(\mathcal R\) belongs to a special class of risk measures, called coherent risk measures and defined in Artzner et al. (1999). To be coherent, a risk measure has to satisfy the following axioms (the axioms are in a slightly different from Artzner et al. (1999) form, due to Rockafellar and Uryasev (2013)):
-
A1. \(\mathcal R(\xi ) = C\) for constant random variables \(\xi = C\) a.s.,
-
A2. \(\mathcal R(\xi _1) \le \mathcal R(\xi _2)\) for \(\xi _1 \le \xi _2\) a.s.,
-
A3. \(\mathcal R(\xi _1 + \xi _2) \le \mathcal R(\xi _1) + \mathcal R(\xi _2)\),
-
A4. \(\mathcal R(\lambda \xi _1) = \lambda \mathcal R(\xi _1)\), for any \(\lambda \in \mathbb (0, +\infty )\).
Definition 10
The risk-measure-based distance between maximal monotone relations on \(\mathcal {A}\), F and G, is defined through the corresponding functions \(F(\cdot )\) and \(G(\cdot )\), as follows:
The function \(d^H(F, G)\) satisfies the usual properties of a probability metric, discussed in, for instance, Rachev et al. (2008), Chapter 3.
Proposition A.1
Let F, Z, G be maximal monotone relations on \(\mathcal A = [a,b]\). If H is a distribution with a density function \(h(x) > 0,\, \forall x \in int(\mathcal A)\), the following properties hold:
-
1.
\(d^H(F, G) \ge 0\)
-
2.
\(d^H(F, G) = 0 \iff \mu \big (\{x : F(x) \ne G(x)\}\big ) = 0\) where \(\mu \) denotes Lebesgue measure
-
3.
\(d^H(F, G) = d^H(G, F)\)
-
4.
\(d^H(F, Z) \le d^H(F, G) + d^H(G, Z)\)
Proof
First, correctness of definition \(|F(\xi ) - G(\xi )|\) needs to be shown, i.e., that \(|F(\xi ) - G(\xi )|\) has to be a \(\mathcal F-\) measurable function. It is sufficient to show that \(F(\xi )\) is measurable, i.e., that the preimage of an open set in \(\mathcal A\) is in \(\mathcal {F}\). In order to see this sufficiency, note first that the sum or the difference of two measurable functions is measurable, see for instance McDonald and Weiss (1999), Chapter 3. Then, it is well-known that a preimage (with respect to a continuous function) of an open set is another open set, see McDonald and Weiss (1999), Chapter 2, therefore the absolute value function also preserves measurability. Thus, consider the measurability of the function \(F(\cdot )\). The function \(F(\cdot )\) is associated with a maximal monotone relation F, therefore it is nondecreasing and has at most countable number of points of discontinuity, cf. Rudin (1964) for example. Therefore, \(F(\cdot )\) can be approximated by a sequence of continuous nondecreasing functions \(F_n(\cdot )\) such that the sequence converges pointwise to F: \(\displaystyle \forall x \in \mathcal A,\, \lim _{n\rightarrow \infty } F_n(x) = F(x)\). By Theorem 4.5 in McDonald and Weiss (1999), the function \(F(\cdot )\) is measurable.
Properties 1, 3, 4 are trivial and direct consequences of the axioms of coherent risk measures. The proof of Property 2 follows next.
1. We start from the \(\implies \) implication. Suppose \(d^H(F,G) = 0\), in other words, \(0 = \mathcal R(|F(\xi ) - G(\xi )|) \le \mathcal R(0)\), which implies by the property A2 of coherent risk measures that \(\mathbb P(\omega : |F(\xi (\omega ) ) - G(\xi (\omega ))| \le 0) = \mathbb P(\omega : F(\xi (\omega ) ) = G(\xi (\omega ))) = 1\). Thus, we obtain the following
Let \(\mathcal {A}^{'} \subset \mathcal A\) denote the image of \(\xi \). Clearly, \(\mu \left( \mathcal A^{'}\right) = \mu (\mathcal A)\) because of the absolute continuity of the distribution of \(\xi \). Then,
Let \(E = \left\{ x \in \mathcal A^{'}: F(x) \ne G(x)\right\} \). Consider a sequence of \(\{\epsilon _k > 0\}, \, \epsilon _k \rightarrow 0,\, k \rightarrow +\infty \) and let \(E_k = \left\{ x \in \mathcal A^{'}: F(x) \ne G(x),\, h(x) \ge \epsilon _k\right\} \). Clearly, \(\displaystyle \bigcup ^{\infty }_{k=1} E_k = E\). Also,
Thus
2. The implication is more obvious. Let \(E = \big \{x \in \mathcal A: F(x) \ne G(x)\big \}\). Then, due to \(\xi \) having the density function h,
as an integral of the nonnegative function over the set of measure 0. \(\square \)
Appendix B
Formulation 1
The problem (18)–(20) for \(0< \alpha < 1\) can be reformulated as the following linear problem:
Formulation 2
The problem (18)–(20) with \(\alpha = 1\) can be reformulated as the following linear problem:
Formulation 3
The problem (18)–(20) with \(\alpha = 0\) can be reformulated as the following linear problem:
Formulation 4
The problem (29)–(33) with \(\alpha \in (0,1)\) can be reformulated as the following linear problem:
Corresponding reformulations for problem (29)–(33) with \(\alpha = 1\) and \(\alpha = 0\) can be obtained similar to Formulations 2 and 3.
Formulation 5
The problem (61)–(62) with \(\alpha \in (0,1)\) for the minimization of CVaR distance between quantile functions can be reformulated as the following linear problem:
Corresponding reformulations for problem (61)–(62) with \(\alpha = 1\) and \(\alpha = 0\) can be obtained similar to Formulations 2 and 3.
Rights and permissions
About this article
Cite this article
Pavlikov, K., Uryasev, S. CVaR distance between univariate probability distributions and approximation problems. Ann Oper Res 262, 67–88 (2018). https://doi.org/10.1007/s10479-017-2732-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-017-2732-8