Abstract
We consider a class of partial mass problems in which a fraction of the mass of a probability measure is allowed to be changed (trimmed) to maximize fit to a given pattern. This includes the problem of optimal partial transportation of mass, where a part of the mass need not be transported, and also trimming procedures which are often used in statistical data analysis to discard outliers in a sample (the data with lowest agreement to a certain pattern). This results in a modified, trimmed version of the original probability which is closer to the pattern. We focus on the case of the empirical measure and analyze to what extent its optimally trimmed version is closer to the true random generator in terms of rates of convergence. We deal with probabilities on \({\mathbb{R}^k}\) and measure agreement through probability metrics. Our choices include transportation cost metrics, associated to optimal partial transportation, and the Kolmogorov distance. We show that partial transportation (as opposed to classical, complete transportation) results in a sharp decrease of costs only in low dimension. In contrast, for the Kolmogorov metric this decrease is seen in any dimension.
References
Ajtai M., Kómlos J., Tusnády G.: On optimal matchings. Combinatorica 4(4), 259–264 (1984)
del Barrio P.C., del Barrio E., Cuesta-Albertos J.A., Matrán C.: Trimmed comparison of distributions. J. Am. Stat. Assoc. 103(482), 697–704 (2008)
Barrio P.C., del Barrio E., Cuesta-Albertos J.A., Matrán C.: Uniqueness and approximate computation of optimal incomplete transportation plans. Ann. Inst. Henri Poincaré-Probabilités et Statistiques 47, 358–375 (2011)
Álvarez-Esteban, P.C., del Barrio, E., Cuesta-Albertos, J.A., Matrán, C.: Similarity of samples and trimming. Bernoulli (2011, to appear)
del Barrio E., Giné E., Matrán C.: Central limit theorems for the Wasserstein distance between the empirical and the true distributions. Ann. Probab. 27, 1009–1071 (1999)
del Barrio E., Giné E., Utzet F.: Asymptotics for L 2 functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances. Bernoulli 11(1), 131–189 (2005)
Bickel P.J., Freedman D.A.: Some asymptotic theory for the bootstrap. Ann. Stat. 9, 1196–1217 (1981)
Boucheron S., Lugosi G., Massart P.: Concentration inequalities using the entropy method. Ann. Probab. 31, 1583–1614 (2003)
Caffarelli L.A., McCann R.J.: Free boundaries in optimal transport and Monge-Ampére obstacle problems. Ann. Math. 171, 673–730 (2010)
Cohort P.: Limit theorems for random normalized distortion. Ann. Appl. Probab. 14, 118–143 (2004)
Csörgő M., Horváth L.: Weighted Approximations in Probability and Statistics. Wiley, New York (1993)
Devroye L., Lugosi G.: Combinatorial Methods in Density Estimation. Springer, New York (2001)
Dobrić V., Yukich J.E.: Asymptotics for transportation cost in high dimension. J. Theor. Probab. 8, 97–118 (1995)
Dvoretzky A., Kiefer J., Wolfowitz J.: Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Stat. 27, 642–669 (1956)
Figalli A.: The optimal partial transport problem. Arch. Ration. Mech. Anal. 195, 533–560 (2009)
Gordaliza A.: Best approximations to random variables based on trimming procedures. J. Approx. Theory 64(2), 162–180 (1991)
Graf, S., Luschgy, H.: Foundations of Quantization for Probability Distributions. Lecture Notes in Mathematics, vol. 1730. Springer, Berlin (2000)
Horowitz J., Karandikar R.L.: Mean rates of convergence of empirical measures in the Wasserstein metric. J. Comput. Appl. Math. 55(3), 261–273 (1994)
Kiefer J.: On large deviations of the Empiric D.F. of vector chance variables and a law of the iterated logarithm. Pac. J. Math. 11, 649–660 (1960)
Kiefer J., Wolfowitz J.: On the deviations of the empiric distribution function of vector chance variables. Trans. Am. Math. Soc. 87, 173–186 (1958)
Kolchin V.F., Sevast’yanov B.A., Chistyakov V.P.: Random Allocations. V.H. Winston & Sons, Washington, DC (1978)
Li G.: Multidimensional Lévy inequalities and their applications. Stat. Probab. Lett. 20, 327–335 (1994)
Massart P.: Concentration inequalities and model selection. Springer, Berlin (2007)
de la Peña V.H., Giné E.: Decoupling. From Dependence to Independence. Springer, New York (1999)
Pollard D.: Convergence of Stochastic Processes. Springer, New York (1984)
Samworth, R., Johnson, O.: Convergence of the empirical process in Mallows distance, with an application to bootstrap performance. (2004, unpubished)
Schmid P.: On the Kolmogorov and Smirnov limit theorems for discontinuous distribution functions. Ann. Math. Stat. 29, 1011–1027 (1958)
Talagrand M.: Matching random samples in many dimensions. Ann. Appl. Probab. 2, 846–856 (1992)
Talagrand M., Yukich J.E.: The integrability of the square exponential transportation cost. Ann. Appl. Probab. 3(4), 1100–1111 (1993)
Talagrand M.: The transportation cost from the uniform measure to the empirical measure in dimension ≥ 3. Ann. Probab. 22, 919–959 (1994)
van der Vaart A., Wellner J.A.: Weak Convergence and Empirical Processes. With Applications to Statistics. Springer, New York (1996)
Yukich J.E.: Limit theorems for multi-dimensional random quantizers. Electr. Commun. Probab. 13, 507–517 (2008)
Zador P.L.: Asymptotic quantization error of continuous signals and the quantization dimension. IEEE Trans. Inf. Theory 28, 139–149 (1982)
Author information
Authors and Affiliations
Corresponding author
Additional information
Research partially supported by the Spanish Ministerio de Educación y Ciencia and FEDER, Grant MTM2011-28657-C02-01.
Rights and permissions
About this article
Cite this article
del Barrio, E., Matrán, C. Rates of convergence for partial mass problems. Probab. Theory Relat. Fields 155, 521–542 (2013). https://doi.org/10.1007/s00440-011-0406-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-011-0406-z
Keywords
- Partial mass transportation problem
- Random quantization
- Optimal transportation plan
- Similarity
- Trimming
- Trimmed probability
- Kolmogorov distance
- Wasserstein distance
- Rate of convergence
- Concentration of measure
Mathematics Subject Classification (2000)
- Primary 60B10
- Secondary 05C70
- 60C05