Abstract
Empirical optimal transport (OT) plans and distances provide effective tools to compare and statistically match probability measures defined on a given ground space. Fundamental to this are distributional limit laws, and we derive a central limit theorem for the empirical OT distance of circular data. Our limit results require only mild assumptions in general and include prominent examples such as the von Mises or wrapped Cauchy family. Most notably, no assumptions are required when data are sampled from the probability measure to be compared with, which is in strict contrast to the real line. A bootstrap principle follows immediately as our proof relies on Hadamard differentiability of the OT functional. This paves the way for a variety of statistical inference tasks and is exemplified for asymptotic OT-based goodness of fit testing for circular distributions. We discuss numerical implementation, consistency and investigate its statistical power. For testing uniformity, it turns out that this approach performs particularly well for unimodal alternatives and is almost as powerful as Rayleigh’s test, the most powerful invariant test for von Mises alternatives. For regimes with many modes, the circular OT test is less powerful which is explained by the shape of the corresponding transport plan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A function \(g:\mathbb {R}\rightarrow \mathbb {R}\) is called coercive if \(g(x)\rightarrow \infty \) as \(|x| \rightarrow \infty \).
References
Agostinelli, C., Lund, U.: R package circular: Circular Statistics (version 0.4-93). CA: Department of Environmental Sciences, Informatics and Statistics, Ca’ Foscari University, Venice, Italy. UL: Department of Statistics, California Polytechnic State University, San Luis Obispo, California, USA (2017). URL https://r-forge.r-project.org/projects/circular/
Altschuler, J., Niles-Weed, J., Rigollet, P.: Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In: Von Luxburg, U., Guyon, I.M., et al. (eds.) Advances in Neural Information Processing Systems, pp. 1964–1974 (2017)
Anevski, D., Fougères, A.L.: Limit properties of the monotone rearrangement for density and regression function estimation. Bernoulli 25(1), 549–583 (2019)
Bak, J., Newman, D.J.: Complex Analysis (3rd edn). Undergraduate Texts in Mathematics. Springer, Berlin
Batschelet, E.: Circular Statistics in Biology. Academic Press, New York (1981)
Bergin, T.M.: A comparison of goodness-of-fit tests for analysis of nest orientation in western kingbirds (Tyrannus verticalis). The Condor 93(1), 164–171 (1991)
Berthet, P., Fort, J.C.: Exact rate of convergence of the expected \(\cal{W}_2 \) distance between the empirical and true gaussian distribution. Electronic J. Prob. 25, 1–16 (2020)
Billingsley, P.: Convergence of Probability Measures. Wiley Series in Probability and Statistics. Wiley (1999)
Bivens, I.C., Klein, B.G.: The median value of a continuous function. Math. Mag. 88(1), 39–51 (2015)
Bobkov, S., Ledoux, M.: One-dimensional empirical measures, order statistics, and Kantorovich transport distances. Memoirs of the American Mathematical Society. American Mathematical Society (2019)
Chernozhukov, V., Fernández-Val, I., Galichon, A.: Quantile and probability curves without crossing. Econometrica 78(3), 1093–1125 (2010)
Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., et al. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 2292–2300 (2013)
Del Barrio, E., Cuesta-Albertos, J.A., Matrán, C.: Contributions of empirical and quantile processes to the asymptotic theory of goodness-of-fit tests. Test 9(1), 1–96 (2000)
Del Barrio, E., Cuesta-Albertos, J.A., Matrán, C., Rodríguez-Rodríguez, J.M.: Tests of goodness of fit based on the \(L_2\)-Wasserstein distance. Ann. Stat. 27(4), 1230–1239 (1999)
Del Barrio, E., Giné, E., Matrán, C.: Central limit theorems for the Wasserstein distance between the empirical and the true distributions. Ann. Prob. 27(2), 1009–1071 (1999)
Del Barrio, E., Giné, E., Utzet, F.: Asymptotics for \(L_2\) functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances. Bernoulli 11(1), 131–189 (2005)
Del Barrio, E., Loubes, J.M.: Central limit theorems for empirical transportation cost in general dimension. Ann. Prob. 47(2), 926–951 (2019)
Delon, J., Salomon, J., Sobolevski, A.: Fast transport optimization for Monge costs on the circle. SIAM J. Appl. Math. 70(7), 2239–2258 (2010)
Dümbgen, L.: On nondifferentiable functions and the bootstrap. Prob. Theor. Related Fields 95(1), 125–140 (1993)
Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1367–1376 (2018)
Evans, S.N., Matsen, F.A.: The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples. J. Royal Stat. Soc.: Ser. B (Stat. Methodol.) 74(3), 569–592 (2012)
Fisher, N.I.: Statistical Analysis of Circular Data. Cambridge University Press (1995)
Freitag, G., Czado, C., Munk, A.: A nonparametric test for similarity of marginals—with applications to the assessment of population bioequivalence. J. Stat. Plann. Inference 137(3), 697–711 (2007)
García-Portugués, E., Verdebout, T.: An overview of uniformity tests on the hypersphere. arXiv preprint 1804.00286 (2018)
Hundrieser, S., Eltzner, B., Huckemann, S.F.: Finite sample smeariness of Fréchet means and application to climate. arXiv preprint 2005.02321 (2020)
Jammalamadaka, S., Sengupta, A.: Topics in Circular Statistics. Series on Multivariate Analysis. World Scientific (2001)
Kantorovich, L.: On the translocation of masses. Doklady Akademii Nauk URSS 37, 7–8 (1942)
Kim, S., SenGupta, A.: A three-parameter generalized von Mises distribution. Stat. Papers 54(3), 685–693 (2013)
Klatt, M., Tameling, C., Munk, A.: Empirical regularized optimal transport: statistical theory and applications. SIAM J. Math. Data Sci. 2(2), 419–443 (2020)
Kolouri, S., Park, S.R., Thorpe, M., Slepcev, D., Rohde, G.K.: Optimal mass transport: signal processing and machine-learning applications. IEEE Signal Process. Mag. 34(4), 43–59 (2017)
Kuiper, N.H.: Tests concerning random points on a circle. Koninklijke Nederlandse Akademie van Wetenschappen Proc.: Ser. A 63(1), 38–47 (1960)
Landler, L., Ruxton, G.D., Malkemper, E.P.: Circular data in biology: advice for effectively implementing statistical procedures. Behav. Ecol. Sociobiol. 72(8), 128 (2018)
Landler, L., Ruxton, G.D., Malkemper, E.P.: The Hermans-Rasson test as a powerful alternative to the Rayleigh test for circular statistics in biology. BMC Ecol. 19(1), 1–8 (2019)
Mardia, K.V., Jupp, P.E.: Directional Statistics. Wiley, Chichester, New York (2000)
Monge, G.: Mémoire sur la théorie des déblais et des remblais. In: Histoire de l’Académie Royale des Sciences de Paris, pp. 666–704 (1781)
Munk, A., Czado, C.: Nonparametric validation of similar distributions and assessment of goodness of fit. J. Royal Stat. Soc.: Ser. B (Stat. Methodol.) 60(1), 223–241 (1998)
Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Ann. Rev. Stat. Appl. 6, 405–431 (2019)
Pewsey, A., García-Portugués, E.: Recent advances in directional statistics. Test 30, 1–58 (2021)
Peyré, G., Cuturi, M.: Computational optimal transport: with applications to data science. Foundations Trends Mach. Learn. 11(5–6), 355–607 (2019)
Pycke, J.R.: Some tests for uniformity of circular distributions powerful against multimodal alternatives. Canadian J. Stat. 38(1), 80–96 (2010)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2020). https://www.R-project.org
Rabin, J., Delon, J., Gousseau, Y.: Circular earth mover’s distance for the comparison of local features. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4 (2008)
Rachev, S., Rüschendorf, L.: Mass transportation problems: Volume I: Theory. Probability and Its Applications. Springer, Berlin (1998)
Rachev, S., Rüschendorf, L.: Mass transportation problems: Volume II: Applications. In: Probability and Its Applications. Springer, Berlin (1998)
Rao, J.: Some Contributions to the Analysis of Circular Data. Ph.D. thesis, Indian Statistical Institute, Kolkata (1969)
Römisch, W.: Delta method, infinite dimensional. In: Kotz, S., Balakrishnan, N., et al. (eds.) Encyclopedia of Statistical Sciences. Wiley (2004)
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Santambrogio, F.: Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. In: Progress in Nonlinear Differential Equations and Their Applications. Springer International Publishing, Berlin (2015)
Schiebinger, G., Shu, J., Tabaka, M., Cleary, B., Subramanian, V., Solomon, A., Gould, J., Liu, S., Lin, S., Berube, P., Lee, L., Chen, J., Brumbaugh, J., Rigollet, P., Hochedlinger, K., Jaenisch, R., Regev, A., Lander, E.S.: Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176(4), 928-943.e22 (2019)
Schmitzer, B.: A sparse multiscale algorithm for dense optimal transport. J. Math. Imaging Vis. 56(2), 238–259 (2016)
Schrieber, J., Schuhmacher, D., Gottschlich, C.: DOTmark—a benchmark for discrete optimal transport. IEEE Access 5, 271–282 (2017)
SenGupta, A., Ugwuowo, F.I.: Asymmetric circular-linear multivariate regression models with applications to environmental data. Environ. Ecol. Stat. (13), 299–309 (2009)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press (1986)
Sommerfeld, M., Munk, A.: Inference for empirical Wasserstein distances on finite spaces. J. Royal Stat. Soc.: Ser. B (Stat. Methodol.) 80(1), 219–238 (2018)
Stephens, M.A.: A goodness-of-fit statistic for the circle, with some comparisons. Biometrika 56(1), 161–168 (1969)
Strutt, J.W.: On the resultant of a large number of vibrations of the same pitch and of arbitrary phase. London. Edinburgh Dublin Philos. Mag. J. Sci. 10(60), 73–78 (1880)
Tameling, C., Sommerfeld, M., Munk, A.: Empirical optimal transport on countable metric spaces: distributional limits and statistical applications. Ann. Appl. Prob. 29(5), 2744–2781 (2019)
Van der Vaart, A.W.: Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press (1998)
Van der Vaart, A.W., Wellner, J.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, Berlin (1996)
Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics. American Mathematical Society (2003)
Villani, C.: Optimal Transport: Old and New. A Series of Comprehensive Studies in Mathematics. Springer, Berlin (2008)
Watson, G.S.: Goodness-of-fit tests on a circle. Biometrika 48(1 and 2), 109–114 (1961)
Watson, G.S., Williams, E.J.: On the construction of significance tests on the circle and the sphere. Biometrika 43(3/4), 344–352 (1956)
Weitkamp, C.A., Proksch, K., Tameling, C., Munk, A.: Gromov-Wasserstein Distance based object matching: Asymptotic Inference. arXiv preprint 2006.12287 (2020)
Werman, M., Peleg, S., Rosenfeld, A.: A distance metric for multidimensional histograms. Comput. Vis. Graph. Image Process. 32(3), 328–336 (1985)
Zemel, Y., Panaretos, V.M.: Fréchet means and procrustes analysis in Wasserstein space. Bernoulli 25(2), 932–976 (2019)
Acknowledgements
The authors gratefully acknowledge support for the DFG Research Training Group 2088 Discovering Structure in Complex Data: Statistics Meets Optimization and Inverse Problems and the DFG Cluster of Excellence 2067 Multiscale Bioimaging: From Molecular Machines to Networks of Excitable Cells. The authors would also like to thank the editor and the anonymous referees for their comments that improved the quality of this paper. In particular, credit is given to one referee for the suggestion to consider rotationally invariant distributions for the spherical OT distance.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Hundrieser, S., Klatt, M., Munk, A. (2022). The Statistics of Circular Optimal Transport. In: SenGupta, A., Arnold, B.C. (eds) Directional Statistics for Innovative Applications. Forum for Interdisciplinary Mathematics. Springer, Singapore. https://doi.org/10.1007/978-981-19-1044-9_4
Download citation
DOI: https://doi.org/10.1007/978-981-19-1044-9_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1043-2
Online ISBN: 978-981-19-1044-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)