Advertisement

Optimal Projections in the Distance-Based Statistical Methods

  • Chuanping Yu
  • Xiaoming HuoEmail author
Chapter
  • 39 Downloads
Part of the Emerging Topics in Statistics and Biostatistics book series (ETSB)

Abstract

This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Székely (Technometrics, 58(4):435–447, 2016) for the univariate variables, the computational complexity can be improved from O(m2) to O(nm ⋅log(m)), where n is the number of projection directions and m is the sample size. When \(n \ll m/\log (m)\), computational savings can be achieved. The key challenge is how to find the optimal pre-specified projection directions. This can be obtained by minimizing the worse-case difference between the true distance and the approximated distance, which can be formulated as a nonconvex optimization problem in a general setting. In this paper, we show that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either 2 or the number of projection directions. In the generic settings, we propose an algorithm to find approximate solutions. Simulations confirm the advantage of our method, in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated.

Keywords

Distance-based statistical methods Projection-based methods Quasi-Monte Carlo Statistical computing Random projections 

Notes

Acknowledgements

This project is partially supported by the Transdisciplinary Research Institute for Advancing Data Science (TRIAD), http://triad.gatech.edu, which is a part of the TRIPODS program at NSF and locates at Georgia Tech, enabled by the NSF grant CCF-1740776. Both authors are also partially supported by the NSF grant DMS-1613152.

References

  1. 1.
    Asmussen, S., & Glynn, P. W. (2007). Stochastic simulation: Algorithms and analysis (Vol. 57). Berlin/Heidelberg: Springer Science & Business Media.Google Scholar
  2. 2.
    Bradley, S., Hax, A., & Magnanti, T. (1977). Applied mathematical programming. Boston: Addison-Wesley.Google Scholar
  3. 3.
    Brauchart, J., Saff, E., Sloan, I., & Womersley, R. (2014). QMC designs: Optimal order quasi Monte Carlo integration schemes on the sphere. Mathematics of Computation, 83(290), 2821–2851.MathSciNetCrossRefGoogle Scholar
  4. 4.
    Chaudhuri, A., & Hu, W. (2018). A fast algorithm for computing distance correlation. Preprint. arXiv:1810.11332.Google Scholar
  5. 5.
    Hesse, K., Sloan, I. H., & Womersley, R. S. (2010). Numerical integration on the sphere. In Handbook of geomathematics (pp. 1185–1219). New York: Springer.CrossRefGoogle Scholar
  6. 6.
    Hoeffding, W. (1992). A class of statistics with asymptotically normal distribution. In Breakthroughs in statistics (pp. 308–334). New York: Springer. .CrossRefGoogle Scholar
  7. 7.
    Huang, C., & Huo, X. (2017). An efficient and distribution-free two-sample test based on energy statistics and random projections. Preprint. arXiv:1707.04602.Google Scholar
  8. 8.
    Huo, X., & Székely, G. J. (2016). Fast computing for distance covariance. Technometrics, 58(4), 435–447.MathSciNetCrossRefGoogle Scholar
  9. 9.
    Korolyuk, V. S., & Borovskich, Y. V. (2013). Theory of U-statistics (Vol. 273). Berlin/Heidelberg: Springer Science & Business Media.Google Scholar
  10. 10.
    Lyons, R., et al. (2013). Distance covariance in metric spaces. The Annals of Probability, 41(5), 3284–3305.MathSciNetCrossRefGoogle Scholar
  11. 11.
    Mises, R. V. (1947). On the asymptotic distribution of differentiable statistical functions. The Annals of Mathematical Statistics, 18(3), 309–348 (1947)Google Scholar
  12. 12.
    Morokoff, W. J., & Caflisch, R. E. (1995). Quasi-monte carlo integration. Journal of Computational Physics, 122(2), 218–230.MathSciNetCrossRefGoogle Scholar
  13. 13.
    Nesterov, Y. (2012). Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2), 341–362 (2012).Google Scholar
  14. 14.
    Niederreiter, H. (1992). Random number generation and quasi-Monte Carlo methods (Vol. 63). Philadelphia: SIAM.CrossRefGoogle Scholar
  15. 15.
    Sloan, I. H., & Womersley, R. S. (2004). Extremal systems of points and numerical integration on the sphere. Advances in Computational Mathematics, 21(1–2), 107–125.MathSciNetCrossRefGoogle Scholar
  16. 16.
    Székely, G. J., & Rizzo, M. L. (2004). Testing for equal distributions in high dimension. InterStat, 5, 1–6.Google Scholar
  17. 17.
    Székely, G. J., & Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3, 1236–1265.MathSciNetCrossRefGoogle Scholar
  18. 18.
    Székely, G. J., Rizzo, M. L., Bakirov, N. K., et al. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794.MathSciNetCrossRefGoogle Scholar
  19. 19.
    Wright, S. J. (2015). Coordinate descent algorithms. Mathematical Programming, 151(1), 3–34.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Industrial and Systems EngineeringGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations