Correlations between random projections and the bivariate normal

Kang, Keegan

doi:10.1007/s10618-021-00764-6

Correlations between random projections and the bivariate normal

Published: 18 May 2021

Volume 35, pages 1622–1653, (2021)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Keegan Kang ORCID: orcid.org/0000-0001-8689-2764¹

272 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Random projections is a technique primarily used in dimension reduction by mapping high dimensional data to a low dimensional space, preserving pairwise distances in expectation, such as the Euclidean distance, inner product, angular distance, and \(l_p\) distance for values of p which are even. These estimated pairwise distances between observations in the low dimensional space can be rapidly computed to be used for nearest neighbor searches, clustering, or even classification. This paper highlights how these two disparate topics have a common thread, and expand upon two computational statistical techniques in recent random projection literature to further improve the accuracy of the estimate of the inner product between vectors under random projection by making use of the properties of the respective dataset, as well as limitations of these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Projections for Large-Scale Regression

Partial Distance Correlation

Random Projections with Bayesian Priors

References

Achlioptas D (2003) Database-friendly random projections: Johnson–Lindenstrauss with binary coins. J Comput Syst Sci 66(4):671–687
Article MathSciNet Google Scholar
Ailon N, Chazelle B (2009) The fast Johnson–Lindenstrauss Transform and approximate nearest neighbors. SIAM J Comput 39(1):302–322
Article MathSciNet Google Scholar
Alkema L, Raftery A, Gerland P, Clark S, Pelletier F, Buettner T, Heilig G (2011) Probabilistic projections of the total fertility rate for all countries. Demography 48(3):815–839
Article Google Scholar
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637
Article Google Scholar
Casella G, Berger R (2001) Statistical inference. Duxbury Resource Center
Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thiry-fourth annual ACM symposium on theory of computing. ACM, pp 380–388
Dasgupta S (2000) Experiments with Random Projection. In: Proceedings of the 16th conference on uncertainty in artificial intelligence, UAI ’00, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc, pp 143–151
Durrant R, Kaban A (2013) Random projections as regularizers: learning a linear discriminant ensemble from fewer observations than dimensions. In: Asian conference on machine learning, pp 17–32
Fosdick BK, Perlman MD (2016) Variance-stabilizing and confidence-stabilizing transformations for the normal correlation coefficient with known variances. Commun Stat Simul Comput 45(6):1918–1935
Article MathSciNet Google Scholar
Fosdick BK, Raftery AE (2012) Estimating the correlation in bivariate normal data with known variances and small sample sizes. Am Stat 66(1):34–41
Article MathSciNet Google Scholar
Fu Y, Wang H, Wong A (2013) Small sample inference for the correlation in bivariate normal with known variances. Far East J Theor Stat 45(2):147
MathSciNet MATH Google Scholar
Glynn PW, Szechtman R (2002) Some new perspectives on the method of control variates. In: Monte Carlo and Quasi-Monte Carlo Methods 2000. Springer, pp 27–49
Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Article MathSciNet Google Scholar
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, STOC ’98, New York, NY, USA. ACM, pp 604–613
Jeffreys H (1961) Theory of probability, 3rd edn. Oxford
Kaban A (2015) Improved bounds on the dot product under random projection and random sign projection. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 487–496
Kang K (2017a) Random projections with Bayesian priors. In: Natural Language Processing and Chinese Computing - 6th CCF International Conference, NLPCC 2017, Dalian, China, November 8-12, 2017, Proceedings, pp 170–182
Kang K (2017b) Using the multivariate normal to improve random projections. In: Intelligent data engineering and automated learning—IDEAL 2017: 18th international conference, Guilin, China, October 30–November 1, 2017, Proceedings. Springer, Cham, pp 397–405
Kang K, Hooker G (2017a) Control variates as a variance reduction technique for random projections. In: Pattern recognition applications and methods - 6th international conference, ICPRAM 2017, Porto, Portugal, February 24-26, 2017, Revised Selected Papers, pp 1–20
Kang K, Hooker G (2017b) Random projections with control variates. In: Proceedings of the 6th international conference on pattern recognition applications and methods - volume 1: ICPRAM. INSTICC, ScitePress, pp 138–147
Lavenberg SS, Welch PD (1981) A perspective on the use of control variables to increase the efficiency of Monte Carlo simulations. Manage Sci 27(3):322–335
Article MathSciNet Google Scholar
Li P, Hastie T, Church KW (2006a) Improving random projections using marginal information. In: Lugosi G, Simon H-U (eds) COLT, volume 4005 of Lecture Notes in Computer Science. Springer, pp 635–649
Li P, Hastie TJ, Church KW (2006b) Very Sparse Random Projections. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06, New York, NY, USA. ACM, pp 287–296
Li P, Mahoney MW, She Y (2010) Approximating higher-order distances using random projections. In: Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence. AUAI Press, pp 312–321
Liberty E, Ailon N, Singer A (2008) Dense fast random projections and lean walsh transforms. In: Goel A, Jansen K, Rolim JDP, Rubinfeld R (eds) APPROX-RANDOM, volume 5171 of Lecture Notes in Computer Science. Springer, pp 512–522
Lichman M (2013) UCI machine learning repository
Madansky A (1965) On the maximum likelihood estimate of the correlation coefficient. Defense Technical Information Center
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London
MATH Google Scholar
Muirhead RJ (2005) Aspects of multivariate statistical theory. Wiley-Interscience, Hoboken
MATH Google Scholar
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9(1):141–142
Article Google Scholar
Oates CJ, Girolami M, Chopin N (2017) Control functionals for Monte Carlo integration. J R Stat Soc: Ser B (Stat Methodol) 79(3):695–718
Article MathSciNet Google Scholar
Papamarkou T, Mira A, Girolami M (2014) Zero variance differential geometric Markov chain Monte Carlo algorithms. Bayesian Anal 9(1):97–128
Article MathSciNet Google Scholar
Paul S, Boutsidis C, Magdon-Ismail M, Drineas P (2013) Random projections for support vector machines. In: Artificial intelligence and statistics, pp 498–506
Portier F, Segers J (2018) Monte carlo integration with a growing number of control variates. arXiv preprint arXiv:1801.01797
Shao J (2003) Mathematical statistics. Springer Texts in Statistics. Springer
Vempala SS (2004) The random projection method, volume 65 of DIMACS series in discrete mathematics and theoretical computer science. Providence, R.I. American Mathematical Society. Appendice, pp 101–105
Watson GS (1964) Smooth regression analysis. Sankhyā: Indian J Stat Ser A 359–372

Download references

Acknowledgements

We would like to thank the reviewers for their comments and suggestions for improvement, which has helped to enhance the quality of the paper. We also want to thank the following people: Wong Wei Pin and Sergey Kushnarev for fruitful and productive discussions. We thank Omar Ortiz for his technical assistance.

Author information

Authors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Keegan Kang

Authors

Keegan Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keegan Kang.

Additional information

Responsible editor: Fei Wang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is funded by the SUTD Faculty Fellow Grant RGFECA17003 as well as the Singapore Ministry of Education Academic Research Fund Tier 2 Grant MOE2018-T2-2-013.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, K. Correlations between random projections and the bivariate normal. Data Min Knowl Disc 35, 1622–1653 (2021). https://doi.org/10.1007/s10618-021-00764-6

Download citation

Received: 04 February 2020
Accepted: 04 May 2021
Published: 18 May 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10618-021-00764-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Correlations between random projections and the bivariate normal

Abstract

Access this article

Similar content being viewed by others

Random Projections for Large-Scale Regression

Partial Distance Correlation

Random Projections with Bayesian Priors

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Correlations between random projections and the bivariate normal

Abstract

Access this article

Similar content being viewed by others

Random Projections for Large-Scale Regression

Partial Distance Correlation

Random Projections with Bayesian Priors

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation