Abstract
Much attention has been given in the research literature to the study of distance-preserving random projections of discrete data sets, the limitations of which are established by the classical Johnson-Lindenstrauss existence lemma. In this theoretical paper, we analyze the effect of random projection on a natural measure of the local intrinsic dimensionality (LID) of smooth distance distributions in the Euclidean setting. The main contribution of the paper consists of upper and lower bounds on the LID in the vicinity of a reference point after random projection. The bounds depend only on the LID in the original data domain and the target dimension of the projection; as the difference between the target and intrinsic dimensionalities grows, these bounds converge to the LID of the original domain. The paper concludes with a brief discussion of the implications for applications in databases, machine learning and data mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Sys. Sci. 66(4), 671–687 (2003)
Ailon, N., Chazelle, B.: The fast Johnson-Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39(1), 302–322 (2009)
Amsaleg, L.: Extreme-value-theoretic estimation of local intrinsic dimensionality. Data Min. Knowl. Disc. 32(6), 1768–1805 (2018). https://doi.org/10.1007/s10618-018-0578-6
Baraniuk, R.G., Wakin, M.B.: Random projections of smooth manifolds. Found. Comput. Math. 9(1), 51–77 (2009)
Bartal, Y., Recht, B., Schulman, L.J.: Dimensionality reduction: beyond the Johnson-Lindenstrauss bound. In: Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, pp. 868–887 (2011)
Bruske, J., Sommer, G.: Intrinsic dimensionality estimation with optimally topology preserving maps. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 572–575 (1998)
Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans. Pattern Anal. Mach. Intell. 24(10), 1404–1407 (2002)
Casanova, G., Englmeier, E., Houle, M.E., Kröger, P., Nett, M., Zimek, A.: Dimensional testing for reverse \(k\)-nearest neighbor search. PVLDB 10(7), 769–780 (2017)
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics, Springer, London (2001). https://doi.org/10.1007/978-1-4471-3675-0
Dasgupta, A., Kumar, R., Sarlos, T.: A sparse Johnson-Lindenstrauss transform. In: STOC, pp. 341–350 (2010)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)
de Vries, T., Chawla, S., Houle, M.E.: Density-preserving projections for large-scale local anomaly detection. Knowl. Inf. Syst. 32(1), 25–52 (2012)
Dubhashi, D.P., Panconesi, A.: Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, Cambridge (2009)
Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley (2003)
Faloutsos, C., Kamel, I.: Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension. In: PODS 1994, pp. 4–13 (1994)
Frankl, P., Maehara, H.: The Johnson-Lindenstrauss lemma and the sphericity of some graphs. J. Comb. Theor. Ser. B 44(3), 355–362 (1988)
Gomes, M.I., Canto e Castro, L., Fraga Alves, M.I., Pestana, D., Laurens de Haan leading contributions: Statistics of extremes for IID data and breakthroughs in the estimation of the extreme value index. Extremes 11, 3–34 (2008)
Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Phys. D 9(1–2), 189–208 (1983)
Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: FOCS 2003, pp. 534–543. IEEE Computer Society (2003)
Houle, M.E.: Dimensionality, discriminability, density & distance distributions. In: Proceedings of the ICDMW 2013, pp. 468–473 (2013)
Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: International Conference on Similarity Search and Applications, pp. 64–79 (2017)
Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: International Conference on Similarity Search and Applications, pp. 80–95 (2017)
Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: Proceedings of the ICDMW 2012, pp. 587–594 (2012)
Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM 2012, pp. 299–308 (2012)
Houle, M.E., Ma, X., Oria, V.: Effective and efficient algorithms for flexible aggregate similarity search in high dimensional spaces. IEEE TKDE 27(12), 3258–3273 (2015)
Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient algorithms for similarity search in user-specified projective subspaces. Inf. Syst. 59, 2–14 (2016)
Houle, M.E., Nett, M.: Rank-based similarity search: Reducing the dimensional dependence. IEEE TPAMI 37(1), 136–150 (2015)
Huisman, R., Koedijk, K.G., Kool, C.J.M., Palm, F.: Tail-index estimates in small samples. J. Bus. Econ. Stat. 19(2), 208–216 (2001)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: AMS Conference in Modern Analysis and Probability, pp. 189–206 (1982)
Kane, D.M., Nelson, J.: Sparser Johnson-Lindenstrauss transforms. J. ACM 61(1), 4:1-4:23 (2014)
Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: STOC 2002, pp. 741–750 (2002)
Larsen, K.G., Nelson, J.: The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction. arXiv.org, cs.IT (2014)
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems 17 (NIPS 2004) (2004)
Mattila, P.: Hausdorff dimension, orthogonal projections and intersections with planes. Ann. Acad. Sci. Fenn. A Math. 1, 227–244 (1975)
Navarro, G., Paredes, R., Reyes, N., Bustos, C.: An empirical evaluation of intrinsic dimension estimators. Inf. Syst. 64, 206–218 (2017)
Romano, S., Chelly,O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality. In: ICPR, pp. 1207–1212 (2016)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. J. 89(1–2), 37–65 (2012)
Schölkopf, B., Smola, A.J., Müller, K.-R.: Nonlinear component analysis as a Kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
Tenenbaum, J., Silva, V.D., Langford, J.: A global geometric framework for non linear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Venna, J., Kaski, S.: Local multidimensional scaling. Neural Netw. 19(6–7), 889–899 (2006)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.J.: Distance metric learning with application to clustering with side-information. In: NIPS 2002, pp. 505–512 (2002)
Acknowledgments
Michael  E. Houle acknowledges the financial support of JSPS Kakenhi Kiban (B) Research Grant 18H03296. Ken-ichi Kawarabayashi is supported by JSPS Kakenhi Research Grant JP18H05291.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Houle, M.E., Kawarabayashi, Ki. (2021). The Effect of Random Projection on Local Intrinsic Dimensionality. In: Reyes, N., et al. Similarity Search and Applications. SISAP 2021. Lecture Notes in Computer Science(), vol 13058. Springer, Cham. https://doi.org/10.1007/978-3-030-89657-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-89657-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89656-0
Online ISBN: 978-3-030-89657-7
eBook Packages: Computer ScienceComputer Science (R0)