Skip to main content

Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10609))

Included in the following conference series:

Abstract

Distance-based expansion models of intrinsic dimensionality have had recent application in the analysis of complexity of similarity applications, and in the design of efficient heuristics. This theory paper extends one such model, the local intrinsic dimension (LID), to a multivariate form that can account for the contributions of different distributional components towards the intrinsic dimensionality of the entire feature set, or equivalently towards the discriminability of distance measures defined in terms of these feature combinations. Formulas are established for the effect on LID under summation, product, composition, and convolution operations on smooth functions in general, and cumulative distribution functions in particular. For some of these operations, the dimensional or discriminability characteristics of the result are also shown to depend on a form of distributional support. As an example, an analysis is provided that quantifies the impact of introduced random Gaussian noise on the intrinsic dimension of data. Finally, a theoretical relationship is established between the LID model and the classical correlation dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: ICML, pp. 1151–1157 (2007)

    Google Scholar 

  2. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS, pp. 507–514 (2005)

    Google Scholar 

  3. Clarkson, K.L.: Nearest neighbor queries in metric spaces. Discrete Comput. Geom. 22, 63–93 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33, 273–321 (2001)

    Article  Google Scholar 

  5. Pestov, V.: Indexability, concentration, and VC theory. J. Discrete Algorithms 13, 2–18 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Goyal, N., Lifshits, Y., Schütze, H.: Disorder inequality: a combinatorial approach to nearest neighbor search. In: WSDM, pp. 25–32 (2008)

    Google Scholar 

  7. Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. J. 89(1–2), 37–65 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: STOC, pp. 741–750 (2002)

    Google Scholar 

  9. Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: ICDMW, pp. 587–594 (2012)

    Google Scholar 

  10. Houle, M.E.: Dimensionality, discriminability, density & distance distributions. In: ICDMW, pp. 468–473 (2013)

    Google Scholar 

  11. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbors. In: ICML, pp. 97–104 (2006)

    Google Scholar 

  12. Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM, pp. 299–308 (2012)

    Google Scholar 

  13. Houle, M.E., Nett, M.: Rank-based similarity search: reducing the dimensional dependence. IEEE TPAMI 37(1), 136–150 (2015)

    Article  Google Scholar 

  14. Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient similarity search within user-specified projective subspaces. Inf. Syst. 59, 2–14 (2016)

    Article  Google Scholar 

  15. Casanova, G., Englmeier, E., Houle, M.E., Kröger, P., Nett, M., Zimek, A.: Dimensional testing for reverse \(k\)-nearest neighbor search. PVLDB 10(7), 769–780 (2017)

    Google Scholar 

  16. de Vries, T., Chawla, S., Houle, M.E.: Density-preserving projections for large-scale local anomaly detection. Knowl. Inf. Syst. 32(1), 25–52 (2012)

    Article  Google Scholar 

  17. Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: KDD, pp. 29–38 (2015)

    Google Scholar 

  18. Houle, M.E.: Local intrinsic dimensionality I: An extreme-value-theoretic foundation for similarity applications. In: SISAP, pp. 1–16 (2017)

    Google Scholar 

  19. Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3(5), 1163–1174 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  20. Romano, S., Chelly, O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality. In: ICPR, pp. 1207–1212 (2016)

    Google Scholar 

  21. Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, London (2001)

    Book  MATH  Google Scholar 

  22. Gomes, M.I., Canto e Castro, L., Fraga Alves, M.I., Pestana, D.: Statistics of extremes for IID data and breakthroughs in the estimation of the extreme value index: Laurens de Haan leading contributions. Extremes 11, 3–34 (2008)

    Google Scholar 

  23. Pesin, Y.B.: On rigorous mathematical definitions of correlation dimension and generalized spectrum for dimensions. J. Stat. Phys. 71(3–4), 529–547 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  24. Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D 9(1–2), 189–208 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  25. Procaccia, I., Grassberger, P., Hentschel, V.G.E.: On the characterization of chaotic motions. In: Garrido, L. (ed.) Dynamical System and Chaos. Lecture Notes in Physics, vol. 179, pp. 212–221. Springer, Heidelberg (1983)

    Chapter  Google Scholar 

  26. Theiler, J.: Lacunarity in a best estimator of fractal dimension. Phys. Lett. A 133(4–5), 195–200 (1988)

    Article  MathSciNet  Google Scholar 

  27. Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Making 5(4), 597–604 (2006)

    Article  Google Scholar 

  28. Houle, M.E., Oria, V., Wali, A.M.: Improving \(k\)-NN graph accuracy using local intrinsic dimensionality. In: SISAP, pp. 1–15 (2017)

    Google Scholar 

Download references

Acknowledgments

The author gratefully acknowledges the financial support of JSPS Kakenhi Kiban (A) Research Grant 25240036 and JSPS Kakenhi Kiban (B) Research Grant 15H02753.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael E. Houle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Houle, M.E. (2017). Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds) Similarity Search and Applications. SISAP 2017. Lecture Notes in Computer Science(), vol 10609. Springer, Cham. https://doi.org/10.1007/978-3-319-68474-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68474-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68473-4

  • Online ISBN: 978-3-319-68474-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics