Advertisement

An Empirical Evaluation of Intrinsic Dimension Estimators

  • Cristian Bustos
  • Gonzalo Navarro
  • Nora Reyes
  • Rodrigo ParedesEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9371)

Abstract

We study the practical behavior of different algorithms that aim to estimate the intrinsic dimension (ID) in metric spaces. Some of these algorithms were specifically developed to evaluate the complexity of searching in metric spaces, based on different theories related to the distribution of distances between objects on such spaces. Others were originally designed for vector spaces only, and have been extended to general metric spaces. To empirically evaluate the fitness of various ID estimations with the actual difficulty of searching in metric spaces, we compare one representative of each of the broadest families of metric indices: those based on pivots and those based on compact partitions. Our preliminary conclusions are that Fastmap and the measure called Intrinsic Dimensionality fit best their purpose.

Keywords

Intrinsic Dimensionality Target Space Empirical Evaluation Range Query Search Cost 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brin, S.: Near neighbor search in large metric spaces. In: Proc. 21st Conf. on Very Large Databases (VLDB 1995), pp. 574–584 (1995)Google Scholar
  2. 2.
    Camastra, F.: Data dimensionality estimation methods: a survey. Pattern Recognition 36(12), 2945–2954 (2003)CrossRefzbMATHGoogle Scholar
  3. 3.
    Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE TPAMI 24(10), 1404–1407 (2002)CrossRefGoogle Scholar
  4. 4.
    Chávez, E., Marroquín, J.: Proximity queries in metric spaces. In: Proc. 4th South American Workshop on String Processing (WSP 1997), pp. 21–36. Carleton University Press (1997)Google Scholar
  5. 5.
    Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)CrossRefGoogle Scholar
  6. 6.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)CrossRefGoogle Scholar
  7. 7.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: Proc. 23rd VLDB, pp. 426–435 (1997)Google Scholar
  8. 8.
    Ciaccia, P., Patella, M., Zezula, P.: A cost model for similarity queries in metric spaces. In: PODS, pp. 59–68 (1998)Google Scholar
  9. 9.
    Eckmann, J.P., Ruelle, D.: Ergodic theory of chaos and strange attractors. Rev. Mod. Phys. 57, 617 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Faloutsos, C., Lin, K.-I.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proc. 1995 ACM SIGMOD Intl. Conf. on Management of Data, pp. 163–174. ACM Press (1995)Google Scholar
  11. 11.
    Figueroa, K., Navarro, G., Chávez, E.: Metric spaces library (2007). http://www.sisap.org/Metric_Space_Library.html
  12. 12.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional Inc, San Diego (1990)zbMATHGoogle Scholar
  13. 13.
    Jagadish, H.V.: A retrieval technique for similar shapes. In: SIGMOD Conference, pp. 208–217. ACM Press (1991)Google Scholar
  14. 14.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc, Upper Saddle River (1988)zbMATHGoogle Scholar
  15. 15.
    Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer Series in Statistics. Springer (2002)Google Scholar
  16. 16.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Mandelbrot, B.: Fractals: Form, Chance and Dimension. W. H. Freeman, San Francisco (1977)zbMATHGoogle Scholar
  18. 18.
    Ott, E.: Chaos in Dynamical Systems. Cambridge University Press, Cambridge (1993)zbMATHGoogle Scholar
  19. 19.
    Pestov, V.: Intrinsic dimension of a dataset: what properties does one expect? In: Intl. Joint Conf. on Neural Networks (IJCNN), pp. 2959–2964 (2007)Google Scholar
  20. 20.
    Pestov, V.: An axiomatic approach to intrinsic dimension of a dataset. Neural Networks 21(23), 204–213 (2008). Advances in Neural Networks Research: 2007 International Joint Conference on Neural Networks (IJCNN)Google Scholar
  21. 21.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco (2005)Google Scholar
  22. 22.
    Traina Jr., C., Traina, A.J.M., Faloutsos, C.: Distance exponent: a new concept for selectivity estimation in metric trees. Research Paper 99–110, School of Computer Science, Carnegie Mellon University, 03/1999 (1999)Google Scholar
  23. 23.
    Yianilos, P.: Excluded middle vantage point forests for nearest neighbor search. In: DIMACS Implementation Challenge, ALENEX 1999, Baltimore, MD (1999)Google Scholar
  24. 24.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Cristian Bustos
    • 1
  • Gonzalo Navarro
    • 2
  • Nora Reyes
    • 1
  • Rodrigo Paredes
    • 3
    Email author
  1. 1.Departamento de InformáticaUniversidad Nacional de San LuisSan LuisArgentina
  2. 2.Department of Computer Science, Center of Biotechnology and BioengineeringUniversity of ChileSantiagoChile
  3. 3.Departamento de Ciencias de la ComputaciónUniversidad de TalcaCuricóChile

Personalised recommendations