Advertisement

Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity

  • Ömer Eğecioğlu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2168)

Abstract

We introduce a spectrum of algorithms for measuring the similarity of high-dimensional vectors in Euclidean space. The algorithms proposed consist of a convex combination of two measures: one which contains summary data about the shape of a vector, and the other about the relative magnitudes of the coordinates. The former is based on a concept called bin-score permutations and a metric to quantify similarity of permutations, the latter on another novel approximation for inner-product computations based on power symmetric functions, which generalizes the Cauchy-Schwarz inequality. We present experiments on time-series data on labor statistics unemployment figures that show the effectiveness of the algorithm as a function of the parameter that combines the two parts.

Keywords

Unemployment Rate Large Data Base Parametric Approximation Algorithm Query Vector Dimensionality Curse 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    R. Agrawal, K-I. Lin, H. S. Sawhney, and K. Shim. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. The VLDB Journal, pp. 490–501, 1995.Google Scholar
  2. 2.
    B. Bollobas, G. Das, D. Gunopulos, and H. Mannila. Time-Series Similarity Problems and Well-Separated Geometric Sets. Proc. of 13th Annual ACM Symposium on Computational Geometry, Nice, France, pp. 454–456, 1997.Google Scholar
  3. 3.
    R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In 4th Int. Conference on Foundations of Data Organization and Algorithms, pp. 69–84, 1993.Google Scholar
  4. 4.
    S. Berchtold, D. Keim, and H. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the Int. Conf. on Very Large Data Bases, pp. 28–39, Bombay, India, 1996.Google Scholar
  5. 5.
    S. Berchtold, C. Bohm, D. Keim, and H. Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In Proc. ACM Symp. on Principles of Database Systems, Tuscon, Arizona, 1997.Google Scholar
  6. 6.
    S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Launder, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407, 1990.CrossRefGoogle Scholar
  7. 7.
    Ö. Eğecioğlu. How to approximate the inner-product: fast dynamic algorithms for Euclidean similarity. Technical Report TRCS98-37, Department of Computer Science, University of California at Santa Barbara, December 1998.Google Scholar
  8. 8.
    Ö. Eğecioğlu and H. Ferhatosmanoğlu, Dimensionality reduction and similarity computation by inner product approximations. Proc. 9th Int. Conf. on Information and Knowledge Management (CIKM’00), Nov. 2000, Washington DC.Google Scholar
  9. 9.
    V. Estivill-Castro and D. Wood. A Survey of Adaptive Sorting Algorithms. ACM Computing Surveys, Vol. 24, No. 4, pp. 441–476, 1992.CrossRefGoogle Scholar
  10. 10.
    C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 419–429, Minneapolis, May 1994.Google Scholar
  11. 11.
    A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 47–57, 1984.Google Scholar
  12. 12.
    N.A.J. Hastings and J.B. Peacock. Statistical Distributions, Halsted Press, New York, 1975.Google Scholar
  13. 13.
    D. Hull. Improving text retrieval for the routing problem using latent semantic indexing. In Proc. of the 17th ACM-SIGIR Conference, pp. 282–291, 1994.Google Scholar
  14. 14.
    J. E. Humphreys. Reflection Groups and Coxeter Groups, Cambridge Studies in Advanced Mathematics, No. 29, Cambridge Univ. Press, Cambridge, 1990.zbMATHGoogle Scholar
  15. 15.
    D. Knuth. The art of computer programming (Vol. III), Addison-Wesley, Reading, MA, 1973.Google Scholar
  16. 16.
    Korn F., Sidiropoulos N., Faloutsos C., Siegel E., and Protopapas Z. Fast nearest neighbor search in medical image databases. In Proceedings of the Int. Conf. on Very Large Data Bases, pages 215–226, Mumbai, India, 1996.Google Scholar
  17. 17.
    C-S. Perng, H. Wang, S. R. Zhang, and D. S. Parker. Landmarks: a new model for similarity-based pattern querying in time-series databases. Proc. of the 16-th ICDE, San Diego, CA, 2000.Google Scholar
  18. 18.
    T. Seidl and Kriegel H.-P. Efficient user-adaptable similarity search in large multimedia databases. In Proceedings of the Int. Conf. on Very Large Data Bases, pages 506–515, Athens, Greece, 1997.Google Scholar
  19. 19.
    D. White and R. Jain. Similarity indexing with the SS-tree. In Proc. Int. Conf. Data Engineering, pp. 516–523, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Ömer Eğecioğlu
    • 1
  1. 1.Department of Computer ScienceUniversity of CaliforniaSanta BarbaraUSA

Personalised recommendations