# Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity

## Abstract

We introduce a spectrum of algorithms for measuring the similarity of high-dimensional vectors in Euclidean space. The algorithms proposed consist of a convex combination of two measures: one which contains summary data about the *shape* of a vector, and the other about the relative *magnitudes* of the coordinates. The former is based on a concept called *bin-score permutations* and a metric to quantify similarity of permutations, the latter on another novel approximation for inner-product computations based on power symmetric functions, which generalizes the Cauchy-Schwarz inequality. We present experiments on time-series data on labor statistics unemployment figures that show the effectiveness of the algorithm as a function of the parameter that combines the two parts.

## Keywords

Unemployment Rate Large Data Base Parametric Approximation Algorithm Query Vector Dimensionality Curse## References

- 1.R. Agrawal, K-I. Lin, H. S. Sawhney, and K. Shim. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases.
*The VLDB Journal*, pp. 490–501, 1995.Google Scholar - 2.B. Bollobas, G. Das, D. Gunopulos, and H. Mannila. Time-Series Similarity Problems and Well-Separated Geometric Sets.
*Proc. of 13th Annual ACM Symposium on Computational Geometry*, Nice, France, pp. 454–456, 1997.Google Scholar - 3.R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In
*4th Int. Conference on Foundations of Data Organization and Algorithms*, pp. 69–84, 1993.Google Scholar - 4.S. Berchtold, D. Keim, and H. Kriegel. The X-tree: An index structure for high-dimensional data. In
*Proceedings of the Int. Conf. on Very Large Data Bases*, pp. 28–39, Bombay, India, 1996.Google Scholar - 5.S. Berchtold, C. Bohm, D. Keim, and H. Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In
*Proc. ACM Symp. on Principles of Database Systems*, Tuscon, Arizona, 1997.Google Scholar - 6.S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Launder, and R. Harshman. Indexing by latent semantic analysis.
*Journal of the American Society for Information Science*, 41:391–407, 1990.CrossRefGoogle Scholar - 7.Ö. Eğecioğlu. How to approximate the inner-product: fast dynamic algorithms for Euclidean similarity.
*Technical Report TRCS98-37*, Department of Computer Science, University of California at Santa Barbara, December 1998.Google Scholar - 8.Ö. Eğecioğlu and H. Ferhatosmanoğlu, Dimensionality reduction and similarity computation by inner product approximations.
*Proc. 9th Int. Conf. on Information and Knowledge Management (CIKM’00)*, Nov. 2000, Washington DC.Google Scholar - 9.V. Estivill-Castro and D. Wood. A Survey of Adaptive Sorting Algorithms.
*ACM Computing Surveys*, Vol. 24, No. 4, pp. 441–476, 1992.CrossRefGoogle Scholar - 10.C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In
*Proc. ACM SIGMOD Int. Conf. on Management of Data*, pages 419–429, Minneapolis, May 1994.Google Scholar - 11.A. Guttman. R-trees: A dynamic index structure for spatial searching. In
*Proc. ACM SIGMOD Int. Conf. on Management of Data*, pp. 47–57, 1984.Google Scholar - 12.N.A.J. Hastings and J.B. Peacock.
*Statistical Distributions*, Halsted Press, New York, 1975.Google Scholar - 13.D. Hull. Improving text retrieval for the routing problem using latent semantic indexing. In
*Proc. of the 17th ACM-SIGIR Conference*, pp. 282–291, 1994.Google Scholar - 14.J. E. Humphreys.
*Reflection Groups and Coxeter Groups*, Cambridge Studies in Advanced Mathematics, No. 29, Cambridge Univ. Press, Cambridge, 1990.zbMATHGoogle Scholar - 15.D. Knuth.
*The art of computer programming*(Vol. III), Addison-Wesley, Reading, MA, 1973.Google Scholar - 16.Korn F., Sidiropoulos N., Faloutsos C., Siegel E., and Protopapas Z. Fast nearest neighbor search in medical image databases. In
*Proceedings of the Int. Conf. on Very Large Data Bases*, pages 215–226, Mumbai, India, 1996.Google Scholar - 17.C-S. Perng, H. Wang, S. R. Zhang, and D. S. Parker. Landmarks: a new model for similarity-based pattern querying in time-series databases.
*Proc. of the 16-th ICDE*, San Diego, CA, 2000.Google Scholar - 18.T. Seidl and Kriegel H.-P. Efficient user-adaptable similarity search in large multimedia databases. In
*Proceedings of the Int. Conf. on Very Large Data Bases*, pages 506–515, Athens, Greece, 1997.Google Scholar - 19.D. White and R. Jain. Similarity indexing with the SS-tree. In
*Proc. Int. Conf. Data Engineering*, pp. 516–523, 1996.Google Scholar