Principal Component Hashing: An Accelerated Approximate Nearest Neighbor Search

  • Yusuke Matsushita
  • Toshikazu Wada
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5414)


Nearest Neighbor (NN) search is a basic algorithm for data mining and machine learning applications. However, its acceleration in high dimensional space is a difficult problem. For solving this problem, approximate NN search algorithms have been investigated. Especially, LSH is getting highlighted recently, because it has a clear relationship between relative error ratio and the computational complexity. However, the p-stable LSH computes hash values independent of the data distributions, and hence, sometimes the search fails or consumes considerably long time. For solving this problem, we propose Principal Component Hashing (PCH), which exploits the distribution of the stored data. Through experiments, we confirmed that PCH is faster than ANN and LSH at the same accuracy.


Approximate Nearest Neighbor Search High dimensional space p-stable Locality Sensitive Hashing 


  1. 1.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory IT-13(1), 21–27 (1967)CrossRefzbMATHGoogle Scholar
  2. 2.
    Zhang, Z.: Iterative Point Matching for Registration of Free-Form Curves and Surfaces. Tech. Report INRIA, No 1658 (1992)Google Scholar
  3. 3.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefzbMATHGoogle Scholar
  4. 4.
    Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching. Journal of the ACM 45, 891–923 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    ANN: Library for Approximate Nearest Neighbor Searching,
  6. 6.
    Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In: Proceedings of the 30th ACM Symposium on Theory of Computing (STOC 1998), pp. 604–613 (May 1998)Google Scholar
  7. 7.
    Datar, M., Indyk, P., Immorlica, N., Mirrokni, V.: Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In: Proceedings of the 20th Annual Symposium on Computational Geometry (SCG 2004) (June 2004)Google Scholar
  8. 8.
    Andoni, A., Indyk, P.: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In: Proc. of FOCS 2006, pp. 459–468 (2006)Google Scholar
  9. 9.
    Vidal, R.: An algorithm for finding nearest neighbor in (approximately) constant average time. Pattern Recognition Letters 4, 145–158 (1986)CrossRefGoogle Scholar
  10. 10.
    Mico, L., Oncina, J., Vidal, E.: A new version of the nearest-neighbor approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognition Letters 15, 9–17 (1994)CrossRefGoogle Scholar
  11. 11.
    Brin, S.: Near neighbor search in large metric spaces. In: Proc. of 21st Conf. on very large database (VLDB), Zurich, Switzerland, pp. 574–584 (1995)Google Scholar
  12. 12.
    Yianilos, P.Y.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proc. of the Fourth Annual ACM-SIAM Symp. on Discrete Algorithms, Austin, TX, pp. 311–321 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Yusuke Matsushita
    • 1
  • Toshikazu Wada
    • 1
  1. 1.Graduate School of Systems EngineeringWakayama UniversityWakayamaJapan

Personalised recommendations