Accelerated Large Scale Optimization by Concomitant Hashing

  • Yadong Mu
  • John Wright
  • Shih-Fu Chang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7572)


Traditional locality-sensitive hashing (LSH) techniques aim to tackle the curse of explosive data scale by guaranteeing that similar samples are projected onto proximal hash buckets. Despite the success of LSH on numerous vision tasks like image retrieval and object matching, however, its potential in large-scale optimization is only realized recently. In this paper we further advance this nascent area. We first identify two common operations known as the computational bottleneck of numerous optimization algorithms in a large-scale setting, i.e., min/max inner product. We propose a hashing scheme for accelerating min/max inner product, which exploits properties of order statistics of statistically correlated random vectors. Compared with other schemes, our algorithm exhibits improved recall at a lower computational cost. The effectiveness and efficiency of the proposed method are corroborated by theoretic analysis and several important applications. Especially, we use the proposed hashing scheme to perform approximate ℓ1 regularized least squares with dictionaries with millions of elements, a scale which is beyond the capability of currently known exact solvers. Nonetheless, it is highlighted that the focus of this paper is not on a new hashing scheme for approximate nearest neighbor problem. It exploits a new application for the hashing techniques and proposes a general framework for accelerating a large variety of optimization procedures in computer vision.


Sparse Code Mean Average Precision Random Projection Orthogonal Match Pursuit Gaussian Process Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Wang, G., Hoiem, D., Forsyth, D.: Learning image similarity from flickr groups using stochastic intersection kernel machines. In: ICCV (2009)Google Scholar
  2. 2.
    Mu, Y., Sun, J., Han, T.X., Cheong, L.-F., Yan, S.: Randomized Locality Sensitive Vocabularies for Bag-of-Features Model. In: Daniilidis, K. (ed.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 748–761. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Talwalkar, A.: Matrix Approximation for Large-scale Learning. PhD thesis, New York University (2010)Google Scholar
  4. 4.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  5. 5.
    Basri, R., Hassner, T., Zelnik-Manor, L.: Approximate nearest subspace search. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 266–278 (2011)CrossRefGoogle Scholar
  6. 6.
    Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: ICCV (2009)Google Scholar
  7. 7.
    Mu, Y., Yan, S.: Non-metric locality-sensitive hashing. In: AAAI (2010)Google Scholar
  8. 8.
    Liu, W., Wang, J., Kumar, S., Chang, S.-F.: Hashing with graphs. In: ICML (2011)Google Scholar
  9. 9.
    Jain, P., Vijayanarasimhan, S., Grauman, K.: Hashing hyperplane queries to near points with applications to large-scale active learning. In: NIPS (2010)Google Scholar
  10. 10.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: ICML (2000)Google Scholar
  11. 11.
    David, H., O’Connell, M., Yang, S.: Distribution and expected value of the rank of a concomitant of an order statistic. The Annals of Statistics 5(1), 216–223 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Eshghi, K., Rajaram, S.: Locality sensitive hash functions based on concomitant rank order statistics. In: ACM SIGKDD (2008)Google Scholar
  13. 13.
    Zhao, B., Wang, F., Zhang, C.: Efficient maximum margin clustering via cutting plane algorithm. In: SDM (2008)Google Scholar
  14. 14.
    Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227 (2009)CrossRefGoogle Scholar
  15. 15.
    Perkins, S., Lacker, K., Theiler, J.: Grafting: fast, incremental feature selection by gradient descent in function space. J. Mach. Learn. Res. 3, 1333–1356 (2003)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Zhu, J., Lao, N., Xing, E.P.: Grafting-light: fast, incremental feature selection and structure learning of markov random fields. In: SIGKDD (2010)Google Scholar
  17. 17.
    Bhattacharya, P.K.: Convergence of sample paths of normalized sums of induced order statistics. The Annals of Statistics 2, 1034–1039 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Sen, P.: A note on invariance principles for induced order statistics. Annuals of Probability 4, 474–479 (1976)zbMATHCrossRefGoogle Scholar
  19. 19.
    David, H., Galambos, J.: The asymptotic theory of concomitants of order statistics. Journal of Applied Probability 11, 762–770 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Charikar, M.: Similarity estimation techniques from rounding algorithms. In: STOC (2002)Google Scholar
  21. 21.
    Vempala, S.: The Random Projection Method. American Mathematical Society (2004)Google Scholar
  22. 22.
    Yang, J., Yu, K., Huang, T.: Efficient Highly Over-Complete Sparse Coding Using a Mixture Model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 113–126. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Chen, X., Mu, Y., Yan, S., Chua, T.-S.: Efficient large-scale image annotation by probabilistic collaborative multi-label propagation. In: ACM Multimedia (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yadong Mu
    • 1
  • John Wright
    • 1
  • Shih-Fu Chang
    • 1
  1. 1.Electrical Engineering DepartmentColumbia UniversityNew YorkUSA

Personalised recommendations