Advertisement

Learning Binary Codes with Bagging PCA

  • Cong Leng
  • Jian Cheng
  • Ting Yuan
  • Xiao Bai
  • Hanqing Lu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8725)

Abstract

For the eigendecomposition based hashing approaches, the information caught in different dimensions is unbalanced and most of them is typically contained in the top eigenvectors. This often leads to an unexpected phenomenon that longer code does not necessarily yield better performance. This paper attempts to leverage the bootstrap sampling idea and integrate it with PCA, resulting in a new projection method called Bagging PCA, in order to learn effective binary codes. Specifically, a small fraction of the training data is randomly sampled to learn the PCA directions each time and only the top eigenvectors are kept to generate one piece of short code. This process is repeated several times and the obtained short codes are concatenated into one piece of long code. By considering each piece of short code as a “super-bit”, the whole process is closely connected with the core idea of LSH. Both theoretical and experimental analyses demonstrate the effectiveness of the proposed method.

Keywords

Bootstrap random bagging PCA binary codes Hamming ranking 

References

  1. 1.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceeding of the Annual IEEE Symposium on Foundations of Computer Science (2006)Google Scholar
  2. 2.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. In: Proceedings of the Annual ACM Symposium on Theory of Computing (1998)Google Scholar
  4. 4.
    Charikar, M.: Similarity estimation techniques from rounding algorithm. In: ACM Symposium on Theory of Computing, pp. 380–388 (2002)Google Scholar
  5. 5.
    Dean, T., Ruzon, M.A., Segal, M., Shlens, J., Vijayanarasimhan, S., Yagnik, J.: Fast, accurate detection of 100,000 object classes on a single machine. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  6. 6.
    Efron, B., Tibshirani, R.: An introduction to the bootstrap, vol. 57. CRC press (1993)Google Scholar
  7. 7.
    Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  8. 8.
    He, J., Liu, W., Chang, S.F.: Scalable similarity search with optimized kernel hashing. In: Proceedings of the ACM SIGKDD Conference (2010)Google Scholar
  9. 9.
    He, K., Wen, F., Sun, J.: K-means hashing: an affinity-preserving quantization method for learning binary compact codes. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  10. 10.
    He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems (2003)Google Scholar
  11. 11.
    Heo, J., Lee, Y., He, J., Chang, S., Yoon, S.: Spherical hashing. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  12. 12.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of ACM Symposium on Theory of Computing (1998)Google Scholar
  14. 14.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE (2010)Google Scholar
  15. 15.
    Kong, W., Li, W.: Isotropic hashing. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  16. 16.
    Leng, C., Cheng, J., Lu, H.: Random subspace for binary codes learning in large scale image retrieval. In: Proceedings of ACM SIGIR Conference, SIGIR (2014)Google Scholar
  17. 17.
    Liu, W., Wang, J., Ji, R., Jiang, Y., Chang, S.: Supervised hashing with kernels. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  18. 18.
    Liu, W., Wang, J., Kumar, S., Chang, S.: Hashing with graphs. In: Proceedings of the International Conference on Machine Learning (2011)Google Scholar
  19. 19.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  20. 20.
    Shrivastava, A., Li, P.: Fast near neighbor search in high-dimensional binary data. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 474–489. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Shrivastava, A., Li, P.: In defense of minhash over simhash. In: Proceedings of International Conference on Artificial Intelligence and Statistics (2014)Google Scholar
  22. 22.
    Skurichina, M., Duin, R.P.: Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis & Applications 5(2), 121–135 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Strecha, C., Bronstein, A.M., Bronstein, M.M., Fua, P.: Ldahash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(1), 66–78 (2012)CrossRefGoogle Scholar
  24. 24.
    Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  25. 25.
    Wang, J., Kumar, S., Chang, S.F.: Sequential projection learning for hashing with compact codes. In: Proceedings of International Conference on Machine Learning, pp. 1127–1134 (2010)Google Scholar
  26. 26.
    Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems (2008)Google Scholar
  27. 27.
    Xu, B., Bu, J., Lin, Y., Chen, C., He, X., Cai, D.: Harmonious hashing. In: Proceedings of International Joint Conference on Artificial Intelligence (2013)Google Scholar
  28. 28.
    Yu, S.X., Shi, J.: Multiclass spectral clustering. In: Proceedings of the International Conference on Computer Vision (2003)Google Scholar
  29. 29.
    Zhang, D., Wang, J., Cai, D., Lu, J.: Self-taught hashing for fast similarity search. In: Proceedings of International ACM SIGIR Conference (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Cong Leng
    • 1
  • Jian Cheng
    • 1
  • Ting Yuan
    • 1
  • Xiao Bai
    • 2
  • Hanqing Lu
    • 1
  1. 1.National Laboratory of Pattern RecognitionInstitute of Automation, Chinese Academy of SciencesBeijingChina
  2. 2.School of Computer Science and EngineeringBeihang UniversityBeijingChina

Personalised recommendations