Abstract
Image hash codes are produced by binarizing the embeddings of convolutional neural networks (CNN) trained for either classification or retrieval. While proxy embeddings achieve good performance on both tasks, they are non-trivial to binarize, due to a rotational ambiguity that encourages non-binary embeddings. The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity, and a procedure to design proxy sets that are nearly optimal for both classification and hashing is introduced. The resulting hash-consistent large margin (HCLM) proxies are shown to encourage saturation of hashing units, thus guaranteeing a small binarization error, while producing highly discriminative hash-codes. A semantic extension (sHCLM), aimed to improve hashing performance in a transfer scenario, is also proposed. Extensive experiments show that sHCLM embeddings achieve significant improvements over state-of-the-art hashing procedures on several small and large datasets, both within and beyond the set of training classes.
Similar content being viewed by others
References
Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2013). Label-embedding for attribute-based classification. In IEEE conference on computer vision and pattern recognition (CVPR).
Andoni, A., & Indyk, P. (2006). Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In IEEE symposium on foundations of computer science (FOCS).
Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In European conference on computer vision (ECCV).
Bach, J. R., Fuller, C., Gupta, A., Hampapur, A., Horowitz, B., Humphrey, R., Jain, R. C., & Shu, C. F. (1996). Virage image search engine: an open framework for image management. In Storage and retrieval for still image and video databases IV (Vol. 2670).
Banerjee, A., Merugu, S., Dhillon, I. S., & Ghosh, J. (2005). Clustering with bregman divergences. Journal of Machine Learning Research, 6, 1705–1749.
Barndorff-Nielsen, O. (2014). Information and exponential families: In statistical theory. New York: Wiley.
Bell, S., & Bala, K. (2015). Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG), 34(4), 1–10.
Bregman, L. M. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3), 200–217.
Cakir, F., He, K., Adel Bargal, S., & Sclaroff, S. (2017). Mihash: Online hashing with mutual information. In International conference on computer vision (ICCV).
Cakir, F., He, K., & Sclaroff, S. (2018). Hashing with binary matrix pursuit. In Proceedings of the European conference on computer vision (ECCV) (pp. 332–348).
Cao, Y., Long, M., Wang, J., Zhu, H., & Wen, Q. (2016) Deep quantization network for efficient image retrieval. In AAAI Conference on Artificial Intelligence.
Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Chua, T. S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. T. (2009). NUS-WIDE: A real-world web image database from national university of Singapore. In ACM Conferrence on Image and Video Retrieval (CIVR).
Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on computational geometry (SOCG).
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR).
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., et al. (1995). Query by image and video content: The QBIC system. Computer, 28(9), 23–32.
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al. (2013). Devise: A deep visual-semantic embedding model. In advances in neural information processing systems (NIPS).
Goldberger, J., Hinton, G. E., Roweis, S. T., & Salakhutdinov, R. R. (2005) Neighbourhood components analysis. In: Advances in neural information processing systems (NIPS).
Gong, Y., Lazebnik, S., Gordo, A., & Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(12), 2916–2929.
Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: Learning global representations for image search. In European conference on computer vision (ECCV)
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In IEEE conference on computer vision and pattern recognition (CVPR) .
He, K., Cakir, F., Bargal, S. A., & Sclaroff, S. (2018). Hashing as tie-aware learning to rank. In IEEE conference on computer vision and pattern recognition (CVPR)
Huang, S., Xiong, Y., Zhang, Y., & Wang, J. (2017). Unsupervised triplet hashing for fast image retrieval. In Thematic workshops of ACM multimedia
Jain, H., Zepeda, J., Pérez, P., & Gribonval, R. (2017). Subic: A supervised, structured binary code for image search. In International conference on computer vision (ICCV)
Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 33(1), 117–128.
Jiang, Q. Y., & Li, W. J. (2018). Asymmetric deep supervised hashing. In Thirty-second AAAI conference on artificial intelligence.
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. University of Toronto.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS).
Kulis, B., & Darrell, T. (2009). Learning to hash with binary reconstructive embeddings. In Advances in neural information processing systems (NIPS)
Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. In IEEE conference on computer vision and pattern recognition (CVPR).
Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In IEEE conference on computer vision and pattern recognition (CVPR).
Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in neural information processing systems (NIPS).
Li, Q., Sun, Z., He, R., & Tan, T. (2017). Deep supervised discrete hashing. In Advances in neural information processing systems (NIPS).
Li, W. J., Wang, S., & Kang, W. C. (2016). Feature learning based deep supervised hashing with pairwise labels. In AAAI conference on artificial intelligence.
Lin, K., Lu, J., Chen, C. S., & Zhou, J. (2016). Learning compact binary descriptors with unsupervised deep neural networks. In IEEE conference on computer vision and pattern recognition (CVPR).
Lin, K., Yang, H. F., Hsiao, J. H., & Chen, C.S. (2015) Deep learning of binary hash codes for fast image retrieval. In IEEE conference on Computer vision and pattern recognition (workshops).
Liong, E., Lu, J., Wang, G., Moulin, P., & Zhou, J. (2015). Deep hashing for compact binary codes learning. In IEEE conference on computer vision and pattern recognition (CVPR).
Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In IEEE conference on computer vision and pattern recognition (CVPR).
Liu, W., Wang, J., Ji, R., Jiang, Y. G., & Chang, S. F. (2012). Supervised hashing with kernels. In IEEE conference on computer vision and pattern recognition (CVPR).
Lu, J., Liong, V. E., & Zhou, J. (2017). Deep hashing for scalable image search. IEEE Transactions on Image Processing (TIP), 26(5), 2352–2367.
Morgado, P., & Vasconcelos, N. (2017). Semantically consistent regularization for zero-shot recognition. In IEEE conference on computer vision and pattern recognition (CVPR).
Movshovitz-Attias, Y., Toshev, A., Leung, T. K., Ioffe, S., & Singh, S. (2017). No fuss distance metric learning using proxies. In International conference on computer vision (ICCV).
Mu, Y., & Yan, S. (2010). Non-metric locality-sensitive hashing. In AAAI conference on artificial intelligence.
Nelder, J. A., & Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135(3), 370–384.
Norouzi, M., & Blei, D. M. (2011). Minimal loss hashing for compact binary codes. In International conference on machine learning (ICML).
Oh Song, H., Xiang, Y., Jegelka, S., & Savarese, S. (2016). Deep metric learning via lifted structured feature embedding. In IEEE conference on computer vision and pattern recognition (CVPR)
Pereira, J. C., & Vasconcelos, N. (2014). Cross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems. In Computer Vision and Image Understanding (Vol. 124).
Rasiwasia, N., Moreno, P., & Vasconcelos, N. (2007). Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 9(5), 923–938.
Rohrbach, M., Stark, M., & Schiele, B. (2011). Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In IEEE conference on computer vision and pattern recognition (CVPR).
Sablayrolles, A., Douze, M., Usunier, N., & Jégou, H. (2017). How should we evaluate supervised hashing? In IEEE International conference on acoustics, speech and signal processing (ICASSP).
Schroff, F., Kalenichenko, D., & Philbin, J. Facenet (2015). A unified embedding for face recognition and clustering. In IEEE conference on computer vision and pattern recognition (CVPR).
Shen, F., Shen, C., Liu, W., & Tao Shen, H. (2015). Supervised discrete hashing. In IEEE conference on computer vision and pattern recognition (CVPR)
Smeulders, A., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(12), 1349–1380.
Smith, J. R., & Chang, S. F. (1997). Visualseek: A fully automated content-based image query system. In ACM international conference on multimedia.
Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems (NIPS)
Song, H. O., Jegelka, S., Rathod, V., & Murphy, K. (2016). Learnable structured clustering framework for deep metric learning. arXiv preprint arXiv:1612.01213.
Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In Advances in neural information processing systems (NIPS).
Tammes, P. (1930). On the origin of number and arrangement of the places of exit on the surface of pollen-grains. Ph.D. thesis, University of Groningen.
Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In European conference on computer vision (ECCV).
Wang, J., Kumar, S., & Chang, S. F. (2010). Semi-supervised hashing for scalable image retrieval. In IEEE conference on computer vision and pattern recognition (CVPR).
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., & Wu, Y. (2014). Learning fine-grained image similarity with deep ranking. In IEEE conference on computer vision and pattern recognition (CVPR).
Wang, X., Shi, Y., & Kitani, K. M. (2016). Deep supervised hashing with triplet labels. In Asian conference on computer vision (ACCV).
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(9), 207–244.
Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In Advances in Neural Information Processing Systems (NIPS).
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision (ECCV) (pp. 499–515). Berlin: Springer.
Wright, S., & Nocedal, J. (1999). Numerical optimization. Science, 35, 566.
Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014). Supervised hashing for image retrieval via image representation learning. In AAAI conference on artificial intelligence
Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision (pp. 1395–1403).
Yang, H. F., Lin, K., & Chen, C. S. (2017). Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(2), 437–451.
Zhang, D., Wu, W., Cheng, H., Zhang, R., Dong, Z., & Cai, Z. (2017). Image-to-video person re-identification with temporally memorized similarity learning. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2622–2632.
Zhang, J., Peng, Y., & Zhang, J. (2016). Query-adaptive image retrieval by deep weighted hashing. IEEE Transactions on Multimedia.
Zhang, R., Li, J., Sun, H., Ge, Y., Luo, P., Wang, X., et al. (2019). Scan: Self-and-collaborative attention network for video person re-identification. IEEE Transactions on Image Processing, 28(10), 4870–4882.
Zhang, R., Lin, L., Zhang, R., Zuo, W., & Zhang, L. (2015). Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transactions on Image Processing, 24(12), 4766–4779.
Zhao, F., Huang, Y., Wang, L., & Tan, T. (2015). Deep semantic ranking based hashing for multi-label image retrieval. In IEEE conference on computer vision and pattern recognition (CVPR).
Zhong, G., Xu, H., Yang, P., Wang, S., & Dong, J. (2016). Deep hashing learning networks. In International joint conference on neural networks (IJCNN).
Zhu, H., Long, M., Wang, J., & Cao, Y. (2016). Deep hashing network for efficient similarity retrieval. In AAAI conference on artificial intelligence.
Acknowledgements
This work was funded by graduate fellowship 109135/2015 from the Portuguese Ministry of Sciences and Education, NSF Grants IIS-1546305, IIS-1637941, IIS-1924937, and NVIDIA GPU donations.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Relations Between Classification and Metric Learning
Appendix: Relations Between Classification and Metric Learning
Although seemingly different, metric learning and classification are closely related. To see this, consider the Bayes rule
It follows from (2) that
where \(\propto _\mathbf {x}\) denotes a proportional relation for each value of \(\mathbf{x}\). This holds when
where \(q(\mathbf {x})\) is any non-negative function and \(\psi (\mathbf {w}_y)\) a constant such that (29) integrates to one. In this case, \(P_{\mathbf {X}|Y}(\mathbf {x}|y)\) is an exponential family distribution of canonical parameter \(\mathbf {w}_y\), sufficient statistic \(\nu (\mathbf {x})\) and cumulant function \(\psi (\mathbf {w}_y)\) Barndorff-Nielsen (2014). Further assuming, for simplicity, that the classes are balanced, i.e., \(P_Y(y) =\frac{1}{C} \forall y\), leads to
where K is a constant.
The cumulant \(\psi (\mathbf {w}_y)\) has several important properties Barndorff-Nielsen (2014); Nelder and Wedderburn (1972); Banerjee et al. (2005). First, \(\psi (\cdot )\) is a convex function of \(\mathbf {w}_y\). Second, its first and second order derivatives are the mean \(\nabla \psi (\mathbf {w}_y) = {{\mu }}^\nu _y\) and co-variance \(\nabla ^2 \psi (\mathbf {w}_y) = {\varvec{\Sigma }}^\nu _y\) of \(\nu (\mathbf {x})\) under class y. Third, \(\psi (\cdot )\) has a conjugate function, convex on \({{\mu }}^\nu _y\), given by
It follows that the exponent of (29) can be re-written as
where
is the Bregman divergence between \(\mathbf{a}\) and \(\mathbf{b}\) associated with \(\phi \). Thus, (29) can be written as
where \(u(\mathbf {x}) = q(\mathbf {x}) {e}^{\phi (\nu (\mathbf {x}))}\) and using (31), (30) and (27),
Hence, learning the embedding \(\nu (\mathbf {x})\) with the softmax classifier of (2) endows \(\mathcal V\) with the Bregman divergence \(d_{\phi }(\nu (\mathbf {x}),{{\mu }}^\nu _y)\). From (32), it follows that
Hence,
if and only if
which holds when
It can be shown that the corresponding exponential family model is the Gaussian of identity covariance and the corresponding Bregman divergence the squared Euclidean distance. Hence, \({{\mu }}_y^g = \mathbf {w}_y\) if only if \(d_\phi \) is the \(L_2\) distance. In this case, (36) reduces to
Rights and permissions
About this article
Cite this article
Morgado, P., Li, Y., Costa Pereira, J. et al. Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings. Int J Comput Vis 129, 419–438 (2021). https://doi.org/10.1007/s11263-020-01362-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-020-01362-7