Skip to main content
Log in

Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Image hash codes are produced by binarizing the embeddings of convolutional neural networks (CNN) trained for either classification or retrieval. While proxy embeddings achieve good performance on both tasks, they are non-trivial to binarize, due to a rotational ambiguity that encourages non-binary embeddings. The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity, and a procedure to design proxy sets that are nearly optimal for both classification and hashing is introduced. The resulting hash-consistent large margin (HCLM) proxies are shown to encourage saturation of hashing units, thus guaranteeing a small binarization error, while producing highly discriminative hash-codes. A semantic extension (sHCLM), aimed to improve hashing performance in a transfer scenario, is also proposed. Extensive experiments show that sHCLM embeddings achieve significant improvements over state-of-the-art hashing procedures on several small and large datasets, both within and beyond the set of training classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2013). Label-embedding for attribute-based classification. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Andoni, A., & Indyk, P. (2006). Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In IEEE symposium on foundations of computer science (FOCS).

  • Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In European conference on computer vision (ECCV).

  • Bach, J. R., Fuller, C., Gupta, A., Hampapur, A., Horowitz, B., Humphrey, R., Jain, R. C., & Shu, C. F. (1996). Virage image search engine: an open framework for image management. In Storage and retrieval for still image and video databases IV (Vol. 2670).

  • Banerjee, A., Merugu, S., Dhillon, I. S., & Ghosh, J. (2005). Clustering with bregman divergences. Journal of Machine Learning Research, 6, 1705–1749.

    MathSciNet  MATH  Google Scholar 

  • Barndorff-Nielsen, O. (2014). Information and exponential families: In statistical theory. New York: Wiley.

    Book  Google Scholar 

  • Bell, S., & Bala, K. (2015). Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG), 34(4), 1–10.

    Article  Google Scholar 

  • Bregman, L. M. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3), 200–217.

    Article  MathSciNet  Google Scholar 

  • Cakir, F., He, K., Adel Bargal, S., & Sclaroff, S. (2017). Mihash: Online hashing with mutual information. In International conference on computer vision (ICCV).

  • Cakir, F., He, K., & Sclaroff, S. (2018). Hashing with binary matrix pursuit. In Proceedings of the European conference on computer vision (ECCV) (pp. 332–348).

  • Cao, Y., Long, M., Wang, J., Zhu, H., & Wen, Q. (2016) Deep quantization network for efficient image retrieval. In AAAI Conference on Artificial Intelligence.

  • Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Chua, T. S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. T. (2009). NUS-WIDE: A real-world web image database from national university of Singapore. In ACM Conferrence on Image and Video Retrieval (CIVR).

  • Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on computational geometry (SOCG).

  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., et al. (1995). Query by image and video content: The QBIC system. Computer, 28(9), 23–32.

    Article  Google Scholar 

  • Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al. (2013). Devise: A deep visual-semantic embedding model. In advances in neural information processing systems (NIPS).

  • Goldberger, J., Hinton, G. E., Roweis, S. T., & Salakhutdinov, R. R. (2005) Neighbourhood components analysis. In: Advances in neural information processing systems (NIPS).

  • Gong, Y., Lazebnik, S., Gordo, A., & Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(12), 2916–2929.

    Article  Google Scholar 

  • Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: Learning global representations for image search. In European conference on computer vision (ECCV)

  • Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In IEEE conference on computer vision and pattern recognition (CVPR) .

  • He, K., Cakir, F., Bargal, S. A., & Sclaroff, S. (2018). Hashing as tie-aware learning to rank. In IEEE conference on computer vision and pattern recognition (CVPR)

  • Huang, S., Xiong, Y., Zhang, Y., & Wang, J. (2017). Unsupervised triplet hashing for fast image retrieval. In Thematic workshops of ACM multimedia

  • Jain, H., Zepeda, J., Pérez, P., & Gribonval, R. (2017). Subic: A supervised, structured binary code for image search. In International conference on computer vision (ICCV)

  • Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 33(1), 117–128.

    Article  Google Scholar 

  • Jiang, Q. Y., & Li, W. J. (2018). Asymmetric deep supervised hashing. In Thirty-second AAAI conference on artificial intelligence.

  • Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. University of Toronto.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS).

  • Kulis, B., & Darrell, T. (2009). Learning to hash with binary reconstructive embeddings. In Advances in neural information processing systems (NIPS)

  • Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in neural information processing systems (NIPS).

  • Li, Q., Sun, Z., He, R., & Tan, T. (2017). Deep supervised discrete hashing. In Advances in neural information processing systems (NIPS).

  • Li, W. J., Wang, S., & Kang, W. C. (2016). Feature learning based deep supervised hashing with pairwise labels. In AAAI conference on artificial intelligence.

  • Lin, K., Lu, J., Chen, C. S., & Zhou, J. (2016). Learning compact binary descriptors with unsupervised deep neural networks. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Lin, K., Yang, H. F., Hsiao, J. H., & Chen, C.S. (2015) Deep learning of binary hash codes for fast image retrieval. In IEEE conference on Computer vision and pattern recognition (workshops).

  • Liong, E., Lu, J., Wang, G., Moulin, P., & Zhou, J. (2015). Deep hashing for compact binary codes learning. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Liu, W., Wang, J., Ji, R., Jiang, Y. G., & Chang, S. F. (2012). Supervised hashing with kernels. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Lu, J., Liong, V. E., & Zhou, J. (2017). Deep hashing for scalable image search. IEEE Transactions on Image Processing (TIP), 26(5), 2352–2367.

    Article  MathSciNet  Google Scholar 

  • Morgado, P., & Vasconcelos, N. (2017). Semantically consistent regularization for zero-shot recognition. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Movshovitz-Attias, Y., Toshev, A., Leung, T. K., Ioffe, S., & Singh, S. (2017). No fuss distance metric learning using proxies. In International conference on computer vision (ICCV).

  • Mu, Y., & Yan, S. (2010). Non-metric locality-sensitive hashing. In AAAI conference on artificial intelligence.

  • Nelder, J. A., & Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135(3), 370–384.

    Article  Google Scholar 

  • Norouzi, M., & Blei, D. M. (2011). Minimal loss hashing for compact binary codes. In International conference  on machine learning (ICML).

  • Oh Song, H., Xiang, Y., Jegelka, S., & Savarese, S. (2016). Deep metric learning via lifted structured feature embedding. In IEEE conference on computer vision and pattern recognition (CVPR)

  • Pereira, J. C., & Vasconcelos, N. (2014). Cross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems. In Computer Vision and Image Understanding (Vol. 124).

  • Rasiwasia, N., Moreno, P., & Vasconcelos, N. (2007). Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 9(5), 923–938.

    Article  Google Scholar 

  • Rohrbach, M., Stark, M., & Schiele, B. (2011). Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Sablayrolles, A., Douze, M., Usunier, N., & Jégou, H. (2017). How should we evaluate supervised hashing? In IEEE International conference on acoustics, speech and signal processing (ICASSP).

  • Schroff, F., Kalenichenko, D., & Philbin, J. Facenet (2015). A unified embedding for face recognition and clustering. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Shen, F., Shen, C., Liu, W., & Tao Shen, H. (2015). Supervised discrete hashing. In IEEE conference on computer vision and pattern recognition (CVPR)

  • Smeulders, A., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(12), 1349–1380.

    Article  Google Scholar 

  • Smith, J. R., & Chang, S. F. (1997). Visualseek: A fully automated content-based image query system. In ACM international conference on multimedia.

  • Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems (NIPS)

  • Song, H. O., Jegelka, S., Rathod, V., & Murphy, K. (2016). Learnable structured clustering framework for deep metric learning. arXiv preprint arXiv:1612.01213.

  • Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In Advances in neural information processing systems (NIPS).

  • Tammes, P. (1930). On the origin of number and arrangement of the places of exit on the surface of pollen-grains. Ph.D. thesis, University of Groningen.

  • Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In European conference on computer vision (ECCV).

  • Wang, J., Kumar, S., & Chang, S. F. (2010). Semi-supervised hashing for scalable image retrieval. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., & Wu, Y. (2014). Learning fine-grained image similarity with deep ranking. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Wang, X., Shi, Y., & Kitani, K. M. (2016). Deep supervised hashing with triplet labels. In Asian conference on computer vision (ACCV).

  • Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(9), 207–244.

    MATH  Google Scholar 

  • Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In Advances in Neural Information Processing Systems (NIPS).

  • Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision (ECCV) (pp. 499–515). Berlin: Springer.

  • Wright, S., & Nocedal, J. (1999). Numerical optimization. Science, 35, 566.

    MATH  Google Scholar 

  • Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014). Supervised hashing for image retrieval via image representation learning. In AAAI conference on artificial intelligence

  • Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision (pp. 1395–1403).

  • Yang, H. F., Lin, K., & Chen, C. S. (2017). Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(2), 437–451.

    Article  Google Scholar 

  • Zhang, D., Wu, W., Cheng, H., Zhang, R., Dong, Z., & Cai, Z. (2017). Image-to-video person re-identification with temporally memorized similarity learning. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2622–2632.

    Article  Google Scholar 

  • Zhang, J., Peng, Y., & Zhang, J. (2016). Query-adaptive image retrieval by deep weighted hashing. IEEE Transactions on Multimedia.

  • Zhang, R., Li, J., Sun, H., Ge, Y., Luo, P., Wang, X., et al. (2019). Scan: Self-and-collaborative attention network for video person re-identification. IEEE Transactions on Image Processing, 28(10), 4870–4882.

    Article  MathSciNet  Google Scholar 

  • Zhang, R., Lin, L., Zhang, R., Zuo, W., & Zhang, L. (2015). Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transactions on Image Processing, 24(12), 4766–4779.

    Article  MathSciNet  Google Scholar 

  • Zhao, F., Huang, Y., Wang, L., & Tan, T. (2015). Deep semantic ranking based hashing for multi-label image retrieval. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Zhong, G., Xu, H., Yang, P., Wang, S., & Dong, J. (2016). Deep hashing learning networks. In International joint conference on neural networks (IJCNN).

  • Zhu, H., Long, M., Wang, J., & Cao, Y. (2016). Deep hashing network for efficient similarity retrieval. In AAAI conference on artificial intelligence.

Download references

Acknowledgements

This work was funded by graduate fellowship 109135/2015 from the Portuguese Ministry of Sciences and Education, NSF Grants IIS-1546305, IIS-1637941, IIS-1924937, and NVIDIA GPU donations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Morgado.

Additional information

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Relations Between Classification and Metric Learning

Appendix: Relations Between Classification and Metric Learning

Although seemingly different, metric learning and classification are closely related. To see this, consider the Bayes rule

$$\begin{aligned} P_{Y|\mathbf {X}}(y|\mathbf {x})= & {} \frac{P_{\mathbf {X}|Y}(\mathbf {x}|y) P_Y(y)}{\sum _k P_{\mathbf {X}|Y}(\mathbf {x}|k) P_Y(k)}. \end{aligned}$$
(27)

It follows from (2) that

$$\begin{aligned} P_{\mathbf {X}|Y}(\mathbf {x}|y) P_Y(y) \propto _\mathbf {x}{e}^{\mathbf {w}^{T}_y \nu (\mathbf {x}) + b_y} \end{aligned}$$
(28)

where \(\propto _\mathbf {x}\) denotes a proportional relation for each value of \(\mathbf{x}\). This holds when

$$\begin{aligned} P_{\mathbf {X}|Y}(\mathbf {x}|y)= & {} q(\mathbf {x}) {e}^{\mathbf {w}^{T}_y \nu (\mathbf {x}) - \psi (\mathbf {w}_y)} \end{aligned}$$
(29)
$$\begin{aligned} P_Y(y)= & {} \frac{{e}^{b_y + \psi (\mathbf {w}_y)}}{\sum _k {e}^{b_k + \psi (\mathbf {w}_k)}}, \end{aligned}$$
(30)

where \(q(\mathbf {x})\) is any non-negative function and \(\psi (\mathbf {w}_y)\) a constant such that (29) integrates to one. In this case, \(P_{\mathbf {X}|Y}(\mathbf {x}|y)\) is an exponential family distribution of canonical parameter \(\mathbf {w}_y\), sufficient statistic \(\nu (\mathbf {x})\) and cumulant function \(\psi (\mathbf {w}_y)\) Barndorff-Nielsen (2014). Further assuming, for simplicity, that the classes are balanced, i.e., \(P_Y(y) =\frac{1}{C} \forall y\), leads to

$$\begin{aligned} b_y = -\psi (\mathbf {w}_y) + \log K \end{aligned}$$
(31)

where K is a constant.

The cumulant \(\psi (\mathbf {w}_y)\) has several important properties Barndorff-Nielsen (2014); Nelder and Wedderburn (1972); Banerjee et al. (2005). First, \(\psi (\cdot )\) is a convex function of \(\mathbf {w}_y\). Second, its first and second order derivatives are the mean \(\nabla \psi (\mathbf {w}_y) = {{\mu }}^\nu _y\) and co-variance \(\nabla ^2 \psi (\mathbf {w}_y) = {\varvec{\Sigma }}^\nu _y\) of \(\nu (\mathbf {x})\) under class y. Third, \(\psi (\cdot )\) has a conjugate function, convex on \({{\mu }}^\nu _y\), given by

$$\begin{aligned} \phi ({{\mu }}^\nu _y) = \mathbf {w}^{T}_y {{\mu }}^\nu _y - \psi (\mathbf {w}_y). \end{aligned}$$
(32)

It follows that the exponent of (29) can be re-written as

$$\begin{aligned} \mathbf {w}^{T}_y \nu (\mathbf {x}) - \psi (\mathbf {w}_y)= & {} \mathbf {w}^{T}_y {{\mu }}^\nu _y - \psi (\mathbf {w}_y) + \mathbf {w}^{T}_y (\nu (\mathbf {x}) - {{\mu }}^\nu _y) \nonumber \\= & {} \phi ({{\mu }}^\nu _y) + \mathbf {w}^{T}_y (\nu (\mathbf {x}) - {{\mu }}^\nu _y) \nonumber \\= & {} \phi ({{\mu }}^\nu _y) + \nabla \phi ({{\mu }}^\nu _y)^{T} (\nu (\mathbf {x}) - {{\mu }}^\nu _y) \nonumber \\= & {} -d_{\phi }(\nu (\mathbf {x}),{{\mu }}^\nu _y) + \phi (\nu (\mathbf {x})) \end{aligned}$$
(33)

where

$$\begin{aligned} d_{\phi }(\mathbf{a},\mathbf{b}) = \phi (\mathbf{a}) - \phi (\mathbf{b}) - \langle \nabla \phi (\mathbf{b}), \mathbf{a} - \mathbf{b} \rangle \end{aligned}$$
(34)

is the Bregman divergence between \(\mathbf{a}\) and \(\mathbf{b}\) associated with \(\phi \). Thus, (29) can be written as

$$\begin{aligned} P_{\mathbf {X}|Y}(\mathbf {x}|y) = u(\mathbf {x}) {e}^{-d_{\phi }(\nu (\mathbf {x}),{{\mu }}^\nu _y)} \end{aligned}$$
(35)

where \(u(\mathbf {x}) = q(\mathbf {x}) {e}^{\phi (\nu (\mathbf {x}))}\) and using (31), (30) and (27),

$$\begin{aligned} P_{Y|\mathbf {X}}(y|\mathbf {x})= & {} \frac{{e}^{-d_{\phi }(\nu (\mathbf {x}),{{\mu }}^\nu _y)}}{\sum _{k} {e}^{-d_{\phi }(\nu (\mathbf {x}),{{\mu }}^\nu _k)}}. \end{aligned}$$
(36)

Hence, learning the embedding \(\nu (\mathbf {x})\) with the softmax classifier of (2) endows \(\mathcal V\) with the Bregman divergence \(d_{\phi }(\nu (\mathbf {x}),{{\mu }}^\nu _y)\). From (32), it follows that

$$\begin{aligned} \nabla \psi (\mathbf {w}_y) = {{\mu }}_y^\nu \quad \quad \quad \nabla \phi ({{\mu }}_y^\nu ) = \mathbf {w}_y. \end{aligned}$$
(37)

Hence,

$$\begin{aligned} {{\mu }}_y^\nu = \mathbf {w}_y \end{aligned}$$
(38)

if and only if

$$\begin{aligned} \nabla \psi (\mathbf {w}_y) = \mathbf {w}_y \quad \quad \quad \nabla \phi ({{\mu }}_y^\nu ) = {{\mu }}_y^\nu , \end{aligned}$$
(39)

which holds when

$$\begin{aligned} \psi (\mathbf{a}) = \phi (\mathbf{a}) = \frac{1}{2}||\mathbf{a}||^2. \end{aligned}$$
(40)

It can be shown that the corresponding exponential family model is the Gaussian of identity covariance and the corresponding Bregman divergence the squared Euclidean distance. Hence, \({{\mu }}_y^g = \mathbf {w}_y\) if only if \(d_\phi \) is the \(L_2\) distance. In this case, (36) reduces to

$$\begin{aligned} P_{Y|\mathbf {X}}(y|\mathbf {x})= & {} \frac{{e}^{-d(\nu (\mathbf {x}),\mathbf {w}_y)}}{\sum _{k} {e}^{-d(\nu (\mathbf {x}),\mathbf {w}_k)}}. \end{aligned}$$
(41)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Morgado, P., Li, Y., Costa Pereira, J. et al. Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings. Int J Comput Vis 129, 419–438 (2021). https://doi.org/10.1007/s11263-020-01362-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01362-7

Keywords

Navigation