Abstract
Cross-modal hashing has drawn increasing research interests in cross-modal retrieval due to the explosive growth of multimedia big data. However, most of the existing models are trained and tested in a close-set circumstance, which may easily fail on the newly emerged concepts that are never present in the training stage. In this paper, we propose a novel cross-modal hashing model, named Cross-Modal Attribute Hashing (CMAH), which can handle cross-modal retrieval of unseen categories. Inspired by zero-shot learning, attribute space is employed to transfer knowledge from seen categories to unseen categories. Specifically, the cross-modal hashing functions learning and knowledge transfer are conducted by modeling the relationships among features, attributes, and classes as a dual multi-layer network. In addition, graph regularization and binary constraints are imposed to preserve the local structure information in each modality and to reduce quantization loss, respectively. Extensive experiments are carried out on three datasets, and the results demonstrate the effectiveness of CMAH in handling cross-modal retrieval for both seen and unseen concepts.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsChange history
05 June 2019
In the original version of the chapter titled “An Exploration of Cross-Modal Retrieval for Unseen Concepts”, the acknowledgement was missing. It has been added.
In the original version of the chapter titled “Towards both Local and Global Query Result Diversification”, the funding information in the acknowledgement section was incomplete. This has now been corrected.
References
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Cao, Y., Long, M., Wang, J., Liu, S.: Collective deep quantization for efficient cross-modal retrieval. In: AAAI, pp. 3974–3980 (2017)
Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: CVPR, pp. 5327–5336 (2016)
Chi, J., Huang, X., Peng, Y.: Zero-shot cross-media retrieval with external knowledge. In: Huet, B., Nie, L., Hong, R. (eds.) ICIMCS 2017. CCIS, vol. 819, pp. 200–211. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8530-7_20
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: CVPR, pp. 2075–2082 (2014)
Ding, K., Fan, B., Huo, C., Xiang, S., Pan, C.: Cross-modal hashing via rank-order preserving. IEEE Trans. Multimedia 19(3), 571–585 (2017). https://doi.org/10.1109/TMM.2016.2625747
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Guo, Y., Ding, G., Han, J., Gao, Y.: SitNet: discrete similarity transfer network for zero-shot hashing. In: IJCAI, pp. 1767–1773 (2017)
Hwang, S.J., Grauman, K.: Reading between the lines: object localization using implicit cues from image tags. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1145–1158 (2012)
Ji, Z., Sun, Y., Yu, Y., Pang, Y., Han, J.: Attribute-guided network for cross-modal zero-shot hashing. arXiv preprint arXiv:1802.01943 (2018)
Jiang, Q.Y., Li, W.J.: Deep cross-modal hashing. In: CVPR, pp. 3270–3278 (2017)
Kodirov, E., Xiang, T., Fu, Z., Gong, S.: Unsupervised domain adaptation for zero-shot learning. In: CVPR, pp. 2452–2460 (2015)
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: CVPR, pp. 3174–3183 (2017)
Liu, H., Ji, R., Wu, Y., Hua, G.: Supervised matrix factorization for cross-modality hashing. In: IJCAI, pp. 1767–1773 (2016)
Liu, L., Lin, Z., Shao, L., Shen, F., Ding, G., Han, J.: Sequential discrete hashing for scalable cross-modality similarity retrieval. IEEE Trans. Image Process. 26(1), 107–118 (2017)
Long, Y., Liu, L., Shao, L.: Towards fine-grained open zero-shot learning: inferring unseen visual features from attributes. In: IEEE Winter Conference on Applications of Computer Vision, pp. 944–952 (2017)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Pachori, S., Deshpande, A., Raman, S.: Hashing in the zero shot framework with domain adaptation. Neurocomputing 275, 2137–2149 (2018)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 251–260 (2010)
Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: International Conference on Machine Learning, pp. 2152–2161 (2015)
Shen, F., Shen, C., Liu, W., Tao Shen, H.: Supervised discrete hashing. In: CVPR, pp. 37–45 (2015)
Tang, J., Wang, K., Shao, L.: Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans. Image Process. 25(7), 3157–3166 (2016)
Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2010–2023 (2016)
Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning-the good, the bad and the ugly. In: CVPR, pp. 4582–4591 (2017)
Xu, X., Shen, F., Yang, Y., Zhang, D., Shen, H.T., Song, J.: Matrix tri-factorization with manifold regularizations for zero-shot learning. In: CVPR (2017)
Xu, Y., Yang, Y., Shen, F., Xu, X., Zhou, Y., Shen, H.T.: Attribute hashing for zero-shot image retrieval. In: IEEE International Conference on Multimedia and Expo, pp. 133–138 (2017)
Yang, E., Deng, C., Liu, W., Liu, X., Tao, D., Gao, X.: Pairwise relationship guided deep hashing for cross-modal retrieval. In: AAAI, pp. 1618–1625 (2017)
Yang, Y., Luo, Y., Chen, W., Shen, F., Shao, J., Shen, H.T.: Zero-shot hashing via transferring supervised knowledge. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 1286–1295 (2016)
Zhang, L., Ma, B., He, J., Li, G., Huang, Q., Tian, Q.: Adaptively unified semi-supervised learning for cross-modal retrieval. In: AAAI, pp. 3406–3412 (2017)
Zhang, L., Ma, B., Li, G., Huang, Q., Tian, Q.: Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans. Multimedia 20(1), 128–141 (2018)
Zhong, F., Chen, Z., Min, G.: Deep discrete cross-modal hashing for cross-media retrieval. Pattern Recogn. 83, 64–77 (2018)
Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th ACM International Conference on Research and Development in Information Retrieval, pp. 415–424 (2014)
Acknowledgement
This work was supported in part by the National Key Research and Development Program of China (2018YFC0831305) and in part by the Nature Science Foundation of China (61672123).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhong, F., Chen, Z., Min, G. (2019). An Exploration of Cross-Modal Retrieval for Unseen Concepts. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11447. Springer, Cham. https://doi.org/10.1007/978-3-030-18579-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-18579-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18578-7
Online ISBN: 978-3-030-18579-4
eBook Packages: Computer ScienceComputer Science (R0)