Abstract
A large number of multimedia documents containing texts and images have appeared on the internet, hence cross-modal retrieval in which the modality of a query is different from that of the retrieved results is being an interesting search paradigm. In this paper, a multimodal multiclass boosting framework (MMB) is proposed to capture intra-modal semantic information and inter-modal semantic correlation. Unlike traditional boosting methods which are confined to two classes or single modality, MMB could simultaneously deal with multimodal data. The empirical risk, which takes both intra-modal and inter-modal losses into account, is designed and then minimized by gradient descent in the multidimensional functional spaces. More specifically, the optimization problem is solved in turn for each modality. Semantic space can be naturally attained by applying sigmoid function to the quasi-margins. Extensive experiments on the Wiki and NUS-WIDE datasets show that the performance of our method significantly outperforms those of existing approaches for cross-modal retrieval.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blaschko, M.B., Lampert, C.H.: Correlational Spectral Clustering. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data Fusion through Cross-modality Metric Learning Using Similarity-sensitive Hashing. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3594–3601 (2010)
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 394–410 (2007)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 48–56 (2009)
Clinchant, S., Ah-Pine, J., Csurka, G.: Semantic Combination of Textual and Visual Information in Multimedia Retrieval. In: Proceeding of the 1st ACM International Conference on Multimedia Retrieval (2011)
Coxeter, H.S.M.: Regular polytopes. Courier Dover Publications (1973)
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Hotelling, H.: Relations Between Two Sets of Variates. Biometrika 28(3-4), 321–337 (1936)
Kidron, E., Schechner, Y.Y., Elad, M.: Pixels That Sound. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 88–95 (2005)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)
Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia 9(5), 923–938 (2007)
Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A New Approach to Cross-modal Multimedia Retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 251–260 (2010)
Saberian, M.J., Masnadi-Shirazi, H., Vasconcelos, N.: Taylorboost: First and Second-order Boosting Algorithms with Explicit Margin Control. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2929–2934 (2011)
Saberian, M.J., Vasconcelos, N.: Multiclass Boosting: Theory and Algorithms. In: Advances in Neural Information Processing Systems, pp. 2124–2132 (2011)
Shen, J., Cheng, Z.: Personalized Video Similarity Measure. Multimedia Systems 17(5), 421–433 (2011)
Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic Annotation and Retrieval of Music and Sound Effects. IEEE Transactions on Audio, Speech, and Language Processing 16(2), 467–476 (2008)
Typke, R., Wiering, F., Veltkamp, R.C.: A Survey of Music Information Retrieval Systems. In: Proceeding of ISMIR, pp. 153–160 (2005)
Zhen, Y., Yeung, D.Y.: A Probabilistic Model for Multimodal Hash Function Learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 940–948 (2012)
Zhu, J., Zou, H., Rosset, S., Hastie, T.: Multi-class Adaboost. Statistics and Its Interface 2, 349–360 (2009)
Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear Cross-modal Hashing for Efficient Multimedia Search. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 143–152 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, S., Pan, P., Lu, Y., Jiang, S. (2015). Multiclass Boosting Framework for Multimodal Data Analysis. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8936. Springer, Cham. https://doi.org/10.1007/978-3-319-14442-9_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-14442-9_60
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14441-2
Online ISBN: 978-3-319-14442-9
eBook Packages: Computer ScienceComputer Science (R0)