Multiclass Boosting Framework for Multimodal Data Analysis

  • Shixun Wang
  • Peng Pan
  • Yansheng Lu
  • Sheng Jiang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8936)


A large number of multimedia documents containing texts and images have appeared on the internet, hence cross-modal retrieval in which the modality of a query is different from that of the retrieved results is being an interesting search paradigm. In this paper, a multimodal multiclass boosting framework (MMB) is proposed to capture intra-modal semantic information and inter-modal semantic correlation. Unlike traditional boosting methods which are confined to two classes or single modality, MMB could simultaneously deal with multimodal data. The empirical risk, which takes both intra-modal and inter-modal losses into account, is designed and then minimized by gradient descent in the multidimensional functional spaces. More specifically, the optimization problem is solved in turn for each modality. Semantic space can be naturally attained by applying sigmoid function to the quasi-margins. Extensive experiments on the Wiki and NUS-WIDE datasets show that the performance of our method significantly outperforms those of existing approaches for cross-modal retrieval.


multiclass boosting loss function intra-modal and inter-modal cross-modal retrieval semantic space 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blaschko, M.B., Lampert, C.H.: Correlational Spectral Clustering. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  2. 2.
    Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data Fusion through Cross-modality Metric Learning Using Similarity-sensitive Hashing. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3594–3601 (2010)Google Scholar
  3. 3.
    Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 394–410 (2007)CrossRefGoogle Scholar
  4. 4.
    Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 48–56 (2009)Google Scholar
  5. 5.
    Clinchant, S., Ah-Pine, J., Csurka, G.: Semantic Combination of Textual and Visual Information in Multimedia Retrieval. In: Proceeding of the 1st ACM International Conference on Multimedia Retrieval (2011)Google Scholar
  6. 6.
    Coxeter, H.S.M.: Regular polytopes. Courier Dover Publications (1973)Google Scholar
  7. 7.
    Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Hotelling, H.: Relations Between Two Sets of Variates. Biometrika 28(3-4), 321–337 (1936)CrossRefzbMATHGoogle Scholar
  9. 9.
    Kidron, E., Schechner, Y.Y., Elad, M.: Pixels That Sound. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 88–95 (2005)Google Scholar
  10. 10.
    Manning, C.D., Raghavan, P., Schtze, H.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  11. 11.
    Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia 9(5), 923–938 (2007)CrossRefGoogle Scholar
  12. 12.
    Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A New Approach to Cross-modal Multimedia Retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 251–260 (2010)Google Scholar
  13. 13.
    Saberian, M.J., Masnadi-Shirazi, H., Vasconcelos, N.: Taylorboost: First and Second-order Boosting Algorithms with Explicit Margin Control. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2929–2934 (2011)Google Scholar
  14. 14.
    Saberian, M.J., Vasconcelos, N.: Multiclass Boosting: Theory and Algorithms. In: Advances in Neural Information Processing Systems, pp. 2124–2132 (2011)Google Scholar
  15. 15.
    Shen, J., Cheng, Z.: Personalized Video Similarity Measure. Multimedia Systems 17(5), 421–433 (2011)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic Annotation and Retrieval of Music and Sound Effects. IEEE Transactions on Audio, Speech, and Language Processing 16(2), 467–476 (2008)CrossRefGoogle Scholar
  17. 17.
    Typke, R., Wiering, F., Veltkamp, R.C.: A Survey of Music Information Retrieval Systems. In: Proceeding of ISMIR, pp. 153–160 (2005)Google Scholar
  18. 18.
    Zhen, Y., Yeung, D.Y.: A Probabilistic Model for Multimodal Hash Function Learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 940–948 (2012)Google Scholar
  19. 19.
    Zhu, J., Zou, H., Rosset, S., Hastie, T.: Multi-class Adaboost. Statistics and Its Interface 2, 349–360 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear Cross-modal Hashing for Efficient Multimedia Search. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 143–152 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Shixun Wang
    • 1
  • Peng Pan
    • 1
  • Yansheng Lu
    • 1
  • Sheng Jiang
    • 1
  1. 1.School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina

Personalised recommendations