Skip to main content
Log in

Joint learning of cross-modal classifier and factor analysis for multimedia data classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we study the problem of learning from multiple model data for the purpose of document classification. In this problem, each document is composed of two different models of data, i.e., an image and a text. We propose to represent the data of two models by projecting them to a shared data space by using cross-model factor analysis formula and classify them in the shared space by using a linear class label predictor, named cross-model classifier. The parameters of both cross-model classifier and cross-model factor analysis are learned jointly, so that they can regularize the learning of each other. We construct a unified objective function for this learning problem. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projections measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple model document data sets show the advantage of the proposed algorithm over state-of-the-art multimedia data classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: 16th Annual neural information processing systems conference (NIPS 2002), pp 561–568

  2. Berghöfer E, Schulze D, Rauch C, Tscherepanow M, Khler T, Wachsmuth S (2013) Art-based fusion of multi-modal perception for robots. Neurocomputing 107:11–22

    Article  Google Scholar 

  3. Caicedo J, BenAbdallah J, González F, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60

    Article  Google Scholar 

  4. Carenzi F, Bendahan P, Roschin V, Frolov A, Gorce P, Maier M (2004) A generic neural network for multi-modal sensorimotor learning. Neurocomputing 58–60:525–533

    Article  Google Scholar 

  5. Chen Y, Wang L, Wang W, Zhang Z (2012) Continuum regression for cross-modal multimedia retrieval. In: 2012 19th IEEE international conference on image processing (ICIP 2012), pp 1949–1952

  6. Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet G, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  7. Deng J, Du L, Shen YD (2013) Heterogeneous metric learning for cross-modal multimedia retrieval. In: Web information systems engineering—WISE 2013. 14th International conference. proceedings: LNCS 8180, vol pt.I, pp 43–56

  8. Fomeni F, Letchford A (2014) A dynamic programming heuristic for the quadratic knapsack problem. INFORMS J Comput 26(1):173–182

    Article  MathSciNet  Google Scholar 

  9. Hong C, Zhu J (2013) Hypergraph-based multi-example ranking with sparse representation for transductive learning image retrieval. Neurocomputing 101:94–103

    Article  Google Scholar 

  10. Hu Y, Mian AS, Owens R (2011) Sparse approximated nearest points for image set classification. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on, pp 121–128

  11. Jayasekara S, Dassanayake H, Fernando A (2013) A novel image retrieval system based on histogram factorization and contextual similarity learning. Appl Mech Mater 380:4148–4151

    Article  Google Scholar 

  12. Khan I, Saffari A, Bischof H (2009) Tvgraz: Multi-modal learning of object categories by combining textual and visual features. In: AAPR Workshop, pp 213–224

  13. Kim HJ, Kim JU, Ra YG (2005) Boosting naïve bayes text classification using uncertainty-based selective sampling. Neurocomputing 67(1–4 SUPPL.):403–410

    Article  Google Scholar 

  14. Lee KS, Nurzid Rosli A, Ariesthea Supandi I, Jo GS (2014) Dynamic sampling-based interpolation algorithm for representation of clickable moving object in collaborative video annotation. Neurocomputing 146:291–300

    Article  Google Scholar 

  15. Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on Multimedia, pp 604–611

  16. Liu F, Yang G, Yin Y, Wang S (2014) Singular value decomposition based minutiae matching method for finger vein recognition. Neurocomputing 145:75–89

    Article  Google Scholar 

  17. Liu H, Li S (2013) Decision fusion of sparse representation and support vector machine for sar image target recognition. Neurocomputing 113:97–104

    Article  Google Scholar 

  18. Lumini A, Nanni L (2006) An advanced multi-modal method for human authentication featuring biometrics data and tokenised random numbers. Neurocomputing 69(13–15):1706–1710

    Article  Google Scholar 

  19. Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: 11th Annual conference on neural information processing systems (NIPS 1997), pp 570–576

  20. Masci J, Bronstein M, Bronstein A, Schmidhuber J (2014) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830

    Article  Google Scholar 

  21. Merkl D (1998) Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1–3):61–77

    Article  Google Scholar 

  22. Miao P, Shen Y, Xia X (2014) Finite time dual neural networks with a tunable activation function for solving quadratic programming problems and its application. Neurocomputing 143:80–89

    Article  Google Scholar 

  23. Oh K, Oh BS, Toh KA, Yau WY, Eng HL (2014) Combining sclera and periocular features for multi-modal identity verification. Neurocomputing 128:185–198

    Article  Google Scholar 

  24. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the international conference on Multimedia, ACM, pp 251–260

  25. Szymczyk P, Szymczyk M (2015) Classification of geological structure using ground penetrating radar and laplace transform artificial neural networks. Neurocomputing 148:354–362

    Article  Google Scholar 

  26. Vidar EA, Alvindia SK (2013) SVD based graph regularized matrix factorization. In: Intelligent Data Engineering and Automated Learning-IDEAL 2013, Springer, pp 234–241

  27. Wang D, Wu J, Zhang H, Xu K, Lin M (2013) Towards enhancing centroid classifier for text classification-a border-instance approach. Neurocomputing 101:299–308

    Article  Google Scholar 

  28. Wang J, Li Y, Zhang Y, Xie H, Wang C (2011) Bag-of-features based classification of breast parenchymal tissue in the mammogram via jointly selecting and weighting visual words. In: Image and Graphics (ICIG), 2011 Sixth International Conference on IEEE, pp 622–627

  29. Wang R, Guo H, Davis LS, Dai Q (2012) Covariance discriminative learning: a natural and efficient approach to image set classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, pp 2496–2503

  30. Wang Y, Guan L, Venetsanopoulos AN (2011) Kernel cross-modal factor analysis for multimodal information fusion. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on IEEE, pp 2384–2387

  31. Xing B, Zhang K, Sun S, Zhang L, Gao Z, Wang J, Chen S (2015) Emotion-driven chinese folk music-image retrieval based on de-svm. Neurocomputing 148:619–627

    Article  Google Scholar 

  32. Yu J, Cong Y, Qin Z, Wan T (2012) Cross-modal topic correlations for multimedia retrieval. In: 2012 21st international conference on pattern recognition (ICPR 2012), pp 246–249

  33. Zhang H, Lv S, Li W, Qu X (2014) A novel face recognition method using nearest line projection. J Comput 9(8):1952–1958

    Google Scholar 

  34. Zhang X, Xu Z, Jia N, Yang W, Feng Q, Chen W, Feng Y (2015) Denoising of 3d magnetic resonance images by using higher-order singular value decomposition. Med Image Anal 19(1):75–86

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by an open research program of the Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxin Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, K., Zhang, H. & Wang, J.JY. Joint learning of cross-modal classifier and factor analysis for multimedia data classification. Neural Comput & Applic 27, 459–468 (2016). https://doi.org/10.1007/s00521-015-1866-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-015-1866-3

Keywords

Navigation