Abstract
In this paper, we study the problem of learning from multiple model data for the purpose of document classification. In this problem, each document is composed of two different models of data, i.e., an image and a text. We propose to represent the data of two models by projecting them to a shared data space by using cross-model factor analysis formula and classify them in the shared space by using a linear class label predictor, named cross-model classifier. The parameters of both cross-model classifier and cross-model factor analysis are learned jointly, so that they can regularize the learning of each other. We construct a unified objective function for this learning problem. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projections measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple model document data sets show the advantage of the proposed algorithm over state-of-the-art multimedia data classification methods.
Similar content being viewed by others
References
Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: 16th Annual neural information processing systems conference (NIPS 2002), pp 561–568
Berghöfer E, Schulze D, Rauch C, Tscherepanow M, Khler T, Wachsmuth S (2013) Art-based fusion of multi-modal perception for robots. Neurocomputing 107:11–22
Caicedo J, BenAbdallah J, González F, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60
Carenzi F, Bendahan P, Roschin V, Frolov A, Gorce P, Maier M (2004) A generic neural network for multi-modal sensorimotor learning. Neurocomputing 58–60:525–533
Chen Y, Wang L, Wang W, Zhang Z (2012) Continuum regression for cross-modal multimedia retrieval. In: 2012 19th IEEE international conference on image processing (ICIP 2012), pp 1949–1952
Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet G, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Deng J, Du L, Shen YD (2013) Heterogeneous metric learning for cross-modal multimedia retrieval. In: Web information systems engineering—WISE 2013. 14th International conference. proceedings: LNCS 8180, vol pt.I, pp 43–56
Fomeni F, Letchford A (2014) A dynamic programming heuristic for the quadratic knapsack problem. INFORMS J Comput 26(1):173–182
Hong C, Zhu J (2013) Hypergraph-based multi-example ranking with sparse representation for transductive learning image retrieval. Neurocomputing 101:94–103
Hu Y, Mian AS, Owens R (2011) Sparse approximated nearest points for image set classification. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on, pp 121–128
Jayasekara S, Dassanayake H, Fernando A (2013) A novel image retrieval system based on histogram factorization and contextual similarity learning. Appl Mech Mater 380:4148–4151
Khan I, Saffari A, Bischof H (2009) Tvgraz: Multi-modal learning of object categories by combining textual and visual features. In: AAPR Workshop, pp 213–224
Kim HJ, Kim JU, Ra YG (2005) Boosting naïve bayes text classification using uncertainty-based selective sampling. Neurocomputing 67(1–4 SUPPL.):403–410
Lee KS, Nurzid Rosli A, Ariesthea Supandi I, Jo GS (2014) Dynamic sampling-based interpolation algorithm for representation of clickable moving object in collaborative video annotation. Neurocomputing 146:291–300
Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on Multimedia, pp 604–611
Liu F, Yang G, Yin Y, Wang S (2014) Singular value decomposition based minutiae matching method for finger vein recognition. Neurocomputing 145:75–89
Liu H, Li S (2013) Decision fusion of sparse representation and support vector machine for sar image target recognition. Neurocomputing 113:97–104
Lumini A, Nanni L (2006) An advanced multi-modal method for human authentication featuring biometrics data and tokenised random numbers. Neurocomputing 69(13–15):1706–1710
Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: 11th Annual conference on neural information processing systems (NIPS 1997), pp 570–576
Masci J, Bronstein M, Bronstein A, Schmidhuber J (2014) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830
Merkl D (1998) Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1–3):61–77
Miao P, Shen Y, Xia X (2014) Finite time dual neural networks with a tunable activation function for solving quadratic programming problems and its application. Neurocomputing 143:80–89
Oh K, Oh BS, Toh KA, Yau WY, Eng HL (2014) Combining sclera and periocular features for multi-modal identity verification. Neurocomputing 128:185–198
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the international conference on Multimedia, ACM, pp 251–260
Szymczyk P, Szymczyk M (2015) Classification of geological structure using ground penetrating radar and laplace transform artificial neural networks. Neurocomputing 148:354–362
Vidar EA, Alvindia SK (2013) SVD based graph regularized matrix factorization. In: Intelligent Data Engineering and Automated Learning-IDEAL 2013, Springer, pp 234–241
Wang D, Wu J, Zhang H, Xu K, Lin M (2013) Towards enhancing centroid classifier for text classification-a border-instance approach. Neurocomputing 101:299–308
Wang J, Li Y, Zhang Y, Xie H, Wang C (2011) Bag-of-features based classification of breast parenchymal tissue in the mammogram via jointly selecting and weighting visual words. In: Image and Graphics (ICIG), 2011 Sixth International Conference on IEEE, pp 622–627
Wang R, Guo H, Davis LS, Dai Q (2012) Covariance discriminative learning: a natural and efficient approach to image set classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, pp 2496–2503
Wang Y, Guan L, Venetsanopoulos AN (2011) Kernel cross-modal factor analysis for multimodal information fusion. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on IEEE, pp 2384–2387
Xing B, Zhang K, Sun S, Zhang L, Gao Z, Wang J, Chen S (2015) Emotion-driven chinese folk music-image retrieval based on de-svm. Neurocomputing 148:619–627
Yu J, Cong Y, Qin Z, Wan T (2012) Cross-modal topic correlations for multimedia retrieval. In: 2012 21st international conference on pattern recognition (ICPR 2012), pp 246–249
Zhang H, Lv S, Li W, Qu X (2014) A novel face recognition method using nearest line projection. J Comput 9(8):1952–1958
Zhang X, Xu Z, Jia N, Yang W, Feng Q, Chen W, Feng Y (2015) Denoising of 3d magnetic resonance images by using higher-order singular value decomposition. Med Image Anal 19(1):75–86
Acknowledgments
This work was supported by an open research program of the Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, China.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Duan, K., Zhang, H. & Wang, J.JY. Joint learning of cross-modal classifier and factor analysis for multimedia data classification. Neural Comput & Applic 27, 459–468 (2016). https://doi.org/10.1007/s00521-015-1866-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-015-1866-3