Joint learning of cross-modal classifier and factor analysis for multimedia data classification

Duan, Kanghong; Zhang, Hongxin; Wang, Jim Jing-Yan

doi:10.1007/s00521-015-1866-3

Joint learning of cross-modal classifier and factor analysis for multimedia data classification

Original Article
Published: 05 March 2015

Volume 27, pages 459–468, (2016)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Kanghong Duan¹,
Hongxin Zhang¹ &
Jim Jing-Yan Wang^2,3

367 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, we study the problem of learning from multiple model data for the purpose of document classification. In this problem, each document is composed of two different models of data, i.e., an image and a text. We propose to represent the data of two models by projecting them to a shared data space by using cross-model factor analysis formula and classify them in the shared space by using a linear class label predictor, named cross-model classifier. The parameters of both cross-model classifier and cross-model factor analysis are learned jointly, so that they can regularize the learning of each other. We construct a unified objective function for this learning problem. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projections measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple model document data sets show the advantage of the proposed algorithm over state-of-the-art multimedia data classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

k-Labelsets for Multimedia Classification with Global and Local Label Correlation

Kronecker Decomposition for Image Classification

Direct Multi-label Linear Discriminant Analysis

References

Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: 16th Annual neural information processing systems conference (NIPS 2002), pp 561–568
Berghöfer E, Schulze D, Rauch C, Tscherepanow M, Khler T, Wachsmuth S (2013) Art-based fusion of multi-modal perception for robots. Neurocomputing 107:11–22
Article Google Scholar
Caicedo J, BenAbdallah J, González F, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60
Article Google Scholar
Carenzi F, Bendahan P, Roschin V, Frolov A, Gorce P, Maier M (2004) A generic neural network for multi-modal sensorimotor learning. Neurocomputing 58–60:525–533
Article Google Scholar
Chen Y, Wang L, Wang W, Zhang Z (2012) Continuum regression for cross-modal multimedia retrieval. In: 2012 19th IEEE international conference on image processing (ICIP 2012), pp 1949–1952
Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet G, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Article Google Scholar
Deng J, Du L, Shen YD (2013) Heterogeneous metric learning for cross-modal multimedia retrieval. In: Web information systems engineering—WISE 2013. 14th International conference. proceedings: LNCS 8180, vol pt.I, pp 43–56
Fomeni F, Letchford A (2014) A dynamic programming heuristic for the quadratic knapsack problem. INFORMS J Comput 26(1):173–182
Article MathSciNet Google Scholar
Hong C, Zhu J (2013) Hypergraph-based multi-example ranking with sparse representation for transductive learning image retrieval. Neurocomputing 101:94–103
Article Google Scholar
Hu Y, Mian AS, Owens R (2011) Sparse approximated nearest points for image set classification. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on, pp 121–128
Jayasekara S, Dassanayake H, Fernando A (2013) A novel image retrieval system based on histogram factorization and contextual similarity learning. Appl Mech Mater 380:4148–4151
Article Google Scholar
Khan I, Saffari A, Bischof H (2009) Tvgraz: Multi-modal learning of object categories by combining textual and visual features. In: AAPR Workshop, pp 213–224
Kim HJ, Kim JU, Ra YG (2005) Boosting naïve bayes text classification using uncertainty-based selective sampling. Neurocomputing 67(1–4 SUPPL.):403–410
Article Google Scholar
Lee KS, Nurzid Rosli A, Ariesthea Supandi I, Jo GS (2014) Dynamic sampling-based interpolation algorithm for representation of clickable moving object in collaborative video annotation. Neurocomputing 146:291–300
Article Google Scholar
Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on Multimedia, pp 604–611
Liu F, Yang G, Yin Y, Wang S (2014) Singular value decomposition based minutiae matching method for finger vein recognition. Neurocomputing 145:75–89
Article Google Scholar
Liu H, Li S (2013) Decision fusion of sparse representation and support vector machine for sar image target recognition. Neurocomputing 113:97–104
Article Google Scholar
Lumini A, Nanni L (2006) An advanced multi-modal method for human authentication featuring biometrics data and tokenised random numbers. Neurocomputing 69(13–15):1706–1710
Article Google Scholar
Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: 11th Annual conference on neural information processing systems (NIPS 1997), pp 570–576
Masci J, Bronstein M, Bronstein A, Schmidhuber J (2014) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830
Article Google Scholar
Merkl D (1998) Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1–3):61–77
Article Google Scholar
Miao P, Shen Y, Xia X (2014) Finite time dual neural networks with a tunable activation function for solving quadratic programming problems and its application. Neurocomputing 143:80–89
Article Google Scholar
Oh K, Oh BS, Toh KA, Yau WY, Eng HL (2014) Combining sclera and periocular features for multi-modal identity verification. Neurocomputing 128:185–198
Article Google Scholar
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the international conference on Multimedia, ACM, pp 251–260
Szymczyk P, Szymczyk M (2015) Classification of geological structure using ground penetrating radar and laplace transform artificial neural networks. Neurocomputing 148:354–362
Article Google Scholar
Vidar EA, Alvindia SK (2013) SVD based graph regularized matrix factorization. In: Intelligent Data Engineering and Automated Learning-IDEAL 2013, Springer, pp 234–241
Wang D, Wu J, Zhang H, Xu K, Lin M (2013) Towards enhancing centroid classifier for text classification-a border-instance approach. Neurocomputing 101:299–308
Article Google Scholar
Wang J, Li Y, Zhang Y, Xie H, Wang C (2011) Bag-of-features based classification of breast parenchymal tissue in the mammogram via jointly selecting and weighting visual words. In: Image and Graphics (ICIG), 2011 Sixth International Conference on IEEE, pp 622–627
Wang R, Guo H, Davis LS, Dai Q (2012) Covariance discriminative learning: a natural and efficient approach to image set classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, pp 2496–2503
Wang Y, Guan L, Venetsanopoulos AN (2011) Kernel cross-modal factor analysis for multimodal information fusion. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on IEEE, pp 2384–2387
Xing B, Zhang K, Sun S, Zhang L, Gao Z, Wang J, Chen S (2015) Emotion-driven chinese folk music-image retrieval based on de-svm. Neurocomputing 148:619–627
Article Google Scholar
Yu J, Cong Y, Qin Z, Wan T (2012) Cross-modal topic correlations for multimedia retrieval. In: 2012 21st international conference on pattern recognition (ICPR 2012), pp 246–249
Zhang H, Lv S, Li W, Qu X (2014) A novel face recognition method using nearest line projection. J Comput 9(8):1952–1958
Google Scholar
Zhang X, Xu Z, Jia N, Yang W, Feng Q, Chen W, Feng Y (2015) Denoising of 3d magnetic resonance images by using higher-order singular value decomposition. Med Image Anal 19(1):75–86
Article Google Scholar

Download references

Acknowledgments

This work was supported by an open research program of the Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, China.

Author information

Authors and Affiliations

North China Sea Marine Technical Support Center, State Oceanic Administration, Qingdao, 266033, China
Kanghong Duan & Hongxin Zhang
University at Buffalo, The State University of New York, Buffalo, NY, 14203, USA
Jim Jing-Yan Wang
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300072, China
Jim Jing-Yan Wang

Authors

Kanghong Duan
View author publications
You can also search for this author in PubMed Google Scholar
Hongxin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jim Jing-Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxin Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duan, K., Zhang, H. & Wang, J.JY. Joint learning of cross-modal classifier and factor analysis for multimedia data classification. Neural Comput & Applic 27, 459–468 (2016). https://doi.org/10.1007/s00521-015-1866-3

Download citation

Received: 22 December 2014
Accepted: 18 February 2015
Published: 05 March 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s00521-015-1866-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint learning of cross-modal classifier and factor analysis for multimedia data classification

Abstract

Access this article

Similar content being viewed by others

k-Labelsets for Multimedia Classification with Global and Local Label Correlation

Kronecker Decomposition for Image Classification

Direct Multi-label Linear Discriminant Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint learning of cross-modal classifier and factor analysis for multimedia data classification

Abstract

Access this article

Similar content being viewed by others

k-Labelsets for Multimedia Classification with Global and Local Label Correlation

Kronecker Decomposition for Image Classification

Direct Multi-label Linear Discriminant Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation