Two-class 3D-CNN classifiers combination for video copy detection

Li, Jing; Zhang, Huaxiang; Wan, Wenbo; Sun, Jiande

doi:10.1007/s11042-018-6047-9

Two-class 3D-CNN classifiers combination for video copy detection

Published: 04 May 2018

Volume 79, pages 4749–4761, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jing Li ORCID: orcid.org/0000-0002-9132-6684^1,2,
Huaxiang Zhang¹,
Wenbo Wan¹ &
…
Jiande Sun¹

1037 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

3D-CNN is the latest CNN model used for video classification. However, the required amount of computation and training data for training 3D-CNN, especially for complex classification tasks with large video data, hinders the wide application of 3D-CNN. In this paper, inspired by the exclusion method in human’s judgment, a parallel 3D-CNN architecture is proposed to decompose the multi-class classification task using one 3D-CNN into the combination of multiple two-class classification tasks. 3D-CNN is used as a two-class classifier for each of the two-class classification tasks, and the difficulty and the data requirement on training such a 3D-CNN is reduced greatly comparing with the 3D-CNN for multi-class classification. In addition, the combination of two-class classifiers provides the ability of recognizing unknown class to the proposed 3D-CNN model. The feasibility of this proposed 3D-CNN model is verified via its application on video copy detection on the CC_WEB_VIDEO dataset. The experimental results show the potentiality of the proposed parallel two-class 3D-CNN model in video classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compact CNN Based Video Representation for Efficient Video Copy Detection

Video Copy Detection Based on Deep CNN Features and Graph-Based Sequence Matching

Article 02 March 2018

Performance Characterization of 2D CNN Features for Partial Video Copy Detection

References

CC WEB VIDEO: Near-Duplicate Web Video Dataset. http://vireo.cs.cityu.edu.hk/webvideo
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197
Article Google Scholar
Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Article Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) ImageNet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Jiang YG, Ye G, Chang SF, Ellis D, Loui AC (2011) Consumer video understanding:a benchmark database and an evaluation of human and machine performance. In: ACM international conference on multimedia retrieval, ACM, pp 1-8
Jiang YG, Wu Z, Wang J, Xue X, Chang SF (2018) Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352–364
Article Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, IEEE Computer Society, pp 1725-1732
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical Report
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, 60, 1097-1105 Curran Associates Inc.
Le CP, Viard-Gaudin C, Barba D (2006) A convolutional neural network approach for objective video quality assessment. IEEE Trans Neural Netw 17(5):1316–1327
Article Google Scholar
Li FF, Fergus R, Perona P (2005) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Pattern Recognition Workshop, 2004. CVPRW ’04. IEEE 106(1):59–70
Google Scholar
Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng PP(99):1–1
Google Scholar
Liu H, Sun Y, Li Y (2012) Modeling and path generation approaches for crowd simulation based on computational intelligence. Chin J Electron 21(4):636–641
Google Scholar
Luo M, Chang X, Nie L, Yi Y, Hauptmann AG, Zheng Q (2017) An adaptive semisupervised feature analysis for video semantic recognition. IEEE Transactions on Cybernetics 48(2):648–660
Article Google Scholar
Maturana D, Scherer S (2015) VoxNet: A 3D convolutional neural network for real-time object recognition. In: Ieee/rsj international conference on intelligent robots and systems, IEEE, pp 922-928
Mei S, Ji J, Hou J, Li X, Du Q (2017) Learning sensor-specific spatial-spectral features of hyperspectral images via convolutional neural networks. IEEE Trans Geosci Remote Sens 55(8):4520–4533
Article Google Scholar
Shamma DA, Friedland G, Elizalde B, Ni K, Poland D et al (2016) YFCC100m: the new data in multimedia research. Commun ACM 59(2):64–73
Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild CRCV-TR-12-01. Computer Science
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with convolutions. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–9
Tu J, Wu Z, Dai Q, Jiang YG, Xue X (2014) Challenge Huawei challenge: fusing multimodal features with deep neural networks for mobile video annotation. In: Proceedings of IEEE international conference on multimedia and expo workshops, pp 1–6
Wang Y, Zhang H, Yang F (2017) A weighted sparse neighbourhood-preserving projections for face recognition. IETE J Res 63(3):358–367
Article Google Scholar
Wei Y, Wei X, Lin M, Huang J, Ni B, Dong J et al (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907
Article Google Scholar
Yu Q, Zhang H, Cheng L, Xiao D (2018) Katzmda: prediction of mirna-disease associations based on katz model. IEEE Access 6:3943–3950
Article Google Scholar
Zeng Z, Li Z, Cheng D, Zhang H, Zhan K, Yang Y (2017) Two-stream multi-rate recurrent neural network for video-based pedestrian re-identification. IEEE Trans Ind Inf PP(99):1–1
Google Scholar
Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Set Syst 161(13):1790–1802
Article MathSciNet Google Scholar
Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47(9):3168–3178
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China (61772322, 61601268,61572298), and Shandong Provincial Key Research and Development Plan (2017CXGC1504). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research. The corresponding authors are Huaxiang Zhang (huaxzhang@163.com) and Wenbo Wan (wanwenbo@sdnu.edu.cn).

Author information

Authors and Affiliations

School of Information Science and Technology, Shandong Normal University, Jinan, China
Jing Li, Huaxiang Zhang, Wenbo Wan & Jiande Sun
School of Mechanical and Electrical Engineering, Shandong Management University, Jinan, China
Jing Li

Authors

Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Huaxiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenbo Wan
View author publications
You can also search for this author in PubMed Google Scholar
Jiande Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Zhang, H., Wan, W. et al. Two-class 3D-CNN classifiers combination for video copy detection. Multimed Tools Appl 79, 4749–4761 (2020). https://doi.org/10.1007/s11042-018-6047-9

Download citation

Received: 20 March 2018
Revised: 13 April 2018
Accepted: 22 April 2018
Published: 04 May 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11042-018-6047-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-class 3D-CNN classifiers combination for video copy detection

Abstract

Access this article

Similar content being viewed by others

Compact CNN Based Video Representation for Efficient Video Copy Detection

Video Copy Detection Based on Deep CNN Features and Graph-Based Sequence Matching

Performance Characterization of 2D CNN Features for Partial Video Copy Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-class 3D-CNN classifiers combination for video copy detection

Abstract

Access this article

Similar content being viewed by others

Compact CNN Based Video Representation for Efficient Video Copy Detection

Video Copy Detection Based on Deep CNN Features and Graph-Based Sequence Matching

Performance Characterization of 2D CNN Features for Partial Video Copy Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation