Skip to main content
Log in

Two-class 3D-CNN classifiers combination for video copy detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

3D-CNN is the latest CNN model used for video classification. However, the required amount of computation and training data for training 3D-CNN, especially for complex classification tasks with large video data, hinders the wide application of 3D-CNN. In this paper, inspired by the exclusion method in human’s judgment, a parallel 3D-CNN architecture is proposed to decompose the multi-class classification task using one 3D-CNN into the combination of multiple two-class classification tasks. 3D-CNN is used as a two-class classifier for each of the two-class classification tasks, and the difficulty and the data requirement on training such a 3D-CNN is reduced greatly comparing with the 3D-CNN for multi-class classification. In addition, the combination of two-class classifiers provides the ability of recognizing unknown class to the proposed 3D-CNN model. The feasibility of this proposed 3D-CNN model is verified via its application on video copy detection on the CC_WEB_VIDEO dataset. The experimental results show the potentiality of the proposed parallel two-class 3D-CNN model in video classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. CC WEB VIDEO: Near-Duplicate Web Video Dataset. http://vireo.cs.cityu.edu.hk/webvideo

  2. Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197

    Article  Google Scholar 

  3. Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632

    Article  Google Scholar 

  4. Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) ImageNet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255

  5. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology

  6. Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  7. Jiang YG, Ye G, Chang SF, Ellis D, Loui AC (2011) Consumer video understanding:a benchmark database and an evaluation of human and machine performance. In: ACM international conference on multimedia retrieval, ACM, pp 1-8

  8. Jiang YG, Wu Z, Wang J, Xue X, Chang SF (2018) Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352–364

    Article  Google Scholar 

  9. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, IEEE Computer Society, pp 1725-1732

  10. Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical Report

  11. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, 60, 1097-1105 Curran Associates Inc.

  12. Le CP, Viard-Gaudin C, Barba D (2006) A convolutional neural network approach for objective video quality assessment. IEEE Trans Neural Netw 17(5):1316–1327

    Article  Google Scholar 

  13. Li FF, Fergus R, Perona P (2005) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Pattern Recognition Workshop, 2004. CVPRW ’04. IEEE 106(1):59–70

    Google Scholar 

  14. Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng PP(99):1–1

    Google Scholar 

  15. Liu H, Sun Y, Li Y (2012) Modeling and path generation approaches for crowd simulation based on computational intelligence. Chin J Electron 21(4):636–641

    Google Scholar 

  16. Luo M, Chang X, Nie L, Yi Y, Hauptmann AG, Zheng Q (2017) An adaptive semisupervised feature analysis for video semantic recognition. IEEE Transactions on Cybernetics 48(2):648–660

    Article  Google Scholar 

  17. Maturana D, Scherer S (2015) VoxNet: A 3D convolutional neural network for real-time object recognition. In: Ieee/rsj international conference on intelligent robots and systems, IEEE, pp 922-928

  18. Mei S, Ji J, Hou J, Li X, Du Q (2017) Learning sensor-specific spatial-spectral features of hyperspectral images via convolutional neural networks. IEEE Trans Geosci Remote Sens 55(8):4520–4533

    Article  Google Scholar 

  19. Shamma DA, Friedland G, Elizalde B, Ni K, Poland D et al (2016) YFCC100m: the new data in multimedia research. Commun ACM 59(2):64–73

    Article  Google Scholar 

  20. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild CRCV-TR-12-01. Computer Science

  21. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with convolutions. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–9

  22. Tu J, Wu Z, Dai Q, Jiang YG, Xue X (2014) Challenge Huawei challenge: fusing multimodal features with deep neural networks for mobile video annotation. In: Proceedings of IEEE international conference on multimedia and expo workshops, pp 1–6

  23. Wang Y, Zhang H, Yang F (2017) A weighted sparse neighbourhood-preserving projections for face recognition. IETE J Res 63(3):358–367

    Article  Google Scholar 

  24. Wei Y, Wei X, Lin M, Huang J, Ni B, Dong J et al (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907

    Article  Google Scholar 

  25. Yu Q, Zhang H, Cheng L, Xiao D (2018) Katzmda: prediction of mirna-disease associations based on katz model. IEEE Access 6:3943–3950

    Article  Google Scholar 

  26. Zeng Z, Li Z, Cheng D, Zhang H, Zhan K, Yang Y (2017) Two-stream multi-rate recurrent neural network for video-based pedestrian re-identification. IEEE Trans Ind Inf PP(99):1–1

    Google Scholar 

  27. Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Set Syst 161(13):1790–1802

    Article  MathSciNet  Google Scholar 

  28. Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47(9):3168–3178

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China (61772322, 61601268,61572298), and Shandong Provincial Key Research and Development Plan (2017CXGC1504). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research. The corresponding authors are Huaxiang Zhang (huaxzhang@163.com) and Wenbo Wan (wanwenbo@sdnu.edu.cn).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhang, H., Wan, W. et al. Two-class 3D-CNN classifiers combination for video copy detection. Multimed Tools Appl 79, 4749–4761 (2020). https://doi.org/10.1007/s11042-018-6047-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6047-9

Keywords

Navigation