Skip to main content

High-dimensional multimedia classification using deep CNN and extended residual units


Processing multimedia data has emerged as a key area for the application of machine learning methods Building a robust classification model to use in high dimensional space requires the combination of a deep feature extractor and a powerful classifier. We present a new classification pipeline to facilitate multimedia data analysis based on convolutional neural network and the modified residual network which can integrate with the other feedforward network style in an endwise training fashion. The proposed residual network is producing attention-aware features. We proposed a unified deep CNN model to achieve promising performance in classifying high dimensional multimedia data by getting the advantages of the residual network. In every residual module, up-down and vice-versa feedforward structure is implemented to unfold the feedforward and backward process into a unique process. The hybrid proposed model evaluated on four datasets and have been shown promising results which outperform the previous best results. Last but not the least, the proposed model achieves detection speeds that are much faster than other approaches.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Abdur R, Kashif J, Haroon AB, Mehreen S (2015) Relative discrimination criterion – A novel feature ranking method for text data. Expert Syst Appl 42(7):3670–3681

    Article  Google Scholar 

  2. Bianco S, Cusano C, Napoletano P, Schettini R (2017) Improving CNN-Based Texture Classification by Color Balancing. J Imaging 3:33

    Article  Google Scholar 

  3. Cheng D, Zhang S, Liu X, Sun K, Zong M (2017) Feature selection by combining subspace learning with sparse representation. Multimedia Systems 23(3):285–291

    Article  Google Scholar 

  4. Coates A, Lee H, Ng AY (2011) An analysis of single layer networks in unsupervised feature learning AISTATS

  5. Cui G, Yang J, Zareapoor M (2017) Unsupervised feature selection algorithm based on sparse representation. International Conference on Systems and Informatics, ICSAI 2016, p 1028–1033

  6. Cunningham JP, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. JMLR

  7. Daniel E, Lars H, Bernd H (2011) A survey of dimension reduction methods for high-dimensional data analysis and visualization. In VLUDS, pp 135–149

  8. Dominik S, Arthur F, Nenad T (2014) A case for hubness removal in high–dimensional multimedia retrieval. European Conference on Information Retrieval, Lecture Notes in Computer Science book series, vol 8416, p 687–692

  9. Du S, Liu J, Liu Y, Zhang X, Xue J (2017) Precise glasses detection algorithm for face with in-plane rotation. Multimedia Systems 23(3):293–302

    Article  Google Scholar 

  10. Fang W, Le K, Yi L (2015) Sketch-based 3d shape retrieval using convolution neural networks. In CVPR, 2015

  11. Gao L, Song J, Liu X, Shao J, Liu J, Shao J (2017) Learning in high-dimensional multimedia data: the state of the art. Multimedia Systems 23(3):303–313

    Article  Google Scholar 

  12. Girish C, Ferat S (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In CVPR

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, p 27–30

  15. He Y, Xueliang L, Richang H (2016) Image classification via fusing the latent deep CNN feature. Proceedings of the International Conference on Internet Multimedia Computing and Service, p 110–113

  16. Ian J (2002) Principal component analysis. Wiley Online Library, New York

    Google Scholar 

  17. Ionescu B, Lucian Gînsca A, Boteanu B, Popescu A, Lupu M, Müller H (2015) Retrieving diverse social images at MediaEval 2015: challenge, dataset and evaluation, MediaEval workshop

  18. Itti L, Koch C (2011) Computational modelling of visual attention. Nat Rev Neurosci 2:194–203

  19. Jiang W, Er GH, Dai QH, Gu JW (2006) Similarity-based online feature selection in content-based image retrieval. IEEE Trans Image Process 15:702–712

    Article  Google Scholar 

  20. Jianqing F, Yingying F (2008) High-dimensional classification using features annealed independence rules. Institute of Mathematical Statistics in the Annals of Statistics, vol 36(6), p 2605–2637

  21. Jingkuan S, Yi Y, Zi H, Heng TS, Jiebo L (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 15(8):1997–2008

    Article  Google Scholar 

  22. Jinguk J, Jongho N (2004) An efficient bitmap indexing method for similarity search in high dimensional multimedia databases. IEEE International Conference on Multimedia and Expo

  23. Juha R (2003) Overfitting in making comparisons between variable selection methods. JMLR 3:1371–1382

    MATH  Google Scholar 

  24. Kim KW, Hong HG, Nam GPP, Ark KR (2017) A Study of Deep CNN-Based Classification of Open and Closed Eyes Using a Visible Light Camera Sensor. Sensors 17:1534

    Article  Google Scholar 

  25. Lu C, Qu Y, Shi C, Fan J, Wu Y, Wang H (2015) Hierarchical learning for large-scale image classification via CNN and maximum confidence path. Conference on Advances in multimedia information processing, vol 9315, pp 236–245.

  26. Mikhail B, Partha N (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  27. Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In NIPS

  28. Napoletano P (2017) Hand-crafted vs learned descriptors for color texture classification. International workshop on computational color imaging. Springer, Berlin, pp 259–271

  29. Nie W, Cao Q, Liu A, Y S (2017) Convolutional deep learning for 3D object retrieval. Multimedia Systems 23(3):325–332

    Article  Google Scholar 

  30. Reuter T, Papadopoulos S, Mezaris V, Cimiano P (2014) ReSEED: social event dEtection dataset, MMSys '14 Proceedings of the 5th ACM Multimedia Systems Conference, 2014, p 35–40

  31. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  Google Scholar 

  32. Salah R, Pascal V, Xavier M, Xavier G, Yoshua B (2011) Contractive auto-encoders: explicit invariance during feature extraction. In ICML, pp 833–840

  33. Salakhutdinov R, Hinton GE (2009) Deep boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18; p 448–455

  34. Seeja KR, Zareapoor M (2014) FraudMiner: A novel credit card fraud detection model based on frequent itemset mining. Sci World J 2014:1–10

  35. Shamsolmoali P, Zareapoor M, Jain DK et al (2018) Deep convolution network for surveillance records super-resolution. Multimed Tools Appl.

  36. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ICLR

  37. Socher R, Huval B, Bath B, Manning CD, Ng AY (2012) Convolutional-recursive deep learning for 3D object classifcation. In: Advances in Neural Information Processing Systems. In: NIPS, p 665–673

  38. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In CVPR

  39. Uljarevic D, Veinovic M, Kunjadic G, Tepsic D (2017) A new way of covert communication by steganography via JPEG images within a Microsoft Word document. Multimedia Systems 23(3):333–341

    Article  Google Scholar 

  40. Walther D, Itti L, Riesenhuber M, Poggio T, Koch C (2002) Attentional selection for object recognitiona gentle way. In International Workshop on Biologically Motivated Computer Vision, pp 472–479. Springer

  41. Wei W, Yan H, Yizhou W, Liang W (2014) Generalized autoencoder: a neural network framework for dimensionality reduction. In CVPR Workshops, pp 496–503

  42. Yan Y, Chen M, Ling Shyu M, Ching Chen S (2015) Deep learning for imbalanced multimedia data classification. International Symposium on Multimedia, ISM, pp 483–488

  43. Yuanjun X, Kai Z, Dahua L, Xiaoou T (2015) Recognize complex events from static images by fusing deep channels, Computer Vision and Pattern Recognition (CVPR)

  44. Zareapoor M, Shamsolmoali P (2015) Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia Comp Sci 48(C):679–686

    Article  Google Scholar 

  45. Zareapoor M, Shamsolmoali P (2018) Boosting prediction performance on imbalanced dataset. Int J Inf Commun Technol 13(2):186–195

    Google Scholar 

  46. Zareapoor M, Yang J (2017) A novel strategy for mining highly imbalanced data in credit card transactions. Intell Autom Soft Comput.

  47. Zareapoor M, Shamsolmoali P, Kumar DJ, Wang H, Yang J (2017) Kernelized support vector machine with deep learning: An efficient approach for extreme multiclass dataset. Pattern Recogn Lett.

  48. Zhao B, Wu X, Feng J, Peng Q, Yan S (2016) Diversified visual attention networks for fine-grained object classification. arXiv preprint arXiv:1606.08572

  49. Zhicheng Z, Rui X, Fei S (2018) Complex event detection via attention-based video representation and classification. Multimed Tools Appl 77(3):3209–3227

  50. Zhou W, Newsam S, Li C, Shao Z (2017) Learning Low Dimensional Convolutional Neural Networks for High-Resolution Remote Sensing Image Retrieval. Remote Sens 9(5):489–508

    Article  Google Scholar 

  51. Zhu Y, Liang Z, Liu X, Sun K (2017) Self-representation graph feature selection method for classification. Multimedia Systems 23(3):351–356

    Article  Google Scholar 

  52. Zhu X, Jin Z, Ji R (2017) Learning high-dimensional multimedia data. Multimedia Systems 23(3):281–283

    Article  Google Scholar 

Download references


This research is partly supported by NSFC, China (No: 61572315) and Committee of Science and Technology, Shanghai, China (No: 17JC1403000).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Pourya Shamsolmoali.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shamsolmoali, P., Kumar Jain, D., Zareapoor, M. et al. High-dimensional multimedia classification using deep CNN and extended residual units. Multimed Tools Appl 78, 23867–23882 (2019).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • High dimensional
  • Multimedia data classification
  • Deep learning
  • Feature extraction
  • Residual network