Multimedia Tools and Applications

, Volume 78, Issue 17, pp 23867–23882 | Cite as

High-dimensional multimedia classification using deep CNN and extended residual units

  • Pourya ShamsolmoaliEmail author
  • Deepak Kumar Jain
  • Masoumeh Zareapoor
  • Jie Yang
  • M. Afshar Alam


Processing multimedia data has emerged as a key area for the application of machine learning methods Building a robust classification model to use in high dimensional space requires the combination of a deep feature extractor and a powerful classifier. We present a new classification pipeline to facilitate multimedia data analysis based on convolutional neural network and the modified residual network which can integrate with the other feedforward network style in an endwise training fashion. The proposed residual network is producing attention-aware features. We proposed a unified deep CNN model to achieve promising performance in classifying high dimensional multimedia data by getting the advantages of the residual network. In every residual module, up-down and vice-versa feedforward structure is implemented to unfold the feedforward and backward process into a unique process. The hybrid proposed model evaluated on four datasets and have been shown promising results which outperform the previous best results. Last but not the least, the proposed model achieves detection speeds that are much faster than other approaches.


High dimensional Multimedia data classification Deep learning Feature extraction Residual network 



This research is partly supported by NSFC, China (No: 61572315) and Committee of Science and Technology, Shanghai, China (No: 17JC1403000).


  1. 1.
    Abdur R, Kashif J, Haroon AB, Mehreen S (2015) Relative discrimination criterion – A novel feature ranking method for text data. Expert Syst Appl 42(7):3670–3681CrossRefGoogle Scholar
  2. 2.
    Bianco S, Cusano C, Napoletano P, Schettini R (2017) Improving CNN-Based Texture Classification by Color Balancing. J Imaging 3:33CrossRefGoogle Scholar
  3. 3.
    Cheng D, Zhang S, Liu X, Sun K, Zong M (2017) Feature selection by combining subspace learning with sparse representation. Multimedia Systems 23(3):285–291CrossRefGoogle Scholar
  4. 4.
    Coates A, Lee H, Ng AY (2011) An analysis of single layer networks in unsupervised feature learning AISTATSGoogle Scholar
  5. 5.
    Cui G, Yang J, Zareapoor M (2017) Unsupervised feature selection algorithm based on sparse representation. International Conference on Systems and Informatics, ICSAI 2016, p 1028–1033Google Scholar
  6. 6.
    Cunningham JP, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. JMLRGoogle Scholar
  7. 7.
    Daniel E, Lars H, Bernd H (2011) A survey of dimension reduction methods for high-dimensional data analysis and visualization. In VLUDS, pp 135–149Google Scholar
  8. 8.
    Dominik S, Arthur F, Nenad T (2014) A case for hubness removal in high–dimensional multimedia retrieval. European Conference on Information Retrieval, Lecture Notes in Computer Science book series, vol 8416, p 687–692Google Scholar
  9. 9.
    Du S, Liu J, Liu Y, Zhang X, Xue J (2017) Precise glasses detection algorithm for face with in-plane rotation. Multimedia Systems 23(3):293–302CrossRefGoogle Scholar
  10. 10.
    Fang W, Le K, Yi L (2015) Sketch-based 3d shape retrieval using convolution neural networks. In CVPR, 2015Google Scholar
  11. 11.
    Gao L, Song J, Liu X, Shao J, Liu J, Shao J (2017) Learning in high-dimensional multimedia data: the state of the art. Multimedia Systems 23(3):303–313CrossRefGoogle Scholar
  12. 12.
    Girish C, Ferat S (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRefGoogle Scholar
  13. 13.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In CVPRGoogle Scholar
  14. 14.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, p 27–30Google Scholar
  15. 15.
    He Y, Xueliang L, Richang H (2016) Image classification via fusing the latent deep CNN feature. Proceedings of the International Conference on Internet Multimedia Computing and Service, p 110–113Google Scholar
  16. 16.
    Ian J (2002) Principal component analysis. Wiley Online Library, New YorkGoogle Scholar
  17. 17.
    Ionescu B, Lucian Gînsca A, Boteanu B, Popescu A, Lupu M, Müller H (2015) Retrieving diverse social images at MediaEval 2015: challenge, dataset and evaluation, MediaEval workshopGoogle Scholar
  18. 18.
    Itti L, Koch C (2011) Computational modelling of visual attention. Nat Rev Neurosci 2:194–203Google Scholar
  19. 19.
    Jiang W, Er GH, Dai QH, Gu JW (2006) Similarity-based online feature selection in content-based image retrieval. IEEE Trans Image Process 15:702–712CrossRefGoogle Scholar
  20. 20.
    Jianqing F, Yingying F (2008) High-dimensional classification using features annealed independence rules. Institute of Mathematical Statistics in the Annals of Statistics, vol 36(6), p 2605–2637Google Scholar
  21. 21.
    Jingkuan S, Yi Y, Zi H, Heng TS, Jiebo L (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 15(8):1997–2008CrossRefGoogle Scholar
  22. 22.
    Jinguk J, Jongho N (2004) An efficient bitmap indexing method for similarity search in high dimensional multimedia databases. IEEE International Conference on Multimedia and ExpoGoogle Scholar
  23. 23.
    Juha R (2003) Overfitting in making comparisons between variable selection methods. JMLR 3:1371–1382zbMATHGoogle Scholar
  24. 24.
    Kim KW, Hong HG, Nam GPP, Ark KR (2017) A Study of Deep CNN-Based Classification of Open and Closed Eyes Using a Visible Light Camera Sensor. Sensors 17:1534CrossRefGoogle Scholar
  25. 25.
    Lu C, Qu Y, Shi C, Fan J, Wu Y, Wang H (2015) Hierarchical learning for large-scale image classification via CNN and maximum confidence path. Conference on Advances in multimedia information processing, vol 9315, pp 236–245.
  26. 26.
    Mikhail B, Partha N (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396zbMATHCrossRefGoogle Scholar
  27. 27.
    Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In NIPSGoogle Scholar
  28. 28.
    Napoletano P (2017) Hand-crafted vs learned descriptors for color texture classification. International workshop on computational color imaging. Springer, Berlin, pp 259–271Google Scholar
  29. 29.
    Nie W, Cao Q, Liu A, Y S (2017) Convolutional deep learning for 3D object retrieval. Multimedia Systems 23(3):325–332CrossRefGoogle Scholar
  30. 30.
    Reuter T, Papadopoulos S, Mezaris V, Cimiano P (2014) ReSEED: social event dEtection dataset, MMSys '14 Proceedings of the 5th ACM Multimedia Systems Conference, 2014, p 35–40Google Scholar
  31. 31.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRefGoogle Scholar
  32. 32.
    Salah R, Pascal V, Xavier M, Xavier G, Yoshua B (2011) Contractive auto-encoders: explicit invariance during feature extraction. In ICML, pp 833–840Google Scholar
  33. 33.
    Salakhutdinov R, Hinton GE (2009) Deep boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18; p 448–455Google Scholar
  34. 34.
    Seeja KR, Zareapoor M (2014) FraudMiner: A novel credit card fraud detection model based on frequent itemset mining. Sci World J 2014:1–10Google Scholar
  35. 35.
    Shamsolmoali P, Zareapoor M, Jain DK et al (2018) Deep convolution network for surveillance records super-resolution. Multimed Tools Appl.
  36. 36.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ICLRGoogle Scholar
  37. 37.
    Socher R, Huval B, Bath B, Manning CD, Ng AY (2012) Convolutional-recursive deep learning for 3D object classifcation. In: Advances in Neural Information Processing Systems. In: NIPS, p 665–673Google Scholar
  38. 38.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In CVPRGoogle Scholar
  39. 39.
    Uljarevic D, Veinovic M, Kunjadic G, Tepsic D (2017) A new way of covert communication by steganography via JPEG images within a Microsoft Word document. Multimedia Systems 23(3):333–341CrossRefGoogle Scholar
  40. 40.
    Walther D, Itti L, Riesenhuber M, Poggio T, Koch C (2002) Attentional selection for object recognitiona gentle way. In International Workshop on Biologically Motivated Computer Vision, pp 472–479. SpringerGoogle Scholar
  41. 41.
    Wei W, Yan H, Yizhou W, Liang W (2014) Generalized autoencoder: a neural network framework for dimensionality reduction. In CVPR Workshops, pp 496–503Google Scholar
  42. 42.
    Yan Y, Chen M, Ling Shyu M, Ching Chen S (2015) Deep learning for imbalanced multimedia data classification. International Symposium on Multimedia, ISM, pp 483–488Google Scholar
  43. 43.
    Yuanjun X, Kai Z, Dahua L, Xiaoou T (2015) Recognize complex events from static images by fusing deep channels, Computer Vision and Pattern Recognition (CVPR)Google Scholar
  44. 44.
    Zareapoor M, Shamsolmoali P (2015) Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia Comp Sci 48(C):679–686CrossRefGoogle Scholar
  45. 45.
    Zareapoor M, Shamsolmoali P (2018) Boosting prediction performance on imbalanced dataset. Int J Inf Commun Technol 13(2):186–195Google Scholar
  46. 46.
    Zareapoor M, Yang J (2017) A novel strategy for mining highly imbalanced data in credit card transactions. Intell Autom Soft Comput.
  47. 47.
    Zareapoor M, Shamsolmoali P, Kumar DJ, Wang H, Yang J (2017) Kernelized support vector machine with deep learning: An efficient approach for extreme multiclass dataset. Pattern Recogn Lett.
  48. 48.
    Zhao B, Wu X, Feng J, Peng Q, Yan S (2016) Diversified visual attention networks for fine-grained object classification. arXiv preprint arXiv:1606.08572Google Scholar
  49. 49.
    Zhicheng Z, Rui X, Fei S (2018) Complex event detection via attention-based video representation and classification. Multimed Tools Appl 77(3):3209–3227Google Scholar
  50. 50.
    Zhou W, Newsam S, Li C, Shao Z (2017) Learning Low Dimensional Convolutional Neural Networks for High-Resolution Remote Sensing Image Retrieval. Remote Sens 9(5):489–508CrossRefGoogle Scholar
  51. 51.
    Zhu Y, Liang Z, Liu X, Sun K (2017) Self-representation graph feature selection method for classification. Multimedia Systems 23(3):351–356CrossRefGoogle Scholar
  52. 52.
    Zhu X, Jin Z, Ji R (2017) Learning high-dimensional multimedia data. Multimedia Systems 23(3):281–283CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Image Processing & Pattern RecognitionShanghai Jiao Tong UniversityShanghaiChina
  2. 2.Institute of AutomationChinese Academy of SciencesBeijingChina
  3. 3.Department of Computer Science & EngineeringJamia Hamdard UniversityNew DelhiIndia

Personalised recommendations