A method for video categorization by analyzing text, audio, and frames

Amin, Hossain Md Al; Arefin, Mohammad Shamsul; Dhar, Pranab Kumar

doi:10.1007/s41870-019-00338-2

A method for video categorization by analyzing text, audio, and frames

Original Research
Published: 14 August 2019

Volume 12, pages 889–898, (2020)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

Hossain Md Al Amin¹,
Mohammad Shamsul Arefin¹ &
Pranab Kumar Dhar¹

253 Accesses
7 Citations
Explore all metrics

Abstract

A video file naturally contains audio, text metadata, and visual content in the form of frames, as it is a series of images with adequate motion. To get an efficient result in video categorization, it is necessary to use and analyze all the available resources. For this reason, in this paper we introduce a video categorization method by examining all the essential elements of video in the form of text, audio, and frames. The proposed method consists of three different modules. These modules are used for analyzing the text, audio, and visual contents to provide the analysis results, which are finally combined to get the final output. A set of fundamental properties are analyzed and compared with standard values acquired from training data set to understand the genre of the videos and eventually tagging it with the most probable category. Besides, we have conducted different tests using the proposed method and the simulation results show that the proposed method effectively categorizes the video sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Smith C (2017) 160 Amazing YouTube statistics. DMR, 28-Oct-2017. http://expandedramblings.com/index.php/youtube-statistics/. Accessed 04 Nov 2017 (online)
Tao D, Gong C, Liu W (2015) Beyond short snippets: deep networks for video classification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, USA, pp 4694–4702
Huang C, Fu T, Chen H (2010) Text-based video content classification for online video-sharing sites. J Am Soc Inf Sci Technol 61:891–906
Article Google Scholar
Cai D, He X, Li Z, Ma WY, Wen JR (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proceedings of the 12th annual ACM international conference on multimedia, New-York, USA, 10–16 October 2004, pp 952–959
Zha S, Luicier F, Andrews W, Srivastava N, Salakhutdinov R (2015) Exploiting image-trained CNN architectures for unconstrained video classification. In: 26th British machine vision conference (BMVC), Swansea, UK, 7–10 September 2015, pp 1–13
Meier DC, Meier U (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, 16–21 June 2012, pp 3642–3649
Lin L, Ravitz G, Shyu M, Chen S (2007) Video semantic concept discovery using multimodal-based association classification. In: IEEE international conference on multimedia and expo (ICME), Beijing, China, 2–5 July 2007, pp 859–862
Feng H, Shi R, Chua T-S (2004) A bootstrapping framework for annotating and retrieving WWW images. In: Proceedings of the 12th annual ACM international conference on multimedia, New-York, USA, 15–16 October 2004, pp 960–967
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition (CVPR), Columbus, Ohio, USA, 23–28 June 2014, pp 1725–1732
Tran D, Bourdev LD, Fergus R, Torresani L, Paluri M (2014) C3D: generic features for video analysis. CoRR. arXiv:1412.0767
Lin W-H, Hauptmann A (2002) News video classification using SVM-based multimodal classifiers and combination strategies. In: Proceedings 10th ACM international conference on multimedia, Huan-les-Pins, France, 1–6 December 2002, pp 323–326
Zhang R, Sarukkai R, Chow JH, Dai W, Zhang Z (2006) Joint categorization of queries and clips for web-based video search. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval, Santa-Barbara, California, USA, 26–27 October 2006, pp 193–202
Klaser A, Marszalek A, Schmid M (2008) A spatio-temporal descriptor based on 3d-gradients. In: Proceedings of the British machine vision conference (BMVC), Leeds, September 2008
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: Proc. IEEE conference on computer vision and pattern recognition, Miami, FL, USA, 20–25 June 2009
Cinbis I, Sclaroff N (2010) Object, scene and actions: combining multiple features for human action recognition. In: Proc. 11th European conference on computer vision, Heraklion, Crete, Greece, 5–11 September 2010, pp 494–507
Gong C, Tao D, Maybank SJ, Liu W, Kang G, Liu W (2016) Multi-modal curriculum learning for semi-supervised image classification. IEEE Trans Image Process 25(7):3249–3260
Article MathSciNet Google Scholar
Jiang Y-G, Zuxuan W, Wang J, Xue X, Chang S-F (2018) Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352–364
Article Google Scholar
Afzal M, Wu X, Chen H, Jiang YG, Peng Q (2016) Web video categorization using category-predictive classifiers and category-specific concept classifiers. Neurocomputing 214:175–190
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chittagong University of Engineering and Technology (CUET), Chittagong-4349, Bangladesh
Hossain Md Al Amin, Mohammad Shamsul Arefin & Pranab Kumar Dhar

Authors

Hossain Md Al Amin
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Shamsul Arefin
View author publications
You can also search for this author in PubMed Google Scholar
Pranab Kumar Dhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Shamsul Arefin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amin, H.M.A., Arefin, M.S. & Dhar, P.K. A method for video categorization by analyzing text, audio, and frames. Int. j. inf. tecnol. 12, 889–898 (2020). https://doi.org/10.1007/s41870-019-00338-2

Download citation

Received: 28 November 2018
Accepted: 29 July 2019
Published: 14 August 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s41870-019-00338-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A method for video categorization by analyzing text, audio, and frames

Abstract

Access this article

Similar content being viewed by others

Automatic Genre Classification from Videos

VideoToVecs: a new video representation based on deep learning techniques for video classification and clustering

A Novel Method to Classify Videos Based VBR Trace

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A method for video categorization by analyzing text, audio, and frames

Abstract

Access this article

Similar content being viewed by others

Automatic Genre Classification from Videos

VideoToVecs: a new video representation based on deep learning techniques for video classification and clustering

A Novel Method to Classify Videos Based VBR Trace

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation