An object detection-based few-shot learning approach for multimedia quality assessment

Chatterjee, Rajdeep; Chatterjee, Ankita; Islam, SK Hafizul; Khan, Muhammad Khurram

doi:10.1007/s00530-021-00881-8

An object detection-based few-shot learning approach for multimedia quality assessment

Special Issue Paper
Published: 29 January 2022

Volume 29, pages 2899–2912, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Rajdeep Chatterjee¹,
Ankita Chatterjee²,
SK Hafizul Islam ORCID: orcid.org/0000-0002-2703-0213³ &
…
Muhammad Khurram Khan⁴

630 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

A large portion of the global population generates various multimedia data such as texts, images, videos, etc. One of the most common categories which influences the public at large is visual multimedia content. Due to the different social media platforms (e.g., Whatsapp, Twitter, Facebook, Instagram, and YouTube), these materials are passed without censorship and national boundaries. Multimedia data containing any violent or vulgar objects could trigger public unrest, and thus, it is a serious threat to the law and order of the land. Children and teenagers use social media like never before in previous generations and create lots of multimedia data. It is important to assess the quality of multimedia content without any bias and prejudices. Although the mainstream social media platforms use different filters and moderation using human experts, it is impossible to verify the terabytes of uploaded images and videos. Thus, it is inevitable to automate the content assessment phase without incurring an increase in upload time. This study aims to prevent uploading or to tag an image/video with a reasonable percentage of a gun as content. In this paper, object detection architectures such as Faster RCNN, EfficientDet, and YOLOv5 have been used to demonstrate how these techniques can efficiently detect human faces and different types of guns in given multimedia data (images/videos). The models are tested on various test images and video clips. A comparative analysis has also been discussed based on mean average precision and frames per second metric. The YOLOv5 provides the best-performing results as high as 80.39% and 35.22% at \(\text{mAP}_{0.5}\) and \(\text{mAP}_{[0.50:0.95]}\), respectively. A face recognition task requires thousands of samples and the usual deep learning models are data-driven. On the contrary, a few-shot learning approach has been implemented to recognize the detected faces categorizing the content as real or reel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive evaluation of feature-based AI techniques for deepfake detection

Article 14 December 2023

A Mobile Application Using Deep Learning to Automatically Classify Adult-Only Images

A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos

Article Open access 05 January 2022

Notes

YOLO:Real-Time Object Detectionhttps://pjreddie.com/darknet/yolo/.
VoTT: https://github.com/microsoft/vott.
mAP: https://github.com/Cartucho/mAP.
FRCNN: https://github.com/kbardool/keras-frcnn.
EfficientDet: https://github.com/xuannianz/EfficientDet.
YOLOv5s: https://github.com/ultralytics/yolov5.
Collateral (2004): https://www.youtube.com/watch?v=EMS4lYA-hEo.

References

Adorjan, M., Ricciardelli, R.: Smartphone and social media addiction: exploring the perceptions and experiences of Canadian teenagers. Can. Rev. Sociol./Revue canadienne de sociologie 58(1), 45–64 (2021)
Article Google Scholar
Van den Eijnden, R.J.J.M., Lemmens, J.S., Valkenburg, P.M.: The social media disorder scale. Comput. Hum. Behav. 61, 478–487 (2016)
Article Google Scholar
Fabris, M.A., Marengo, D., Longobardi, C., Settanni, M.: Investigating the links between fear of missing out, social media addiction, and emotional symptoms in adolescence: the role of stress associated with neglect and negative reactions on social media. Addict. Behav. 106, 106364 (2020)
Article Google Scholar
Jaffe, S.: Decisions to be made on us gun violence research funds. Lancet 395(10222), 403–404 (2020)
Article Google Scholar
Smith, M.E., Sharpe, T.L., Richardson, J., Pahwa, R., Smith, D., DeVylder, J.: The impact of exposure to gun violence fatality on mental health outcomes in four urban us settings. Soc. Sci. Med. 246, 112587 (2020)
Article Google Scholar
Two Delhi teens upload photos with guns on social media, land in police net (2019). https://www.hindustantimes.com/delhi-news/two-delhi-teens-upload-photos-with-guns-on-social-media-land-in-police-net/story-RoB0IZweeGGqbaQ1OyAbbK.html. Accessed 1 Sept 2020
Delhi police nabs man for brandishing gun, posting picture on whatsapp (2020). https://www.indiatoday.in/crime/story/delhi-police-nabs-man-brandishing-gun-posting-picture-whatsapp-1655753-2020-03-15. Accessed 1 Sept 2020
Posting pics with licenced guns on social media can land you (2017). https://timesofindia.indiatimes.com/city/hubballi/posting-pics-with-licenced-guns-on-social-media-can-land-you-in-jail/articleshow/61512798.cms. Accessed 1 Sept 2020
Patton, D.U., Frey, W.R., Gaskell, M.: Guns on social media: complex interpretations of gun images posted by Chicago youth. Palgrave Commun. 5(1), 1–8 (2019)
Article Google Scholar
The hired guns of Instagram (2019). https://www.vox.com/features/2019/6/19/18644129/instagram-gun-influencers-second-amendment-tactical-community. Accessed 1 Sept 2020
Liu, L., Dzyabura, D., Mizik, N.: Visual listening in: extracting brand image portrayed on social media. Mark. Sci. 39(4), 669–686 (2020)
Article Google Scholar
Zhang, Zhenhua, He, Qing, Gao, Jing, Ni, Ming: A deep learning approach for detecting traffic accidents from social media data. Transp. Res. Part C Emerg. Technol. 86, 580–596 (2018)
Article Google Scholar
Nguyen, D.T., Alam, F., Ofli, F., Imran, M.: Automatic image filtering on social networks using deep learning and perceptual hashing during crises (2017). arXiv preprint arXiv:1704.02602
Garimella, V.R.K., Alfayad, A., Weber, I.: Social media image analysis for public health. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5543–5547 (2016)
Egiazarov, A., Mavroeidis, V., Zennaro, F.M., Kamer, V.: Firearm detection and segmentation using an ensemble of semantic neural networks. In: 2019 European Intelligence and Security Informatics Conference (EISIC), pp. 70–77. IEEE (2019)
Akçay, S., Kundegorski, M.E., Devereux, M., Breckon, T.P. : Transfer learning using convolutional neural networks for object classification within x-ray baggage security imagery. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1057–1061. IEEE (2016)
Olmos, R., Tabik, S., Herrera, F.: Automatic handgun detection alarm in videos using deep learning. Neurocomputing 275, 66–72 (2018)
Article Google Scholar
Halder, R., Chatterjee, R.: CNN-BiLSTM model for violence detection in smart surveillance. SN Comput. Sci. 1(4), 1–9 (2020)
Article Google Scholar
Yolov5 (2020). https://zenodo.org/record/3983579#.X1EIAsgzY2w. Accessed 25 Aug 2020
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Region proposal network (2017). https://blog.deepsense.ai/region-of-interest-pooling-explained/. Accessed 20 May 2020
Wu, X., Sahoo, D., Zhang, D., Zhu, J., Hoi, S.C.H.: Single-shot bidirectional pyramid networks for high-quality object detection. Neurocomputing (2020)
Efficientnet (2019). https://keras.io/api/applications/efficientnet/. Accessed 20 May 2020
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Padilla, R., Netto, S.L., da Silva, E.A.B.: Survey on performance metrics for object-detection algorithms. In: International Conference on Systems, Signals and Image Processing (IWSSIP) (2020)
Paul, H., Ferrari, V.: End-to-end training of object class detectors for mean average precision. In: Asian Conference on Computer Vision, pp. 198–213. Springer, Berlin (2016)
Revaud, J., Almazán, J., Rezende, R.S., de Souza, C.R.: Learning with average precision: training image retrieval with a listwise loss. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5107–5116 (2019)
Average precision (2020). https://github.com/rafaelpadilla/Object-Detection-Metrics. Accessed 25 July 2020
Rezatofighi, H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union (2019)
Rezatofighi, S.H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I.D., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression (2019). CoRR. arXiv:1902.09630
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). arXiv preprint. arXiv:1804.02767
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934
Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.H.: CSPNeT: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)
Yang, J., Fu, X., Hu, Y., Huang, Y., Ding, X., Paisley, J.: PanNet: a deep network architecture for pan-sharpening. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5449–5457 (2017)
Internet movie firearms database (2008). http://www.imfdb.org/wiki/Main_Page. Accessed 17 May 2020
Wider face dataset (2017). http://shuoyang1213.me/WIDERFACE/. Accessed 20 May 2020

Download references

Acknowledgements

The work of Dr. Muhammad Khurram Khan is supported by Researchers Supporting Project number (RSP-2021/12), King Saud University, Riyadh, Saudi Arabia.

Author information

Authors and Affiliations

School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, 751024, India
Rajdeep Chatterjee
School of Electrical Sciences, Indian Institute of Technology Bhubaneswar, Khordha, Odisha, 752050, India
Ankita Chatterjee
Department of Computer Science and Engineering, Indian Institute of Information Technology Kalyani, West Bengal, Kalyani, 741235, India
SK Hafizul Islam
Center of Excellence in Information Assurance, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Muhammad Khurram Khan

Authors

Rajdeep Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Ankita Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
SK Hafizul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Khurram Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to SK Hafizul Islam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chatterjee, R., Chatterjee, A., Islam, S. et al. An object detection-based few-shot learning approach for multimedia quality assessment. Multimedia Systems 29, 2899–2912 (2023). https://doi.org/10.1007/s00530-021-00881-8

Download citation

Received: 26 August 2021
Accepted: 06 December 2021
Published: 29 January 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00530-021-00881-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An object detection-based few-shot learning approach for multimedia quality assessment

Abstract

Access this article

Similar content being viewed by others

A comprehensive evaluation of feature-based AI techniques for deepfake detection

A Mobile Application Using Deep Learning to Automatically Classify Adult-Only Images

A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An object detection-based few-shot learning approach for multimedia quality assessment

Abstract

Access this article

Similar content being viewed by others

A comprehensive evaluation of feature-based AI techniques for deepfake detection

A Mobile Application Using Deep Learning to Automatically Classify Adult-Only Images

A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation