Skip to main content

Advertisement

Log in

An object detection-based few-shot learning approach for multimedia quality assessment

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

A large portion of the global population generates various multimedia data such as texts, images, videos, etc. One of the most common categories which influences the public at large is visual multimedia content. Due to the different social media platforms (e.g., Whatsapp, Twitter, Facebook, Instagram, and YouTube), these materials are passed without censorship and national boundaries. Multimedia data containing any violent or vulgar objects could trigger public unrest, and thus, it is a serious threat to the law and order of the land. Children and teenagers use social media like never before in previous generations and create lots of multimedia data. It is important to assess the quality of multimedia content without any bias and prejudices. Although the mainstream social media platforms use different filters and moderation using human experts, it is impossible to verify the terabytes of uploaded images and videos. Thus, it is inevitable to automate the content assessment phase without incurring an increase in upload time. This study aims to prevent uploading or to tag an image/video with a reasonable percentage of a gun as content. In this paper, object detection architectures such as Faster RCNN, EfficientDet, and YOLOv5 have been used to demonstrate how these techniques can efficiently detect human faces and different types of guns in given multimedia data (images/videos). The models are tested on various test images and video clips. A comparative analysis has also been discussed based on mean average precision and frames per second metric. The YOLOv5 provides the best-performing results as high as 80.39% and 35.22% at \(\text{mAP}_{0.5}\) and \(\text{mAP}_{[0.50:0.95]}\), respectively. A face recognition task requires thousands of samples and the usual deep learning models are data-driven. On the contrary, a few-shot learning approach has been implemented to recognize the detected faces categorizing the content as real or reel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. YOLO:Real-Time Object Detectionhttps://pjreddie.com/darknet/yolo/.

  2. VoTT: https://github.com/microsoft/vott.

  3. mAP: https://github.com/Cartucho/mAP.

  4. FRCNN: https://github.com/kbardool/keras-frcnn.

  5. EfficientDet: https://github.com/xuannianz/EfficientDet.

  6. YOLOv5s: https://github.com/ultralytics/yolov5.

  7. Collateral (2004): https://www.youtube.com/watch?v=EMS4lYA-hEo.

References

  1. Adorjan, M., Ricciardelli, R.: Smartphone and social media addiction: exploring the perceptions and experiences of Canadian teenagers. Can. Rev. Sociol./Revue canadienne de sociologie 58(1), 45–64 (2021)

    Article  Google Scholar 

  2. Van den Eijnden, R.J.J.M., Lemmens, J.S., Valkenburg, P.M.: The social media disorder scale. Comput. Hum. Behav. 61, 478–487 (2016)

    Article  Google Scholar 

  3. Fabris, M.A., Marengo, D., Longobardi, C., Settanni, M.: Investigating the links between fear of missing out, social media addiction, and emotional symptoms in adolescence: the role of stress associated with neglect and negative reactions on social media. Addict. Behav. 106, 106364 (2020)

    Article  Google Scholar 

  4. Jaffe, S.: Decisions to be made on us gun violence research funds. Lancet 395(10222), 403–404 (2020)

    Article  Google Scholar 

  5. Smith, M.E., Sharpe, T.L., Richardson, J., Pahwa, R., Smith, D., DeVylder, J.: The impact of exposure to gun violence fatality on mental health outcomes in four urban us settings. Soc. Sci. Med. 246, 112587 (2020)

    Article  Google Scholar 

  6. Two Delhi teens upload photos with guns on social media, land in police net (2019). https://www.hindustantimes.com/delhi-news/two-delhi-teens-upload-photos-with-guns-on-social-media-land-in-police-net/story-RoB0IZweeGGqbaQ1OyAbbK.html. Accessed 1 Sept 2020

  7. Delhi police nabs man for brandishing gun, posting picture on whatsapp (2020). https://www.indiatoday.in/crime/story/delhi-police-nabs-man-brandishing-gun-posting-picture-whatsapp-1655753-2020-03-15. Accessed 1 Sept 2020

  8. Posting pics with licenced guns on social media can land you (2017). https://timesofindia.indiatimes.com/city/hubballi/posting-pics-with-licenced-guns-on-social-media-can-land-you-in-jail/articleshow/61512798.cms. Accessed 1 Sept 2020

  9. Patton, D.U., Frey, W.R., Gaskell, M.: Guns on social media: complex interpretations of gun images posted by Chicago youth. Palgrave Commun. 5(1), 1–8 (2019)

    Article  Google Scholar 

  10. The hired guns of Instagram (2019). https://www.vox.com/features/2019/6/19/18644129/instagram-gun-influencers-second-amendment-tactical-community. Accessed 1 Sept 2020

  11. Liu, L., Dzyabura, D., Mizik, N.: Visual listening in: extracting brand image portrayed on social media. Mark. Sci. 39(4), 669–686 (2020)

    Article  Google Scholar 

  12. Zhang, Zhenhua, He, Qing, Gao, Jing, Ni, Ming: A deep learning approach for detecting traffic accidents from social media data. Transp. Res. Part C Emerg. Technol. 86, 580–596 (2018)

    Article  Google Scholar 

  13. Nguyen, D.T., Alam, F., Ofli, F., Imran, M.: Automatic image filtering on social networks using deep learning and perceptual hashing during crises (2017). arXiv preprint arXiv:1704.02602

  14. Garimella, V.R.K., Alfayad, A., Weber, I.: Social media image analysis for public health. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5543–5547 (2016)

  15. Egiazarov, A., Mavroeidis, V., Zennaro, F.M., Kamer, V.: Firearm detection and segmentation using an ensemble of semantic neural networks. In: 2019 European Intelligence and Security Informatics Conference (EISIC), pp. 70–77. IEEE (2019)

  16. Akçay, S., Kundegorski, M.E., Devereux, M., Breckon, T.P. : Transfer learning using convolutional neural networks for object classification within x-ray baggage security imagery. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1057–1061. IEEE (2016)

  17. Olmos, R., Tabik, S., Herrera, F.: Automatic handgun detection alarm in videos using deep learning. Neurocomputing 275, 66–72 (2018)

    Article  Google Scholar 

  18. Halder, R., Chatterjee, R.: CNN-BiLSTM model for violence detection in smart surveillance. SN Comput. Sci. 1(4), 1–9 (2020)

    Article  Google Scholar 

  19. Yolov5 (2020). https://zenodo.org/record/3983579#.X1EIAsgzY2w. Accessed 25 Aug 2020

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  21. Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)

  22. Region proposal network (2017). https://blog.deepsense.ai/region-of-interest-pooling-explained/. Accessed 20 May 2020

  23. Wu, X., Sahoo, D., Zhang, D., Zhu, J., Hoi, S.C.H.: Single-shot bidirectional pyramid networks for high-quality object detection. Neurocomputing (2020)

  24. Efficientnet (2019). https://keras.io/api/applications/efficientnet/. Accessed 20 May 2020

  25. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  26. Padilla, R., Netto, S.L., da Silva, E.A.B.: Survey on performance metrics for object-detection algorithms. In: International Conference on Systems, Signals and Image Processing (IWSSIP) (2020)

  27. Paul, H., Ferrari, V.: End-to-end training of object class detectors for mean average precision. In: Asian Conference on Computer Vision, pp. 198–213. Springer, Berlin (2016)

  28. Revaud, J., Almazán, J., Rezende, R.S., de Souza, C.R.: Learning with average precision: training image retrieval with a listwise loss. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5107–5116 (2019)

  29. Average precision (2020). https://github.com/rafaelpadilla/Object-Detection-Metrics. Accessed 25 July 2020

  30. Rezatofighi, H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union (2019)

  31. Rezatofighi, S.H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I.D., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression (2019). CoRR. arXiv:1902.09630

  32. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)

  33. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  34. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  35. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). arXiv preprint. arXiv:1804.02767

  36. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934

  37. Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.H.: CSPNeT: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)

  38. Yang, J., Fu, X., Hu, Y., Huang, Y., Ding, X., Paisley, J.: PanNet: a deep network architecture for pan-sharpening. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5449–5457 (2017)

  39. Internet movie firearms database (2008). http://www.imfdb.org/wiki/Main_Page. Accessed 17 May 2020

  40. Wider face dataset (2017). http://shuoyang1213.me/WIDERFACE/. Accessed 20 May 2020

Download references

Acknowledgements

The work of Dr. Muhammad Khurram Khan is supported by Researchers Supporting Project number (RSP-2021/12), King Saud University, Riyadh, Saudi Arabia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to SK Hafizul Islam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chatterjee, R., Chatterjee, A., Islam, S. et al. An object detection-based few-shot learning approach for multimedia quality assessment. Multimedia Systems 29, 2899–2912 (2023). https://doi.org/10.1007/s00530-021-00881-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00881-8

Keywords

Navigation