Skip to main content
Log in

Porn streamer audio recognition based on deep learning and random Forest

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The existing porn streamers audio recognition algorithms show poor performance in increasingly complex network environment. To resolve this problem, a porn streamer audio recognition algorithm based on deep learning and random forest is proposed. In this algorithm, a more stable complementary feature is first proposed, which consists of Log Mel Spectrum (LMS), Mel Frequency Cepstrum Coefficient (MFCC) and Gammatone Frequency Cepstrum Coefficient (GFCC), and the Dual-Path Fused Transformer Net (DPFTNet) network structure is then proposed for sound classification, which parallelizes the two main modules of the Swin Transformer, so that more feature details can be retained. Finally, the random forest is utilized to identify porn streamer. The experimental results show that this algorithm has higher recognition accuracy than the comparison algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Lykousas N, Gómez V, Patsakis C (2018, August) Adult content in social live streaming services: characterizing deviant users and relationships. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE. pp. 375-382. https://doi.org/10.1109/ASONAM.2018.8508246

  2. Wang L, Zhang J, Tian Q, Li C, Zhuo L (2019) Porn streamer recognition in live video streaming via attention-gated multimodal deep features. IEEE Trans Circuits Syst Video Technol 30(12):4876–4886. https://doi.org/10.1109/TCSVT.2019.2958871

    Article  Google Scholar 

  3. Bosson A, Cawley GC, Chan Y, Harvey R (2002, July) Non-retrieval: blocking pornographic images. In International conference on image and video retrieval. Springer, Berlin, Heidelberg. pp. 50-60. https://doi.org/10.1007/3-540-45479-9_6

  4. Kim CY, Kwon OJ, Kim WG, Choi SR (2008, February) Automatic system for filtering obscene video. In: 2008 10th international conference on advanced communication technology. IEEE. (Vol. 2, pp. 1435-1438). https://doi.org/10.1109/ICACT.2008.4494034

  5. Liu Z, Wang Y, Chen T (1998) Audio feature extraction and analysis for scene segmentation and classification. J VLSI Signal Process Syst Signal Image Video Technol 20(1):61–79. https://doi.org/10.1023/A:1008066223044

    Article  Google Scholar 

  6. Qu Z, Yu J, Niu Q (2010, December) Pornographic audios detection using MFCC features and vector quantization. In: 2010 international conference on computational and information sciences. IEEE. (pp. 924-927). https://doi.org/10.1109/ICCIS.2010.228

  7. Kim MJ, Kim H (2011, June) Automatic extraction of pornographic contents using radon transform based audio features. In: 2011 9th international workshop on content-based multimedia indexing (CBMI). IEEE. pp. 205-210. https://doi.org/10.1109/CBMI.2011.5972546

  8. Lim J, Choi B, Han S, Lee C, Chung B (2011, April) Classification and detection of objectionable sounds using repeated curve-like spectrum feature. In: 2011 international conference on information science and applications. IEEE. pp. 1-5. https://doi.org/10.1109/ICISA.2011.5772400

  9. Banaeeyan R, Karim HA, Mansour S, See J (2019, November) Acoustic pornography recognition using fused pitch and Mel-frequency Cepstrum coefficients. In: International conference on advanced science, Engineering and Technology, MMU Engineering Conference, MECON

  10. Wazir ASB, Karim HA, Abdullah MHL, Mansor S (2019, September) Acoustic pornography recognition using recurrent neural network. In: 2019 IEEE international conference on signal and image processing applications (ICSIPA). IEEE. pp. 144-148. https://doi.org/10.1109/ICSIPA45851.2019.8977794

  11. Lin SY, Chen YL (2021, July) Attention-based multi-filter convolutional neural network for porn speech detection. In: 2021 international joint conference on neural networks (IJCNN). IEEE. pp. 1-10. https://doi.org/10.1109/IJCNN52387.2021.9533827

  12. Zhou L, Wei K, Li Y, Hao Y, Yang W, Zhu H (2022) Acoustic pornography recognition using convolutional neural networks and bag of refinements. arXiv preprint arXiv:2211.05983. https://doi.org/10.48550/arXiv.2211.05983

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770-778. https://doi.org/10.48550/arXiv.1512.03385

  14. Lovenia H, Lestari DP, Frieske R (2022, September) What did I just hear? Detecting pornographic sounds in adult videos using neural networks. In: Proceedings of the 17th international audio mostly conference. pp. 92-95. https://doi.org/10.1145/3561212.3561244

  15. XiaoYun L, Hongxia W (2010) Abnormal audio recognition algorithm based on MFCC and short-term energy [J][J]. J Comput Appl 30(3):796–798

    Google Scholar 

  16. Geiger JT, Helwani K (2015, August). Improving event detection for audio surveillance using gabor filterbank features. In: 2015 23rd European signal processing conference (EUSIPCO). IEEE. pp. 714-718. https://doi.org/10.1109/EUSIPCO.2015.7362476

  17. Kranthi Kumar L, Alphonse PJA (2022) COVID-19 disease diagnosis with light-weight CNN using modified MFCC and enhanced GFCC from human respiratory sounds. Eur Phys J Spec Top 231:3329–3346. https://doi.org/10.1140/epjs/s11734-022-00432-w

    Article  Google Scholar 

  18. Xu H, Lin L, Sun X, Jin H (2012, May) A new algorithm for auditory feature extraction. In: 2012 international conference on communication systems and network technologies. IEEE. pp. 229-232. https://doi.org/10.1109/CSNT.2012.57

  19. Lim M, Lee D, Park H et al (2018) Convolutional neural network based audio event classification. KSII Trans Internet Inf Syst (TIIS) 12(6):2748–2760. https://doi.org/10.3837/tiis.2018.06.017

    Article  Google Scholar 

  20. Xie J, Hu K, Zhu M, Yu J, Zhu Q (2019) Investigation of different CNN-based models for improved bird sound classification. IEEE Access 7:175353–175361. https://doi.org/10.1109/ACCESS.2019.2957572

    Article  Google Scholar 

  21. Chen Y, Guo Q, Liang X, Wang J, Qian Y (2019) Environmental sound classification with dilated convolutions. Appl Acoust 148:123–132. https://doi.org/10.1016/j.apacoust.2018.12.019

    Article  Google Scholar 

  22. Zhang X, Zou Y, Shi W (2017, August) Dilated convolution neural network with LeakyReLU for environmental sound classification. In: 2017 22nd international conference on digital signal processing (DSP). IEEE. pp. 1-5. https://doi.org/10.1109/ICDSP.2017.8096153

  23. Liu M, Wang W, Li Y (2019) The system for acoustic scene classification using resnet. Technical Report, DCASE 2019

  24. Miyazaki K, Komatsu T, Hayashi T, Watanabe S, Toda T, Takeda K (2020, June) Convolution augmented transformer for semi-supervised sound event detection. In: Proc. workshop detection classification Acoust. Scenes events (DCASE). pp. 100-104

  25. Kong Q, Xu Y, Wang W, Plumbley MD (2020) Sound event detection of weakly labelled data with CNN-transformer and automatic threshold optimization. IEEE/ACM Trans Audio Speech Lang Process 28:2450–2460. https://doi.org/10.1109/TASLP.2020.3014737

    Article  Google Scholar 

  26. Gulati A, Qin J, Chiu CC, Parmar N, Zhang Y, Yu J, ..., Pang, R. (2020) Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100. https://doi.org/10.48550/arXiv.2005.08100

  27. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, ..., Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929

  28. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, ..., Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61971016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruwei Li.

Ethics declarations

Funding and conflicts of interests

We the authors of this manuscript entitled “Porn Streamer Audio Recognition Based on Deep Learning and Random Forest” declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S., Li, R., Li, Q. et al. Porn streamer audio recognition based on deep learning and random Forest. Appl Intell 53, 18857–18867 (2023). https://doi.org/10.1007/s10489-023-04491-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04491-x

Keywords

Navigation