Porn streamer audio recognition based on deep learning and random Forest

Liu, Shangfeng; Li, Ruwei; Li, Qiuyan; Zhao, Jingyu

doi:10.1007/s10489-023-04491-x

Porn streamer audio recognition based on deep learning and random Forest

Published: 13 February 2023

Volume 53, pages 18857–18867, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shangfeng Liu¹,
Ruwei Li¹,
Qiuyan Li¹ &
…
Jingyu Zhao¹

323 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The existing porn streamers audio recognition algorithms show poor performance in increasingly complex network environment. To resolve this problem, a porn streamer audio recognition algorithm based on deep learning and random forest is proposed. In this algorithm, a more stable complementary feature is first proposed, which consists of Log Mel Spectrum (LMS), Mel Frequency Cepstrum Coefficient (MFCC) and Gammatone Frequency Cepstrum Coefficient (GFCC), and the Dual-Path Fused Transformer Net (DPFTNet) network structure is then proposed for sound classification, which parallelizes the two main modules of the Swin Transformer, so that more feature details can be retained. Finally, the random forest is utilized to identify porn streamer. The experimental results show that this algorithm has higher recognition accuracy than the comparison algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 4

Fig. 5

Fig. 8

Classifying Audio Music Genres Using CNN and RNN

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Ensemble of convolutional neural networks to improve animal audio classification

Article Open access 26 May 2020

References

Lykousas N, Gómez V, Patsakis C (2018, August) Adult content in social live streaming services: characterizing deviant users and relationships. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE. pp. 375-382. https://doi.org/10.1109/ASONAM.2018.8508246
Wang L, Zhang J, Tian Q, Li C, Zhuo L (2019) Porn streamer recognition in live video streaming via attention-gated multimodal deep features. IEEE Trans Circuits Syst Video Technol 30(12):4876–4886. https://doi.org/10.1109/TCSVT.2019.2958871
Article Google Scholar
Bosson A, Cawley GC, Chan Y, Harvey R (2002, July) Non-retrieval: blocking pornographic images. In International conference on image and video retrieval. Springer, Berlin, Heidelberg. pp. 50-60. https://doi.org/10.1007/3-540-45479-9_6
Kim CY, Kwon OJ, Kim WG, Choi SR (2008, February) Automatic system for filtering obscene video. In: 2008 10th international conference on advanced communication technology. IEEE. (Vol. 2, pp. 1435-1438). https://doi.org/10.1109/ICACT.2008.4494034
Liu Z, Wang Y, Chen T (1998) Audio feature extraction and analysis for scene segmentation and classification. J VLSI Signal Process Syst Signal Image Video Technol 20(1):61–79. https://doi.org/10.1023/A:1008066223044
Article Google Scholar
Qu Z, Yu J, Niu Q (2010, December) Pornographic audios detection using MFCC features and vector quantization. In: 2010 international conference on computational and information sciences. IEEE. (pp. 924-927). https://doi.org/10.1109/ICCIS.2010.228
Kim MJ, Kim H (2011, June) Automatic extraction of pornographic contents using radon transform based audio features. In: 2011 9th international workshop on content-based multimedia indexing (CBMI). IEEE. pp. 205-210. https://doi.org/10.1109/CBMI.2011.5972546
Lim J, Choi B, Han S, Lee C, Chung B (2011, April) Classification and detection of objectionable sounds using repeated curve-like spectrum feature. In: 2011 international conference on information science and applications. IEEE. pp. 1-5. https://doi.org/10.1109/ICISA.2011.5772400
Banaeeyan R, Karim HA, Mansour S, See J (2019, November) Acoustic pornography recognition using fused pitch and Mel-frequency Cepstrum coefficients. In: International conference on advanced science, Engineering and Technology, MMU Engineering Conference, MECON
Wazir ASB, Karim HA, Abdullah MHL, Mansor S (2019, September) Acoustic pornography recognition using recurrent neural network. In: 2019 IEEE international conference on signal and image processing applications (ICSIPA). IEEE. pp. 144-148. https://doi.org/10.1109/ICSIPA45851.2019.8977794
Lin SY, Chen YL (2021, July) Attention-based multi-filter convolutional neural network for porn speech detection. In: 2021 international joint conference on neural networks (IJCNN). IEEE. pp. 1-10. https://doi.org/10.1109/IJCNN52387.2021.9533827
Zhou L, Wei K, Li Y, Hao Y, Yang W, Zhu H (2022) Acoustic pornography recognition using convolutional neural networks and bag of refinements. arXiv preprint arXiv:2211.05983. https://doi.org/10.48550/arXiv.2211.05983
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770-778. https://doi.org/10.48550/arXiv.1512.03385
Lovenia H, Lestari DP, Frieske R (2022, September) What did I just hear? Detecting pornographic sounds in adult videos using neural networks. In: Proceedings of the 17th international audio mostly conference. pp. 92-95. https://doi.org/10.1145/3561212.3561244
XiaoYun L, Hongxia W (2010) Abnormal audio recognition algorithm based on MFCC and short-term energy [J][J]. J Comput Appl 30(3):796–798
Google Scholar
Geiger JT, Helwani K (2015, August). Improving event detection for audio surveillance using gabor filterbank features. In: 2015 23rd European signal processing conference (EUSIPCO). IEEE. pp. 714-718. https://doi.org/10.1109/EUSIPCO.2015.7362476
Kranthi Kumar L, Alphonse PJA (2022) COVID-19 disease diagnosis with light-weight CNN using modified MFCC and enhanced GFCC from human respiratory sounds. Eur Phys J Spec Top 231:3329–3346. https://doi.org/10.1140/epjs/s11734-022-00432-w
Article Google Scholar
Xu H, Lin L, Sun X, Jin H (2012, May) A new algorithm for auditory feature extraction. In: 2012 international conference on communication systems and network technologies. IEEE. pp. 229-232. https://doi.org/10.1109/CSNT.2012.57
Lim M, Lee D, Park H et al (2018) Convolutional neural network based audio event classification. KSII Trans Internet Inf Syst (TIIS) 12(6):2748–2760. https://doi.org/10.3837/tiis.2018.06.017
Article Google Scholar
Xie J, Hu K, Zhu M, Yu J, Zhu Q (2019) Investigation of different CNN-based models for improved bird sound classification. IEEE Access 7:175353–175361. https://doi.org/10.1109/ACCESS.2019.2957572
Article Google Scholar
Chen Y, Guo Q, Liang X, Wang J, Qian Y (2019) Environmental sound classification with dilated convolutions. Appl Acoust 148:123–132. https://doi.org/10.1016/j.apacoust.2018.12.019
Article Google Scholar
Zhang X, Zou Y, Shi W (2017, August) Dilated convolution neural network with LeakyReLU for environmental sound classification. In: 2017 22nd international conference on digital signal processing (DSP). IEEE. pp. 1-5. https://doi.org/10.1109/ICDSP.2017.8096153
Liu M, Wang W, Li Y (2019) The system for acoustic scene classification using resnet. Technical Report, DCASE 2019
Miyazaki K, Komatsu T, Hayashi T, Watanabe S, Toda T, Takeda K (2020, June) Convolution augmented transformer for semi-supervised sound event detection. In: Proc. workshop detection classification Acoust. Scenes events (DCASE). pp. 100-104
Kong Q, Xu Y, Wang W, Plumbley MD (2020) Sound event detection of weakly labelled data with CNN-transformer and automatic threshold optimization. IEEE/ACM Trans Audio Speech Lang Process 28:2450–2460. https://doi.org/10.1109/TASLP.2020.3014737
Article Google Scholar
Gulati A, Qin J, Chiu CC, Parmar N, Zhang Y, Yu J, ..., Pang, R. (2020) Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100. https://doi.org/10.48550/arXiv.2005.08100
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, ..., Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, ..., Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61971016).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, China
Shangfeng Liu, Ruwei Li, Qiuyan Li & Jingyu Zhao

Authors

Shangfeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ruwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiuyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jingyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruwei Li.

Ethics declarations

Funding and conflicts of interests

We the authors of this manuscript entitled “Porn Streamer Audio Recognition Based on Deep Learning and Random Forest” declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, S., Li, R., Li, Q. et al. Porn streamer audio recognition based on deep learning and random Forest. Appl Intell 53, 18857–18867 (2023). https://doi.org/10.1007/s10489-023-04491-x

Download citation

Accepted: 27 January 2023
Published: 13 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10489-023-04491-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Porn streamer audio recognition based on deep learning and random Forest

Abstract

Access this article

Similar content being viewed by others

Classifying Audio Music Genres Using CNN and RNN

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Ensemble of convolutional neural networks to improve animal audio classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding and conflicts of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Porn streamer audio recognition based on deep learning and random Forest

Abstract

Access this article

Similar content being viewed by others

Classifying Audio Music Genres Using CNN and RNN

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Ensemble of convolutional neural networks to improve animal audio classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding and conflicts of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation