Abstract
The existing porn streamers audio recognition algorithms show poor performance in increasingly complex network environment. To resolve this problem, a porn streamer audio recognition algorithm based on deep learning and random forest is proposed. In this algorithm, a more stable complementary feature is first proposed, which consists of Log Mel Spectrum (LMS), Mel Frequency Cepstrum Coefficient (MFCC) and Gammatone Frequency Cepstrum Coefficient (GFCC), and the Dual-Path Fused Transformer Net (DPFTNet) network structure is then proposed for sound classification, which parallelizes the two main modules of the Swin Transformer, so that more feature details can be retained. Finally, the random forest is utilized to identify porn streamer. The experimental results show that this algorithm has higher recognition accuracy than the comparison algorithm.
Similar content being viewed by others
References
Lykousas N, Gómez V, Patsakis C (2018, August) Adult content in social live streaming services: characterizing deviant users and relationships. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE. pp. 375-382. https://doi.org/10.1109/ASONAM.2018.8508246
Wang L, Zhang J, Tian Q, Li C, Zhuo L (2019) Porn streamer recognition in live video streaming via attention-gated multimodal deep features. IEEE Trans Circuits Syst Video Technol 30(12):4876–4886. https://doi.org/10.1109/TCSVT.2019.2958871
Bosson A, Cawley GC, Chan Y, Harvey R (2002, July) Non-retrieval: blocking pornographic images. In International conference on image and video retrieval. Springer, Berlin, Heidelberg. pp. 50-60. https://doi.org/10.1007/3-540-45479-9_6
Kim CY, Kwon OJ, Kim WG, Choi SR (2008, February) Automatic system for filtering obscene video. In: 2008 10th international conference on advanced communication technology. IEEE. (Vol. 2, pp. 1435-1438). https://doi.org/10.1109/ICACT.2008.4494034
Liu Z, Wang Y, Chen T (1998) Audio feature extraction and analysis for scene segmentation and classification. J VLSI Signal Process Syst Signal Image Video Technol 20(1):61–79. https://doi.org/10.1023/A:1008066223044
Qu Z, Yu J, Niu Q (2010, December) Pornographic audios detection using MFCC features and vector quantization. In: 2010 international conference on computational and information sciences. IEEE. (pp. 924-927). https://doi.org/10.1109/ICCIS.2010.228
Kim MJ, Kim H (2011, June) Automatic extraction of pornographic contents using radon transform based audio features. In: 2011 9th international workshop on content-based multimedia indexing (CBMI). IEEE. pp. 205-210. https://doi.org/10.1109/CBMI.2011.5972546
Lim J, Choi B, Han S, Lee C, Chung B (2011, April) Classification and detection of objectionable sounds using repeated curve-like spectrum feature. In: 2011 international conference on information science and applications. IEEE. pp. 1-5. https://doi.org/10.1109/ICISA.2011.5772400
Banaeeyan R, Karim HA, Mansour S, See J (2019, November) Acoustic pornography recognition using fused pitch and Mel-frequency Cepstrum coefficients. In: International conference on advanced science, Engineering and Technology, MMU Engineering Conference, MECON
Wazir ASB, Karim HA, Abdullah MHL, Mansor S (2019, September) Acoustic pornography recognition using recurrent neural network. In: 2019 IEEE international conference on signal and image processing applications (ICSIPA). IEEE. pp. 144-148. https://doi.org/10.1109/ICSIPA45851.2019.8977794
Lin SY, Chen YL (2021, July) Attention-based multi-filter convolutional neural network for porn speech detection. In: 2021 international joint conference on neural networks (IJCNN). IEEE. pp. 1-10. https://doi.org/10.1109/IJCNN52387.2021.9533827
Zhou L, Wei K, Li Y, Hao Y, Yang W, Zhu H (2022) Acoustic pornography recognition using convolutional neural networks and bag of refinements. arXiv preprint arXiv:2211.05983. https://doi.org/10.48550/arXiv.2211.05983
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770-778. https://doi.org/10.48550/arXiv.1512.03385
Lovenia H, Lestari DP, Frieske R (2022, September) What did I just hear? Detecting pornographic sounds in adult videos using neural networks. In: Proceedings of the 17th international audio mostly conference. pp. 92-95. https://doi.org/10.1145/3561212.3561244
XiaoYun L, Hongxia W (2010) Abnormal audio recognition algorithm based on MFCC and short-term energy [J][J]. J Comput Appl 30(3):796–798
Geiger JT, Helwani K (2015, August). Improving event detection for audio surveillance using gabor filterbank features. In: 2015 23rd European signal processing conference (EUSIPCO). IEEE. pp. 714-718. https://doi.org/10.1109/EUSIPCO.2015.7362476
Kranthi Kumar L, Alphonse PJA (2022) COVID-19 disease diagnosis with light-weight CNN using modified MFCC and enhanced GFCC from human respiratory sounds. Eur Phys J Spec Top 231:3329–3346. https://doi.org/10.1140/epjs/s11734-022-00432-w
Xu H, Lin L, Sun X, Jin H (2012, May) A new algorithm for auditory feature extraction. In: 2012 international conference on communication systems and network technologies. IEEE. pp. 229-232. https://doi.org/10.1109/CSNT.2012.57
Lim M, Lee D, Park H et al (2018) Convolutional neural network based audio event classification. KSII Trans Internet Inf Syst (TIIS) 12(6):2748–2760. https://doi.org/10.3837/tiis.2018.06.017
Xie J, Hu K, Zhu M, Yu J, Zhu Q (2019) Investigation of different CNN-based models for improved bird sound classification. IEEE Access 7:175353–175361. https://doi.org/10.1109/ACCESS.2019.2957572
Chen Y, Guo Q, Liang X, Wang J, Qian Y (2019) Environmental sound classification with dilated convolutions. Appl Acoust 148:123–132. https://doi.org/10.1016/j.apacoust.2018.12.019
Zhang X, Zou Y, Shi W (2017, August) Dilated convolution neural network with LeakyReLU for environmental sound classification. In: 2017 22nd international conference on digital signal processing (DSP). IEEE. pp. 1-5. https://doi.org/10.1109/ICDSP.2017.8096153
Liu M, Wang W, Li Y (2019) The system for acoustic scene classification using resnet. Technical Report, DCASE 2019
Miyazaki K, Komatsu T, Hayashi T, Watanabe S, Toda T, Takeda K (2020, June) Convolution augmented transformer for semi-supervised sound event detection. In: Proc. workshop detection classification Acoust. Scenes events (DCASE). pp. 100-104
Kong Q, Xu Y, Wang W, Plumbley MD (2020) Sound event detection of weakly labelled data with CNN-transformer and automatic threshold optimization. IEEE/ACM Trans Audio Speech Lang Process 28:2450–2460. https://doi.org/10.1109/TASLP.2020.3014737
Gulati A, Qin J, Chiu CC, Parmar N, Zhang Y, Yu J, ..., Pang, R. (2020) Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100. https://doi.org/10.48550/arXiv.2005.08100
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, ..., Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, ..., Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61971016).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding and conflicts of interests
We the authors of this manuscript entitled “Porn Streamer Audio Recognition Based on Deep Learning and Random Forest” declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, S., Li, R., Li, Q. et al. Porn streamer audio recognition based on deep learning and random Forest. Appl Intell 53, 18857–18867 (2023). https://doi.org/10.1007/s10489-023-04491-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04491-x