Real-time Korean voice phishing detection based on machine learning approaches

Lee, Minyoung; Park, Eunil

doi:10.1007/s12652-021-03587-x

Real-time Korean voice phishing detection based on machine learning approaches

Original Research
Published: 12 November 2021

Volume 14, pages 8173–8184, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Minyoung Lee¹ &
Eunil Park¹

1556 Accesses
5 Citations
Explore all metrics

Abstract

Voice phishing, or vishing, is a phishing phone call in which an attacker lures receivers into providing personal their information. Damage from vishing is a serious problem worldwide and is increasing in frequency. Therefore, this study is aimed at detecting vishing in real time. Owing to the absence of research on spam detection using low-resource languages, we detect vishing in the Korean language using basic machine-learning models. We collected actual vishing damage data and converted the voice files into text to achieve spam detection using natural language processing techniques. The focus is on determining whether vishing can be rapidly detected, rather than model development. Based on the results, we suggest that vishing can be detected in real time and requires only a short training time when using machine learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Natural Language Processing: History, Evolution, Application, and Future Work

Data availability Statement

The datasets collected for the current study are available at https://anonymous.4open.science/r/vishing-1AF8. Additional information used in this study can be obtained from the corresponding author upon reasonable request.

Notes

References

Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, ACM, pp 60–69
Akinyelu AA, Adewumi AO (2014) Classification of phishing email using random forest machine learning technique. J Appl Math 2014:425731
Article Google Scholar
Arık SÖ, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X, Miller J, Ng A, Raiman J et al (2017) Deep voice: real-time neural text-to-speech. In: Proceedings of the International Conference on Machine Learning, PMLR, pp 195–204
Barraclough PA, Hossain MA, Tahir M, Sexton G, Aslam N (2013) Intelligent phishing detection and protection scheme for online transactions. Expert Syst Appl 40(11):4697–4706
Article Google Scholar
Biswal S (2021) Real-time intelligent vishing prediction and awareness model (rivpam). In: Proceedings of the 2021 international conference on cyber situational awareness. Data Analytics and Assessment (CyberSA), IEEE, pp 1–2
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, ACM, pp 144–152
Breiman L (2001) Random forests. Mach Learning 45(1):5–32
Article MATH Google Scholar
Choi K, Jl L, Yt C (2017) Voice phishing fraud and its modus operandi. Secur J 30(2):454–466
Article Google Scholar
Cook S (2021) 35+ phone spam stattistics for 2017–2021. https://www.comparitech.com/blog/information-security/phone-spam-statistics/
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359
Article Google Scholar
Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Networks 10(5):1048–1054
Article Google Scholar
Ghourabi A, Mahmood MA, Alzubi QM (2020) A hybrid cnn-lstm model for sms spam detection in Arabic and English messages. Future Internet 12(9):156
Article Google Scholar
Gómez Hidalgo JM, Bringas GC, Sánz EP, García FC (2006) Content based sms spam filtering. In: Proceedings of the 2006 ACM symposium on Document engineering, ACM, pp 107–114
Gorham M (2019) 2018 internet crime report. https://www.ic3.gov/Media/PDF/AnnualReport/2018_IC3Report.pdf
Gupta H, Jamal MS, Madisetty S, Desarkar MS (2018) A framework for real-time spam detection in twitter. In: Proceedings of the 2018 10th international conference on communication systems & networks (COMSNETS), IEEE, pp 380–383
Hwang S, Kim J, Park E, Kwon SJ (2020) Who will be your next customer: a machine learning approach to customer return visits in airline services. J Bus Res 121:121–126
Article Google Scholar
Kadoya Y, Khan MSR, Yamane T (2020) The rising phenomenon of financial scams: evidence from Japan. J Financial Crime 27(2):387–396
Article Google Scholar
Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT), pp 4171–4186
Kim J, Bae K, Park E, del Pobil AP (2019) Who will subscribe to my streaming channel? The case of twitch. In: Conference companion publication of the 2019 on computer supported cooperative work and social computing (CSCW Companion), pp 247–251
Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media. Sci Rep 10(1):1–6
Google Scholar
Kim J, Hwang S, Park E (2021a) Can we predict the Oscar winner? A machine learning approach with social network services. Entertain Comput 39:100441
Article Google Scholar
Kim JW, Hong GW, Chang H (2021b) Voice recognition and document classification-based data analysis for voice phishing detection. Human-Centric Comput Info Sci 11:2
Google Scholar
Korea Financial Supervisory Service (2021) Analysis of voice phishing status in 2020. https://www.fss.or.kr/fss/kr/promo/bodobbs_view.jsp?seqno=23836
Korea National Police Agency (2020) Voice phishing status. https://www.data.go.kt/data/15063815/fileData.do
Koøcz A, Alspector J (2001) SVM-based Filtering of E-mail Spam with Content-specific Misclassification Costs. In: Proceedings of the workshop on text mining (TEXTDM), Citeseer, pp 1–14
Lee S, Ji H, Kim J, Park E (2021) What books will be your bestseller? A machine learning approach with amazon kindle. Electron Libr 39(1):137–151
Article Google Scholar
Li Z, Nie F, Chang X, Nie L, Zhang H, Yang Y (2018a) Rank-constrained spectral clustering with flexible embedding. IEEE Trans Neural Netw Learning Syst 29(12):6073–6082
Article MathSciNet Google Scholar
Li Z, Nie F, Chang X, Yang Y, Zhang C, Sebe N (2018b) Dynamic affinity graph construction for spectral clustering using multiple features. IEEE Trans Neural Netw Learning Syst 29(12):6323–6332
Article MathSciNet Google Scholar
Li Z, Yao L, Chang X, Zhan K, Sun J, Zhang H (2019) Zero-shot event detection via event-adaptive concept relevance mining. Pattern Recogn 88:595–603
Article Google Scholar
Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: Proceedings of the international conference on autonomic and trusted computing (ATC), Springer, pp 175–186
Obuhuma J, Zivuku S (2020) Social engineering based cyber-attacks in kenya. In: Proceedings of the 2020 IST-Africa conference (IST-Africa), IEEE, pp 1–9
Raj H, Weihong Y, Banbhrani SK, Dino SP (2018) Lstm based short message service (sms) modeling for spam classification. In: Proceedings of the 2018 International Conference on Machine Learning Technologies, pp 76–80
Ren P, Xiao Y, Chang X, Huang PY, Li Z, Chen X, Wang X (2021) A comprehensive survey of neural architecture search: challenges and solutions. ACM Comput Surveys (CSUR) 54(4):1–34
Article Google Scholar
Roy PK, Singh JP, Banerjee S (2020) Deep learning to filter sms spam. Futur Gener Comput Syst 102:524–533
Article Google Scholar
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Article MathSciNet Google Scholar
Sasaki M, Shinnou H (2005) Spam detection using text clustering. In: Proceedings of the 2005 international conference on cyberworlds (CW), IEEE, pp 1–4
Shen J, Pang R, Weiss RJ, Schuster M, Jaitly N, Yang Z, Chen Z, Zhang Y, Wang Y, Skerrv-Ryan R et al (2018) Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In: Proceedings of the 2018 IEEE international conference on acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 4779–4783
Song J, Kim H, Gkelias A (2014) ivisher: real-time detection of caller id spoofing. ETRI J 36(5):865–875
Article Google Scholar
Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inf Sci 471:216–232
Article Google Scholar
Sun N, Lin G, Qiu J, Rimba P (2020) Near real-time twitter spam detection with machine learning techniques. Int J Comput Appl. https://doi.org/10.1080/1206212X.2020.1751387
Article Google Scholar
Tran MH, Le Hoai TH, Choo H (2020) A third-party intelligent system for preventing call phishing and message scams. In: Proceedings of the international conference on future data and security engineering (FDSE), Springer, pp 486–492
Trivedi SK (2016) A study of machine learning classifiers for spam detection. In: Proceedings of the 2016 4th international symposium on computational and business intelligence (ISCBI), IEEE, pp 176–180
Wei F, Nguyen T (2020) A lightweight deep neural model for sms spam detection. 2020 International Symposium on Networks. Computers and Communications (ISNCC), IEEE, pp 1–6
Wijaya A, Bisri A (2016) Hybrid decision tree and logistic regression classifier for email spam detection. In: 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE), IEEE, pp 1–4
Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. In: Proceedings of the australasian computer science week multiconference (ACSW), ACM, pp 1–8
Yan C, Chang X, Luo M, Zheng Q, Zhang X, Li Z, Nie F (2020) Self-weighted robust lda for multiclass classification with edge classes. ACM Trans Intell Syst Technol (TIST) 12(1):1–19
Google Scholar
Yeboah-Boateng EO, Amanor PM (2014) Phishing, smishing & vishing: an assessment of threats against mobile devices. J Emerg Trends Comput Inf Sci 5(4):297–307
Google Scholar
Zhang R, Gurtov A (2009) Collaborative reputation-based voice spam filtering. In: Proceedings of the 2009 20th international workshop on database and expert systems application, IEEE, pp 33–37

Download references

Acknowledgements

This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. IITP-2021-0-00358, AI big data-based cyber security orchestration, and automated response technology development). Moreover, this research was supported by National Research Foundation (NRF) of Korea Grant funded by the Korean Government (MSIT) (No. 2021R1A4A3022102).

Author information

Authors and Affiliations

Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of Korea
Minyoung Lee & Eunil Park

Authors

Minyoung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Eunil Park
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ML and EP designed the study. ML collected and analyzed the data. EP presented the results. ML and EP wrote and revised the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Eunil Park.

Ethics declarations

Conflict of interest

The authors have no conflicts or competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Data analysis

Table 6 shows the top-100 most widely used words in spam and nonspam cases when analyzing spam and nonspam text content, respectively. The nonspam cases mostly included everyday words, such as us, movies, people, and me, whereas the spam cases included words such as loan, investigation, bank accounts, bank, illegality, and victims (given in bold). Therefore, understanding the meaning of these words is important in vishing detection.

Table 6 Data analysis

Full size table

Appendix B. Speech-to-text tool examples

We converted the collected.mp3 files of voice phishing speech, and the results when using actual voice scripts, Google speech-to-text API, and Naver Clova Speech speech-to-text conversion tools are shown in Table 7.

Table 7 Comparison of speech-to-text tools

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, M., Park, E. Real-time Korean voice phishing detection based on machine learning approaches. J Ambient Intell Human Comput 14, 8173–8184 (2023). https://doi.org/10.1007/s12652-021-03587-x

Download citation

Received: 09 August 2021
Accepted: 26 October 2021
Published: 12 November 2021
Issue Date: July 2023
DOI: https://doi.org/10.1007/s12652-021-03587-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time Korean voice phishing detection based on machine learning approaches

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Natural Language Processing: History, Evolution, Application, and Future Work

Data availability Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A. Data analysis

Appendix B. Speech-to-text tool examples

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time Korean voice phishing detection based on machine learning approaches

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Natural Language Processing: History, Evolution, Application, and Future Work

Data availability Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A. Data analysis

Appendix B. Speech-to-text tool examples

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation