Detecting Smishing Attacks Using Feature Extraction and Classification Techniques

Ulfath, Rubaiath E.; Sarker, Iqbal H.; Chowdhury, Mohammad Jabed Morshed; Hammoudeh, Mohammad

doi:10.1007/978-981-16-6636-0_51

Detecting Smishing Attacks Using Feature Extraction and Classification Techniques

Rubaiath E. Ulfath⁷,
Iqbal H. Sarker⁷,
Mohammad Jabed Morshed Chowdhury⁸ &
…
Mohammad Hammoudeh⁹

Conference paper
First Online: 04 December 2021

1122 Accesses
6 Citations

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 95))

Abstract

Phishing scams via SMS have become a common phenomenon due to the widespread use of smartphones and the availability of mobile Internet technologies. Identifying a phishing SMS via analyzing unstructured short texts is a challenging issue in the domain of AI-driven cybersecurity. Machine learning-based techniques integrated with natural language processing have massive potentials to identify differentiating patterns between phishing and legitimate SMS. In this paper, we have experimented with several state-of-the-art machine learning algorithms on a benchmark dataset. Also, NLP-based feature extraction and feature selection steps are incorporated to build an automated phishing detection strategy. Support vector machine classifier when applied after feature extraction and feature selection has outperformed the tenfold cross-validation score of 98.27%, F1-score of 99.08% for legitimate SMS, and accuracy of 98.39%. The performance of the tested methods has been evaluated through popular evaluation metrics on a benchmark dataset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aiyar S, Shetty NP (2018) N-gram assisted YouTube spam comment detection. Procedia Comput Sci 132:174–182. https://doi.org/10.1016/j.procs.2018.05.181, https://www.sciencedirect.com/science/article/pii/S1877050918309153. In: International conference on computational intelligence and data science
Alam MN, Sarma D, Lima FF, Saha I, Ulfath RE, Hossain S (2020) Phishing attacks detection using machine learning approach. In: 2020 third international conference on smart systems and inventive technology (ICSSIT), pp 1173–1179. https://doi.org/10.1109/ICSSIT48917.2020.9214225
Amir Sjarif NN, Mohd Azmi NF, Chuprat S, Sarkan HM, Yahya Y, Sam SM (2019) SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Comput Sci 161:509–515. https://doi.org/10.1016/j.procs.2019.11.150. https://www.sciencedirect.com/science/article/pii/S1877050919318617. In: The fifth information systems international conference, 23–24 July 2019, Surabaya
Boukari BE, Ravi A, Msahli M (2021) Machine learning detection for smishing frauds. In: 2021 IEEE 18th annual consumer communications networking conference (CCNC), pp 1–2. https://doi.org/10.1109/CCNC49032.2021.9369640
Burke-Kennedy E, Brennan J, Taylor C (2020) Bank of Ireland does U-turn after refusal to reimburse ‘smishing’ victims. https://www.irishtimes.com/business/financial-services/bank-of-ireland-does-u-turn-after-refusal-to-reimburse-smishing-victims-1.4326502
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Google Scholar
Ghourabi A, Mahmood MA, Alzubi QM (2020) A hybrid CNN-LSTM model for SMS spam detection in Arabic and English messages. Future Internet 12(9). https://doi.org/10.3390/fi12090156. https://www.mdpi.com/1999-5903/12/9/156
Goel D, Jain AK (2017) Smishing-classifier: a novel framework for detection of smishing attack in mobile environment. In: International conference on next generation computing technologies. Springer, pp 502–512
Google Scholar
Kumar S, Pal AK, Islam SH, Hammoudeh M (2021) Secure and efficient image retrieval through invariant features selection in insecure cloud environments. Neural Comput Appl 1–26
Google Scholar
Martens B (2021) 11 facts + stats on smishing (SMS phishing) in 2021. https://www.safetydetectives.com/blog/what-is-smishing-sms-phishing-facts/
Mathew NV, Bai VR (2016) Analyzing the effectiveness of n-gram technique based feature set in a Naive Bayesian spam filter. In: 2016 international conference on emerging technological trends (ICETT), pp 1–5. https://doi.org/10.1109/ICETT.2016.7873648
Meesad P, Boonrawd P, Nuipian V. A chi-square-test for word importance differentiation in text classification
Google Scholar
Mishra S, Soni D (2020) Smishing detector: a security model to detect smishing through SMS content analysis and URL behavior analysis. Future Gener Comput Syst 108:803–815. https://doi.org/10.1016/j.future.2020.03.021 https://www.sciencedirect.com/science/article/pii/S0167739X19318758
Mobile phishing increases more than 300% as 2020 chaos continues | Proofpoint US (2021). https://www.proofpoint.com/us/blog/threat-protection/mobile-phishing-increases-more-300-2020-chaos-continues
Saleem J, Hammoudeh M (2018) Defense methods against social engineering attacks. In: Computer and network security essentials. Springer, pp 603–618
Google Scholar
Sarker IH (2021) Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Comput Sci
Google Scholar
Sarker IH (2021) Cyberlearning: effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet Things 14:100393
Article Google Scholar
Sarker IH (2021) Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci 2(3):1–16
MathSciNet Google Scholar
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2(3):1–21
MathSciNet Google Scholar
Sarker IH, Furhad MH, Nowrozy R (2021) AI-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci 2(3):1–18
Google Scholar
Sonowal G (2020) Detecting phishing SMS based on multiple correlation algorithms. SN Comput Sci 1(6):1–9
Article Google Scholar
Sonowal G, Kuppusamy KS (2018) SmiDCA: an anti-smishing model with machine learning approach. Comput J 61(8):1143–1157. https://doi.org/10.1093/comjnl/bxy039
UCI machine learning repository: SMS spam collection data set (2012). https://archive.ics.uci.edu/ml/datasets/sms+spam+collection
Walker-Roberts S, Hammoudeh M, Aldabbas O, Aydin M, Dehghantanha A (2020) Threats on the horizon: understanding security threats in the era of cyber-physical systems. J Supercomput 76(4):2643–2664
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong, 4349, Bangladesh
Rubaiath E. Ulfath & Iqbal H. Sarker
La Trobe University, Melbourne, Australia
Mohammad Jabed Morshed Chowdhury
Department of Computing and Math, Manchester Metropolitan University, Manchester, M1 5GD, UK
Mohammad Hammoudeh

Authors

Rubaiath E. Ulfath
View author publications
You can also search for this author in PubMed Google Scholar
Iqbal H. Sarker
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Jabed Morshed Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Hammoudeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iqbal H. Sarker .

Editor information

Editors and Affiliations

Chittagong University of Engineering and Technology (CUET), Chittagong, Bangladesh
Mohammad Shamsul Arefin
Jahangirnagar University, Dhaka, Bangladesh
M. Shamim Kaiser
National Institute for Materials Science, Tsukuba, Japan
Anirban Bandyopadhyay
University of Dhaka, Dhaka, Bangladesh
Md. Atiqur Rahman Ahad
Amity University, Jaipur, India
Kanad Ray

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ulfath, R.E., Sarker, I.H., Chowdhury, M.J.M., Hammoudeh, M. (2022). Detecting Smishing Attacks Using Feature Extraction and Classification Techniques. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol 95. Springer, Singapore. https://doi.org/10.1007/978-981-16-6636-0_51

Download citation

DOI: https://doi.org/10.1007/978-981-16-6636-0_51
Published: 04 December 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6635-3
Online ISBN: 978-981-16-6636-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics