Abstract
Phishing scams via SMS have become a common phenomenon due to the widespread use of smartphones and the availability of mobile Internet technologies. Identifying a phishing SMS via analyzing unstructured short texts is a challenging issue in the domain of AI-driven cybersecurity. Machine learning-based techniques integrated with natural language processing have massive potentials to identify differentiating patterns between phishing and legitimate SMS. In this paper, we have experimented with several state-of-the-art machine learning algorithms on a benchmark dataset. Also, NLP-based feature extraction and feature selection steps are incorporated to build an automated phishing detection strategy. Support vector machine classifier when applied after feature extraction and feature selection has outperformed the tenfold cross-validation score of 98.27%, F1-score of 99.08% for legitimate SMS, and accuracy of 98.39%. The performance of the tested methods has been evaluated through popular evaluation metrics on a benchmark dataset.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aiyar S, Shetty NP (2018) N-gram assisted YouTube spam comment detection. Procedia Comput Sci 132:174–182. https://doi.org/10.1016/j.procs.2018.05.181, https://www.sciencedirect.com/science/article/pii/S1877050918309153. In: International conference on computational intelligence and data science
Alam MN, Sarma D, Lima FF, Saha I, Ulfath RE, Hossain S (2020) Phishing attacks detection using machine learning approach. In: 2020 third international conference on smart systems and inventive technology (ICSSIT), pp 1173–1179. https://doi.org/10.1109/ICSSIT48917.2020.9214225
Amir Sjarif NN, Mohd Azmi NF, Chuprat S, Sarkan HM, Yahya Y, Sam SM (2019) SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Comput Sci 161:509–515. https://doi.org/10.1016/j.procs.2019.11.150. https://www.sciencedirect.com/science/article/pii/S1877050919318617. In: The fifth information systems international conference, 23–24 July 2019, Surabaya
Boukari BE, Ravi A, Msahli M (2021) Machine learning detection for smishing frauds. In: 2021 IEEE 18th annual consumer communications networking conference (CCNC), pp 1–2. https://doi.org/10.1109/CCNC49032.2021.9369640
Burke-Kennedy E, Brennan J, Taylor C (2020) Bank of Ireland does U-turn after refusal to reimburse ‘smishing’ victims. https://www.irishtimes.com/business/financial-services/bank-of-ireland-does-u-turn-after-refusal-to-reimburse-smishing-victims-1.4326502
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Ghourabi A, Mahmood MA, Alzubi QM (2020) A hybrid CNN-LSTM model for SMS spam detection in Arabic and English messages. Future Internet 12(9). https://doi.org/10.3390/fi12090156. https://www.mdpi.com/1999-5903/12/9/156
Goel D, Jain AK (2017) Smishing-classifier: a novel framework for detection of smishing attack in mobile environment. In: International conference on next generation computing technologies. Springer, pp 502–512
Kumar S, Pal AK, Islam SH, Hammoudeh M (2021) Secure and efficient image retrieval through invariant features selection in insecure cloud environments. Neural Comput Appl 1–26
Martens B (2021) 11 facts + stats on smishing (SMS phishing) in 2021. https://www.safetydetectives.com/blog/what-is-smishing-sms-phishing-facts/
Mathew NV, Bai VR (2016) Analyzing the effectiveness of n-gram technique based feature set in a Naive Bayesian spam filter. In: 2016 international conference on emerging technological trends (ICETT), pp 1–5. https://doi.org/10.1109/ICETT.2016.7873648
Meesad P, Boonrawd P, Nuipian V. A chi-square-test for word importance differentiation in text classification
Mishra S, Soni D (2020) Smishing detector: a security model to detect smishing through SMS content analysis and URL behavior analysis. Future Gener Comput Syst 108:803–815. https://doi.org/10.1016/j.future.2020.03.021https://www.sciencedirect.com/science/article/pii/S0167739X19318758
Mobile phishing increases more than 300% as 2020 chaos continues | Proofpoint US (2021). https://www.proofpoint.com/us/blog/threat-protection/mobile-phishing-increases-more-300-2020-chaos-continues
Saleem J, Hammoudeh M (2018) Defense methods against social engineering attacks. In: Computer and network security essentials. Springer, pp 603–618
Sarker IH (2021) Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Comput Sci
Sarker IH (2021) Cyberlearning: effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet Things 14:100393
Sarker IH (2021) Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci 2(3):1–16
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2(3):1–21
Sarker IH, Furhad MH, Nowrozy R (2021) AI-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci 2(3):1–18
Sonowal G (2020) Detecting phishing SMS based on multiple correlation algorithms. SN Comput Sci 1(6):1–9
Sonowal G, Kuppusamy KS (2018) SmiDCA: an anti-smishing model with machine learning approach. Comput J 61(8):1143–1157. https://doi.org/10.1093/comjnl/bxy039
UCI machine learning repository: SMS spam collection data set (2012). https://archive.ics.uci.edu/ml/datasets/sms+spam+collection
Walker-Roberts S, Hammoudeh M, Aldabbas O, Aydin M, Dehghantanha A (2020) Threats on the horizon: understanding security threats in the era of cyber-physical systems. J Supercomput 76(4):2643–2664
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ulfath, R.E., Sarker, I.H., Chowdhury, M.J.M., Hammoudeh, M. (2022). Detecting Smishing Attacks Using Feature Extraction and Classification Techniques. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol 95. Springer, Singapore. https://doi.org/10.1007/978-981-16-6636-0_51
Download citation
DOI: https://doi.org/10.1007/978-981-16-6636-0_51
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6635-3
Online ISBN: 978-981-16-6636-0
eBook Packages: EngineeringEngineering (R0)