Machine Learning Approaches for the Classification of Spammed Text in Messages

Mundra, Shikha; Mundra, Ankit; Saigal, Anshul; Gupta, Punit; Agarwal, Josh; Goyal, Mayank Kumar

doi:10.1007/978-981-16-2877-1_56

Shikha Mundra⁷,
Ankit Mundra⁸,
Anshul Saigal⁸,
Punit Gupta⁹,
Josh Agarwal⁸ &
…
Mayank Kumar Goyal¹⁰

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 235))

830 Accesses

Abstract

Spam text messages are unwanted messages sent to a large number of users on their mobile phones by telemarketers, companies to advertise their products and services and can often be a trap set by a scammer. These junk messages are capable of installing malware on phones if the user engages with the messages and can also be an attempt to steal the private information of the user. Thus, it is necessary to classify and detect these messages in order to protect the user from being a victim of such traps and to prevent identity theft of the user. Spam text messages can be detected by creating a corpus of text message words and identifying or classifying the words common to the spam text messages. In order to create a corpus, the data first has to be cleaned. Feature extraction can be performed on the cleaned data using various methods like term frequency, TF-IDF, Word2Vec, and GloVe. In this paper, we have classified the spam text from the messages by applying Naïve Bayes Classifier, Logistic Regression, LSTM and Convolutional Neural Network (CNN ). Further, we have presented their comparative analysis by calculating accuracy, precision, recall and F1 score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shirani-Mehr, H.: SMS spam detection using machine learning approach. unpublished. http://cs229.stanford.edu/proj2013/ShiraniMehr-SMSSpamDetectionUsingMachineLearningApproach.pdf. (2013)
Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262 (2011)
Google Scholar
Aski, A.S., Sourati, N.K.: Proposed efficient algorithm to filter spam using machine learning techniques. Pac. Sci. Rev. Nat. Sci. Eng. 18(2), 145–149 (2016)
Google Scholar
Mujtaba, D.G., Yasin, M.: SMS spam detection using simple message content features. J. Basic Appl. Sci. Res. 4(4), 5 (2014)
Google Scholar
Gudkova, D., Vergelis, M., Shcherbakova, T., Demidova, N.: Spam and phishing in Q3 2017. Securelist—kaspersky lab’s cyberthreat research and reports. https://securelist.com/spam-and-phishing-in-q3-2017/82901/ (2017)
Choudhary, N., Jain, A.K.: Towards filtering of SMS spam messages using machine learning based technique. Adv. Inf. Comput. Res. 712, 18–30 (2017)
Google Scholar
Agarwal, S., Kaur, S., Garhwal, S.: SMS spam detection for Indian messages. In: 2015 1st International Conference on Next Generation Computing Technologies (NGCT), pp. 634–638. IEEE (2015)
Google Scholar
Xu, Q.E., Xiang, W., Yang, Q., Du, J., Zhong, J.: SMS spam detection using non-content features. IEEE Intell. Syst. 27(6), 44–51 (2012)
Article Google Scholar
Suleiman, D., Al-Naymat, G.: SMS spam detection using H2O framework. Procedia Comput. Sci. 113, 154–161 (2017)
Article Google Scholar
Sethi, P., Bhandari, V., Kohli, B.: SMS spam detection and comparison of various machine learning algorithms. In 2017 International Conference on Computing and Communication Technologies for Smart Nation (IC3TSN), pp. 28–31(2017)
Google Scholar
Sajedi, H., Parast, G.Z., Akbari, F.: SMS spam filtering using machine learning techniques: a survey. Mach. Lear. Res. 1(1), 1 (2016)
Google Scholar
Sable, S., Kalavadekar, P.N.: SMS classification based on naïve bayes classifier and semi-supervised learning. Int. J. Mod. Trends. Eng. Res 3, 16–25 (2016)
Google Scholar
Popovac, M., Karanovic, M, Sladojevic, S., Arsenovic, M., Anderla, A.: Convolutional neural network based SMS spam detection. In: 2018 26th Telecommunications Forum (TELFOR), pp. 1–4. IEEE (2018)
Google Scholar
Dada, E.G., Bassi, J.S., Chiroma, H., Adetunmbi, A.O., Ajibuwa, O.E.: Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5(6), e01802 (2019)
Google Scholar
Mathew, K., Biju, I.: Intelligent spam classification for mobile text message. In: Proceedings of 2011 International Conference on Computer Science and Network Technology, vol. 1, pp. 101–105. IEEE (2011)
Google Scholar
Taheri, R., Javidan, R.: Spam filtering in SMS using recurrent neural networks. In: 2017 Artificial Intelligence and Signal Processing Conference (AISP), pp. 331–336. IEEE, (2017)
Google Scholar
Google Developers, 2018, Machine learning guidelines-text classification, viewed 02 October 2019, https://developers.google.com/machine-learning/guides/text-classification/step-1 (2018)
Skymind: A beginner’s guide to word2vec and word embedding, viewed 20 October 2019, https://skymind.ai/wiki/word2vec (2019)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546 (2013)
Awad, M., Foqaha, M.: Email spam classification using hybrid approach of RBF neural network and particle swarm optimization. Int. J. Netw. Secur. Appl. 8(4) (2016)
Google Scholar
Fonseca, D.M., Fazzion, O.H., Cunha, E., Las-Casas, I., Guedes, P.D., Meira, W.M.: Chaves measuring characterizing, and avoiding spam traffic costs. IEEE Int. Comput. 99 (2016)
Google Scholar
Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. (2017)
Google Scholar
Bhowmick, A., Hazarika, S.M.: Machine learning for E-Mail spam filtering: review, techniques and trends arXiv:1606.01042v1 [cs.LG] 3 Jun 2016 pp. 1–27 (2016)
Sharma, A., Suryawansi, A.: A novel method for detecting spam email using KNN classification with spearman correlation as distance measure. Int. J. Comput. Appl. 136(6), 28–34 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India
Shikha Mundra
Department of Information Technology, Manipal University Jaipur, Jaipur, India
Ankit Mundra, Anshul Saigal & Josh Agarwal
Department of Computer and Communication, Manipal University Jaipur, Jaipur, India
Punit Gupta
Department of Computer Science and Engineering, Sharda University, Greater Noida, India
Mayank Kumar Goyal

Authors

Shikha Mundra
View author publications
You can also search for this author in PubMed Google Scholar
Ankit Mundra
View author publications
You can also search for this author in PubMed Google Scholar
Anshul Saigal
View author publications
You can also search for this author in PubMed Google Scholar
Punit Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Josh Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Mayank Kumar Goyal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Iowa State University, Ames, IA, USA
Arun K. Somani
Manipal University Jaipur, Jaipur, Rajasthan, India
Ankit Mundra
Strategic Research Centre—Centre for Cyber Security Research and Innovation (CSRI), Deakin University, Burwood, VIC, Australia
Robin Doss
Accenture Innovations, New Delhi, Delhi, India
Subhajit Bhattacharya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mundra, S., Mundra, A., Saigal, A., Gupta, P., Agarwal, J., Goyal, M.K. (2022). Machine Learning Approaches for the Classification of Spammed Text in Messages. In: Somani, A.K., Mundra, A., Doss, R., Bhattacharya, S. (eds) Smart Systems: Innovations in Computing. Smart Innovation, Systems and Technologies, vol 235. Springer, Singapore. https://doi.org/10.1007/978-981-16-2877-1_56

Download citation

DOI: https://doi.org/10.1007/978-981-16-2877-1_56
Published: 04 September 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2876-4
Online ISBN: 978-981-16-2877-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics