Abstract
In recent years, as the popularity of mobile phone devices has increased, the short message service (SMS) has grown into a multi-billion dollar industry. At the same time, a reduction in the cost of messaging services has resulted in the growth of unsolicited messages, known as spam, one of the major problems that not only causes financial damage to organizations but is also very annoying for those who receive them. Findings: Thus, the increasing volume of such unsolicited messages has generated the need to classify and block them. Although humans have the cognitive ability to readily identify a message as spam, doing so remains an uphill task for computers. Objectives: This is where machine learning comes in handy by offering a data-driven and statistical method for designing algorithms that can help computer systems identify an SMS as a desirable message (HAM) or as junk (SPAM). But the lack of real databases for SMS spam, limited features and the informal language of the body of the text are probable factors that may have caused existing SMS filtering algorithms to underperform when classifying text messages. Methods/Statistical Analysis: In this paper, a corpus of real SMS texts made available by the University of California, Irvine (UCI) Machine Learning Repository has been leveraged and a weighting method based on the ability of individual words (present in the corpus) to point towards different target classes (HAM or SPAM) has been applied to classify new SMSs as SPAM and HAM. Additionally, different supervised machine learning algorithms such as support vector machine, k-nearest neighbours, and random forest have been compared on the basis of their performance in the classification of SMSs. Applications/Improvements: The results of this comparison are shown at the end of the paper along with the desktop application for the same which helps in classification of SPAM and HAM. This is also developed and executed in python.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gandhi K, Pandit Rao R, Lahane VB (2015) A survey on OSN message filtering. Int J Comput Appl 113(17):19–22
Soundararajan K, Eranna U, Mehta S (2012) A neural technique for classification of intercepted e-mail communications with multilayer perceptron using BPA with LMS learning. Int J Adv Electr Electron Eng 1(13):141–150
Wang Q, Han X, Wang X (2009) Studying of classifying junk messages based on the data mining. In: Management and service science, Wuhan, China, pp 1–4
Hidalgo JMG, Almeida TA, Yamakami A On the validity of a new SMS spam collection. In: Proceedings of the 11th IEEE international conference on machine learning and applications (ICMLA’12), vol 2, Boca Raton, FL, USA, December 2012, pp 240–245
Suresh M, Jain K (2017) Subpixel level mapping of remotely sensed imagery to extract fractional abundances using colorimetry. E J Remote Sens Spat Sci. https://doi.org/10.1016/j.ejrs.2017.02.004
Aha D (1987) UCI machine learning repository: SMS spam collection data set. http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection. Accessed on 21 Dec 2016
Almeida TA, Hidalgo JMG (2011) SMS spam collection (v. 1) (Online). http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. Accessed on 11 Nov 2016
Suresh M, Jain K (2015) Semantic driven automated image processing using the concept of colorimetry. In: Second international symposium on computer vision and the internet (VisionNet’15); Procedia Computer Science (Elsevier) 58:453–460. https://doi.org/10.1016/j.procs.2015.08.062
Suresh M, Jain K (2016) Colorimetry-based edge preservation approach for color image enhancement. J Appl Remote Sens (SPIE) 10(3):035011. https://doi.org/10.1117/1.jrs.10.035011
Al-Talib GA, Hassan HS (2013) A study on analysis of SMS classification using TF-IDF weighting. Int J Comput Netw Commun Secur 1(5):189–194
Caragea C, McNeese N, Jaiswal A et al Classifying text messages for the Haiti earthquake. In: Proceedings of the 8th international information systems for crisis response and management conference (ISCRAM’12), Lisbon, Portugal, May 2011
Ahmed, Guan D, Chung T (2014) SMS classification based on naïve Bayes classifier and Apriori algorithm frequent item set. Int J Mach Learn Comput 4(2):183–187
Almeida TA, Hidalgo JMG, Yamakami A Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 2011 ACM symposium on document engineering (DOCENG’11), Mountain View, CA, USA, pp 259–262, Sept 2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Merugu, S., Reddy, M.C.S., Goyal, E., Piplani, L. (2019). Text Message Classification Using Supervised Machine Learning Algorithms. In: Kumar, A., Mozar, S. (eds) ICCCE 2018. ICCCE 2018. Lecture Notes in Electrical Engineering, vol 500. Springer, Singapore. https://doi.org/10.1007/978-981-13-0212-1_15
Download citation
DOI: https://doi.org/10.1007/978-981-13-0212-1_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0211-4
Online ISBN: 978-981-13-0212-1
eBook Packages: EngineeringEngineering (R0)