Text Message Classification Using Supervised Machine Learning Algorithms

Merugu, Suresh; Reddy, M. Chandra Shekhar; Goyal, Ekansh; Piplani, Lakshay

doi:10.1007/978-981-13-0212-1_15

Suresh Merugu³⁴,
M. Chandra Shekhar Reddy³⁴,
Ekansh Goyal³⁵ &
…
Lakshay Piplani³⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 500))

Included in the following conference series:

International Conference on Communications and Cyber Physical Engineering 2018

1226 Accesses
23 Citations

Abstract

In recent years, as the popularity of mobile phone devices has increased, the short message service (SMS) has grown into a multi-billion dollar industry. At the same time, a reduction in the cost of messaging services has resulted in the growth of unsolicited messages, known as spam, one of the major problems that not only causes financial damage to organizations but is also very annoying for those who receive them. Findings: Thus, the increasing volume of such unsolicited messages has generated the need to classify and block them. Although humans have the cognitive ability to readily identify a message as spam, doing so remains an uphill task for computers. Objectives: This is where machine learning comes in handy by offering a data-driven and statistical method for designing algorithms that can help computer systems identify an SMS as a desirable message (HAM) or as junk (SPAM). But the lack of real databases for SMS spam, limited features and the informal language of the body of the text are probable factors that may have caused existing SMS filtering algorithms to underperform when classifying text messages. Methods/Statistical Analysis: In this paper, a corpus of real SMS texts made available by the University of California, Irvine (UCI) Machine Learning Repository has been leveraged and a weighting method based on the ability of individual words (present in the corpus) to point towards different target classes (HAM or SPAM) has been applied to classify new SMSs as SPAM and HAM. Additionally, different supervised machine learning algorithms such as support vector machine, k-nearest neighbours, and random forest have been compared on the basis of their performance in the classification of SMSs. Applications/Improvements: The results of this comparison are shown at the end of the paper along with the desktop application for the same which helps in classification of SPAM and HAM. This is also developed and executed in python.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gandhi K, Pandit Rao R, Lahane VB (2015) A survey on OSN message filtering. Int J Comput Appl 113(17):19–22
Google Scholar
Soundararajan K, Eranna U, Mehta S (2012) A neural technique for classification of intercepted e-mail communications with multilayer perceptron using BPA with LMS learning. Int J Adv Electr Electron Eng 1(13):141–150
Google Scholar
Wang Q, Han X, Wang X (2009) Studying of classifying junk messages based on the data mining. In: Management and service science, Wuhan, China, pp 1–4
Google Scholar
Hidalgo JMG, Almeida TA, Yamakami A On the validity of a new SMS spam collection. In: Proceedings of the 11th IEEE international conference on machine learning and applications (ICMLA’12), vol 2, Boca Raton, FL, USA, December 2012, pp 240–245
Google Scholar
Suresh M, Jain K (2017) Subpixel level mapping of remotely sensed imagery to extract fractional abundances using colorimetry. E J Remote Sens Spat Sci. https://doi.org/10.1016/j.ejrs.2017.02.004
Article Google Scholar
Aha D (1987) UCI machine learning repository: SMS spam collection data set. http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection. Accessed on 21 Dec 2016
Almeida TA, Hidalgo JMG (2011) SMS spam collection (v. 1) (Online). http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. Accessed on 11 Nov 2016
Suresh M, Jain K (2015) Semantic driven automated image processing using the concept of colorimetry. In: Second international symposium on computer vision and the internet (VisionNet’15); Procedia Computer Science (Elsevier) 58:453–460. https://doi.org/10.1016/j.procs.2015.08.062
Suresh M, Jain K (2016) Colorimetry-based edge preservation approach for color image enhancement. J Appl Remote Sens (SPIE) 10(3):035011. https://doi.org/10.1117/1.jrs.10.035011
Al-Talib GA, Hassan HS (2013) A study on analysis of SMS classification using TF-IDF weighting. Int J Comput Netw Commun Secur 1(5):189–194
Google Scholar
Caragea C, McNeese N, Jaiswal A et al Classifying text messages for the Haiti earthquake. In: Proceedings of the 8th international information systems for crisis response and management conference (ISCRAM’12), Lisbon, Portugal, May 2011
Google Scholar
Ahmed, Guan D, Chung T (2014) SMS classification based on naïve Bayes classifier and Apriori algorithm frequent item set. Int J Mach Learn Comput 4(2):183–187
Google Scholar
Almeida TA, Hidalgo JMG, Yamakami A Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 2011 ACM symposium on document engineering (DOCENG’11), Mountain View, CA, USA, pp 259–262, Sept 2011
Google Scholar

Download references

Author information

Authors and Affiliations

Research and Development Centre, CMR College of Engineering & Technology, Hyderabad, 501401, Telangana State, India
Suresh Merugu & M. Chandra Shekhar Reddy
Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India
Ekansh Goyal & Lakshay Piplani

Authors

Suresh Merugu
View author publications
You can also search for this author in PubMed Google Scholar
M. Chandra Shekhar Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Ekansh Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Lakshay Piplani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suresh Merugu .

Editor information

Editors and Affiliations

BioAxis DNA Research Centre Pvt. Ltd., Hyderabad, Andhra Pradesh, India
Amit Kumar
Dynexsys, Sydney, NSW, Australia
Stefan Mozar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Merugu, S., Reddy, M.C.S., Goyal, E., Piplani, L. (2019). Text Message Classification Using Supervised Machine Learning Algorithms. In: Kumar, A., Mozar, S. (eds) ICCCE 2018. ICCCE 2018. Lecture Notes in Electrical Engineering, vol 500. Springer, Singapore. https://doi.org/10.1007/978-981-13-0212-1_15

Download citation

DOI: https://doi.org/10.1007/978-981-13-0212-1_15
Published: 01 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0211-4
Online ISBN: 978-981-13-0212-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics