Skip to main content

Text Message Classification Using Supervised Machine Learning Algorithms

  • Conference paper
  • First Online:
Book cover ICCCE 2018 (ICCCE 2018)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 500))

Abstract

In recent years, as the popularity of mobile phone devices has increased, the short message service (SMS) has grown into a multi-billion dollar industry. At the same time, a reduction in the cost of messaging services has resulted in the growth of unsolicited messages, known as spam, one of the major problems that not only causes financial damage to organizations but is also very annoying for those who receive them. Findings: Thus, the increasing volume of such unsolicited messages has generated the need to classify and block them. Although humans have the cognitive ability to readily identify a message as spam, doing so remains an uphill task for computers. Objectives: This is where machine learning comes in handy by offering a data-driven and statistical method for designing algorithms that can help computer systems identify an SMS as a desirable message (HAM) or as junk (SPAM). But the lack of real databases for SMS spam, limited features and the informal language of the body of the text are probable factors that may have caused existing SMS filtering algorithms to underperform when classifying text messages. Methods/Statistical Analysis: In this paper, a corpus of real SMS texts made available by the University of California, Irvine (UCI) Machine Learning Repository has been leveraged and a weighting method based on the ability of individual words (present in the corpus) to point towards different target classes (HAM or SPAM) has been applied to classify new SMSs as SPAM and HAM. Additionally, different supervised machine learning algorithms such as support vector machine, k-nearest neighbours, and random forest have been compared on the basis of their performance in the classification of SMSs. Applications/Improvements: The results of this comparison are shown at the end of the paper along with the desktop application for the same which helps in classification of SPAM and HAM. This is also developed and executed in python.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gandhi K, Pandit Rao R, Lahane VB (2015) A survey on OSN message filtering. Int J Comput Appl 113(17):19–22

    Google Scholar 

  2. Soundararajan K, Eranna U, Mehta S (2012) A neural technique for classification of intercepted e-mail communications with multilayer perceptron using BPA with LMS learning. Int J Adv Electr Electron Eng 1(13):141–150

    Google Scholar 

  3. Wang Q, Han X, Wang X (2009) Studying of classifying junk messages based on the data mining. In: Management and service science, Wuhan, China, pp 1–4

    Google Scholar 

  4. Hidalgo JMG, Almeida TA, Yamakami A On the validity of a new SMS spam collection. In: Proceedings of the 11th IEEE international conference on machine learning and applications (ICMLA’12), vol 2, Boca Raton, FL, USA, December 2012, pp 240–245

    Google Scholar 

  5. Suresh M, Jain K (2017) Subpixel level mapping of remotely sensed imagery to extract fractional abundances using colorimetry. E J Remote Sens Spat Sci. https://doi.org/10.1016/j.ejrs.2017.02.004

    Article  Google Scholar 

  6. Aha D (1987) UCI machine learning repository: SMS spam collection data set. http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection. Accessed on 21 Dec 2016

  7. Almeida TA, Hidalgo JMG (2011) SMS spam collection (v. 1) (Online). http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. Accessed on 11 Nov 2016

  8. Suresh M, Jain K (2015) Semantic driven automated image processing using the concept of colorimetry. In: Second international symposium on computer vision and the internet (VisionNet’15); Procedia Computer Science (Elsevier) 58:453–460. https://doi.org/10.1016/j.procs.2015.08.062

  9. Suresh M, Jain K (2016) Colorimetry-based edge preservation approach for color image enhancement. J Appl Remote Sens (SPIE) 10(3):035011. https://doi.org/10.1117/1.jrs.10.035011

  10. Al-Talib GA, Hassan HS (2013) A study on analysis of SMS classification using TF-IDF weighting. Int J Comput Netw Commun Secur 1(5):189–194

    Google Scholar 

  11. Caragea C, McNeese N, Jaiswal A et al  Classifying text messages for the Haiti earthquake. In: Proceedings of the 8th international information systems for crisis response and management conference (ISCRAM’12), Lisbon, Portugal, May 2011

    Google Scholar 

  12. Ahmed, Guan D, Chung T (2014) SMS classification based on naïve Bayes classifier and Apriori algorithm frequent item set. Int J Mach Learn Comput 4(2):183–187

    Google Scholar 

  13. Almeida TA, Hidalgo JMG, Yamakami A  Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 2011 ACM symposium on document engineering (DOCENG’11), Mountain View, CA, USA, pp 259–262, Sept 2011

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suresh Merugu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Merugu, S., Reddy, M.C.S., Goyal, E., Piplani, L. (2019). Text Message Classification Using Supervised Machine Learning Algorithms. In: Kumar, A., Mozar, S. (eds) ICCCE 2018. ICCCE 2018. Lecture Notes in Electrical Engineering, vol 500. Springer, Singapore. https://doi.org/10.1007/978-981-13-0212-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-0212-1_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-0211-4

  • Online ISBN: 978-981-13-0212-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics