Skip to main content
Log in

Machine intelligence based hybrid classifier for spam detection and sentiment analysis of SMS messages

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Short Message Service (SMS) on mobile phones has improved because of technological advancements and increases in content-based marketing where smart phones are frequently overburden with spam SMS. Spam messages are not important since they include virus and spyware. Several text classification methods have been suggested to address spam. However, none of these methods can guarantee a full spam-free solution since each filtering and modeling methodology has its own set of strengths and weaknesses. This paper suggests a hybrid classifier based on SMS spam classification and sentiment analysis. The datasets are pre-processed and Word2vec data augmentation is used to extract the features. Then, the features are fed to six various feature selection methods and equilibrium optimization (EO). Optimum components are then fed into a hybrid K-Nearest Neighbors (KNN) and support vector machine (SVM) classifier is to classify SMS messages. Further, to optimize the parameters of the network and to improve the accuracy, the optimization algorithm Rat Swarm Optimization (RSO) is used. Then, AFINN and SentiWordNet are used for sentiment analysis. This framework is evaluated on the three benchmark datasets; when comparing the performance of proposed method on the three dataset, spam assassin dataset achieves better spam detection accuracy of 99.82%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

  1. Abayomi-Alli O, Misra S, Abayomi-Alli A, Odusami M (2019) A review of soft techniques for SMS spam classification: methods, approaches and applications. Eng Appl Artif Intell 86:197–212

    Article  Google Scholar 

  2. Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54(6):922–937

    Article  Google Scholar 

  3. Arivoli PV, Chakravarthy T, Kumaravelan G (2017) Empirical evaluation of machine learning algorithms for automatic document classification. Int J Adv Res Comput Sci 8(8):299–302

    Article  Google Scholar 

  4. Ay Karakuş B, Talo M, Hallaç İR, Aydin G (2018) Evaluating deep learning models for sentiment classification. Concurr Comput: Prac Exp 30(21):e4783

    Article  Google Scholar 

  5. Barushka A, Hajek P (2020) Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput Applic 32(9):4239–4257

    Article  Google Scholar 

  6. Cekik R, Uysal AK (2020) A novel filter feature selection method using rough set for short text data. Expert Syst Appl 160:113691

    Article  Google Scholar 

  7. Chandra A, Khatri SK (2019a) Spam SMS filtering using recurrent neural network and long short term memory. In 2019 4th international conference on information systems and computer networks (ISCON) (pp. 118-122). IEEE

  8. Chandra A, Khatri SK (2019b) Spam SMS filtering using recurrent neural network and long short term memory. In 2019 4th international conference on information systems and computer networks (ISCON) (pp. 118-122). IEEE

  9. Dhiman G, Garg M, Nagar A, Kumar V, Dehghani M (2020) A novel algorithm for global optimization: rat swarm optimizer. Journal of ambient intelligence and humanized computing, pp.1-26

  10. Faramarzi A, Heidarinejad M, Stephens B, Mirjalili S (2020) Equilibrium optimizer: a novel optimization algorithm. Knowl-Based Syst 191:105190

    Article  Google Scholar 

  11. Federici M, Dragoni M (2016) A knowledge-based approach for aspect-based opinion mining. In semantic web evaluation challenge (pp. 141-152). Springer, Cham

  12. Gupta M, Bakliwal A, Agarwal S, Mehndiratta P (2018) A comparative study of spam SMS detection using machine learning classifiers. In 2018 eleventh international conference on contemporary computing (IC3) (pp. 1-7). IEEE

  13. Kou G, Yang P, Peng Y, Xiao F, Chen Y, Alsaadi FE (2020) Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput 86:105836

    Article  Google Scholar 

  14. Kumar K, Kurhekar M (2017) Sentimentalizer: Docker container utility over cloud. In 2017 ninth international conference on advances in pattern recognition (ICAPR) (pp. 1-6). IEEE

  15. Kumar KN, Uma V (2020) Need for hybrid lexicon based context aware sentiment analysis for handling uncertainty—an experimental study. In emerging trends in electrical, communications, and information technologies (pp. 117-124). Springer, Singapore

  16. Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37

    Article  Google Scholar 

  17. Lall S, Sinha D, Ghosh A, Sengupta D, Bandyopadhyay S (2021) Stable feature selection using copula based mutual information. Pattern Recogn 112:107697

    Article  Google Scholar 

  18. Lee HY, Kang SS (2019) Word embedding method of sms messages for spam message filtering. In 2019 IEEE international conference on big data and smart computing (BigComp) (pp. 1-4). IEEE

  19. Li F, Lai L, Cui S (2020) On the adversarial robustness of feature selection using LASSO. In 2020 IEEE 30th international workshop on machine learning for signal processing (MLSP) (pp. 1-6). IEEE

  20. Liu Y, Mu Y, Chen K, Li Y, Guo J (2020) Daily activity feature selection in smart homes based on Pearson correlation coefficient. Neural processing letters, pp.1-17

  21. Madasu A, Elango S (2020) Efficient feature selection techniques for sentiment analysis. Multimed Tools Appl 79(9):6313–6335

    Article  Google Scholar 

  22. Mendez JR, Cotos-Yanez TR, Ruano-Ordas D (2019) A new semantic-based feature selection method for spam filtering. Appl Soft Comput 76:89–104

    Article  Google Scholar 

  23. Navaney P, Dubey G, Rana A (2018) SMS spam filtering using supervised machine learning algorithms. In 2018 8th International Conference on Cloud Computing, Data Science & Engineering (confluence) (pp. 43-48). IEEE

  24. Negi A, Kumar K, Chauhan P (2021) Deep neural network-based multi-class image classification for plant diseases. Agricultural Informatics: Automation Using the IoT and Machine Learning, pp.117–129

  25. Ordonez A, Paje RE, Naz R (2018) SMS classification method for disaster response using Naïve Bayes algorithm. In 2018 International Symposium on Computer, Consumer and Control (IS3C) (pp. 233-236). IEEE

  26. Pong-Inwong C, Songpan W (2019) Sentiment analysis in teaching evaluations using sentiment phrase pattern matching (SPPM) based on association mining. Int J Mach Learn Cybern 10(8):2177–2186

    Article  Google Scholar 

  27. Popovac M, Karanovic M, Sladojevic S, Arsenovic M, Anderla A (2018) Convolutional neural network based SMS spam detection. In 2018 26th telecommunications forum (TELFOR) (pp. 1-4). IEEE

  28. Roy PK, Singh JP, Banerjee S (2020) Deep learning to filter SMS spam. Futur Gener Comput Syst 102:524–533

    Article  Google Scholar 

  29. Shafi’I MA, AbdLatiff MS, Chiroma H, Osho O, Abdul-Salaam G, Abubakar AI, Herawan T (2017) A review on mobile SMS spam filtering techniques. IEEE Access 5:15650–15666

    Article  Google Scholar 

  30. Sharaff A (2019) Spam detection in SMS based on feature selection techniques. In Emerging Technologies in Data Mining and Information Security (pp. 555-563). Springer, Singapore

  31. Sharma S, Kumar P, Kumar K (2017a) LEXER: lexicon based emotion analyzer. In International Conference on Pattern Recognition and Machine Intelligence (pp. 373-379). Springer, Cham

  32. Sharma S, Kumar K, Singh N (2017b) D-FES: deep facial expression recognition system. In 2017 Conference on Information and Communication Technology (CICT) (pp. 1-6). IEEE

  33. Sharma S, Shivhare SN, Singh N, Kumar K (2019) Computationally efficient ann model for small-scale problems. In Machine Intelligence and Signal Analysis (pp. 423-435). Springer, Singapore

  34. Sisodia DS, Mahapatra S, Sharma A (2020) Automated SMS classification and spam analysis using topic modeling. In 2nd International Conference on data, Engineering and Applications (IDEA) (pp. 1-6). IEEE

  35. Sjarif NNA, Azmi NFM, Chuprat S, Sarkan HM, Yahya Y, Sam SM (2019) SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Comput Sci 161:509–515

    Article  Google Scholar 

  36. Su YJ, Hu WC, Jiang JH, Su RY (2020) A novel LMAEB-CNN model for Chinese microblog sentiment analysis. J Supercomput:1–15

  37. Suleiman D, Al-Naymat G (2017) SMS spam detection using H2O framework. Procedia Comput Sci 113:154–161

    Article  Google Scholar 

  38. Xia T (2020) A constant time complexity spam detection algorithm for boosting throughput on rule-based filtering systems. IEEE Access 8:82653–82661

    Article  Google Scholar 

  39. Zainal K, Jali MZ (2016) A review of feature extraction optimization in SMS spam messages classification. In: International Conference on Soft Computing in data Science (pp. 158-170). Springer, Singapor.

Download references

Acknowledgements

The authors would like to thank the National Institute of Technology Raipur, Chhattisgarh, India for providing infrastructure and facilities to carry out this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ulligaddala Srinivasarao.

Ethics declarations

Conflict of interest

The authors declare no potential conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srinivasarao, U., Sharaff, A. Machine intelligence based hybrid classifier for spam detection and sentiment analysis of SMS messages. Multimed Tools Appl 82, 31069–31099 (2023). https://doi.org/10.1007/s11042-023-14641-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14641-5

Keywords

Navigation