Abstract
Short Message Service (SMS) on mobile phones has improved because of technological advancements and increases in content-based marketing where smart phones are frequently overburden with spam SMS. Spam messages are not important since they include virus and spyware. Several text classification methods have been suggested to address spam. However, none of these methods can guarantee a full spam-free solution since each filtering and modeling methodology has its own set of strengths and weaknesses. This paper suggests a hybrid classifier based on SMS spam classification and sentiment analysis. The datasets are pre-processed and Word2vec data augmentation is used to extract the features. Then, the features are fed to six various feature selection methods and equilibrium optimization (EO). Optimum components are then fed into a hybrid K-Nearest Neighbors (KNN) and support vector machine (SVM) classifier is to classify SMS messages. Further, to optimize the parameters of the network and to improve the accuracy, the optimization algorithm Rat Swarm Optimization (RSO) is used. Then, AFINN and SentiWordNet are used for sentiment analysis. This framework is evaluated on the three benchmark datasets; when comparing the performance of proposed method on the three dataset, spam assassin dataset achieves better spam detection accuracy of 99.82%.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
References
Abayomi-Alli O, Misra S, Abayomi-Alli A, Odusami M (2019) A review of soft techniques for SMS spam classification: methods, approaches and applications. Eng Appl Artif Intell 86:197–212
Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54(6):922–937
Arivoli PV, Chakravarthy T, Kumaravelan G (2017) Empirical evaluation of machine learning algorithms for automatic document classification. Int J Adv Res Comput Sci 8(8):299–302
Ay Karakuş B, Talo M, Hallaç İR, Aydin G (2018) Evaluating deep learning models for sentiment classification. Concurr Comput: Prac Exp 30(21):e4783
Barushka A, Hajek P (2020) Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput Applic 32(9):4239–4257
Cekik R, Uysal AK (2020) A novel filter feature selection method using rough set for short text data. Expert Syst Appl 160:113691
Chandra A, Khatri SK (2019a) Spam SMS filtering using recurrent neural network and long short term memory. In 2019 4th international conference on information systems and computer networks (ISCON) (pp. 118-122). IEEE
Chandra A, Khatri SK (2019b) Spam SMS filtering using recurrent neural network and long short term memory. In 2019 4th international conference on information systems and computer networks (ISCON) (pp. 118-122). IEEE
Dhiman G, Garg M, Nagar A, Kumar V, Dehghani M (2020) A novel algorithm for global optimization: rat swarm optimizer. Journal of ambient intelligence and humanized computing, pp.1-26
Faramarzi A, Heidarinejad M, Stephens B, Mirjalili S (2020) Equilibrium optimizer: a novel optimization algorithm. Knowl-Based Syst 191:105190
Federici M, Dragoni M (2016) A knowledge-based approach for aspect-based opinion mining. In semantic web evaluation challenge (pp. 141-152). Springer, Cham
Gupta M, Bakliwal A, Agarwal S, Mehndiratta P (2018) A comparative study of spam SMS detection using machine learning classifiers. In 2018 eleventh international conference on contemporary computing (IC3) (pp. 1-7). IEEE
Kou G, Yang P, Peng Y, Xiao F, Chen Y, Alsaadi FE (2020) Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput 86:105836
Kumar K, Kurhekar M (2017) Sentimentalizer: Docker container utility over cloud. In 2017 ninth international conference on advances in pattern recognition (ICAPR) (pp. 1-6). IEEE
Kumar KN, Uma V (2020) Need for hybrid lexicon based context aware sentiment analysis for handling uncertainty—an experimental study. In emerging trends in electrical, communications, and information technologies (pp. 117-124). Springer, Singapore
Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37
Lall S, Sinha D, Ghosh A, Sengupta D, Bandyopadhyay S (2021) Stable feature selection using copula based mutual information. Pattern Recogn 112:107697
Lee HY, Kang SS (2019) Word embedding method of sms messages for spam message filtering. In 2019 IEEE international conference on big data and smart computing (BigComp) (pp. 1-4). IEEE
Li F, Lai L, Cui S (2020) On the adversarial robustness of feature selection using LASSO. In 2020 IEEE 30th international workshop on machine learning for signal processing (MLSP) (pp. 1-6). IEEE
Liu Y, Mu Y, Chen K, Li Y, Guo J (2020) Daily activity feature selection in smart homes based on Pearson correlation coefficient. Neural processing letters, pp.1-17
Madasu A, Elango S (2020) Efficient feature selection techniques for sentiment analysis. Multimed Tools Appl 79(9):6313–6335
Mendez JR, Cotos-Yanez TR, Ruano-Ordas D (2019) A new semantic-based feature selection method for spam filtering. Appl Soft Comput 76:89–104
Navaney P, Dubey G, Rana A (2018) SMS spam filtering using supervised machine learning algorithms. In 2018 8th International Conference on Cloud Computing, Data Science & Engineering (confluence) (pp. 43-48). IEEE
Negi A, Kumar K, Chauhan P (2021) Deep neural network-based multi-class image classification for plant diseases. Agricultural Informatics: Automation Using the IoT and Machine Learning, pp.117–129
Ordonez A, Paje RE, Naz R (2018) SMS classification method for disaster response using Naïve Bayes algorithm. In 2018 International Symposium on Computer, Consumer and Control (IS3C) (pp. 233-236). IEEE
Pong-Inwong C, Songpan W (2019) Sentiment analysis in teaching evaluations using sentiment phrase pattern matching (SPPM) based on association mining. Int J Mach Learn Cybern 10(8):2177–2186
Popovac M, Karanovic M, Sladojevic S, Arsenovic M, Anderla A (2018) Convolutional neural network based SMS spam detection. In 2018 26th telecommunications forum (TELFOR) (pp. 1-4). IEEE
Roy PK, Singh JP, Banerjee S (2020) Deep learning to filter SMS spam. Futur Gener Comput Syst 102:524–533
Shafi’I MA, AbdLatiff MS, Chiroma H, Osho O, Abdul-Salaam G, Abubakar AI, Herawan T (2017) A review on mobile SMS spam filtering techniques. IEEE Access 5:15650–15666
Sharaff A (2019) Spam detection in SMS based on feature selection techniques. In Emerging Technologies in Data Mining and Information Security (pp. 555-563). Springer, Singapore
Sharma S, Kumar P, Kumar K (2017a) LEXER: lexicon based emotion analyzer. In International Conference on Pattern Recognition and Machine Intelligence (pp. 373-379). Springer, Cham
Sharma S, Kumar K, Singh N (2017b) D-FES: deep facial expression recognition system. In 2017 Conference on Information and Communication Technology (CICT) (pp. 1-6). IEEE
Sharma S, Shivhare SN, Singh N, Kumar K (2019) Computationally efficient ann model for small-scale problems. In Machine Intelligence and Signal Analysis (pp. 423-435). Springer, Singapore
Sisodia DS, Mahapatra S, Sharma A (2020) Automated SMS classification and spam analysis using topic modeling. In 2nd International Conference on data, Engineering and Applications (IDEA) (pp. 1-6). IEEE
Sjarif NNA, Azmi NFM, Chuprat S, Sarkan HM, Yahya Y, Sam SM (2019) SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Comput Sci 161:509–515
Su YJ, Hu WC, Jiang JH, Su RY (2020) A novel LMAEB-CNN model for Chinese microblog sentiment analysis. J Supercomput:1–15
Suleiman D, Al-Naymat G (2017) SMS spam detection using H2O framework. Procedia Comput Sci 113:154–161
Xia T (2020) A constant time complexity spam detection algorithm for boosting throughput on rule-based filtering systems. IEEE Access 8:82653–82661
Zainal K, Jali MZ (2016) A review of feature extraction optimization in SMS spam messages classification. In: International Conference on Soft Computing in data Science (pp. 158-170). Springer, Singapor.
Acknowledgements
The authors would like to thank the National Institute of Technology Raipur, Chhattisgarh, India for providing infrastructure and facilities to carry out this research work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no potential conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Srinivasarao, U., Sharaff, A. Machine intelligence based hybrid classifier for spam detection and sentiment analysis of SMS messages. Multimed Tools Appl 82, 31069–31099 (2023). https://doi.org/10.1007/s11042-023-14641-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14641-5