Skip to main content

A hybrid meta-heuristic-based multi-objective feature selection with adaptive capsule network for automated email spam detection

Abstract

Spam emails are not essential because they include dangerous spyware and viruses. Thus, there is an urgent requirement to identify spam emails due to the adaptive nature of unsolicited email. Different approaches are proposed to detect the spam emails, which are developed by considering the machine learning-based algorithms that aim for minimizing the unnecessary emails and obtain outcomes at an accurate rate for the prediction of spam email. These systems focus on solving the issues of different email spam devastating the system. Moreover, the performance of the conventional models is required to be improved, and so this paper implements the email spam detection model for both image and text datasets. Here, the main contribution is considered as the development of multi-objective feature selection and adaptive capsule network for the email spam detection. While using the text datasets, two feature extraction techniques like Term Variance (TV), and Term Frequency-Inverse Document Frequency (TF-IDF) is used, whereas the Fisher Discriminate Analysis (FDA), Walsh-Hadamard Transform (WHT), and color correlogram are used as the feature extraction techniques for handling image datasets. As the length of the features seems to be long, and for reducing the training complexity, the multi-objective feature selection is performed by the hybrid meta-heuristic algorithm Grey-Sail Fish Optimization (G-SFO) algorithm. Further, a novel adaptive Capsule network is used for email spam detection based on the improvement done by the proposed G-SFO algorithm. The efficiency of the suggested model is evaluated with other existing approaches to show better spam detection accuracy.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Abbreviations

E-mail:

Electronic mail

GA:

Genetic algorithm

RF:

Random forest

NN:

Neural networks

NSA:

Negative selection algorithm

WHT:

Walsh-Hadamard transform

WOA:

Whale optimization algorithm

GMDH:

Group method of data handling

MLP:

Multi-layer perceptron

SVM:

Support vector machine

PSO:

Particle Swarm Optimization

FDA:

Fisher discriminate analysis

FNR:

False negative rate

Qos:

Quality of service

LOF:

Local outlier factor

NB:

Naïve Bayes

MSSCA:

Multi-Split Spam Corpus Algorithm

FPR:

False-positive rates

DE:

Differential evolution

DT:

Decision tree

MCC:

Mathews correlation coefficient

FDR:

False discovery rate

NPV:

Negative predictive value

MAMH:

MAS as Metaheuristic

BMAMH:

Binary MAMH

TFIDF:

Term frequency inverse document frequency

VSA:

Vortex search algorithm

References

  • Abedi, M., Gharehchopogh, F.S.: An improved opposition based learning firefly algorithm with dragonfly algorithm for solving continuous optimization problems. Intell. Data Anal. 24(2), 309–338 (2020)

    Article  Google Scholar 

  • Al-Rawashdeh, G., Mamat, R., Rahim, N.H.B.A.: Hybrid water cycle optimization algorithm with simulated annealing for spam e-mail detection. IEEE Access 7, 143721–143734 (2019)

    Article  Google Scholar 

  • Angulakshmi, M., Priya, G.G.L.: Walsh Hadamard transform for simple linear iterative clustering (SLIC) superpixel based spectral clustering of multimodal MRI brain tumor segmentation. IRBM 40, 253–262 (2019)

    Article  Google Scholar 

  • Awad, W.A., Elseuofi, S.M.: Machine learning methods for spam e-mail classification. Int. J. Comput. Sci. Inf. Technol. 3(1), 173–184 (2011)

  • Beno, M.M., Valarmathi, I.R., Swamy, S.M., Rajakumar, B.R.: Threshold prediction for segmenting tumour from brain MRI scans. Int. J. Imaging Syst. Technol. 24(2), 129–137 (2014)

    Article  Google Scholar 

  • Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42, 3105–3114 (2015)

    Article  Google Scholar 

  • Bhuiyan, H., Ashiquzzaman, A., Juthi, T.I., Biswas, S., Ara, J.: A survey of existing e-mail spam filtering methods considering machine learning techniques. Global J. Comput. Sci. Technol. 1(2), 0975–4172 (2018)

  • Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29, 63–92 (2008)

    Article  Google Scholar 

  • Bonyadi, M.R., Michalewicz, Z.: Analysis of stability, local convergence, and transformation sensitivity of a variant of the particle swarm optimization algorithm. IEEE Trans. Evol. Comput. 20(3), 370–385 (2016)

    Article  Google Scholar 

  • Chikh, R., Chikhi, S.: Clustered negative selection algorithm and fruit fly optimization for email spam detection. J. Ambient. Intell. Humaniz. Comput. 10, 143–152 (2019)

    Article  Google Scholar 

  • Diale, M., Celik, T., Van Der Walt, C.: Unsupervised feature learning for spam email filtering. Comput. Electr. Eng. 74, 89–104 (2019)

    Article  Google Scholar 

  • Dizaji, Z.A., Gharehchopogh, F.S.: A hybrid of ant colony optimization and chaos optimization algorithms approach for software cost estimation. Indian J. Sci. Technol. 8(2), 128–133 (2015)

  • El-Alfy, E.-S., Abdel-Aal, R.E.: Using GMDH-based networks for improved spam detection and email feature analysis. Appl. Soft Comput. 11(1), 477–488 (2011)

    Article  Google Scholar 

  • Faris, H., Al-Zoubi, A.M., Asgharheidari, A., Aljarah, I., Mafarja, M., Hassonah, M.A., Fujita, H.: An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf. Fusion. 48, 67–83 (2019)

    Article  Google Scholar 

  • Gbengadadaa, E., et al.: Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5(6), e01802 (2019)

  • Gharehchopogh, F.S., Gholizadeh, H.: A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm Evol. Comput. 48, 1–24 (2019)

    Article  Google Scholar 

  • Gharehchopogh, F.S., Shayanfar, H., Gholizadeh, H.: A comprehensive survey on symbiotic organisms search algorithms. Artif. Intell. Rev. 53(3), 2265–2312 (2020)

    Article  Google Scholar 

  • Gharehchopogh, F.S., Maleki, I., Dizaji, Z.A.: Chaotic vortex search algorithm: metaheuristic algorithm for feature selection. Evol. Intell. 1–32 (2021)

  • Gibson, S., Issac, B., Zhang, L., Jacob, S.M.: Detecting spam email with machine learning optimized with bio-inspired metaheuristic algorithms. IEEE Access 8, 187914–187932 (2020)

    Article  Google Scholar 

  • Guangjun, L., Nazir, S., Khan, H.U., Ul-Haq, A.: Spam detection approach for secure mobile message communication using machine learning algorithms. Secur. Commun. Netw. 2020 (2020)

  • Idris, I., Selamat, A., Omatu, S.: Hybrid email spam detection model with negative selection algorithm and differential evolution. Eng. Appl. Artif. Intell. 28, 97–110 (2014)

    Article  Google Scholar 

  • Idris, I., Selamat, A.: Improved email spam detection model with negative selection algorithm and particle swarm optimization. Appl. Soft Comput. 22, 11–27 (2014)

    Article  Google Scholar 

  • Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K.: Efficient clustering of emails into spam and ham: the foundational study of a comprehensive unsupervised framework. IEEE Access 8, 154759–154788 (2020)

    Article  Google Scholar 

  • Kruthika, K.R., Maheshappa, H.D.: CBIR system using capsule networks and 3D CNN for Alzheimer’s disease diagnosis. Inform. Med. Unlocked. 14, 59–68 (2019)

    Article  Google Scholar 

  • Kumaresan, T., Saravanakumar, S., Balamurugan, R.: Visual and textual features based email spam classification using S-Cuckoo search and hybrid kernel support vector machine. Clust. Comput. 22, 33–46 (2019)

    Article  Google Scholar 

  • Lopes, C., Cortez, P., Sousa, P., Rocha, M., Rio, M.: Symbiotic filtering for spam email detection. Expert Syst. Appl. 38(8), 9365–9372 (2011)

    Article  Google Scholar 

  • Mallampati, D., Hegde, N.P.: A machine learning based email spam classification framework model: related challenges and issues. Int. J. Innov. Technol. Explor. Eng. 9(4) (2020)

  • Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014)

    Article  Google Scholar 

  • Mirjalili, S., Lewis, A.: The Whale Optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)

    Article  Google Scholar 

  • Mohammadzadeh, H., Gharehchopogh, F.S.: A multi-agent system based for solving high-dimensional optimization problems: a case study on email spam detection. Int. J. Commun. Syst. 34(3), e4670 (2021)

    Article  Google Scholar 

  • Mohmmadzadeh, H., Gharehchopogh, F.S.: An efficient binary chaotic symbiotic organisms search algorithm approaches for feature selection problems. J. Supercomput. (8), 1–43 (2021)

  • Murugavel, U., Santhi, R.: Detection of spam and threads identification in E-mail spam corpus using content based text analytics method. Mater. Today Proc. 33(Part 7), 3319–3323 (2020)

  • Naem, A.A., Ghali, N.I., Saleh, A.A.: Antlion optimization and boosting classifier for spam email detection. Future Comput. Inform. J. 3(2), 436–442 (2018)

    Article  Google Scholar 

  • Olatunji, S.O.: Improved email spam detection model based on support vector machines. Neural Comput. Appl. 31, 691–699 (2019)

    Article  Google Scholar 

  • Ouyang, Tu., Ray, S., Allman, M., Rabinovich, M.: A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise. Comput. Netw. 59, 101–121 (2014)

    Article  Google Scholar 

  • Rahnema, N., Gharehchopogh, F.S.: An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering. Multimedia Tools Appl. 79(43), 32169–32194 (2020)

    Article  Google Scholar 

  • Ramprasad, M., Chowdary, N.H., Reddy, K.J., Gaurav, V.: Email spam detection using Python and Machine Learning. Turk. J. Physiother. Rehabilit. 32(3), 2651–4451 (2021)

  • Renuka, K., Hamsapriya, T.: Email classification for spam detection using word stemming. Int. J. Comput. Appl. 5(5), 58–60 (2010)

    Google Scholar 

  • Shadravan, S., Naji, H.R., Bardsiri, V.K.: The Sailfish Optimizer: a novel nature-inspired metaheuristic algorithm for solving constrained engineering optimization problems. Eng. Appl. Artif. Intell. 80, 20–34 (2019)

    Article  Google Scholar 

  • Sharma, P., Bhardwaj, U.: Machine learning based spam e-mail detection. Int. J. Intell. Eng. Syst. 11(3) (2017)

  • Shuaib, M., Abdulhamid, S.M., Adebayo, O.S., et al.: Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification. SN Appl. Sci. 1(390), 1–17 (2019)

    Google Scholar 

  • Song, Q., Wu, Y., Soh, Y.C.: Robust adaptive gradient-descent training algorithm for recurrent neural networks in discrete time domain. IEEE Trans. Neural Netw. 19(11), 1841–1853 (2008)

    Article  Google Scholar 

  • Sreedharan, N.P.N., Ganesan, B., Raveendran, R., Sarala, P., Dennis, B., Boothalingam, R.: Grey Wolf optimisation-based feature selection and classification for facial emotion recognition. IET Biom. 7(5), 490–499 (2018)

    Article  Google Scholar 

  • Sumathi, S., Pugalendhi, G.K.: Cognition based spam mail text analysis using combined approach of deep neural network classifier and random forest. J. Ambient Intell. Humaniz. Comput. 12, 5721–5731 (2020b)

    Article  Google Scholar 

  • Tsang, S., Kao, B., Yip, K.Y., Ho, W., Lee, S.D.: Decision trees for uncertain data. IEEE Trans. Knowl. Data Eng. 23(1), 64–78 (2011)

    Article  Google Scholar 

  • Wu, J., Yang, H.: Linear regression-based efficient SVM learning for large-scale classification. IEEE Trans. Neural Netw. Learn. Syst. 26(10), 2357–2369 (2015)

    MathSciNet  Article  Google Scholar 

  • Wu, D., et al.: Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1583–1597 (2016)

    Article  Google Scholar 

  • Yang, Y.: Research and realization of internet public opinion analysis based on improved TF—IDF algorithm. In: 16th International symposium on distributed computing and applications to business, engineering and science (2017)

  • Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R.: Efficient knn classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1774–1785 (2018)

    MathSciNet  Article  Google Scholar 

  • Zhang, H., Jolfaei, A., Alazab, M.: A face emotion recognition method using convolutional neural network and image edge computing. IEEE Access 7, 159081–159089 (2019)

    Article  Google Scholar 

  • Zhao, C., Gao, F.: A nested-loop Fisher discriminant analysis algorithm. Chemom. Intell. Lab. Syst. 146, 396–406 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kadam Vikas Samarthrao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Samarthrao, K.V., Rohokale, V.M. A hybrid meta-heuristic-based multi-objective feature selection with adaptive capsule network for automated email spam detection. Int J Intell Robot Appl 6, 497–521 (2022). https://doi.org/10.1007/s41315-021-00217-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41315-021-00217-9

Keywords

  • Spam email
  • Email spam detection
  • Multi-objective feature selection
  • Adaptive capsule network
  • Grey-Sail Fish Optimization Algorithm