Extra-Tree Classifier with Metaheuristics Approach for Email Classification

  • Aakanksha SharaffEmail author
  • Harshil Gupta
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 924)


It is very normal for any user to receive hundreds of emails every day. Almost 93% of them are spam messages which include mainly advertisements from the industries like software, phishing, gambling, stocks, electronics, pharmaceutical, loan, and malware attempts etc. Spams messages not only waste user’s time but also eats up user valuable space. In this paper, a nature inspired metaheuristics technique has been used for email classification which emphasizes on reducing false-positive problem of treating spam messages as ham. It uses metaheuristics-based feature selection methods and employs extra-tree classifier to classify emails into spam and ham. The proposed model has accuracy of 95.5%, specificity of 93.7%, and F1-score of 96.3%, which is clearly a major improvement over the previous researches which have been conducted in this field using decision trees. The comparative analysis of extra-tree classifiers with other classifiers like decision trees and random forest has also been studied.


Ham and spam detection Feature selection Extra tree Binary particle swarm optimization 


  1. 1.
    Idris, I., Selamat, A., Nguyen, N.T., Omatu, S., Krejcar, O., Kuca, K., Penhaker, M.: A combined negative selection algorithm–particle swarm optimization for an email spam detection system. Eng. Appl. Artif. Intell. 39, 33–44 (2015)CrossRefGoogle Scholar
  2. 2.
    Brezočnik, L.: Feature selection for classification using particle swarm optimization. In: 17th International Conference on Smart Technologies, IEEE EUROCON 2017, pp. 966–971. IEEE (2017)Google Scholar
  3. 3.
    Chakraborty, B.: Feature subset selection by particle swarm optimization with fuzzy fitness function. In: 3rd International Conference on Intelligent System and Knowledge Engineering, 2008, ISKE 2008, vol. 1, pp. 1038–1042. IEEE (2008)Google Scholar
  4. 4.
    Wang, Y., Liu, Y., Feng, L., Zhu, X.: Novel feature selection method based on harmony search for email classification. Knowl.-Based Syst. 73, 311–323 (2015)CrossRefGoogle Scholar
  5. 5.
    Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)CrossRefGoogle Scholar
  6. 6.
    Sharaff, A., Nagwani, N.K.: Identifying categorical terms based on latent Dirichlet allocation for email categorization. In: Emerging Technologies in Data Mining and Information Security, pp. 431–437. Springer, Singapore (2019)Google Scholar
  7. 7.
    Aski, A.S., Sourati, N.K.: Proposed efficient algorithm to filter spam using machine learning techniques. Pac. Sci. Rev. A: Nat. Sci. Eng. 18(2), 145–149 (2016)Google Scholar
  8. 8.
    Cohen, A., Nissim, N., Elovici, Y.: Novel set of general descriptive features for enhanced detection of malicious emails using machine learning methods. Expert. Syst. Appl. (2018)Google Scholar
  9. 9.
    Almeida, T.A., Silva, T.P., Santos, I., Hidalgo, J.M.G.: Text normalization and semantic indexing to enhance instant messaging and SMS spam filtering. Knowl.-Based Syst. 108, 25–32 (2016)CrossRefGoogle Scholar
  10. 10.
    Proença, H.M., Vieira, S.M., Kaymak, U., Almeida, R.J., Sousa, J.M.: Optimizing probabilistic fuzzy systems for classification using metaheuristics. In: 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1635–1641. IEEE (2016)Google Scholar
  11. 11.
    Sharaff, A., Nagwani, N.K., Dhadse, A.: Comparative study of classification algorithms for spam email detection. In: Emerging Research in Computing, Information, Communication and Applications, pp. 237–244. Springer, New Delhi (2016)Google Scholar
  12. 12.
    Dong, H., Li, T., Ding, R., Sun, J.: A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl. Soft Comput. 65, 33–46 (2018)CrossRefGoogle Scholar
  13. 13.
    Polat, K., Güneş, S.: A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Syst. Appl. 36(2), 1587–1592 (2009)CrossRefGoogle Scholar
  14. 14.
    Wei, J., Zhang, R., Yu, Z., Hu, R., Tang, J., Gui, C., Yuan, Y.: A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection. Appl. Soft Comput. 58, 176–192 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of Technology RaipurRaipurIndia

Personalised recommendations