Skip to main content

Spam E-Mail Classification Based on the IFWB Algorithm

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7802))

Included in the following conference series:

Abstract

The problem of spam e-mails has been addressed for some time. Most of the solutions are based on spam e-mail classification and filtering. However, the content of spam e-mails drifts with new concepts or social events. Thus, several spam classifiers perform effectively when their models are initially established, and their performances deteriorate with time. A learning mechanism is required to adjust the classification parameters for new and old e-mails. Because of the spread of spam e-mails, the number of spam e-mails is larger than that of legitimate e-mails. Therefore, most classifiers produce high recall for spam e-mails and low recall for legitimate e-mails. Based on the Bayesian algorithm, we propose an incremental forgetting weighted algorithm with a misclassification cost mechanism that extracts features by IGICF (Information Gain and Inverse Class Frequency) to address the problem of concept drift and data skew in spam e-mail classification. We implemented the algorithm and performed detailed tests on the effectiveness of the mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alguliev, R.M., Aliguliyev, R.M., Nazirova, S.A.: Classification of textual e-Mail spam using data mining techniques. Applied Computational Intelligence and Soft Computing, Article ID: 416308 (2011)

    Google Scholar 

  2. Almeida, T., Almeida, J., Yamakami, A.: Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers. Journal of Internet Services and Applications 1(3), 183–200 (2011)

    Article  Google Scholar 

  3. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6(1), 1–6 (2004)

    Article  Google Scholar 

  4. Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tacking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)

    Article  Google Scholar 

  5. Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  6. Fawcett, T.: In vivo spam filtering: a challenge problem for data mining. ACM SIGKDD Explorations Newsletter 5(2), 140–148 (2004)

    Article  Google Scholar 

  7. Fdez-Riverola, F., Iglesias, E.L., Díaz, F., Méndez, J.R., Corchado, J.M.: Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Systems with Applications 33(1), 36–48 (2007)

    Article  Google Scholar 

  8. Hayat, M.Z., Basiri, J., Seyedhossein, L., Shakery, A.: Content-Based Concept Drift Detection for Email Spam Filtering. In: 5th International Symposium on Telecommunications, pp. 531–536 (2010)

    Google Scholar 

  9. Katakis, I., Tsoumakas, G., Vlahavas, I.: On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 338–348. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Koychev, I.: Gradual Forgetting for Adaptation to Concept Drift. In: Proceedings of ECAI 2000 Workshop Current Issues in Spatio-Temporal Reasoning, pp. 101–106 (2000)

    Google Scholar 

  11. Monard, M.C., Batista, G.: Learning with skewed class distributions. Advances in Logic, Artificial Intelligence and Robotics, 173–180 (2002)

    Google Scholar 

  12. Porter, M.F.: An algorithm for suffix stripping. Program (Automated Library and Information Systems) 4(3), 130–137 (1980)

    Article  Google Scholar 

  13. Sculley, D., Wachman, G.M.: Relaxed Online SVMs for Spam Filtering. In: SIGIR 2007, pp. 415–422 (2007)

    Google Scholar 

  14. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail. In: Proceedings of the AAAI 1998 Workshop on Learning for Text Categorization, pp. 55–62 (1998)

    Google Scholar 

  15. Tseng, C.Y., Chen, M.S.: Incremental SVM model for spam detection on dynamic email social networks. In: Proceedings of CSE 2009 International Conference on Computer Science and Engineering, pp. 128–135 (2009)

    Google Scholar 

  16. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23(1), 69–101 (1996)

    Google Scholar 

  17. Xu, Y., Li, J., Wang, B., Sun, C., Zhang, S.: A Study of feature selection for text categorization on imbalanced data. Journal of Computer Research and Development 44, 58–62 (2007) (In Simplified Chinese)

    Article  Google Scholar 

  18. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of 14th Conference on Machine Learning, ICML 1997, pp. 412–420 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jou, C. (2013). Spam E-Mail Classification Based on the IFWB Algorithm. In: Selamat, A., Nguyen, N.T., Haron, H. (eds) Intelligent Information and Database Systems. ACIIDS 2013. Lecture Notes in Computer Science(), vol 7802. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36546-1_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36546-1_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36545-4

  • Online ISBN: 978-3-642-36546-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics