Abstract
The problem of spam e-mails has been addressed for some time. Most of the solutions are based on spam e-mail classification and filtering. However, the content of spam e-mails drifts with new concepts or social events. Thus, several spam classifiers perform effectively when their models are initially established, and their performances deteriorate with time. A learning mechanism is required to adjust the classification parameters for new and old e-mails. Because of the spread of spam e-mails, the number of spam e-mails is larger than that of legitimate e-mails. Therefore, most classifiers produce high recall for spam e-mails and low recall for legitimate e-mails. Based on the Bayesian algorithm, we propose an incremental forgetting weighted algorithm with a misclassification cost mechanism that extracts features by IGICF (Information Gain and Inverse Class Frequency) to address the problem of concept drift and data skew in spam e-mail classification. We implemented the algorithm and performed detailed tests on the effectiveness of the mechanism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alguliev, R.M., Aliguliyev, R.M., Nazirova, S.A.: Classification of textual e-Mail spam using data mining techniques. Applied Computational Intelligence and Soft Computing, Article ID: 416308 (2011)
Almeida, T., Almeida, J., Yamakami, A.: Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers. Journal of Internet Services and Applications 1(3), 183–200 (2011)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6(1), 1–6 (2004)
Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tacking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Fawcett, T.: In vivo spam filtering: a challenge problem for data mining. ACM SIGKDD Explorations Newsletter 5(2), 140–148 (2004)
Fdez-Riverola, F., Iglesias, E.L., DÃaz, F., Méndez, J.R., Corchado, J.M.: Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Systems with Applications 33(1), 36–48 (2007)
Hayat, M.Z., Basiri, J., Seyedhossein, L., Shakery, A.: Content-Based Concept Drift Detection for Email Spam Filtering. In: 5th International Symposium on Telecommunications, pp. 531–536 (2010)
Katakis, I., Tsoumakas, G., Vlahavas, I.: On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 338–348. Springer, Heidelberg (2005)
Koychev, I.: Gradual Forgetting for Adaptation to Concept Drift. In: Proceedings of ECAI 2000 Workshop Current Issues in Spatio-Temporal Reasoning, pp. 101–106 (2000)
Monard, M.C., Batista, G.: Learning with skewed class distributions. Advances in Logic, Artificial Intelligence and Robotics, 173–180 (2002)
Porter, M.F.: An algorithm for suffix stripping. Program (Automated Library and Information Systems) 4(3), 130–137 (1980)
Sculley, D., Wachman, G.M.: Relaxed Online SVMs for Spam Filtering. In: SIGIR 2007, pp. 415–422 (2007)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail. In: Proceedings of the AAAI 1998 Workshop on Learning for Text Categorization, pp. 55–62 (1998)
Tseng, C.Y., Chen, M.S.: Incremental SVM model for spam detection on dynamic email social networks. In: Proceedings of CSE 2009 International Conference on Computer Science and Engineering, pp. 128–135 (2009)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23(1), 69–101 (1996)
Xu, Y., Li, J., Wang, B., Sun, C., Zhang, S.: A Study of feature selection for text categorization on imbalanced data. Journal of Computer Research and Development 44, 58–62 (2007) (In Simplified Chinese)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of 14th Conference on Machine Learning, ICML 1997, pp. 412–420 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jou, C. (2013). Spam E-Mail Classification Based on the IFWB Algorithm. In: Selamat, A., Nguyen, N.T., Haron, H. (eds) Intelligent Information and Database Systems. ACIIDS 2013. Lecture Notes in Computer Science(), vol 7802. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36546-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-36546-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36545-4
Online ISBN: 978-3-642-36546-1
eBook Packages: Computer ScienceComputer Science (R0)