EPIA 2007: Progress in Artificial Intelligence pp 53-62 | Cite as
Relaxing Feature Selection in Spam Filtering by Using Case-Based Reasoning Systems
Conference paper
Abstract
This paper presents a comparison between two alternative strategies for addressing feature selection on a well known case-based reasoning spam filtering system called SpamHunting. We present the usage of the k more predictive features and a percentage-based strategy for the exploitation of our amount of information measure. Finally, we confirm the idea that the percentage feature selection method is more adequate for spam filtering domain.
Keywords
Feature Selection Feature Selection Method Case Base Reasoning Concept Drift Feature Selection Approach
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Méndez, J.R., Fdez-Riverola, F., Iglesias, E.L., Díaz, F., Corchado, J.M.: A Comparative Performance Study of Feature Selection Methods for the Anti-Spam Filtering Domain. In: Proc. of the 6th Industrial Conference on Data Mining, pp. 106–120 (2006)Google Scholar
- 2.Méndez, J.R., Fdez-Riverola, F., Díaz, F., Iglesias, E.L., Corchado, J.M.: Tracking Concept Drift at Feature Selection Stage in SpamHunting: an Anti-Spam Instance-Based Reasoning System. In: Proc. of the 8th European Conference on Case-Based Reasoning, pp. 504–518 (2006)Google Scholar
- 3.Fdez-Riverola, F., Iglesias, E.L., Díaz, F., Méndez, J.R., Corchado, J.M.: SpamHunting: An Instance-Based Reasoning System for Spam Labeling and Filtering. Decision Support Systems (in press, 2007) http://dx.doi.org/10.1016/j.dss.2006.11.012
- 4.Méndez, J.R., Corzo, B., Glez-Peña, D., Fdez-Riverola, F., Díaz, F.: Analyzing the Performance of Spam Filtering Methods when Dimensionality of Input Vector Changes. In: Proc. of the 5th International Conference on Data Mining and Machine Learning (to appear, 2007)Google Scholar
- 5.Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes? In: Proc. of the 3rd Conference on Email and Anti-Spam, pp. 125–134 (2006), http://www.ceas.cc
- 6.Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 7.Delany, S.J., Cunningham, P., Coyle, L.: An Assessment of Case-base Reasoning for Spam Filtering. In: AICS 2004. Proc. of Fifteenth Irish Conference on Artificial Intelligence and Cognitive Science, pp. 9–18 (2004)Google Scholar
- 8.Lenz, M., Burkhard, H.D.: Case Retrieval Nets: Foundations, properties, implementation and results. Technical Report: Humboldt University, Berlin (1996)Google Scholar
- 9.Delany, S.J., Cunningham, P.: An Analysis of Case-Based Editing in a Spam Filtering System. In: Proceedings of the 7th European Conference on Case-Based Reasoning, pp. 128–141 (2004)Google Scholar
- 10.Fdez-Riverola, F., Iglesias, E.L., Díaz, F., Méndez, J.R., Corchado, J.M.: Applying Lazy Learning Algorithms to Tackle Concept Drift in Spam Filtering. ESWA: Expert Systems With Applications 33(1), 36–48 (2007)CrossRefGoogle Scholar
- 11.Méndez, J.R., González, C., Glez-Peña, G., Fdez-Riverola, F., Díaz, F., Corchado, J.M.: Assessing Classification Accuracy in the Revision Stage of a CBR Spam Filtering System. Lecture Notes on Artificial Intelligence (to appear, 2007)Google Scholar
- 12.Méndez, J.R., Iglesias, E.L., Fdez-Riverola, F., Díaz, F., Corchado, J.M.: Tokenising, Stemming and Stopword Removal on the Spam Filtering Domain. In: Proc. of the 11th Conference of the Spanish Association for Artificial Intelligence, pp. 449–458 (2005)Google Scholar
- 13.Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
- 14.Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI 1995. Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143 (1995)Google Scholar
- 15.Egan, J.P.: Signal Detection Theory and ROC Analysis. Academic Press, New York (1975)Google Scholar
- 16.Hasselband, V., Hedges, L.: Meta-analysis of diagnostics test. Psychological Bulletin 117, 167–178 (1995)CrossRefGoogle Scholar
- 17.Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to Filter Unsolicited Commercial E-Mail. Technical Report 2004/2, NCSR "Demokritos" (2004)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2007