Skip to main content

An Analysis of Case-Base Editing in a Spam Filtering System

  • Conference paper
Advances in Case-Based Reasoning (ECCBR 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3155))

Included in the following conference series:

Abstract

Because of the volume of spam email and its evolving nature, any deployed Machine Learning- based spam filtering system will need to have procedures for case-base maintenance. Key to this will be procedures to edit the case-base to remove noise and eliminate redundancy. In this paper we present a two stage process to do this. We present a new noise reduction algorithm called Blame-Based Noise Reduction that removes cases that are observed to cause misclassification. We also present an algorithm called Conservative Redundancy Reduction that is much less aggressive than the state-of-the-art alternatives and has significantly better generalisation performance in this domain. These new techniques are evaluated against the alternatives in the literature on four datasets of 1000 emails each (50% spam and 50% non spam).

This research was supported by funding from Enterprise Ireland under grant no. CFTD/03/ 219 and funding from Science Foundation Ireland under grant no. SFI-02IN.1I111.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. In: The ICCBR 2003 Workshop on Long-Lived CBR Systems, Trondheim, Norway (2003)

    Google Scholar 

  2. Androutsopoulos, I., Koutsias, J., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C., Stamatopoulos, P.: Learning to filter spam e-mail: A comparison of a naive Bayesian and a memory-based approach. In: Workshop on Machine Learning and Textual Information Access, at 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD (2000)

    Google Scholar 

  3. Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists Information Retrieval, vol. 6(1), pp. 49–73. Kluwer, Dordrecht (2003)

    Google Scholar 

  4. Smyth, B., McKenna, E.: Modelling the competence of case-bases. In: Smyth, B., Cunningham, P. (eds.) EWCBR 1998. LNCS (LNAI), vol. 1488, pp. 208–220. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Smyth, B., Keane, M.: Remembering to Forget: A Competence Preserving Case Deletion Policy for CBR Systems. In: Mellish, C. (ed.) Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 337–382. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  6. McKenna, E., Smyth, B.: Competence-guided Editing Methods for Lazy Learning. In: Proceedings of the 14th European Conference on Artificial Intelligence, Berlin (2000)

    Google Scholar 

  7. Wilson, D.R., Martinez, T.R.: Instance Pruning Techniques. In: Fisher, D. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning, pp. 404–411. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  8. Brighton, H., Mellish, C.: Advances in Instance Selection for Instance-Based Learning Algorithms. In: Data Mining and Knowledge Discovery, vol. 6, pp. 153–172. Kluwer Academic Publishers, The Netherlands (2002)

    Google Scholar 

  9. Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory 14(3), 515–516 (1968)

    Article  Google Scholar 

  10. Ritter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An Algorithm for a Selective Nearest Neighbor Decision Rule. IEEE Transactions on Information Theory 21(6), 665–669 (1975)

    Article  MATH  Google Scholar 

  11. Gates, G.W.: The Reduced Nearest Neighbor Rule. IEEE Transactions on Information Theory 18(3), 431–433 (1972)

    Article  Google Scholar 

  12. Wilson, D.L.: Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems. Man, and Cybernetics 2(3), 408–421 (1972)

    Article  MATH  Google Scholar 

  13. Tomek, I.: An Experiment with the Nearest Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics 6(6), 448–452 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  14. Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6, 37–66 (1991)

    Google Scholar 

  15. Zhang, J.: Selecting Typical Instances in Instance-Based Learning. In: Proceedings of the Ninth International Conference on Machine Learning, pp. 470–479 (1992)

    Google Scholar 

  16. Cameron-Jones, R.M.: Minimum Description Length Instance-Based Learning. In: Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence, pp. 368–373 (1992)

    Google Scholar 

  17. Brodley, C.: Addressing the Selective Superiority Problem: Automatic Algorithm/Mode Class Selection. In: Proceedings of the Tenth International Machine Learning Conference, pp. 17–24 (1993)

    Google Scholar 

  18. Zhu, J., Yang, Q.: Remembering to Add: Competence Preserving Case-Addition Policies for Case-Base Maintenance. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 234–239. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  19. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk Email. In: AAAI 1998 Workshop on Learning for Text Categorization, Madison,Wisconsin, pp. 55–62, AAAI Technical Report WS-98-05 (1998)

    Google Scholar 

  20. Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  21. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of ICML1997, 14th International Conference on Machine Learning, Nashville, US, pp. 412–420 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Delany, S.J., Cunningham, P. (2004). An Analysis of Case-Base Editing in a Spam Filtering System. In: Funk, P., González Calero, P.A. (eds) Advances in Case-Based Reasoning. ECCBR 2004. Lecture Notes in Computer Science(), vol 3155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28631-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28631-8_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22882-0

  • Online ISBN: 978-3-540-28631-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics