A Personal Antispam System Based on a Behaviour-Knowledge Space Approach

  • Francesco Gargiulo
  • Antonio Penta
  • Antonio Picariello
  • Carlo Sansone
Part of the Studies in Computational Intelligence book series (SCI, volume 245)


In their daily work and common life, people suffer serious problems with Unsolicited Commercial E-mails (UCE), commonly known as spam: common people, small companies and large public or private institutions feel that spam has weakened the reliability and effectiveness of email as an efficient tool for communicating. To establish simple, fast and effective countermeasures against spam attacks is a necessary strategy of a modern mailing management system. In this chapter we describe a novel method for detecting spam messages, analyzing both text and image attached components. In particular, we describe an architecture for deploying a personal antispam system able to overcome some problems that are still besetting the state-of-the-art spam filters. Text analysis is accomplished by considering recent advances in both semantic and syntactic analysis; in addition, spammers tricks based on images are also taken into account. A Behaviour Knowledge Space approach for fusing the different results coming from the analysis of the different parts of the e-mails enhances the performance of the proposed system, as described by the experiments we have carried out.


text-based and image-based spam Singular Value Decomposition Latent Semantic Analysis Pattern Recognition Image Analysis combining classifiers Behaviour Knowledge Space 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An evaluation of Naive Bayesian anti-spam filtering. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 9–17. Springer, Heidelberg (2000)Google Scholar
  2. 2.
    Aradhye, H.B., Myers, G.K., Herson, J.A.: Image analysis for efficient categorization of image-based spam e-mail. In: Proc. 8th Int. Conf. Document Analysis and Recogn, Seoul, Korea, pp. 914–918. IEEE Comp. Soc., Los Alamitos (2005)Google Scholar
  3. 3.
    Balakumar, M., Vaidehi, V.: Ontology based classification and categorization of email. In: Proc. Int. Conf. Sign. Proc., Communications and Networking, Chennai, India, pp. 199–202. IEEE Comp. Soc., Los Alamitos (2008)CrossRefGoogle Scholar
  4. 4.
    Biggio, B., Fumera, G., Roli, F.: Adversarial pattern classification using multiple classifiers and randomisation. In: da Vitoria Lobo, N., Kasparis, T., Roli, F., Kwok, J.T., Georgiopoulos, M., Anagnostopoulos, G.C., Loog, M. (eds.) S+SSPR 2008. LNCS, vol. 5342, pp. 500–509. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Biggio, B., Fumera, G., Pillai, I., Roli, F.: Image spam filtering using visual information. In: Cucchiara, R. (ed.) Proc. 14th Int. Conf. Image Analysis and Proc., Modena, Italy, pp. 105–110. IEEE Comp. Soc., Los Alamitos (2007)Google Scholar
  6. 6.
    Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. TR DIT-06-056, Informatica e Telecomunicazioni, University of Trento, Italy (2006)Google Scholar
  7. 7.
    Cheng, H., Qin, Z., Liu, Q., Wan, M.: Spam image discrimination using support vector machine based on higher-order local autocorrelation feature extraction. In: Proc. IEEE Conf. Cybern. Intell. Syst., Chendgu, China, pp. 1017–1021. IEEE Comp. Soc., Los Alamitos (2008)CrossRefGoogle Scholar
  8. 8.
    Cohen, W.W.: Learning rules that classify e-mail. In: Proc. AAAI Spring Symp. Mach. Learn. in Inf. Access, pp. 18–25. AAAI Press, Menlo Park (1996)Google Scholar
  9. 9.
    Damiani, E., De Capitani di Vimercati, S., Paraboschi, S., Samarati, P.: P2P-based collaborative spam detection and filtering. In: Caronni, G., Weiler, N., Shahmehri, N. (eds.) Proc. 4th Int. Conf. Peer-to-Peer Computing, Zurich, Switzerland, pp. 176–183. IEEE Comp. Soc., Los Alamitos (2004)CrossRefGoogle Scholar
  10. 10.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  11. 11.
    Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: Proc. 4th Conf. Email and Anti-Spam, Mountain View, CA, pp. 487–493 (2007)Google Scholar
  12. 12.
    Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Networks 10(5), 1048–1054 (1999)CrossRefGoogle Scholar
  13. 13.
    Fumera, G., Pillai, I., Roli, F.: Spam filtering based on the analysis of text information embedded into images. J. Mach. Learn. Research 7, 2699–2720 (2006)Google Scholar
  14. 14.
    Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas, T.N., Choudhary, A.: Image spam hunter. In: Proc. IEEE Int. Conf. Acoustics, Speech and Sign. Proc., Las Vegas, NV, pp. 1765–1768. IEEE Comp. Soc., Los Alamitos (2008)Google Scholar
  15. 15.
    Gargiulo, F., Sansone, C.: Visual and OCR-based features for detecting image spam. In: Juan-Císcar, A., Sánchez-Albaladejo, G. (eds.) Proc. 8th Int. Workshop Patt. Recogn. Inf. Syst., Barcelona, Spain, pp. 154–163. INSTICC Press, Setúbal (2008)Google Scholar
  16. 16.
    Han, A., Kim, H.-J., Ha, I., Jo, G.-S.: Semantic analysis of user behaviors for detecting spam mail. In: Proc. IEEE Int. Workshop Semantic Computing and Appl., Incheon, Korea, pp. 91–95. IEEE Comp. Soc., Los Alamitos (2008)CrossRefGoogle Scholar
  17. 17.
    Haralick, R.M.: Statistical and structural approaches to texture. Proceedings of IEEE 67(5), 786–804 (1979)CrossRefGoogle Scholar
  18. 18.
    Huang, H., Guo, W., Zhang, Y.: A novel method for image spam filtering. In: Proc. 9th Int. Conf. Young Comp. Scientists, Zhang Jia Jie, Hunan, China, pp. 826–830. IEEE Comp. Soc., Los Alamitos (2008)CrossRefGoogle Scholar
  19. 19.
    Huang, Y.S., Suen, C.Y.: A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. Pattern Analysis and Mach. Intell. 17(1), 90–94 (1995)CrossRefGoogle Scholar
  20. 20.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing. In: Computational Linguistics and Speech Recognition. Prentice Hall, Upper Saddle River (2009)Google Scholar
  21. 21.
    Liu, W., Fang, W.: Adaptive spam filtering based on fingerprint vectors. In: Proc. ISECS Int. Colloquium Computing, Communication, Control, and Management, Guangzhou, China, pp. 384–388. IEEE Comp. Soc., Los Alamitos (2008)CrossRefGoogle Scholar
  22. 22.
    Lochbaum, K.E., Streeter, L.A.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval. Inf. Proc. and Management 25(6), 665–676 (1989)CrossRefGoogle Scholar
  23. 23.
    Manning, C., Schuetze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  24. 24.
    Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with Naive Bayes – which Naive Bayes? In: Proc. 3rd Conf. Email and Anti-Spam, Mountain View, CA (2006)Google Scholar
  25. 25.
    Okabe, M., Yamada, S.: Interactive spam filtering with active learning and feature selection. In: Proc. IEEE/WIC/ACM Int. Conf. Web Intell. and Intell. Agent Technology, Sydney, NSW, Australia, pp. 165–168. IEEE Comp. Soc., Los Alamitos (2008)CrossRefGoogle Scholar
  26. 26.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  27. 27.
    Schryen, G.: Anti-Spam Measures: Analysis and Design. Springer, New York (2007)Google Scholar
  28. 28.
    Wan, M., Zhang, F., Cheng, H., Liu, Q.: Text localization in spam image using edge features. In: Proc. Int. Conf. Communications, Circuits and Syst., Fujian, China, pp. 838–842. IEEE Comp. Soc., Los Alamitos (2008)Google Scholar
  29. 29.
    Wu, C.T., Cheng, K.T., Zhu, Q.A., Wu, Y.L.: Using visual features for anti-spam filtering. In: Proc. IEEE Conf. Image Processing, Genoa, Italy, pp. 509–512. IEEE Comp. Soc., Los Alamitos (2005)Google Scholar
  30. 30.
    Zhou, F., Zhuang, L., Zhao, B.Y., Huang, L., Joseph, A.D., Kubiatowicz, J.: Approximate object location and spam filtering on peer-to-peer systems. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 1–20. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Francesco Gargiulo
    • 1
  • Antonio Penta
    • 1
  • Antonio Picariello
    • 1
  • Carlo Sansone
    • 1
  1. 1.Dipartimento di Informatica e SistemisticaUniversity of Naples Federico IIItaly

Personalised recommendations