Does Sentiment Analysis Help in Bayesian Spam Filtering?

  • Enaitz EzpeletaEmail author
  • Urko Zurutuza
  • José María Gómez Hidalgo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9648)


Unsolicited email campaigns remain as one of the biggest threats affecting millions of users per day. During the last years several techniques to detect unsolicited emails have been developed. Among all proposed automatic classification techniques, machine learning algorithms have achieved more success, obtaining detection rates up to a 96 % [1]. This work provides means to validate the assumption that being spam a commercial communication, the semantics of its contents are usually shaped with a positive meaning. We produce the polarity score of each message using sentiment classifiers, and then we compare spam filtering classifiers with and without the polarity score in terms of accuracy. This work shows that the top 10 results of Bayesian filtering classifiers have been improved, reaching to a 99.21 % of accuracy.


Spam Polarity Security Bayes Sentiment analysis 



This work has been partially funded by the Basque Department of Education, Language policy and Culture under the project SocialSPAM (PI_2014_1_102).


  1. 1.
    Malarvizhi, R.: Content-based spam filtering and detection algorithms-an efficient analysis & comparison 1 (2013)Google Scholar
  2. 2.
  3. 3.
    Saadat, N.: Survey on spam filtering techniques. Commun. Netw. 3(3), 153–160 (2011)CrossRefGoogle Scholar
  4. 4.
    Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2007)CrossRefGoogle Scholar
  5. 5.
    Tretyakov, K.: Machine learning techniques in spam filtering. In: Data Mining Problem-oriented Seminar, MTAT, vol. 3, pp. 60–79 (2004)Google Scholar
  6. 6.
    Sanz, E.P., Hidalgo, J.M.G., Cortizo, J.C.: Email spam filtering. Adv. Comput. 74, 45–114 (2008)CrossRefGoogle Scholar
  7. 7.
    Teli, S., Biradar, S.: Effective spam detection method for email. In: International Conference on Advances in Engineering & Technology (2014)Google Scholar
  8. 8.
    Eberhardt, J.J.: Bayesian spam detection. University of Minnesota, Morris Undergraduate Journal, Scholarly Horizons (2015)Google Scholar
  9. 9.
    Liddy, E.: Natural language processing (2001)Google Scholar
  10. 10.
    Giyanani, R., Desai, M.: Spam detection using natural language processing. Int. J. Comput. Sci. Res. Technol. 1, 55–58 (2013)Google Scholar
  11. 11.
    Echeverria Briones, P.F., Altamirano Valarezo, Z.V., Pinto Astudillo, A.B., Sanchez Guerrero, J.D.C.: Text mining aplicado a la clasificación y distribución automática de correo electrónico y detección de correo spam (2009)Google Scholar
  12. 12.
    Lau, R.Y.K., Liao, S.Y., Kwok, R.C.W., Xu, K., Xia, Y., Li, Y.: Text mining and probabilistic language modeling for online review spam detection. ACM Trans. Manage. Inf. Syst. 2(4), 25:1–25:30 (2012)Google Scholar
  13. 13.
    Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, New York (2012)CrossRefGoogle Scholar
  14. 14.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  15. 15.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, Stroudsburg, PA, USA, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)Google Scholar
  16. 16.
    Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, USA (2002)Google Scholar
  17. 17.
    Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6, pp. 417–422. Citeseer (2006)Google Scholar
  18. 18.
    Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 2200–2204 (2010)Google Scholar
  19. 19.
    Ohana, B., Tierney, B.: Sentiment classification of reviews using sentiwordnet. In: 9th. IT & T Conference, p. 13 (2009)Google Scholar
  20. 20.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the ACL (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Enaitz Ezpeleta
    • 1
    Email author
  • Urko Zurutuza
    • 1
  • José María Gómez Hidalgo
    • 2
  1. 1.Electronics and Computing DepartmentMondragon UniversityArrasate-MondragónSpain
  2. 2.Pragsis TechnologiesMadridSpain

Personalised recommendations