Advertisement

Soft Computing

, Volume 21, Issue 1, pp 233–243 | Cite as

A hybrid spam detection method based on unstructured datasets

  • Yeqin Shao
  • Marcello TrovatiEmail author
  • Quan Shi
  • Olga Angelopoulou
  • Eleana Asimakopoulou
  • Nik Bessis
Methodologies and Application

Abstract

The identification of non-genuine or malicious messages poses a variety of challenges due to the continuous changes in the techniques utilised by cyber-criminals. In this article, we propose a hybrid detection method based on a combination of image and text spam recognition techniques. In particular, the former is based on sparse representation-based classification, which focuses on the global and local image features, and a dictionary learning technique to achieve a spam and a ham sub-dictionary. On the other hand, the textual analysis is based on semantic properties of documents to assess the level of maliciousness. More specifically, we are able to distinguish between meta-spam and real spam. Experimental results show the accuracy and potential of our approach.

Keywords

Image spam Text spam Semantic networks Classification Subclass discriminant analysis Feature selection Sparse representation 

Notes

Acknowledgments

The paper is supported by the National Science Foundation of China (61171132), the Natural Science Foundation of Jiangsu Province (BK2015022392), the Talent Project of Jiangsu Province of China (2014WLW029), and the Technology Platform Projects of Nantong (CP2013001).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. Signal Process IEEE Trans 54(11):4311–4322CrossRefGoogle Scholar
  2. Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. Pattern Anal Mach Intell IEEE Trans 28(12):2037–2041CrossRefzbMATHGoogle Scholar
  3. Al-Duwairi B, Khater I, Al-Jarrah O (2012) Detecting image spam using image texture features. Int J Inf Secur Res (IJISR) 2(3/4):344–353Google Scholar
  4. Byun B, Lee C, Webb S, Pu C (2007) A discriminative classifier learning approach to image modeling and spam image identification. In: Proceedings of CEAS 2007Google Scholar
  5. Castiglione A, De Santis A, Fiore U, Palmieri F (2012) An asynchronous covert channel using spam. Comput Math Appl 63(2):437–447CrossRefGoogle Scholar
  6. Castiglione A, De Santis A, Fiore U, Palmieri F (2011) E-mail-based covert channels for asynchronous message steganography.In: Proceedings of the Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS)Google Scholar
  7. Davis S, Craney G (2004) How Do I Stop Spam? http://www.spamhelp.org/articles/HowDoIStopSpam
  8. Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Proceedings of the Conference on Email and Anti-Spam (CEAS 2007), pp 487–493Google Scholar
  9. Drucker H, Wu S, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5)Google Scholar
  10. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976Google Scholar
  11. Fumera G, Pillai I, Roli F (2006) spam filtering based on the analysis of text information embedded into images. J Mach Learn Res 7:2699–2720Google Scholar
  12. Ghit B, Voicu O, Pop F, Cristea V (2009) Distributed agent platform with intrusion detection capabilities.In: Proceedings of international conference on intelligent networking and collaborative systems (INCOS ’09 )Google Scholar
  13. Hare JS, Sinclair PAS, Lewis PH, Martinez K, Enser PGB, Sandom CJ (2006) Bridging the semantic gap in multimedia information retrieval: topdown and bottom-up approaches.In: Proceedings of mastering the gap: from information extraction to semantic representation/3rd European Semantic Web ConferenceGoogle Scholar
  14. Issac B, Raman V (2006) Spam detection proposal in regular and text-based image emails. In: 2006 IEEE Region 10 Conference TENCON, Hong Kong, pp 1–4Google Scholar
  15. Kreutz-Delgado K et al (2003) Dictionary learning algorithms for sparse representation. Neural Comput. 15(2):349–396CrossRefzbMATHGoogle Scholar
  16. Kuropka D (2003) Modelle zur Reprasentation naturlichsprachlicher Dokumente. Logos Verlag, BerlinGoogle Scholar
  17. Lee H et al (2006) Efficient sparse coding algorithms. In: Advances in neural information processing systemsGoogle Scholar
  18. Mairal J et al (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACMGoogle Scholar
  19. Mehta B et al (2008) Detecting image spam using visual features and near duplicate detection. In: Proceedings of the 17th international conference on World Wide Web. ACMGoogle Scholar
  20. Menard S (2002) Applied logistic regression analysis. Vol. 106. SageGoogle Scholar
  21. Nhung, NP, Phuong TM (2007) An efficient method for filtering image-based spam. In: Research, Innovation and Vision for the Future, 2007 IEEE International Conference on IEEEGoogle Scholar
  22. Nitin J, Bing L (2007) Review spam detection. In: Proceedings of the 16th International Conference on World Wide WebGoogle Scholar
  23. Palmieri F, Fiore U, Castiglione A, De Santis A (2013) On the detection of card-sharing traffic through wavelet analysis and Support Vector Machines. Appli Soft Comput 13(1):615–627Google Scholar
  24. Palmieri F, Fiore U, Castiglione A (2014) A distributed approach to network anomaly detection based on independent component analysis. Concurr Comput Pract Exp 26(5)Google Scholar
  25. Scholkopft B, Mullert KR (1999) Fisher discriminant analysis with kernels. Neural Netw Signal Process IX Google Scholar
  26. Serbanescu V, Pop F, Cristea V, Antoniu G (2015) A formal method for rule analysis and validation in distributed data aggregation service. World Wide Web 18(6):1717–1736CrossRefGoogle Scholar
  27. Serbanescu V, Pop F, Cristea V, Antoniu G (2014) Architecture of distributed data aggregation service. In: Proceedings of IEEE 28th international conference on advanced information networking and applications (AINA)Google Scholar
  28. Trovati M, Bessis N (2015) An influence assessment method based on co-occurrence for topologically reduced big data sets. Soft Comput. doi: 10.1007/s00500-015-1621-9
  29. Wertheimer M (2015) The Mathematics Community and the NSA, Notices of the AMS Volume 62, Number 2Google Scholar
  30. Win ZM, Aye N (2013) Identification of image spam by using histogram and hough transform. Intern J Sci Res 2(11)Google Scholar
  31. Youn S, McLeod D (2009) Improved spam filtering by extraction of information from text embedded image email. In: SAC 2009, ACM, Honolulu, pp. 1754–1755Google Scholar
  32. Zhang C (2009) Image spam clustering: an unsupervised approach. In: Proceedings of the First ACM workshop on Multimedia in forensics. ACMGoogle Scholar
  33. Zhang C et al (2009) A multimodal data mining framework for revealing common sources of spam images. J Multimed 4(5):313–320CrossRefGoogle Scholar
  34. Zhong J, Zhou Y, Deng W (2013) Filtering image-based Spam Using Multifractal analysis and active learning feedback-driven semi-supervised support vector machine. In: Conference Anthology, IEEEGoogle Scholar
  35. Zhu M, Martinez AM (2006) Subclass discriminant analysis. Pattern Anal Mach Intell IEEE Trans 28(8):1274–1286CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Yeqin Shao
    • 1
    • 4
  • Marcello Trovati
    • 1
    Email author
  • Quan Shi
    • 1
    • 5
  • Olga Angelopoulou
    • 2
  • Eleana Asimakopoulou
    • 1
  • Nik Bessis
    • 3
  1. 1.Department of Computing and MathematicsUniversity of DerbyDerbyUK
  2. 2.Computer Science DepartmentUniversity of HertfordshireHatfield, HertfordshireUK
  3. 3.Department of ComputingEdge Hill UniversityOrmskirk, LancashireUK
  4. 4.Institute of Image Processing & Pattern RecognitionShanghai Jiao Tong UniversityShanghaiChina
  5. 5.School of Computer Science and TechnologyNantong UniversityNantongChina

Personalised recommendations