A hybrid spam detection method based on unstructured datasets

Shao, Yeqin; Trovati, Marcello; Shi, Quan; Angelopoulou, Olga; Asimakopoulou, Eleana; Bessis, Nik

doi:10.1007/s00500-015-1959-z

A hybrid spam detection method based on unstructured datasets

Methodologies and Application
Published: 21 December 2015

Volume 21, pages 233–243, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yeqin Shao^1,4,
Marcello Trovati¹,
Quan Shi^1,5,
Olga Angelopoulou²,
Eleana Asimakopoulou¹ &
…
Nik Bessis³

608 Accesses
12 Citations
Explore all metrics

Abstract

The identification of non-genuine or malicious messages poses a variety of challenges due to the continuous changes in the techniques utilised by cyber-criminals. In this article, we propose a hybrid detection method based on a combination of image and text spam recognition techniques. In particular, the former is based on sparse representation-based classification, which focuses on the global and local image features, and a dictionary learning technique to achieve a spam and a ham sub-dictionary. On the other hand, the textual analysis is based on semantic properties of documents to assess the level of maliciousness. More specifically, we are able to distinguish between meta-spam and real spam. Experimental results show the accuracy and potential of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Article 21 March 2022

Fighting against phishing attacks: state of the art and future challenges

Article 17 March 2016

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

Notes

This database is available at http://www.cs.jhu.edu/~mdredze/datasets/image_spam/.

References

Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. Signal Process IEEE Trans 54(11):4311–4322
Article Google Scholar
Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. Pattern Anal Mach Intell IEEE Trans 28(12):2037–2041
Article MATH Google Scholar
Al-Duwairi B, Khater I, Al-Jarrah O (2012) Detecting image spam using image texture features. Int J Inf Secur Res (IJISR) 2(3/4):344–353
Google Scholar
Byun B, Lee C, Webb S, Pu C (2007) A discriminative classifier learning approach to image modeling and spam image identification. In: Proceedings of CEAS 2007
Castiglione A, De Santis A, Fiore U, Palmieri F (2012) An asynchronous covert channel using spam. Comput Math Appl 63(2):437–447
Article Google Scholar
Castiglione A, De Santis A, Fiore U, Palmieri F (2011) E-mail-based covert channels for asynchronous message steganography.In: Proceedings of the Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS)
Davis S, Craney G (2004) How Do I Stop Spam? http://www.spamhelp.org/articles/HowDoIStopSpam
Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Proceedings of the Conference on Email and Anti-Spam (CEAS 2007), pp 487–493
Drucker H, Wu S, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5)
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Fumera G, Pillai I, Roli F (2006) spam filtering based on the analysis of text information embedded into images. J Mach Learn Res 7:2699–2720
Google Scholar
Ghit B, Voicu O, Pop F, Cristea V (2009) Distributed agent platform with intrusion detection capabilities.In: Proceedings of international conference on intelligent networking and collaborative systems (INCOS ’09 )
Hare JS, Sinclair PAS, Lewis PH, Martinez K, Enser PGB, Sandom CJ (2006) Bridging the semantic gap in multimedia information retrieval: topdown and bottom-up approaches.In: Proceedings of mastering the gap: from information extraction to semantic representation/3rd European Semantic Web Conference
Issac B, Raman V (2006) Spam detection proposal in regular and text-based image emails. In: 2006 IEEE Region 10 Conference TENCON, Hong Kong, pp 1–4
Kellett S (2005) Legislative Definition of Spam for New Zealand. http://www.victoria.ac.nz/law/research/publications/vuwlr/prev-issues/pdf/vol-36-2005/issue-3/kellet
Kreutz-Delgado K et al (2003) Dictionary learning algorithms for sparse representation. Neural Comput. 15(2):349–396
Article MATH Google Scholar
Kuropka D (2003) Modelle zur Reprasentation naturlichsprachlicher Dokumente. Logos Verlag, Berlin
Google Scholar
Lee H et al (2006) Efficient sparse coding algorithms. In: Advances in neural information processing systems
Mairal J et al (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM
Mehta B et al (2008) Detecting image spam using visual features and near duplicate detection. In: Proceedings of the 17th international conference on World Wide Web. ACM
Menard S (2002) Applied logistic regression analysis. Vol. 106. Sage
Nhung, NP, Phuong TM (2007) An efficient method for filtering image-based spam. In: Research, Innovation and Vision for the Future, 2007 IEEE International Conference on IEEE
Nitin J, Bing L (2007) Review spam detection. In: Proceedings of the 16th International Conference on World Wide Web
Palmieri F, Fiore U, Castiglione A, De Santis A (2013) On the detection of card-sharing traffic through wavelet analysis and Support Vector Machines. Appli Soft Comput 13(1):615–627
Palmieri F, Fiore U, Castiglione A (2014) A distributed approach to network anomaly detection based on independent component analysis. Concurr Comput Pract Exp 26(5)
Scholkopft B, Mullert KR (1999) Fisher discriminant analysis with kernels. Neural Netw Signal Process IX
Serbanescu V, Pop F, Cristea V, Antoniu G (2015) A formal method for rule analysis and validation in distributed data aggregation service. World Wide Web 18(6):1717–1736
Article Google Scholar
Serbanescu V, Pop F, Cristea V, Antoniu G (2014) Architecture of distributed data aggregation service. In: Proceedings of IEEE 28th international conference on advanced information networking and applications (AINA)
Trovati M, Bessis N (2015) An influence assessment method based on co-occurrence for topologically reduced big data sets. Soft Comput. doi:10.1007/s00500-015-1621-9
Wertheimer M (2015) The Mathematics Community and the NSA, Notices of the AMS Volume 62, Number 2
Win ZM, Aye N (2013) Identification of image spam by using histogram and hough transform. Intern J Sci Res 2(11)
Youn S, McLeod D (2009) Improved spam filtering by extraction of information from text embedded image email. In: SAC 2009, ACM, Honolulu, pp. 1754–1755
Zhang C (2009) Image spam clustering: an unsupervised approach. In: Proceedings of the First ACM workshop on Multimedia in forensics. ACM
Zhang C et al (2009) A multimodal data mining framework for revealing common sources of spam images. J Multimed 4(5):313–320
Article Google Scholar
Zhong J, Zhou Y, Deng W (2013) Filtering image-based Spam Using Multifractal analysis and active learning feedback-driven semi-supervised support vector machine. In: Conference Anthology, IEEE
Zhu M, Martinez AM (2006) Subclass discriminant analysis. Pattern Anal Mach Intell IEEE Trans 28(8):1274–1286
Article Google Scholar

Download references

Acknowledgments

The paper is supported by the National Science Foundation of China (61171132), the Natural Science Foundation of Jiangsu Province (BK2015022392), the Talent Project of Jiangsu Province of China (2014WLW029), and the Technology Platform Projects of Nantong (CP2013001).

Author information

Authors and Affiliations

Department of Computing and Mathematics, University of Derby, Derby, UK
Yeqin Shao, Marcello Trovati, Quan Shi & Eleana Asimakopoulou
Computer Science Department, University of Hertfordshire, Hatfield, Hertfordshire, AL10 9AB, UK
Olga Angelopoulou
Department of Computing, Edge Hill University, Ormskirk, Lancashire, L39 4QP, UK
Nik Bessis
Institute of Image Processing & Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China
Yeqin Shao
School of Computer Science and Technology, Nantong University, Nantong, 226019, Jiangsun, China
Quan Shi

Authors

Yeqin Shao
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Trovati
View author publications
You can also search for this author in PubMed Google Scholar
Quan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Olga Angelopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Eleana Asimakopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Nik Bessis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcello Trovati.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, Y., Trovati, M., Shi, Q. et al. A hybrid spam detection method based on unstructured datasets. Soft Comput 21, 233–243 (2017). https://doi.org/10.1007/s00500-015-1959-z

Download citation

Published: 21 December 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s00500-015-1959-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid spam detection method based on unstructured datasets

Abstract

Access this article

Similar content being viewed by others

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Fighting against phishing attacks: state of the art and future challenges

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid spam detection method based on unstructured datasets

Abstract

Access this article

Similar content being viewed by others

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Fighting against phishing attacks: state of the art and future challenges

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation