An Adaptive Concentration Selection Model for Spam Detection

Gao, Yang; Mi, Guyue; Tan, Ying

doi:10.1007/978-3-319-11857-4_26

Yang Gao¹⁸,
Guyue Mi¹⁸ &
Ying Tan¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8794))

Included in the following conference series:

International Conference in Swarm Intelligence

2706 Accesses
2 Citations

Abstract

Concentration based feature construction (CFC) approach has been proposed for spam detection. In the CFC approach, Global concentration (GC) and local concentration (LC) are used independently to convert emails to 2-dimensional or 2n-dimensional feature vectors. In this paper, we propose a novel model which selects concentration construction methods adaptively according to the match between testing samples and different kinds of concentration features. By determining which concentration construction method is proper for the current sample, the email is transformed into a corresponding concentration feature vector, which will be further employed by classification techniques in order to obtain the corresponding class. The k-nearest neighbor method is introduced in experiments to evaluate the proposed concentration selection model on the classic and standard corpora, namely PU1, PU2, PU3 and PUA. Experimental results demonstrate that the model performs better than using GC or LC separately, which provides support to the effectiveness of the proposed model and endows it with application in the real world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

CYREN: Internet threats trend report: April 2014. Tech. rep. (2014)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail. In: Learning for Text Categorization: Papers from the 1998 Workshop, vol. 62, pp. 98–105. AAAI Technical Report WS-98-05, Madison (1998)
Google Scholar
Ciltik, A., Gungor, T.: Time-efficient spam e-mail filtering using n-gram models. Pattern Recognition Letters 29(1), 19–33 (2008)
Article Google Scholar
Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C., Stamatopoulos, P.: Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. Arxiv preprint cs/0009009 (2000)
Google Scholar
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6(1), 49–73 (2003)
Article Google Scholar
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Article Google Scholar
Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence, WI 2003, pp. 702–705. IEEE (2003)
Google Scholar
Wu, C.: Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications 36(3), 4321–4330 (2009)
Article Google Scholar
Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263. ACM (1995)
Google Scholar
Tan, Y., Deng, C., Ruan, G.: Concentration based feature construction approach for spam detection. In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 3088–3093. IEEE (2009)
Google Scholar
Ruan, G., Tan, Y.: A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Computing 14(2), 139–150 (2010)
Article Google Scholar
Zhu, Y., Tan, Y.: Extracting discriminative information from e-mail for spam detection inspired by immune system. In: 2010 IEEE Congress on Evolutionary Computation (CEC), pp. 1–7. IEEE (2010)
Google Scholar
Zhu, Y., Tan, Y.: A local-concentration-based feature extraction approach for spam filtering. IEEE Transactions on Information Forensics and Security 6(2), 486–497 (2011)
Article Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited commercial e-mail. “DEMOKRITOS”. National Center for Scientific Research (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Machine Perception (MOE), Department of Machine Intelligence, School of Electronics, Engineering and Computer Science, Peking University, Beijing, 100871, China
Yang Gao, Guyue Mi & Ying Tan

Authors

Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Guyue Mi
View author publications
You can also search for this author in PubMed Google Scholar
Ying Tan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Key Laboratory of Machine Perception (MOE), Peking University, 100871, Beijing, China
Ying Tan
Department of Electrical & Electronic Engineering, Xi’an Jiaotong-Liverpool University, Suzhou, China
Yuhui Shi
Computer Science Department, CINVESTAV-IPN, Mexico
Carlos A. Coello Coello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Y., Mi, G., Tan, Y. (2014). An Adaptive Concentration Selection Model for Spam Detection. In: Tan, Y., Shi, Y., Coello, C.A.C. (eds) Advances in Swarm Intelligence. ICSI 2014. Lecture Notes in Computer Science, vol 8794. Springer, Cham. https://doi.org/10.1007/978-3-319-11857-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-11857-4_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11856-7
Online ISBN: 978-3-319-11857-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics