Reference Data Sets for Spam Detection: Creation, Analysis, Propagation

Luckner, Marcin; Filasiak, Robert

doi:10.1007/978-3-642-40846-5_22

Reference Data Sets for Spam Detection: Creation, Analysis, Propagation

Marcin Luckner²⁴ &
Robert Filasiak²⁵

Conference paper

2446 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8073))

Abstract

A reference set is a set of data of network traffic whose form and content allows detecting an event or a group of events. Realistic and representative datasets based on real traffic can improve research in the fields of intruders and anomaly detection. Creating reference sets tackles a number of issues such as the collection and storage of large volumes of data, the privacy of information and the relevance of collected events. Moreover, rare events are hard to analyse among background traffic and need specialist detection tools. One of the common problems that can be detected in network traffic is spam. This paper presents the methodology for creating a network traffic reference set for spam detection. The methodology concerns the selection of significant features, the collection and storage of data, the analysis of the collected data, the enrichment of the data with additional events and the propagation of the set. Moreover, a hybrid classifier that detects spam on relatively high level is presented.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Behera, G.: Privacy preserving c4.5 using gini index, pp. 1–4 (March 2011)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)
MATH Google Scholar
Deri, L.: nprobe: an open source netflow probe for gigabit networks. In: Proc. of Terena TNC 2003 (2003)
Google Scholar
Fomenkov, M., Claffy, K.: Internet measurement data management challenges. In: Workshop on Research Data Lifecycle Management, Princeton, NJ (July 2011)
Google Scholar
Grzenda, M.: Towards the reduction of data used for the classification of network flows. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 68–77. Springer, Heidelberg (2012), http://dx.doi.org/10.1007/978-3-642-28931-6_7
Chapter Google Scholar
Kim, H., Claffy, K., Fomenkov, M., Barman, D., Faloutsos, M., Lee, K.: Internet traffic classification demystified: myths, caveats, and the best practices. In: Proceedings of the 2008 ACM CoNEXT Conference, CoNEXT 2008, pp. 11:1–11:12. ACM, New York (2008)
Google Scholar
Kobiersky, P., Korenek, J., Polcak, L.: Packet header analysis and field extraction for multigigabit networks, pp. 96–101 (April 2009)
Google Scholar
Limwiwatkul, L., Rungsawang, A.: Distributed denial of service detection using tcp/ip header and traffic measurement analysis, vol. 1, pp. 605–610 (October 2004)
Google Scholar
Moore, A., Crogan, M., Moore, A.W., Mary, Q., Zuev, D., Zuev, D., Crogan, M.L.: Discriminators for use in flow-based classification. Tech. rep. (2005)
Google Scholar
Ouyang, T., Ray, S., Rabinovich, M., Allman, M.: Can network characteristics detect spam effectively in a stand-alone enterprise? In: Spring, N., Riley, G.F. (eds.) PAM 2011. LNCS, vol. 6579, pp. 92–101. Springer, Heidelberg (2011)
Chapter Google Scholar
Schatzmann, D., Burkhart, M., Spyropoulos, T.: Flow-level characteristics of spam and ham (291) (August 2008)
Google Scholar
Žádník, M., Michlovský, Z.: Is spam visible in flow-level statistics? Tech. rep. (2009), http://www.fit.vutbr.cz/research/view_pub.php?id=9277

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Information Science, Warsaw University of Technology, pl. Politechniki 1, 00-661, Warsaw, Poland
Marcin Luckner
Orange Labs Poland, 02-691, Warszawa, ul. Obrzena 7, Poland
Robert Filasiak

Authors

Marcin Luckner
View author publications
You can also search for this author in PubMed Google Scholar
Robert Filasiak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, No. 415, Chien Kung Road, 80778, Kaohsiung, Taiwan
Jeng-Shyang Pan
Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Avenue, 1678, Nicosia, Cyprus
Marios M. Polycarpou
Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Michał Woźniak
Department of Computer Science, University of Sao Paulo at Sao Carlos, Sao Carlos, SP, Brazil
André C. P. L. F. de Carvalho
University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Héctor Quintián & Emilio Corchado &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luckner, M., Filasiak, R. (2013). Reference Data Sets for Spam Detection: Creation, Analysis, Propagation. In: Pan, JS., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2013. Lecture Notes in Computer Science(), vol 8073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40846-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-40846-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40845-8
Online ISBN: 978-3-642-40846-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics