An Ensemble Learning Approach for Addressing the Class Imbalance Problem in Twitter Spam Detection

Liu, Shigang; Wang, Yu; Chen, Chao; Xiang, Yang

doi:10.1007/978-3-319-40253-6_13

Shigang Liu³,
Yu Wang³,
Chao Chen³ &
…
Yang Xiang³

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9722))

Included in the following conference series:

Australasian Conference on Information Security and Privacy

1818 Accesses
7 Citations

Abstract

Being an important source for real-time information dissemination in recent years, Twitter is inevitably a prime target of spammers. It has been showed that the damage caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the threat, a lot of recent studies use machine learning techniques to classify Twitter spam and report very satisfactory results. However, most of the studies overlook a fundamental issue that is widely seen in real-world Twitter data, i.e., the class imbalance problem. In this paper, we show that the unequal distribution between spam and non-spam classes in the data has a great impact on spam detection rate. To address the problem, we propose an ensemble learning approach, which involves three steps. In the first step, we adjust the class distribution in the imbalanced data set using various strategies, including random oversampling, random undersampling and fuzzy-based oversampling. In the next step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine all the classification models. Experimental results obtained using real-world Twitter data indicate that the proposed approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammer on twitter. In: Seventh Annual Collaboration, Electronic messaging, Anti-abuse and Spam Conference, July 2010
Google Scholar
Pash, C.: The lure of naked hollywood star photos sent the internet into meltdown in New Zealand. Business Insider, September 2014
Google Scholar
Oliver, J., Pajares, P., Ke, C., Chen, C., Xiang, Y.: An in-depth analysis of abuse on twitter. Technical report, Trend Micro, 225 E. John Carpenter Freeway, Suite 1500 Irving, Texas 75062 USA, September 2014
Google Scholar
Jeyaraman, R.: Fighting spam with botmaker. Twitter Engineering Blog, August 2014
Google Scholar
Grier, C., Thomas, K., Paxson, V., Zhang, M.: @spam: the under- ground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 27–37. ACM, New York (2010)
Google Scholar
Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC 2011, pp. 243–258, ACM, New York (2011)
Google Scholar
Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.: Towards online spam filtering in social networks. In: NDSS (2012)
Google Scholar
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 71–80, USA (2012)
Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 2010, pp. 1–9. ACM, New York (2010)
Google Scholar
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8(8), 1280–1293 (2013)
Article Google Scholar
Zhang, X., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in the twitter social network. In: Data Mining. IEEE ICDM 2012, pp. 1194–1199 (2012)
Google Scholar
Pear Analytics: Twitter Study, August 2009
Google Scholar
Yardi, S., Romero, D., Schoenebeck, G., Boyd, D.: Detecting spam in a twitter network. First Monday 15(1–4) (2010). http://dx.doi.org/10.5210/fm.v15i1.2793
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 591–600. ACM, New York (2010)
Google Scholar
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435–442. ACM, New York (2010)
Google Scholar
Wang, A.H.: Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10 (2010)
Google Scholar
Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011)
Chapter Google Scholar
Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: Proceedings of the 2011 IEEE Symposium on Security and Privacy, SP 2011, pp. 447– 462. IEEE Computer Society, Washington, DC (2011)
Google Scholar
Lee, S., Kim, J.: Warningbird: a near real-time detection system for suspicious urls in twitter stream. IEEE Trans. Dependable Secur. Comput. 10(3), 183–195 (2013)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Liu, S., Zhang, J., Wang, Y., Xiang, Y.: Fuzzy-Based feature and instance recover. In: Nguyen, T.N., et al. (eds.) ACIIDS 2016. LNCS, vol. 9621, pp. 605–615. Springer, Heidelberg (2016)
Google Scholar
Weka 3: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/
Choo, K.-K.R.: The cyber threat landscape: challenges and future research directions. Comput. Secur. 30(8), 719–731 (2011)
Article Google Scholar
Lai, S., Liu, J.K., Choo, K.-K.R., Liang, K.: Secret picture: an efficient tool for mitigating deletion delay on OSN. In: Qing, S., et al. (eds.) ICICS 2015. LNCS, vol. 9543, pp. 467–477. Springer, Heidelberg (2016). doi:10.1007/978-3-319-29814-6_40
Chapter Google Scholar
Norouzi, F., Dehghantanha, A., Eterovic-Soric, B., Choo, K.-K.R.: Investigating social networking applications on smartphones: detecting Facebook, Twitter, LinkedIn, and Google+ artifacts on android and iOS platforms. Aust. J. Forensic Sci. 1–20 (2015). doi:10.1080/00450618.2015.1066854
Article Google Scholar
Quick, D., Martini, B., Choo, K.-K.R.: Cloud Storage Forensics. Syngress Publishing/Elsevier, Boston (2013)
Google Scholar
Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely twitter spam detection. In: IEEE International Conference on Communications (ICC 2015) (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Deakin University, Geelong, Australia
Shigang Liu, Yu Wang, Chao Chen & Yang Xiang

Authors

Shigang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Wang .

Editor information

Editors and Affiliations

Monash University, Melbourne, Victoria, Australia
Joseph K. Liu
Monash University, Melbourne, Victoria, Australia
Ron Steinfeld

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, S., Wang, Y., Chen, C., Xiang, Y. (2016). An Ensemble Learning Approach for Addressing the Class Imbalance Problem in Twitter Spam Detection. In: Liu, J., Steinfeld, R. (eds) Information Security and Privacy. ACISP 2016. Lecture Notes in Computer Science(), vol 9722. Springer, Cham. https://doi.org/10.1007/978-3-319-40253-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-40253-6_13
Published: 30 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40252-9
Online ISBN: 978-3-319-40253-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics