Abstract
Being an important source for real-time information dissemination in recent years, Twitter is inevitably a prime target of spammers. It has been showed that the damage caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the threat, a lot of recent studies use machine learning techniques to classify Twitter spam and report very satisfactory results. However, most of the studies overlook a fundamental issue that is widely seen in real-world Twitter data, i.e., the class imbalance problem. In this paper, we show that the unequal distribution between spam and non-spam classes in the data has a great impact on spam detection rate. To address the problem, we propose an ensemble learning approach, which involves three steps. In the first step, we adjust the class distribution in the imbalanced data set using various strategies, including random oversampling, random undersampling and fuzzy-based oversampling. In the next step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine all the classification models. Experimental results obtained using real-world Twitter data indicate that the proposed approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammer on twitter. In: Seventh Annual Collaboration, Electronic messaging, Anti-abuse and Spam Conference, July 2010
Pash, C.: The lure of naked hollywood star photos sent the internet into meltdown in New Zealand. Business Insider, September 2014
Oliver, J., Pajares, P., Ke, C., Chen, C., Xiang, Y.: An in-depth analysis of abuse on twitter. Technical report, Trend Micro, 225 E. John Carpenter Freeway, Suite 1500 Irving, Texas 75062 USA, September 2014
Jeyaraman, R.: Fighting spam with botmaker. Twitter Engineering Blog, August 2014
Grier, C., Thomas, K., Paxson, V., Zhang, M.: @spam: the under- ground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 27–37. ACM, New York (2010)
Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC 2011, pp. 243–258, ACM, New York (2011)
Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.: Towards online spam filtering in social networks. In: NDSS (2012)
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 71–80, USA (2012)
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 2010, pp. 1–9. ACM, New York (2010)
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8(8), 1280–1293 (2013)
Zhang, X., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in the twitter social network. In: Data Mining. IEEE ICDM 2012, pp. 1194–1199 (2012)
Pear Analytics: Twitter Study, August 2009
Yardi, S., Romero, D., Schoenebeck, G., Boyd, D.: Detecting spam in a twitter network. First Monday 15(1–4) (2010). http://dx.doi.org/10.5210/fm.v15i1.2793
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 591–600. ACM, New York (2010)
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435–442. ACM, New York (2010)
Wang, A.H.: Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10 (2010)
Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011)
Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: Proceedings of the 2011 IEEE Symposium on Security and Privacy, SP 2011, pp. 447– 462. IEEE Computer Society, Washington, DC (2011)
Lee, S., Kim, J.: Warningbird: a near real-time detection system for suspicious urls in twitter stream. IEEE Trans. Dependable Secur. Comput. 10(3), 183–195 (2013)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Liu, S., Zhang, J., Wang, Y., Xiang, Y.: Fuzzy-Based feature and instance recover. In: Nguyen, T.N., et al. (eds.) ACIIDS 2016. LNCS, vol. 9621, pp. 605–615. Springer, Heidelberg (2016)
Weka 3: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/
Choo, K.-K.R.: The cyber threat landscape: challenges and future research directions. Comput. Secur. 30(8), 719–731 (2011)
Lai, S., Liu, J.K., Choo, K.-K.R., Liang, K.: Secret picture: an efficient tool for mitigating deletion delay on OSN. In: Qing, S., et al. (eds.) ICICS 2015. LNCS, vol. 9543, pp. 467–477. Springer, Heidelberg (2016). doi:10.1007/978-3-319-29814-6_40
Norouzi, F., Dehghantanha, A., Eterovic-Soric, B., Choo, K.-K.R.: Investigating social networking applications on smartphones: detecting Facebook, Twitter, LinkedIn, and Google+ artifacts on android and iOS platforms. Aust. J. Forensic Sci. 1–20 (2015). doi:10.1080/00450618.2015.1066854
Quick, D., Martini, B., Choo, K.-K.R.: Cloud Storage Forensics. Syngress Publishing/Elsevier, Boston (2013)
Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely twitter spam detection. In: IEEE International Conference on Communications (ICC 2015) (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, S., Wang, Y., Chen, C., Xiang, Y. (2016). An Ensemble Learning Approach for Addressing the Class Imbalance Problem in Twitter Spam Detection. In: Liu, J., Steinfeld, R. (eds) Information Security and Privacy. ACISP 2016. Lecture Notes in Computer Science(), vol 9722. Springer, Cham. https://doi.org/10.1007/978-3-319-40253-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-40253-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40252-9
Online ISBN: 978-3-319-40253-6
eBook Packages: Computer ScienceComputer Science (R0)