Abstract
The popularity of Twitter greatly depends on the quality and integrity of contents contributed by users. Unfortunately, Twitter has attracted spammers to post spam content which pollutes the community. Social spamming is more successful than traditional methods such as email spamming by using social relationship between users. Detecting spam is the first and very critical step in the battle of fighting spam. Conventional detection methods check individual messages or accounts for the existence of spam. Our work takes the collective perspective, and focuses on detecting spam campaigns that manipulate multiple accounts to spread spam on Twitter. Complementary to conventional detection methods, our work brings efficiency and robustness. More specifically, we design an automatic classification system based on machine learning, and apply multiple features for classifying spam campaigns. Our experimental evaluation demonstrates the efficacy of the proposed classification system.
Chapter PDF
Similar content being viewed by others
References
Google safe browsing api, http://code.google.com/apis/safebrowsing/ (accessed: August 27, 2011)
The list of email spam trigger words, http://blog.hubspot.com/blog/tabid/6307/bid/30684/The-Ultimate-List-of-Email-SPAM-Trigger-Words.aspx (accessed: April 15, 2012)
Phishtank, join the fight against phishing, http://www.phishtank.com/ (accessed: August 27, 2011)
Senseclusters, http://senseclusters.sourceforge.net/ (accessed: September 2, 2011)
Spam words by wordpress, http://codex.wordpress.org/Spam_Words (accessed: April 15, 2012)
The spamhaus project, http://www.spamhaus.org/ (accessed: August 27, 2011)
Surbl, http://www.surbl.org/lists (accessed: August 27, 2011)
tdash’s api of twitter applications statistics, http://tdash.org/stats/clients (accessed: September 6, 2011)
Twitter blog: Your world, more connected, http://blog.twitter.com/2011/08/your-world-more-connected.html (accessed: August 17, 2011)
Twitter rest api resources, https://dev.twitter.com/docs/api (accessed: August 30, 2011)
The twitter rules, http://support.twitter.com/entries/18311-the-twitter-rules (accessed: August 17, 2011)
Twitter’s streaming api documentation, https://dev.twitter.com/docs/streaming-api (accessed: August 30, 2011)
Uribl, realtime uri blacklist, http://www.uribl.com/about.shtml
Using the twitter search api, https://dev.twitter.com/docs/using-search (accessed: August 30, 2011)
Aizawa, A.: The feature quantity: an information theoretic perspective of tfidf-like measures. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111 (2000)
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Proceedings of the CEAS 2010 (2010)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on twitter: human, bot or cyborg? In: Proceedings of the 2010 Annual Computer Security Applications Conference, Austin, TX, USA (2010)
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley Interscience, New York (2006)
Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.Y.: Detecting and characterizing social spam campaigns. In: Proceedings of the 10th Annual Conference on Internet Measurement, pp. 35–47 (2010)
Grier, C., Thomas, K., Paxson, V., Zhang, M.: @spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37 (2010)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)
Kanich, C., Kreibich, C., Levchenko, K., Enright, B., Voelker, G.M., Paxson, V., Savage, S.: Spamalytics: an empirical analysis of spam marketing conversion. Commun. ACM 52, 99–107 (2009)
Kohavi, R., Quinlan, R.: Decision tree discovery. In: Handbook of Data Mining and Knowledge Discovery, pp. 267–276. University Press (1999)
McLachlan, G., Do, K., Ambroise, C.: Analyzing microarray gene expression data. Wiley (2004)
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92 (2006)
Pedersen, T.: Computational approaches to measuring the similarity of short contexts: A review of applications and methods. CoRR, abs/0806.3787 (2008)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference (2010)
Xie, M., Yin, H., Wang, H.: An effective defense against email spam laundering. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, pp. 179–190 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chu, Z., Widjaja, I., Wang, H. (2012). Detecting Social Spam Campaigns on Twitter. In: Bao, F., Samarati, P., Zhou, J. (eds) Applied Cryptography and Network Security. ACNS 2012. Lecture Notes in Computer Science, vol 7341. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31284-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-31284-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31283-0
Online ISBN: 978-3-642-31284-7
eBook Packages: Computer ScienceComputer Science (R0)