Detecting Social Spam Campaigns on Twitter

  • Zi Chu
  • Indra Widjaja
  • Haining Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7341)


The popularity of Twitter greatly depends on the quality and integrity of contents contributed by users. Unfortunately, Twitter has attracted spammers to post spam content which pollutes the community. Social spamming is more successful than traditional methods such as email spamming by using social relationship between users. Detecting spam is the first and very critical step in the battle of fighting spam. Conventional detection methods check individual messages or accounts for the existence of spam. Our work takes the collective perspective, and focuses on detecting spam campaigns that manipulate multiple accounts to spread spam on Twitter. Complementary to conventional detection methods, our work brings efficiency and robustness. More specifically, we design an automatic classification system based on machine learning, and apply multiple features for classifying spam campaigns. Our experimental evaluation demonstrates the efficacy of the proposed classification system.


Spam Detection Anomaly Detection Machine Learning Twitter 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Google safe browsing api, (accessed: August 27, 2011)
  2. 2.
  3. 3.
    Phishtank, join the fight against phishing, (accessed: August 27, 2011)
  4. 4.
    Senseclusters, (accessed: September 2, 2011)
  5. 5.
    Spam words by wordpress, (accessed: April 15, 2012)
  6. 6.
    The spamhaus project, (accessed: August 27, 2011)
  7. 7.
    Surbl, (accessed: August 27, 2011)
  8. 8.
    tdash’s api of twitter applications statistics, (accessed: September 6, 2011)
  9. 9.
    Twitter blog: Your world, more connected, (accessed: August 17, 2011)
  10. 10.
    Twitter rest api resources, (accessed: August 30, 2011)
  11. 11.
    The twitter rules, (accessed: August 17, 2011)
  12. 12.
    Twitter’s streaming api documentation, (accessed: August 30, 2011)
  13. 13.
    Uribl, realtime uri blacklist,
  14. 14.
    Using the twitter search api, (accessed: August 30, 2011)
  15. 15.
    Aizawa, A.: The feature quantity: an information theoretic perspective of tfidf-like measures. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111 (2000)Google Scholar
  16. 16.
    Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Proceedings of the CEAS 2010 (2010)Google Scholar
  17. 17.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  18. 18.
    Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on twitter: human, bot or cyborg? In: Proceedings of the 2010 Annual Computer Security Applications Conference, Austin, TX, USA (2010)Google Scholar
  19. 19.
    Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley Interscience, New York (2006)zbMATHGoogle Scholar
  20. 20.
    Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.Y.: Detecting and characterizing social spam campaigns. In: Proceedings of the 10th Annual Conference on Internet Measurement, pp. 35–47 (2010)Google Scholar
  21. 21.
    Grier, C., Thomas, K., Paxson, V., Zhang, M.: @spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37 (2010)Google Scholar
  22. 22.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  23. 23.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)CrossRefGoogle Scholar
  24. 24.
    Kanich, C., Kreibich, C., Levchenko, K., Enright, B., Voelker, G.M., Paxson, V., Savage, S.: Spamalytics: an empirical analysis of spam marketing conversion. Commun. ACM 52, 99–107 (2009)CrossRefGoogle Scholar
  25. 25.
    Kohavi, R., Quinlan, R.: Decision tree discovery. In: Handbook of Data Mining and Knowledge Discovery, pp. 267–276. University Press (1999)Google Scholar
  26. 26.
    McLachlan, G., Do, K., Ambroise, C.: Analyzing microarray gene expression data. Wiley (2004)Google Scholar
  27. 27.
    Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92 (2006)Google Scholar
  28. 28.
    Pedersen, T.: Computational approaches to measuring the similarity of short contexts: A review of applications and methods. CoRR, abs/0806.3787 (2008)Google Scholar
  29. 29.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)zbMATHCrossRefGoogle Scholar
  30. 30.
    Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference (2010)Google Scholar
  31. 31.
    Xie, M., Yin, H., Wang, H.: An effective defense against email spam laundering. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, pp. 179–190 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Zi Chu
    • 1
  • Indra Widjaja
    • 2
  • Haining Wang
    • 1
  1. 1.Department of Computer ScienceThe College of William and MaryWilliamsburgUSA
  2. 2.Bell Laboratories, Alcatel-LucentMurray HillUSA

Personalised recommendations