Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

  • Alex Hai Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6166)


As online social networking sites become more and more popular, they have also attracted the attentions of the spammers. In this paper, Twitter, a popular micro-blogging service, is studied as an example of spam bots detection in online social networking sites. A machine learning approach is proposed to distinguish the spam bots from normal ones. To facilitate the spam bots detection, three graph-based features, such as the number of friends and the number of followers, are extracted to explore the unique follower and friend relationships among users on Twitter. Three content-based features are also extracted from user’s most recent 20 tweets. A real data set is collected from Twitter’s public available information using two different methods. Evaluation experiments show that the detection system is efficient and accurate to identify spam bots in Twitter.


Social Graph Levenshtein Distance Spam Detection Suspicious Behavior Online Social Networking Site 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Opera: State of the mobile web,
  2. 2.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail. In: AAAI Workshop on Learning for Text Categorization (1998)Google Scholar
  3. 3.
    Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the web. ACM Trans. Internet Technol. 1(1), 2–43 (2001)CrossRefGoogle Scholar
  4. 4.
    Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth international conference on Very large data bases, pp. 576–587 (2004)Google Scholar
  5. 5.
    Gyongyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: VLDB 2006: Proceedings of the 32nd international conference on Very large data bases, pp. 439–450 (2006)Google Scholar
  6. 6.
    Zhou, D., Burges, C.J.C., Tao, T.: Transductive link spam detection. In: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pp. 21–28 (2007)Google Scholar
  7. 7.
    Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th annual international ACM SIGIR conference, pp. 423–430 (2007)Google Scholar
  8. 8.
    Geng, G.G., Li, Q., Zhang, X.: Link based small sample learning for web spam detection. In: Proceedings of the 18th international conference on World wide web, pp. 1185–1186 (2009)Google Scholar
  9. 9.
    Wu, Y.-S., Bagchi, S., Singh, N., Wita, R.: Spam detection in voice-over-ip calls through semi-supervised clustering. In: Proceedings of the 2009 Dependable Systems Networks, pp. 307–316 (2009)Google Scholar
  10. 10.
    Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd international ACM SIGIR conference, pp. 620–627 (2009)Google Scholar
  11. 11.
    Krishnamurthy, B., Gill, P., Arlitt, M.: A few chirps about twitter. In: WOSP 2008: Proceedings of the first workshop on Online social networks, pp. 19–24 (2008)Google Scholar
  12. 12.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Alex Hai Wang
    • 1
  1. 1.College of Information Sciences and TechnologyThe Pennsylvania State UniversityDunmoreUSA

Personalised recommendations