Skip to main content

Spam Detection on Twitter Using Traditional Classifiers

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 6906)

Abstract

Social networking sites have become very popular in recent years. Users use them to find new friends, updates their existing friends with their latest thoughts and activities. Among these sites, Twitter is the fastest growing site. Its popularity also attracts many spammers to infiltrate legitimate users’ accounts with a large amount of spam messages. In this paper, we discuss some user-based and content-based features that are different between spammers and legitimate users. Then, we use these features to facilitate spam detection. Using the API methods provided by Twitter, we crawled active Twitter users, their followers/following information and their most recent 100 tweets. Then, we evaluated our detection scheme based on the suggested user and content-based features. Our results show that among the four classifiers we evaluated, the Random Forest classifier produces the best results. Our spam detector can achieve 95.7% precision and 95.7% F-measure using the Random Forest classifier.

Keywords

  • Social network security
  • spam detection
  • machine learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-23496-5_13
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-23496-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mowbray, M.: The Twittering Machine. In: Proceedings of the 6th International Conference on Web Information and Technologies (April 2010)

    Google Scholar 

  2. Analytics, P.: Twitter study (August 2009), http://www.peranalytics.com/blog/wp-content/uploads/2010/05/Twitter-Study-August-2009.pdf

  3. CNET. 4 chan may be behind attack on twitter (2009), http://news.cnet.com/8301-13515_3-10279618-26.html

  4. How to; 5 Top methods & applications to reduce Twitter Spam, http://blog.thoughtpick.com/2009/07/how-to-5-top-methods-applications-to-reduce-twitter-spam.html

  5. Twitter, Restoring accidentally suspended accounts (2009a), http://status.twitter.com/post/136164828/restoring-accidentally-suspended-accounts

  6. Twitter. The twitter rules (2009b), http://status.twitter.com/post/136164828/restoring-accidentally-suspended-accounts

  7. Rish, I.: An empirical study of the naïve bayes classifier. In: Proeedings of IJCAI Workshop on Empirical Methods in Artificial Intelligence (2005)

    Google Scholar 

  8. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  9. Compete site comparison, http://siteanalytics.compete.com/facebookcom+myspace.com+twitter.com/

  10. Sophos facebook id probe (2008), http://www.sophos.com/pressoffice/news/articles/2007/08/facebook.html

  11. Bilge, L., et al.: All your contacts are belong to us: automated identifty theft attacks on social networks. In: Proceedings of ACM World Wide Web Conference (2009)

    Google Scholar 

  12. Jagatic, T.N., et al.: Social Phishing. Communications of ACM 50(10), 94–100 (2007)

    CrossRef  Google Scholar 

  13. Yardi, S., et al.: Detecting Spam in a Twitter Network. First Monday 15(1) (2010)

    Google Scholar 

  14. Stringhini, G., Kruegel, C., Vigna, G.: Detecting Spammers on Social Networks. In: Proceedings of ACM ACSAS 2010 (December 2010)

    Google Scholar 

  15. Wang, A.H.: Don’t Follow me: Twitter Spam Detection. In: Proceedings of 5th International Conference on Security and Cryptography (July 2010)

    Google Scholar 

  16. Platt, J.: Sequential Minimal Optimization: A fast algorithm for training support vector machines. In: Schoelkopf, B., et al. (eds.) Advanced in Kernel Methods – Support Vector Learning. MIT Press, Cambridge

    Google Scholar 

  17. Berger, H., Kohle, M., Merkl, D.: On the impact of document representation on classifier performance in email categorization. In: Proceedings of the 4th International Conference on Information Systems Technology and IST Applications (May 2005)

    Google Scholar 

  18. Aha, D., Kibler, D.: Instance-based Learning Algorithms. Machine Learning 6, 37–66

    Google Scholar 

  19. Breiman, L.: Random Forests. Machine Learning 45(1) (October 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McCord, M., Chuah, M. (2011). Spam Detection on Twitter Using Traditional Classifiers. In: Calero, J.M.A., Yang, L.T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds) Autonomic and Trusted Computing. ATC 2011. Lecture Notes in Computer Science, vol 6906. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23496-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23496-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23495-8

  • Online ISBN: 978-3-642-23496-5

  • eBook Packages: Computer ScienceComputer Science (R0)