Abstract
Social networking sites have become very popular in recent years. Users use them to find new friends, updates their existing friends with their latest thoughts and activities. Among these sites, Twitter is the fastest growing site. Its popularity also attracts many spammers to infiltrate legitimate users’ accounts with a large amount of spam messages. In this paper, we discuss some user-based and content-based features that are different between spammers and legitimate users. Then, we use these features to facilitate spam detection. Using the API methods provided by Twitter, we crawled active Twitter users, their followers/following information and their most recent 100 tweets. Then, we evaluated our detection scheme based on the suggested user and content-based features. Our results show that among the four classifiers we evaluated, the Random Forest classifier produces the best results. Our spam detector can achieve 95.7% precision and 95.7% F-measure using the Random Forest classifier.
Keywords
- Social network security
- spam detection
- machine learning
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Mowbray, M.: The Twittering Machine. In: Proceedings of the 6th International Conference on Web Information and Technologies (April 2010)
Analytics, P.: Twitter study (August 2009), http://www.peranalytics.com/blog/wp-content/uploads/2010/05/Twitter-Study-August-2009.pdf
CNET. 4 chan may be behind attack on twitter (2009), http://news.cnet.com/8301-13515_3-10279618-26.html
How to; 5 Top methods & applications to reduce Twitter Spam, http://blog.thoughtpick.com/2009/07/how-to-5-top-methods-applications-to-reduce-twitter-spam.html
Twitter, Restoring accidentally suspended accounts (2009a), http://status.twitter.com/post/136164828/restoring-accidentally-suspended-accounts
Twitter. The twitter rules (2009b), http://status.twitter.com/post/136164828/restoring-accidentally-suspended-accounts
Rish, I.: An empirical study of the naïve bayes classifier. In: Proeedings of IJCAI Workshop on Empirical Methods in Artificial Intelligence (2005)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Compete site comparison, http://siteanalytics.compete.com/facebookcom+myspace.com+twitter.com/
Sophos facebook id probe (2008), http://www.sophos.com/pressoffice/news/articles/2007/08/facebook.html
Bilge, L., et al.: All your contacts are belong to us: automated identifty theft attacks on social networks. In: Proceedings of ACM World Wide Web Conference (2009)
Jagatic, T.N., et al.: Social Phishing. Communications of ACM 50(10), 94–100 (2007)
Yardi, S., et al.: Detecting Spam in a Twitter Network. First Monday 15(1) (2010)
Stringhini, G., Kruegel, C., Vigna, G.: Detecting Spammers on Social Networks. In: Proceedings of ACM ACSAS 2010 (December 2010)
Wang, A.H.: Don’t Follow me: Twitter Spam Detection. In: Proceedings of 5th International Conference on Security and Cryptography (July 2010)
Platt, J.: Sequential Minimal Optimization: A fast algorithm for training support vector machines. In: Schoelkopf, B., et al. (eds.) Advanced in Kernel Methods – Support Vector Learning. MIT Press, Cambridge
Berger, H., Kohle, M., Merkl, D.: On the impact of document representation on classifier performance in email categorization. In: Proceedings of the 4th International Conference on Information Systems Technology and IST Applications (May 2005)
Aha, D., Kibler, D.: Instance-based Learning Algorithms. Machine Learning 6, 37–66
Breiman, L.: Random Forests. Machine Learning 45(1) (October 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McCord, M., Chuah, M. (2011). Spam Detection on Twitter Using Traditional Classifiers. In: Calero, J.M.A., Yang, L.T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds) Autonomic and Trusted Computing. ATC 2011. Lecture Notes in Computer Science, vol 6906. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23496-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-23496-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23495-8
Online ISBN: 978-3-642-23496-5
eBook Packages: Computer ScienceComputer Science (R0)