Abstract
The popularity of social media platforms such as Twitter has led to the proliferation of automated bots, creating both opportunities and challenges in information dissemination, user engagements, and quality of services. Past works on profiling bots had been focused largely on malicious bots, with the assumption that these bots should be removed. In this work, however, we find many bots that are benign, and propose a new, broader categorization of bots based on their behaviors. This includes broadcast, consumption, and spam bots. To facilitate comprehensive analyses of bots and how they compare to human accounts, we develop a systematic profiling framework that includes a rich set of features and classifier bank. We conduct extensive experiments to evaluate the performances of different classifiers under varying time windows, identify the key features of bots, and infer about bots in a larger Twitter population. Our analysis encompasses more than 159K bot and human (non-bot) accounts in Twitter. The results provide interesting insights on the behavioral traits of both benign and malicious bots.
Keywords
- Bot profiling
- Classification
- Feature extraction
- Social media
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
The exceptionally low tweet frequencies in the first week of January and 12-14 February are due to major downtime of our servers.
- 5.
Random guess w.r.t. a class c refers to a classifier that assigns a proportion \(p_c\%\) of the instances to class c, and \((1-p_c)\%\) to classes other than c. In this case, \(Precision(c) = Recall(c) = F1(c) = p_c\), where \(p_c = \frac{P(c)}{P(c)+N(c)} = \frac{TP(c) + FN(c)}{TP(c) + FN(c) + TN(c) + FP(c)}\).
References
Abokhodair, N., Yoo, D., McDonald, D.W.: Dissecting a social botnet: growth, content and influence in Twitter. In: CSCW (2015)
Boshmaf, Y., Muslukhov, I., Beznosov, K., Ripeanu, M.: Design and analysis of a social botnet. Comput. Netw. 57(2), 556–578 (2013)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–824 (2012)
Dickerson, J.P., Kagan, V., Subrahmanian, V.: Using sentiment to detect bots on Twitter: are humans more opinionated than bots? In: ASONAM (2014)
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2–3), 103–130 (1997)
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)
Ferrara, E., Varol, O., Davis, C., Menczer, F., Flammini, A.: The rise of social bots. Commun. ACM 59(7), 96–104 (2016)
Freitas, C., Benevenuto, F., Ghosh, S., Veloso, A.: Reverse engineering socialbot infiltration strategies in Twitter. In: ASONAM, pp. 25–32 (2015)
Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N., Gummadi, K.P.: Understanding and combating link farming in the Twitter social network. In: WWW, pp. 61–70 (2012)
Hu, X., Tang, J., Zhang, Y., Liu, H.: Social spammer detection in microblogging. In: IJCAI, pp. 2633–2639 (2013)
Hwang, T., Pearce, I., Nanis, M.: Socialbots: voices from the fronts. Interactions 19(2), 38–45 (2012)
Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on Twitter. In: ICWSM, pp. 185–192 (2011)
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Mitter, S., Wagner, C., Strohmaier, M.: A categorization scheme for socialbot attacks in online social networks. In: ACM Web Science (2013)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: ACSAC (2010)
Subrahmanian, V., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K., Zhu, L., Ferrara, E., Flammini, A., Menczer, F., Waltzman, R., Stevens, A., Dekhtyar, A., Gao, S., Hogg, T., Kooti, F., Liu, Y., Varol, O., Shiralkar, P., Vydiswaran, V., Mei, Q., Huang, T.: The DARPA Twitter bot challenge. IEEE Comput. 49(16), 38–46 (2016)
Tavares, G., Faisal, A.A.: Scaling-laws of human broadcast communication enable distinction between human, corporate and robot Twitter users. PloS One 8(7), e65774 (2013)
Wagner, C., Mitter, S., Körner, C., Strohmaier, M.: When social bots attack: modeling susceptibility of users in online social networks. In: MSM (2012)
Wang, A.H.: Detecting spam bots in online social networking sites: a machine learning approach. In: DBSec, pp. 335–342 (2010)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Acknowledgments
This research is supported by the National Research Foundation, Prime Ministers Office, Singapore under its International Research Centres in Singapore Funding Initiative.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
A Predictions on Unlabeled Twitter Accounts
A Predictions on Unlabeled Twitter Accounts
To facilitate our study on a larger Twitter population, we first examined how well our best classfier (i.e., LR) can predict for unlabeled data that it never sees in the (labeled) CV data. Table 4 summarizes the top K prediction results, whereby we varied K from 10 to 50 to verify the robustness of the predictions. For each class, we computed the number of correctly predicted instances (TP) as well as precision at top K, i.e., \(Precision = \frac{TP}{K}\).
As shown in Table 4, our LR classifier produces fairly accurate and consistent predictions across different K values. With respect to human accounts, our LR classifier achieved perfect Precision for all K values. Unsurprisingly, we can expect that human accounts constitute the largest proportion of the Twitter population, and thus they should be the easiest to classify. We also obtained good results for the broadcast and consumption bots, with precision scores greater than \(75\,\%\) and \(95\,\%\) respectively. On the other hand, we observe rather modest Precision scores for spam bots (i.e., 40–\(47.5\,\%\)). We can attribute this to the insufficient number of instances for spam bots, which form only \(\frac{105}{1,613} = 6.51\,\%\) of our labeled data (cf. Table 1). This may (again) be due to our data collection procedure that involved popular users as seeds and/or due to our relatively strict criteria for the characterization of spam bot accounts (cf. Sect. 7.1). Nevertheless, the Precision scores of 40–\(47.5\,\%\) remain relatively good, if we compare with that of a random guess for our labeled data (i.e., \(6.51\,\%\)).
All in all, we find our top K predictions on unlabeled data to be satisfactory. Based on this, we can use our predictions to infer the behavioral profiles of bots in a larger Twitter population, which in this case spans the overall Singapore users. In particular, we analyze the entropy-based dynamic tweet features, namely the entropy distributions of the tweet, retweet, mention, hashtag and url activities, which constitute the majority group of the top discriminative features in Fig. 5. Figure 6 presents the cumulative distribution functions of these features. The detailed analysis of the distributions can be found in Sect. 7.3.
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Oentaryo, R.J., Murdopo, A., Prasetyo, P.K., Lim, EP. (2016). On Profiling Bots in Social Media. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-47880-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47879-1
Online ISBN: 978-3-319-47880-7
eBook Packages: Computer ScienceComputer Science (R0)