On Profiling Bots in Social Media

Oentaryo, Richard J.; Murdopo, Arinto; Prasetyo, Philips K.; Lim, Ee-Peng

doi:10.1007/978-3-319-47880-7_6

Richard J. Oentaryo¹⁵,
Arinto Murdopo¹⁵,
Philips K. Prasetyo¹⁵ &
…
Ee-Peng Lim¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10046))

Included in the following conference series:

International Conference on Social Informatics

3842 Accesses
15 Citations
15 Altmetric

Abstract

The popularity of social media platforms such as Twitter has led to the proliferation of automated bots, creating both opportunities and challenges in information dissemination, user engagements, and quality of services. Past works on profiling bots had been focused largely on malicious bots, with the assumption that these bots should be removed. In this work, however, we find many bots that are benign, and propose a new, broader categorization of bots based on their behaviors. This includes broadcast, consumption, and spam bots. To facilitate comprehensive analyses of bots and how they compare to human accounts, we develop a systematic profiling framework that includes a rich set of features and classifier bank. We conduct extensive experiments to evaluate the performances of different classifiers under varying time windows, identify the key features of bots, and infer about bots in a larger Twitter population. Our analysis encompasses more than 159K bot and human (non-bot) accounts in Twitter. The results provide interesting insights on the behavioral traits of both benign and malicious bots.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://ifttt.com.
2.
https://business.twitter.com/solutions/promoted-tweets.
3.
https://dev.twitter.com/overview/.
4.
The exceptionally low tweet frequencies in the first week of January and 12-14 February are due to major downtime of our servers.
5.
Random guess w.r.t. a class c refers to a classifier that assigns a proportion \(p_c\%\) of the instances to class c, and \((1-p_c)\%\) to classes other than c. In this case, \(Precision(c) = Recall(c) = F1(c) = p_c\), where \(p_c = \frac{P(c)}{P(c)+N(c)} = \frac{TP(c) + FN(c)}{TP(c) + FN(c) + TN(c) + FP(c)}\).

References

Abokhodair, N., Yoo, D., McDonald, D.W.: Dissecting a social botnet: growth, content and influence in Twitter. In: CSCW (2015)
Google Scholar
Boshmaf, Y., Muslukhov, I., Beznosov, K., Ripeanu, M.: Design and analysis of a social botnet. Comput. Netw. 57(2), 556–578 (2013)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–824 (2012)
Article Google Scholar
Dickerson, J.P., Kagan, V., Subrahmanian, V.: Using sentiment to detect bots on Twitter: are humans more opinionated than bots? In: ASONAM (2014)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2–3), 103–130 (1997)
Article MATH Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)
MATH Google Scholar
Ferrara, E., Varol, O., Davis, C., Menczer, F., Flammini, A.: The rise of social bots. Commun. ACM 59(7), 96–104 (2016)
Article Google Scholar
Freitas, C., Benevenuto, F., Ghosh, S., Veloso, A.: Reverse engineering socialbot infiltration strategies in Twitter. In: ASONAM, pp. 25–32 (2015)
Google Scholar
Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N., Gummadi, K.P.: Understanding and combating link farming in the Twitter social network. In: WWW, pp. 61–70 (2012)
Google Scholar
Hu, X., Tang, J., Zhang, Y., Liu, H.: Social spammer detection in microblogging. In: IJCAI, pp. 2633–2639 (2013)
Google Scholar
Hwang, T., Pearce, I., Nanis, M.: Socialbots: voices from the fronts. Interactions 19(2), 38–45 (2012)
Article Google Scholar
Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on Twitter. In: ICWSM, pp. 185–192 (2011)
Google Scholar
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Mitter, S., Wagner, C., Strohmaier, M.: A categorization scheme for socialbot attacks in online social networks. In: ACM Web Science (2013)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Article MathSciNet MATH Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: ACSAC (2010)
Google Scholar
Subrahmanian, V., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K., Zhu, L., Ferrara, E., Flammini, A., Menczer, F., Waltzman, R., Stevens, A., Dekhtyar, A., Gao, S., Hogg, T., Kooti, F., Liu, Y., Varol, O., Shiralkar, P., Vydiswaran, V., Mei, Q., Huang, T.: The DARPA Twitter bot challenge. IEEE Comput. 49(16), 38–46 (2016)
Article Google Scholar
Tavares, G., Faisal, A.A.: Scaling-laws of human broadcast communication enable distinction between human, corporate and robot Twitter users. PloS One 8(7), e65774 (2013)
Article Google Scholar
Wagner, C., Mitter, S., Körner, C., Strohmaier, M.: When social bots attack: modeling susceptibility of users in online social networks. In: MSM (2012)
Google Scholar
Wang, A.H.: Detecting spam bots in online social networking sites: a machine learning approach. In: DBSec, pp. 335–342 (2010)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Article Google Scholar

Download references

Acknowledgments

This research is supported by the National Research Foundation, Prime Ministers Office, Singapore under its International Research Centres in Singapore Funding Initiative.

Author information

Authors and Affiliations

Living Analytics Research Centre, Singapore Management University, Singapore, Singapore
Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo & Ee-Peng Lim

Authors

Richard J. Oentaryo
View author publications
You can also search for this author in PubMed Google Scholar
Arinto Murdopo
View author publications
You can also search for this author in PubMed Google Scholar
Philips K. Prasetyo
View author publications
You can also search for this author in PubMed Google Scholar
Ee-Peng Lim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Richard J. Oentaryo or Ee-Peng Lim .

Editor information

Editors and Affiliations

University of Washington, Seattle, Washington, USA
Emma Spiro
Indiana University, Bloomington, Indiana, USA
Yong-Yeol Ahn

A Predictions on Unlabeled Twitter Accounts

To facilitate our study on a larger Twitter population, we first examined how well our best classfier (i.e., LR) can predict for unlabeled data that it never sees in the (labeled) CV data. Table 4 summarizes the top K prediction results, whereby we varied K from 10 to 50 to verify the robustness of the predictions. For each class, we computed the number of correctly predicted instances (TP) as well as precision at top K, i.e., \(Precision = \frac{TP}{K}\).

Table 4. Top K predictions on unlabeled 158,111 Twitter accounts

Full size table

As shown in Table 4, our LR classifier produces fairly accurate and consistent predictions across different K values. With respect to human accounts, our LR classifier achieved perfect Precision for all K values. Unsurprisingly, we can expect that human accounts constitute the largest proportion of the Twitter population, and thus they should be the easiest to classify. We also obtained good results for the broadcast and consumption bots, with precision scores greater than \(75\,\%\) and \(95\,\%\) respectively. On the other hand, we observe rather modest Precision scores for spam bots (i.e., 40–\(47.5\,\%\)). We can attribute this to the insufficient number of instances for spam bots, which form only \(\frac{105}{1,613} = 6.51\,\%\) of our labeled data (cf. Table 1). This may (again) be due to our data collection procedure that involved popular users as seeds and/or due to our relatively strict criteria for the characterization of spam bot accounts (cf. Sect. 7.1). Nevertheless, the Precision scores of 40–\(47.5\,\%\) remain relatively good, if we compare with that of a random guess for our labeled data (i.e., \(6.51\,\%\)).

All in all, we find our top K predictions on unlabeled data to be satisfactory. Based on this, we can use our predictions to infer the behavioral profiles of bots in a larger Twitter population, which in this case spans the overall Singapore users. In particular, we analyze the entropy-based dynamic tweet features, namely the entropy distributions of the tweet, retweet, mention, hashtag and url activities, which constitute the majority group of the top discriminative features in Fig. 5. Figure 6 presents the cumulative distribution functions of these features. The detailed analysis of the distributions can be found in Sect. 7.3.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oentaryo, R.J., Murdopo, A., Prasetyo, P.K., Lim, EP. (2016). On Profiling Bots in Social Media. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-47880-7_6
Published: 23 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47879-1
Online ISBN: 978-3-319-47880-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Profiling Bots in Social Media

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Predictions on Unlabeled Twitter Accounts

A Predictions on Unlabeled Twitter Accounts

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation