Skip to main content

On Profiling Bots in Social Media

Part of the Lecture Notes in Computer Science book series (LNISA,volume 10046)


The popularity of social media platforms such as Twitter has led to the proliferation of automated bots, creating both opportunities and challenges in information dissemination, user engagements, and quality of services. Past works on profiling bots had been focused largely on malicious bots, with the assumption that these bots should be removed. In this work, however, we find many bots that are benign, and propose a new, broader categorization of bots based on their behaviors. This includes broadcast, consumption, and spam bots. To facilitate comprehensive analyses of bots and how they compare to human accounts, we develop a systematic profiling framework that includes a rich set of features and classifier bank. We conduct extensive experiments to evaluate the performances of different classifiers under varying time windows, identify the key features of bots, and infer about bots in a larger Twitter population. Our analysis encompasses more than 159K bot and human (non-bot) accounts in Twitter. The results provide interesting insights on the behavioral traits of both benign and malicious bots.


  • Bot profiling
  • Classification
  • Feature extraction
  • Social media

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

  2. 2.

  3. 3.

  4. 4.

    The exceptionally low tweet frequencies in the first week of January and 12-14 February are due to major downtime of our servers.

  5. 5.

    Random guess w.r.t. a class c refers to a classifier that assigns a proportion \(p_c\%\) of the instances to class c, and \((1-p_c)\%\) to classes other than c. In this case, \(Precision(c) = Recall(c) = F1(c) = p_c\), where \(p_c = \frac{P(c)}{P(c)+N(c)} = \frac{TP(c) + FN(c)}{TP(c) + FN(c) + TN(c) + FP(c)}\).


  1. Abokhodair, N., Yoo, D., McDonald, D.W.: Dissecting a social botnet: growth, content and influence in Twitter. In: CSCW (2015)

    Google Scholar 

  2. Boshmaf, Y., Muslukhov, I., Beznosov, K., Ripeanu, M.: Design and analysis of a social botnet. Comput. Netw. 57(2), 556–578 (2013)

    CrossRef  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    CrossRef  MathSciNet  MATH  Google Scholar 

  4. Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–824 (2012)

    CrossRef  Google Scholar 

  5. Dickerson, J.P., Kagan, V., Subrahmanian, V.: Using sentiment to detect bots on Twitter: are humans more opinionated than bots? In: ASONAM (2014)

    Google Scholar 

  6. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2–3), 103–130 (1997)

    CrossRef  MATH  Google Scholar 

  7. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)

    MATH  Google Scholar 

  8. Ferrara, E., Varol, O., Davis, C., Menczer, F., Flammini, A.: The rise of social bots. Commun. ACM 59(7), 96–104 (2016)

    CrossRef  Google Scholar 

  9. Freitas, C., Benevenuto, F., Ghosh, S., Veloso, A.: Reverse engineering socialbot infiltration strategies in Twitter. In: ASONAM, pp. 25–32 (2015)

    Google Scholar 

  10. Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N., Gummadi, K.P.: Understanding and combating link farming in the Twitter social network. In: WWW, pp. 61–70 (2012)

    Google Scholar 

  11. Hu, X., Tang, J., Zhang, Y., Liu, H.: Social spammer detection in microblogging. In: IJCAI, pp. 2633–2639 (2013)

    Google Scholar 

  12. Hwang, T., Pearce, I., Nanis, M.: Socialbots: voices from the fronts. Interactions 19(2), 38–45 (2012)

    CrossRef  Google Scholar 

  13. Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on Twitter. In: ICWSM, pp. 185–192 (2011)

    Google Scholar 

  14. Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    CrossRef  MATH  Google Scholar 

  15. Mitter, S., Wagner, C., Strohmaier, M.: A categorization scheme for socialbot attacks in online social networks. In: ACM Web Science (2013)

    Google Scholar 

  16. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    CrossRef  MathSciNet  MATH  Google Scholar 

  17. Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: ACSAC (2010)

    Google Scholar 

  18. Subrahmanian, V., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K., Zhu, L., Ferrara, E., Flammini, A., Menczer, F., Waltzman, R., Stevens, A., Dekhtyar, A., Gao, S., Hogg, T., Kooti, F., Liu, Y., Varol, O., Shiralkar, P., Vydiswaran, V., Mei, Q., Huang, T.: The DARPA Twitter bot challenge. IEEE Comput. 49(16), 38–46 (2016)

    CrossRef  Google Scholar 

  19. Tavares, G., Faisal, A.A.: Scaling-laws of human broadcast communication enable distinction between human, corporate and robot Twitter users. PloS One 8(7), e65774 (2013)

    CrossRef  Google Scholar 

  20. Wagner, C., Mitter, S., Körner, C., Strohmaier, M.: When social bots attack: modeling susceptibility of users in online social networks. In: MSM (2012)

    Google Scholar 

  21. Wang, A.H.: Detecting spam bots in online social networking sites: a machine learning approach. In: DBSec, pp. 335–342 (2010)

    Google Scholar 

  22. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)

    CrossRef  Google Scholar 

Download references


This research is supported by the National Research Foundation, Prime Ministers Office, Singapore under its International Research Centres in Singapore Funding Initiative.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Richard J. Oentaryo or Ee-Peng Lim .

Editor information

Editors and Affiliations

A Predictions on Unlabeled Twitter Accounts

A Predictions on Unlabeled Twitter Accounts

To facilitate our study on a larger Twitter population, we first examined how well our best classfier (i.e., LR) can predict for unlabeled data that it never sees in the (labeled) CV data. Table 4 summarizes the top K prediction results, whereby we varied K from 10 to 50 to verify the robustness of the predictions. For each class, we computed the number of correctly predicted instances (TP) as well as precision at top K, i.e., \(Precision = \frac{TP}{K}\).

Table 4. Top K predictions on unlabeled 158,111 Twitter accounts

As shown in Table 4, our LR classifier produces fairly accurate and consistent predictions across different K values. With respect to human accounts, our LR classifier achieved perfect Precision for all K values. Unsurprisingly, we can expect that human accounts constitute the largest proportion of the Twitter population, and thus they should be the easiest to classify. We also obtained good results for the broadcast and consumption bots, with precision scores greater than \(75\,\%\) and \(95\,\%\) respectively. On the other hand, we observe rather modest Precision scores for spam bots (i.e., 40–\(47.5\,\%\)). We can attribute this to the insufficient number of instances for spam bots, which form only \(\frac{105}{1,613} = 6.51\,\%\) of our labeled data (cf. Table 1). This may (again) be due to our data collection procedure that involved popular users as seeds and/or due to our relatively strict criteria for the characterization of spam bot accounts (cf. Sect. 7.1). Nevertheless, the Precision scores of 40–\(47.5\,\%\) remain relatively good, if we compare with that of a random guess for our labeled data (i.e., \(6.51\,\%\)).

All in all, we find our top K predictions on unlabeled data to be satisfactory. Based on this, we can use our predictions to infer the behavioral profiles of bots in a larger Twitter population, which in this case spans the overall Singapore users. In particular, we analyze the entropy-based dynamic tweet features, namely the entropy distributions of the tweet, retweet, mention, hashtag and url activities, which constitute the majority group of the top discriminative features in Fig. 5. Figure 6 presents the cumulative distribution functions of these features. The detailed analysis of the distributions can be found in Sect. 7.3.

Fig. 6.
figure 6

Distribution of entropy-based features for 158,111 Twitter accounts

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Oentaryo, R.J., Murdopo, A., Prasetyo, P.K., Lim, EP. (2016). On Profiling Bots in Social Media. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47879-1

  • Online ISBN: 978-3-319-47880-7

  • eBook Packages: Computer ScienceComputer Science (R0)