Skip to main content

Advertisement

Log in

BotFinder: a novel framework for social bots detection in online social networks based on graph embedding and community detection

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the widespread popularity of online social networks (OSNs), the number of users has also increased exponentially in recent years. At the same time, Social bots, i.e. accounts that controlled by program, are also on the rise. Service providers of OSNs often use them to keep social networks active. Meanwhile, some social bots are also registered for malicious purposes. It is necessary to detect these malicious social bots to present a real public opinion environment. We propose BotFinder, a framework to detect malicious social bots in OSNs. Specifically, it combines machine learning and graph methods so that the potential features of social bots can be effectively extracted. Regarding the feature engineering, we generate second order features and use coding methods to encode variables that have high cardinality. These features make full use of both labelled and unlabeled samples. With respect to the graphs, we firstly generate node vectors through embedding method, following which the similarity between vectors of humans and bots can be further calculated; Then, we use an unsupervised method to diffuse labels and thus the performance can be improved again. To valid the performance of the proposed method, we conduct extensive experiments on the dataset provided by an artificial intelligence contest which is composed of over eight million records of users. Results show that our approach reaches a F1-score of 0.8850, which is much better compared to the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10

Similar content being viewed by others

Data availability

Not applicable.

References

  1. Yang F, Liu Y, Yu X, et al.: Automatic detection of rumor on sina weibo[C]//Proceedings of the ACM SIGKDD workshop on mining data semantics. 1–7 (2012)

  2. Bessi A, Ferrara E.: Social bots distort the 2016 US Presidential election online discussion[J]. First monday, 21(11–7) (2016)

  3. Costa, B.C., Alberto, B.L.A., Portela, A.M., et al.: Fraud detection in electric power distribution networks using an ann-based knowledge-discovery process[J]. Int. J. Artif. Intell. Appl. 4(6), 17 (2013)

    Google Scholar 

  4. Chang, W.H., Chang, J.S.: An effective early fraud detection method for online auctions[J]. Electron. Commer. Res. Appl. 11(4), 346–360 (2012)

    Article  Google Scholar 

  5. Ganji, V.R., Mannem, S.N.P.: Credit card fraud detection using anti-k nearest neighbor algorithm[J]. Int. J. Comput. Sci. Eng. 4(6), 1035–1039 (2012)

    Google Scholar 

  6. Ferrara, E.: Disinformation and social bot operations in the run up to the 2017 French presidential election[J]. arXiv preprint arXiv:1707.00086, (2017)

  7. Stella, M., Ferrara, E., De Domenico, M.: Bots increase exposure to negative and inflammatory content in online social systems[J]. Proc. Natl. Acad. Sci. 115(49), 12435–12440 (2018)

    Article  Google Scholar 

  8. Stukal, D., Sanovich, S., Bonneau, R., et al.: Detecting bots on Russian political Twitter[J]. Big Data 5(4), 310–324 (2017)

    Article  Google Scholar 

  9. Cai C, Li L, Zengi D. Behavior enhanced deep bot detection in social media[C]//2017 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, 128–130 (2017)

  10. Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection[J]. Inf. Sci. 467, 312–322 (2018)

    Article  Google Scholar 

  11. Cresci, S., Di Pietro, R., Petrocchi, M., et al.: DNA-inspired online behavioral modeling and its application to spambot detection[J]. IEEE Intell. Syst. 31(5), 58–64 (2016)

    Article  Google Scholar 

  12. Cresci, S., Di Pietro, R., Petrocchi, M., et al.: Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling[J]. IEEE Trans. Dependable Secure Comput. 15(4), 561–576 (2017)

    Google Scholar 

  13. Chen Z, Subramanian D.: An unsupervised approach to detect spam campaigns that use botnets on twitter[J]. arXiv preprint arXiv:1804.05232, (2018)

  14. Jiang, M., Cui, P., Beutel, A., et al.: Catching synchronized behaviors in large networks: A graph mining approach[J]. ACM Trans. Knowl. Discov. Data 10(4), 1–27 (2016)

    Article  Google Scholar 

  15. Su, S., Tian, Z., Liang, S., et al.: A reputation management scheme for efficient malicious vehicle identification over 5G networks[J]. IEEE Wirel. Commun. 27(3), 46–52 (2020)

    Article  Google Scholar 

  16. Mazza M, Cresci S, Avvenuti M, et al.: Rtbust: Exploiting temporal patterns for botnet detection on twitter[C]//Proceedings of the 10th ACM Conference on Web Science. 183–192 (2019)

  17. Guillaume, L.: Fast unfolding of communities in large networks[J]. J. Stat. Mech.: Theory Exp. 10, P1008 (2008)

    Google Scholar 

  18. Li, S., Jiang, L., Wu, X., et al.: A weighted network community detection algorithm based on deep learning[J]. Appl. Math. Comput. 401, 126012 (2021)

    MathSciNet  MATH  Google Scholar 

  19. Lerer A, Wu L, Shen J, et al.: Pytorch-biggraph: A large-scale graph embedding system[J]. arXiv preprint arXiv:1903.12287 (2019)

  20. Yu, W., Cheng, W., Aggarwal, C.C., et al.: Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2672–2681 (2018)

  21. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864 (2016)

  22. Pham, P., Nguyen, L.T.T., Vo, B., et al.: Bot2Vec: a general approach of intra-community oriented representation learning for bot detection in different types of social networks[J]. Inf. Syst. 103, 101771 (2022)

    Article  Google Scholar 

  23. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907 (2016)

  24. Aljohani, N.R., Fayoumi, A., Hassan, S.U.: Bot prediction on social networks of Twitter in altmetrics using deep graph convolutional networks[J]. Soft Computing, 1–12 (2020)

  25. Li, S., Zhao, D., Wu, X., et al.: Functional immunization of networks based on message passing[J]. Appl. Math. Comput. 366, 124728 (2020)

    MathSciNet  MATH  Google Scholar 

  26. Nie, Y., Jia, Y., Li, S., et al.: Identifying users across social networks based on dynamic core interests[J]. Neurocomputing 210, 107–115 (2016)

    Article  Google Scholar 

  27. Gao, C., Liu, J.: Network-based modeling for characterizing human collective behaviors during extreme events[J]. IEEE Trans. Syst. Man Cybernetics: Syst. 47(1), 171–183 (2016)

    Google Scholar 

  28. Zhu, P., Zhi, Q., Guo, Y., et al.: Analysis of epidemic spreading process in adaptive networks[J]. IEEE Trans. Circuits Syst. II Express Briefs 66(7), 1252–1256 (2018)

    Google Scholar 

  29. Su, S., Tian, Z., Li, S., et al.: IoT root union: a decentralized name resolving system for IoT based on blockchain[J]. Inf. Process. Manage. 58(3), 102553 (2021)

    Article  Google Scholar 

  30. Ke, G., Meng, Q., Finley, T., et al.: Lightgbm: A highly efficient gradient boosting decision tree[J]. Adv. Neural. Inf. Process. Syst. 30, 3146–3154 (2017)

    Google Scholar 

  31. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system[C]//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794 (2016)

  32. Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support[J]. arXiv preprint arXiv:1810.11363 (2018)

Download references

Acknowledgements

This research was funded by NSFC (Grant Nos. U1803263, 62072131, 62073263), Science and Technology Projects in Guangzhou (No.202206030001, 202102010442), Guangdong Basic and Applied Basic Research Foundation (No.2022A1515011401), the Major Key Project of PCL (Grant No. PCL2021A09, PCL2021A02, PCL2022A03), Guangdong Higher Education Innovation Group (Grant No.2020KCXTD007) and Guangzhou Higher Education Innovation Group (Grant No.202032854).

Funding

This research was funded by NSFC (Grant Nos. U1803263, 62072131, 62073263), Science and Technology Projects in Guangzhou (No.202206030001, 202102010442), Guangdong Basic and Applied Basic Research Foundation (No.2022A1515011401), the Major Key Project of PCL (Grant No. PCL2021A09, PCL2021A02, PCL2022A03), Guangdong Higher Education Innovation Group (Grant No.2020KCXTD007) and Guangzhou Higher Education Innovation Group (Grant No.202032854).

Author information

Authors and Affiliations

Authors

Contributions

Conception and design of study: Shudong Li and Chuanyu Zhao

Data processing: Qing Li and Jiuming Huang

Analysis of experimental result: Dawei Zhao

Manuscript revision: Peican Zhu

Corresponding authors

Correspondence to Shudong Li or Qing Li.

Ethics declarations

Ethical approval and consent to participate

Not applicable.

Human and animal ethics

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Zhao, C., Li, Q. et al. BotFinder: a novel framework for social bots detection in online social networks based on graph embedding and community detection. World Wide Web 26, 1793–1809 (2023). https://doi.org/10.1007/s11280-022-01114-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-022-01114-2

Keywords

Navigation