Abstract
With the widespread popularity of online social networks (OSNs), the number of users has also increased exponentially in recent years. At the same time, Social bots, i.e. accounts that controlled by program, are also on the rise. Service providers of OSNs often use them to keep social networks active. Meanwhile, some social bots are also registered for malicious purposes. It is necessary to detect these malicious social bots to present a real public opinion environment. We propose BotFinder, a framework to detect malicious social bots in OSNs. Specifically, it combines machine learning and graph methods so that the potential features of social bots can be effectively extracted. Regarding the feature engineering, we generate second order features and use coding methods to encode variables that have high cardinality. These features make full use of both labelled and unlabeled samples. With respect to the graphs, we firstly generate node vectors through embedding method, following which the similarity between vectors of humans and bots can be further calculated; Then, we use an unsupervised method to diffuse labels and thus the performance can be improved again. To valid the performance of the proposed method, we conduct extensive experiments on the dataset provided by an artificial intelligence contest which is composed of over eight million records of users. Results show that our approach reaches a F1-score of 0.8850, which is much better compared to the state of the art.
Similar content being viewed by others
Data availability
Not applicable.
References
Yang F, Liu Y, Yu X, et al.: Automatic detection of rumor on sina weibo[C]//Proceedings of the ACM SIGKDD workshop on mining data semantics. 1–7 (2012)
Bessi A, Ferrara E.: Social bots distort the 2016 US Presidential election online discussion[J]. First monday, 21(11–7) (2016)
Costa, B.C., Alberto, B.L.A., Portela, A.M., et al.: Fraud detection in electric power distribution networks using an ann-based knowledge-discovery process[J]. Int. J. Artif. Intell. Appl. 4(6), 17 (2013)
Chang, W.H., Chang, J.S.: An effective early fraud detection method for online auctions[J]. Electron. Commer. Res. Appl. 11(4), 346–360 (2012)
Ganji, V.R., Mannem, S.N.P.: Credit card fraud detection using anti-k nearest neighbor algorithm[J]. Int. J. Comput. Sci. Eng. 4(6), 1035–1039 (2012)
Ferrara, E.: Disinformation and social bot operations in the run up to the 2017 French presidential election[J]. arXiv preprint arXiv:1707.00086, (2017)
Stella, M., Ferrara, E., De Domenico, M.: Bots increase exposure to negative and inflammatory content in online social systems[J]. Proc. Natl. Acad. Sci. 115(49), 12435–12440 (2018)
Stukal, D., Sanovich, S., Bonneau, R., et al.: Detecting bots on Russian political Twitter[J]. Big Data 5(4), 310–324 (2017)
Cai C, Li L, Zengi D. Behavior enhanced deep bot detection in social media[C]//2017 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, 128–130 (2017)
Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection[J]. Inf. Sci. 467, 312–322 (2018)
Cresci, S., Di Pietro, R., Petrocchi, M., et al.: DNA-inspired online behavioral modeling and its application to spambot detection[J]. IEEE Intell. Syst. 31(5), 58–64 (2016)
Cresci, S., Di Pietro, R., Petrocchi, M., et al.: Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling[J]. IEEE Trans. Dependable Secure Comput. 15(4), 561–576 (2017)
Chen Z, Subramanian D.: An unsupervised approach to detect spam campaigns that use botnets on twitter[J]. arXiv preprint arXiv:1804.05232, (2018)
Jiang, M., Cui, P., Beutel, A., et al.: Catching synchronized behaviors in large networks: A graph mining approach[J]. ACM Trans. Knowl. Discov. Data 10(4), 1–27 (2016)
Su, S., Tian, Z., Liang, S., et al.: A reputation management scheme for efficient malicious vehicle identification over 5G networks[J]. IEEE Wirel. Commun. 27(3), 46–52 (2020)
Mazza M, Cresci S, Avvenuti M, et al.: Rtbust: Exploiting temporal patterns for botnet detection on twitter[C]//Proceedings of the 10th ACM Conference on Web Science. 183–192 (2019)
Guillaume, L.: Fast unfolding of communities in large networks[J]. J. Stat. Mech.: Theory Exp. 10, P1008 (2008)
Li, S., Jiang, L., Wu, X., et al.: A weighted network community detection algorithm based on deep learning[J]. Appl. Math. Comput. 401, 126012 (2021)
Lerer A, Wu L, Shen J, et al.: Pytorch-biggraph: A large-scale graph embedding system[J]. arXiv preprint arXiv:1903.12287 (2019)
Yu, W., Cheng, W., Aggarwal, C.C., et al.: Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2672–2681 (2018)
Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864 (2016)
Pham, P., Nguyen, L.T.T., Vo, B., et al.: Bot2Vec: a general approach of intra-community oriented representation learning for bot detection in different types of social networks[J]. Inf. Syst. 103, 101771 (2022)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907 (2016)
Aljohani, N.R., Fayoumi, A., Hassan, S.U.: Bot prediction on social networks of Twitter in altmetrics using deep graph convolutional networks[J]. Soft Computing, 1–12 (2020)
Li, S., Zhao, D., Wu, X., et al.: Functional immunization of networks based on message passing[J]. Appl. Math. Comput. 366, 124728 (2020)
Nie, Y., Jia, Y., Li, S., et al.: Identifying users across social networks based on dynamic core interests[J]. Neurocomputing 210, 107–115 (2016)
Gao, C., Liu, J.: Network-based modeling for characterizing human collective behaviors during extreme events[J]. IEEE Trans. Syst. Man Cybernetics: Syst. 47(1), 171–183 (2016)
Zhu, P., Zhi, Q., Guo, Y., et al.: Analysis of epidemic spreading process in adaptive networks[J]. IEEE Trans. Circuits Syst. II Express Briefs 66(7), 1252–1256 (2018)
Su, S., Tian, Z., Li, S., et al.: IoT root union: a decentralized name resolving system for IoT based on blockchain[J]. Inf. Process. Manage. 58(3), 102553 (2021)
Ke, G., Meng, Q., Finley, T., et al.: Lightgbm: A highly efficient gradient boosting decision tree[J]. Adv. Neural. Inf. Process. Syst. 30, 3146–3154 (2017)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system[C]//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794 (2016)
Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support[J]. arXiv preprint arXiv:1810.11363 (2018)
Acknowledgements
This research was funded by NSFC (Grant Nos. U1803263, 62072131, 62073263), Science and Technology Projects in Guangzhou (No.202206030001, 202102010442), Guangdong Basic and Applied Basic Research Foundation (No.2022A1515011401), the Major Key Project of PCL (Grant No. PCL2021A09, PCL2021A02, PCL2022A03), Guangdong Higher Education Innovation Group (Grant No.2020KCXTD007) and Guangzhou Higher Education Innovation Group (Grant No.202032854).
Funding
This research was funded by NSFC (Grant Nos. U1803263, 62072131, 62073263), Science and Technology Projects in Guangzhou (No.202206030001, 202102010442), Guangdong Basic and Applied Basic Research Foundation (No.2022A1515011401), the Major Key Project of PCL (Grant No. PCL2021A09, PCL2021A02, PCL2022A03), Guangdong Higher Education Innovation Group (Grant No.2020KCXTD007) and Guangzhou Higher Education Innovation Group (Grant No.202032854).
Author information
Authors and Affiliations
Contributions
Conception and design of study: Shudong Li and Chuanyu Zhao
Data processing: Qing Li and Jiuming Huang
Analysis of experimental result: Dawei Zhao
Manuscript revision: Peican Zhu
Corresponding authors
Ethics declarations
Ethical approval and consent to participate
Not applicable.
Human and animal ethics
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, S., Zhao, C., Li, Q. et al. BotFinder: a novel framework for social bots detection in online social networks based on graph embedding and community detection. World Wide Web 26, 1793–1809 (2023). https://doi.org/10.1007/s11280-022-01114-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-022-01114-2