Abstract
Detecting fake users (also called Sybils) in online social networks is a basic security research problem. State-of-the-art approaches rely on a large amount of manually labeled users as a training set. These approaches suffer from three key limitations: (1) it is time-consuming and costly to manually label a large training set, (2) they cannot detect new Sybils in a timely fashion, and (3) they are vulnerable to Sybil attacks that leverage information of the training set. In this work, we propose SybilBlind, a structure-based Sybil detection framework that does not rely on a manually labeled training set. SybilBlind works under the same threat model as state-of-the-art structure-based methods. We demonstrate the effectiveness of SybilBlind using (1) a social network with synthetic Sybils and (2) two Twitter datasets with real Sybils. For instance, SybilBlind achieves an AUC of 0.98 on a Twitter dataset.
B. Wang and L. Zhang—Authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Our framework can also be generalized to directed social networks.
- 2.
The local community detection method [26] requires labeled benign nodes and thus is inapplicable to detect Sybils without a manually labeled training set.
- 3.
- 4.
References
1 in 10 Twitter accounts is fake. http://goo.gl/qTYbyy
Alvisi, L., Clement, A., Epasto, A., Lattanzi, S., Panconesi, A.: SoK: the evolution of sybil defense via social networks. In: IEEE S & P (2013)
Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: CEAS (2010)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Stat. Mech.: Theory Exp. (2008)
Boshmaf, Y., Logothetis, D., Siganos, G., Leria, J., Lorenzo, J.: Integro: leveraging victim prediction for robust fake account detection in OSNs. In: NDSS (2015)
Cao, Q., Sirivianos, M., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large scale social online services. In: NSDI (2012)
Danezis, G., Mittal, P.: SybilInfer: detecting Sybil nodes using social networks. In: NDSS (2009)
Fu, H., Xie, X., Rui, Y., Gong, N.Z., Sun, G., Chen, E.: Robust spammer detection in microblogs: leveraging user carefulness. ACM Trans. Intell. Syst. Technol. (TIST) (2017)
Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.: Towards online spam filtering in social networks. In: NDSS (2012)
Gao, P., Wang, B., Gong, N.Z., Kulkarni, S., Thomas, K., Mittal, P.: SybilFuse: Combining local attributes with global structure to perform robust Sybil detection. In: IEEE CNS (2018)
Ghosh, S., et al.: Understanding and combating link farming in the Twitter social network. In: WWW (2012)
Gilbert, E., Karahalios, K.: Predicting tie strength with social media. In: CHI (2009)
Gong, N.Z., Frank, M., Mittal, P.: SybilBelief: a semi-supervised learning approach for structure-based Sybil detection. IEEE TIFS 9(6), 976–987 (2014)
Hacking Election, May 2016. http://goo.gl/G8o9x0
Hacking Financial Market, May 2016. http://goo.gl/4AkWyt
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Jia, J., Wang, B., Gong, N.Z.: Random walk based fake account detection in online social networks. In: IEEE DSN, pp. 273–284 (2017)
Kontaxis, G., Polakis, I., Ioannidis, S., Markatos, E.P.: Detecting social network profile cloning. In: IEEE PERCOM Workshops (2011)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM (2010)
Liu, C., Gao, P., Wright, M., Mittal, P.: Exploiting temporal dynamics in Sybil defenses. In: ACM CCS, pp. 805–816 (2015)
Song, J., Lee, S., Kim, J.: Spam filtering in Twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_16
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: ACSAC (2010)
Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time URL spam filtering service. In: IEEE S & P (2011)
Thomas, K., McCoy, D., Grier, C., Kolcz, A., Paxson, V.: Trafficking fraudulent accounts: the role of the underground market in Twitter spam and abuse. In: USENIX Security Symposium (2013)
Viswanath, B., Post, A., Gummadi, K.P., Mislove, A.: An analysis of social network-based Sybil defenses. In: ACM SIGCOMM (2010)
Wang, A.H.: Don’t follow me - spam detection in Twitter. In: SECRYPT (2010)
Wang, B., Gong, N.Z., Fu, H.: GANG: detecting fraudulent users in online social networks via guilt-by-association on directed graphs. In: IEEE ICDM (2017)
Wang, B., Jia, J., Zhang, L., Gong, N.Z.: Structure-based Sybil detection in social networks via local rule-based propagation. IEEE Transactions on Network Science and Engineering (2018)
Wang, B., Zhang, L., Gong, N.Z.: SybilSCAR: Sybil detection in online social networks via local rule based propagation. In: IEEE INFOCOM (2017)
Wang, G., Konolige, T., Wilson, C., Wang, X.: You are how you click: clickstream analysis for Sybil detection. In: Usenix Security (2013)
Wang, G., et al.: Social turing tests: crowdsourcing Sybil detection. In: NDSS (2013)
Wei, W., Xu, F., Tan, C., Li, Q.: SybilDefender: defend against Sybil attacks in large social networks. In: IEEE INFOCOM (2012)
Wilson, C., Boe, B., Sala, A., Puttaswamy, K.P., Zhao, B.Y.: User interactions in social networks and their implications. In: EuroSys (2009)
Yang, C., Harkreader, R.C., Gu, G.: Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 318–337. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_17
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammer’s social networks for fun and profit. In: WWW (2012)
Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network Sybils in the wild. In: IMC (2011)
Yu, H., Gibbons, P.B., Kaminsky, M., Xiao, F.: SybilLimit: a near-optimal social network defense against Sybil attacks. In: IEEE S & P (2008)
Yu, H., Kaminsky, M., Gibbons, P.B., Flaxman., A.: SybilGuard: defending against Sybil attacks via social networks. In: ACM SIGCOMM (2006)
Acknowledgements
We thank the anonymous reviewers and our shepherd Jason Polakis for their constructive comments. This work was supported by NSF under grant CNS-1750198 and a research gift from JD.com.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Performance of the Average Aggregator
Theorem 2
When SybilBlind uses the average aggregator, the expected aggregated probability is 0.5 for every node.
Proof
Suppose in some sampling trial, the sampled subsets are B and S, and SybilSCAR halts after T iterations. We denote by \(q_u\) the prior probability and by \({p_u}^{(t)}\) the probability in the tth iteration for u, respectively. Note that the subsets \(B'=S\) and \(S'=B\) are sampled by the sampler with the same probability. We denote by \(q_u'\) the prior probability and by \({p_u}^{(t)'}\) the probability in the tth iteration for u, respectively, when SybilSCAR uses the subsets \(B'\) and \(S'\). We prove that \({q}_u'=1 -{q}_u\) and \({p}_u^{(t)'} = 1-{p}_u^{(t)}\) for every node u and iteration t. First, we have:
which means that \({{q}_u}'=1-{q}_u\) for every node.
We have \({p_u}^{(0)'} = {q_u}'\) and \({p_u}^{(0)} = {q_u}\). Therefore, \({p}_u^{(0)'} =1 -{p}_u^{(0)}\) holds for every node in the 0th iteration. We can also show that \({p}_u^{(t)'} = 1-{p}_u^{(t)}\) holds for every node in the tth iteration if \({p}_u^{(t-1)'} = 1-{p}_u^{(t-1)}\) holds for every node. Therefore, \({p}_u^{(t)'} = 1-{p}_u^{(t)}\) holds for every node u and iteration t. As a result, with the sampled subsets \(B'\) and \(S'\), SybilSCAR also halts after T iterations. Moreover, the average probability in the two sampling trials (i.e., the sampled subsets are B and S, and \(B'=S\) and \(S'=B\)) is 0.5 for every node. For each pair of sampled subsets B and S, there is a pair of subsets \(B'=S\) and \(S'=B\) that are sampled by our sampler with the same probability. Therefore, the expected aggregated probability is 0.5 for every node.
B Proof of Theorem 1
Lower Bound: We have:
We note that this lower bound is very loose because we simply ignore the cases where \(\text {Pr}(0<\alpha _b \le \tau , 0<\alpha _s \le \tau )\). However, this lower bound is sufficient to give us qualitative understanding.
Upper Bound: We observe that the probability that label noise in both the benign region and the Sybil region are no bigger than \(\tau \) is bounded by the probability that label noise in the benign region or the Sybil region is no bigger than \(\tau \). Formally, we have:
Next, we will bound the probabilities \(\text {Pr}(\alpha _b \le \tau )\) and \(\text {Pr}( \alpha _s \le \tau )\) separately. We will take \(\text {Pr}(\alpha _b \le \tau )\) as an example to show the derivations, and similar derivations can be used to bound \(\text {Pr}( \alpha _s \le \tau )\).
We observe the following equivalent equations:
We define n random variables \(X_1, X_2, \cdots , X_n\) and n random variables \(Y_1, Y_2, \cdots , Y_n\) as follows:
where \(i=1,2,\cdots , n\). According to our definitions, we have \(\text {Pr}(X_i=\tau )=1 - r\) and \(\text {Pr}(Y_i=\tau - 1)=1 - r\), where \(i=1,2,\cdots , n\). Moreover, we denote S as the sum of these random variables, i.e., \(S=\sum _{i=1}^n X_i + \sum _{i=1}^n Y_i\). Then, the expected value of S is \(E(S)=-(1-2\tau )(1-r)n\). With the variables S and E(S), we can further rewrite Eq. 6 as follows:
According to Hoeffding’s inequality [17], we have
Similarly, we can derive an upper bound of \(Pr( \alpha _s \le \tau )\) as follows:
Since we consider \(r<0.5\) in this work, we have:
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, B., Zhang, L., Gong, N.Z. (2018). SybilBlind: Detecting Fake Users in Online Social Networks Without Manual Labels. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2018. Lecture Notes in Computer Science(), vol 11050. Springer, Cham. https://doi.org/10.1007/978-3-030-00470-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-00470-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00469-9
Online ISBN: 978-3-030-00470-5
eBook Packages: Computer ScienceComputer Science (R0)