SybilBlind: Detecting Fake Users in Online Social Networks Without Manual Labels

Wang, Binghui; Zhang, Le; Gong, Neil Zhenqiang

doi:10.1007/978-3-030-00470-5_11

Binghui Wang¹⁷,
Le Zhang¹⁷ &
Neil Zhenqiang Gong¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11050))

Included in the following conference series:

International Symposium on Research in Attacks, Intrusions, and Defenses

4868 Accesses
10 Citations
1 Altmetric

Abstract

Detecting fake users (also called Sybils) in online social networks is a basic security research problem. State-of-the-art approaches rely on a large amount of manually labeled users as a training set. These approaches suffer from three key limitations: (1) it is time-consuming and costly to manually label a large training set, (2) they cannot detect new Sybils in a timely fashion, and (3) they are vulnerable to Sybil attacks that leverage information of the training set. In this work, we propose SybilBlind, a structure-based Sybil detection framework that does not rely on a manually labeled training set. SybilBlind works under the same threat model as state-of-the-art structure-based methods. We demonstrate the effectiveness of SybilBlind using (1) a social network with synthetic Sybils and (2) two Twitter datasets with real Sybils. For instance, SybilBlind achieves an AUC of 0.98 on a Twitter dataset.

B. Wang and L. Zhang—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our framework can also be generalized to directed social networks.
2.
The local community detection method [26] requires labeled benign nodes and thus is inapplicable to detect Sybils without a manually labeled training set.
3.
http://home.engineering.iastate.edu/~neilgong/dataset.html.
4.
https://sites.google.com/site/findcommunities/.

References

1 in 10 Twitter accounts is fake. http://goo.gl/qTYbyy
Alvisi, L., Clement, A., Epasto, A., Lattanzi, S., Panconesi, A.: SoK: the evolution of sybil defense via social networks. In: IEEE S & P (2013)
Google Scholar
Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Article MathSciNet Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: CEAS (2010)
Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Stat. Mech.: Theory Exp. (2008)
Google Scholar
Boshmaf, Y., Logothetis, D., Siganos, G., Leria, J., Lorenzo, J.: Integro: leveraging victim prediction for robust fake account detection in OSNs. In: NDSS (2015)
Google Scholar
Cao, Q., Sirivianos, M., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large scale social online services. In: NSDI (2012)
Google Scholar
Danezis, G., Mittal, P.: SybilInfer: detecting Sybil nodes using social networks. In: NDSS (2009)
Google Scholar
Fu, H., Xie, X., Rui, Y., Gong, N.Z., Sun, G., Chen, E.: Robust spammer detection in microblogs: leveraging user carefulness. ACM Trans. Intell. Syst. Technol. (TIST) (2017)
Google Scholar
Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.: Towards online spam filtering in social networks. In: NDSS (2012)
Google Scholar
Gao, P., Wang, B., Gong, N.Z., Kulkarni, S., Thomas, K., Mittal, P.: SybilFuse: Combining local attributes with global structure to perform robust Sybil detection. In: IEEE CNS (2018)
Google Scholar
Ghosh, S., et al.: Understanding and combating link farming in the Twitter social network. In: WWW (2012)
Google Scholar
Gilbert, E., Karahalios, K.: Predicting tie strength with social media. In: CHI (2009)
Google Scholar
Gong, N.Z., Frank, M., Mittal, P.: SybilBelief: a semi-supervised learning approach for structure-based Sybil detection. IEEE TIFS 9(6), 976–987 (2014)
Google Scholar
Hacking Election, May 2016. http://goo.gl/G8o9x0
Hacking Financial Market, May 2016. http://goo.gl/4AkWyt
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Article MathSciNet Google Scholar
Jia, J., Wang, B., Gong, N.Z.: Random walk based fake account detection in online social networks. In: IEEE DSN, pp. 273–284 (2017)
Google Scholar
Kontaxis, G., Polakis, I., Ioannidis, S., Markatos, E.P.: Detecting social network profile cloning. In: IEEE PERCOM Workshops (2011)
Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM (2010)
Google Scholar
Liu, C., Gao, P., Wright, M., Mittal, P.: Exploiting temporal dynamics in Sybil defenses. In: ACM CCS, pp. 805–816 (2015)
Google Scholar
Song, J., Lee, S., Kim, J.: Spam filtering in Twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_16
Chapter Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: ACSAC (2010)
Google Scholar
Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time URL spam filtering service. In: IEEE S & P (2011)
Google Scholar
Thomas, K., McCoy, D., Grier, C., Kolcz, A., Paxson, V.: Trafficking fraudulent accounts: the role of the underground market in Twitter spam and abuse. In: USENIX Security Symposium (2013)
Google Scholar
Viswanath, B., Post, A., Gummadi, K.P., Mislove, A.: An analysis of social network-based Sybil defenses. In: ACM SIGCOMM (2010)
Google Scholar
Wang, A.H.: Don’t follow me - spam detection in Twitter. In: SECRYPT (2010)
Google Scholar
Wang, B., Gong, N.Z., Fu, H.: GANG: detecting fraudulent users in online social networks via guilt-by-association on directed graphs. In: IEEE ICDM (2017)
Google Scholar
Wang, B., Jia, J., Zhang, L., Gong, N.Z.: Structure-based Sybil detection in social networks via local rule-based propagation. IEEE Transactions on Network Science and Engineering (2018)
Google Scholar
Wang, B., Zhang, L., Gong, N.Z.: SybilSCAR: Sybil detection in online social networks via local rule based propagation. In: IEEE INFOCOM (2017)
Google Scholar
Wang, G., Konolige, T., Wilson, C., Wang, X.: You are how you click: clickstream analysis for Sybil detection. In: Usenix Security (2013)
Google Scholar
Wang, G., et al.: Social turing tests: crowdsourcing Sybil detection. In: NDSS (2013)
Google Scholar
Wei, W., Xu, F., Tan, C., Li, Q.: SybilDefender: defend against Sybil attacks in large social networks. In: IEEE INFOCOM (2012)
Google Scholar
Wilson, C., Boe, B., Sala, A., Puttaswamy, K.P., Zhao, B.Y.: User interactions in social networks and their implications. In: EuroSys (2009)
Google Scholar
Yang, C., Harkreader, R.C., Gu, G.: Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 318–337. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_17
Chapter Google Scholar
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammer’s social networks for fun and profit. In: WWW (2012)
Google Scholar
Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network Sybils in the wild. In: IMC (2011)
Google Scholar
Yu, H., Gibbons, P.B., Kaminsky, M., Xiao, F.: SybilLimit: a near-optimal social network defense against Sybil attacks. In: IEEE S & P (2008)
Google Scholar
Yu, H., Kaminsky, M., Gibbons, P.B., Flaxman., A.: SybilGuard: defending against Sybil attacks via social networks. In: ACM SIGCOMM (2006)
Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers and our shepherd Jason Polakis for their constructive comments. This work was supported by NSF under grant CNS-1750198 and a research gift from JD.com.

Author information

Authors and Affiliations

ECE Department, Iowa State University, Ames, USA
Binghui Wang, Le Zhang & Neil Zhenqiang Gong

Authors

Binghui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Le Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Neil Zhenqiang Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Binghui Wang .

Editor information

Editors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
Michael Bailey
Ruhr-Universität Bochum, Bochum, Germany
Thorsten Holz
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Manolis Stamatogiannakis
Foundation for Research & Technology – Hellas, Heraklion, Crete, Greece
Sotiris Ioannidis

Appendices

A Performance of the Average Aggregator

Theorem 2

When SybilBlind uses the average aggregator, the expected aggregated probability is 0.5 for every node.

Proof

Suppose in some sampling trial, the sampled subsets are B and S, and SybilSCAR halts after T iterations. We denote by $q_u$ the prior probability and by ${p_u}^{(t)}$ the probability in the tth iteration for u, respectively. Note that the subsets $B'=S$ and $S'=B$ are sampled by the sampler with the same probability. We denote by $q_u'$ the prior probability and by ${p_u}^{(t)'}$ the probability in the tth iteration for u, respectively, when SybilSCAR uses the subsets $B'$ and $S'$. We prove that ${q}_u'=1 -{q}_u$ and ${p}_u^{(t)'} = 1-{p}_u^{(t)}$ for every node u and iteration t. First, we have:

$$\begin{aligned} {q}_u' = {\left\{ \begin{array}{ll} 0.5 - \theta &{}= 1-{q}_u \,\,\text { if } u \in S \\ 0.5 + \theta &{}= 1-{q}_u \,\, \text { if } u \in B \\ 0.5 &{}= 1-{q}_u \,\,\text { otherwise,} \end{array}\right. } \end{aligned}$$

which means that ${{q}_u}'=1-{q}_u$ for every node.

We have ${p_u}^{(0)'} = {q_u}'$ and ${p_u}^{(0)} = {q_u}$. Therefore, ${p}_u^{(0)'} =1 -{p}_u^{(0)}$ holds for every node in the 0th iteration. We can also show that ${p}_u^{(t)'} = 1-{p}_u^{(t)}$ holds for every node in the tth iteration if ${p}_u^{(t-1)'} = 1-{p}_u^{(t-1)}$ holds for every node. Therefore, ${p}_u^{(t)'} = 1-{p}_u^{(t)}$ holds for every node u and iteration t. As a result, with the sampled subsets $B'$ and $S'$, SybilSCAR also halts after T iterations. Moreover, the average probability in the two sampling trials (i.e., the sampled subsets are B and S, and $B'=S$ and $S'=B$) is 0.5 for every node. For each pair of sampled subsets B and S, there is a pair of subsets $B'=S$ and $S'=B$ that are sampled by our sampler with the same probability. Therefore, the expected aggregated probability is 0.5 for every node.

B Proof of Theorem 1

Lower Bound: We have:

$$\begin{aligned} \text {Pr}(\alpha _b \le \tau , \alpha _s \le \tau )&\ge \text {Pr}(\alpha _b=\alpha _s = 0) =(1-r)^n r^n. \end{aligned}$$

(4)

We note that this lower bound is very loose because we simply ignore the cases where $\text {Pr}(0<\alpha _b \le \tau , 0<\alpha _s \le \tau )$. However, this lower bound is sufficient to give us qualitative understanding.

Upper Bound: We observe that the probability that label noise in both the benign region and the Sybil region are no bigger than $\tau $ is bounded by the probability that label noise in the benign region or the Sybil region is no bigger than $\tau $. Formally, we have:

$$\begin{aligned} \text {Pr}(\alpha _b \le \tau , \alpha _s \le \tau ) \le \min \{\text {Pr}(\alpha _b \le \tau ), \text {Pr}( \alpha _s \le \tau ) \} \end{aligned}$$

(5)

Next, we will bound the probabilities $\text {Pr}(\alpha _b \le \tau )$ and $\text {Pr}( \alpha _s \le \tau )$ separately. We will take $\text {Pr}(\alpha _b \le \tau )$ as an example to show the derivations, and similar derivations can be used to bound $\text {Pr}( \alpha _s \le \tau )$.

We observe the following equivalent equations:

$$\begin{aligned} \text {Pr}(\alpha _b \le \tau )&=\text {Pr}(\frac{n_{sb}}{n_{sb} + n_{bb} } \le \tau ) =\text {Pr}(\tau n_{bb} + (\tau - 1) n_{sb} \ge 0) \end{aligned}$$

(6)

We define n random variables $X_1, X_2, \cdots , X_n$ and n random variables $Y_1, Y_2, \cdots , Y_n$ as follows:

$$\begin{aligned}&X_i = {\left\{ \begin{array}{ll} \tau &{}\, \, \, \, \,\,\, \text { if the } i \text {th node in B is benign} \\ 0 &{}\, \, \, \, \,\,\, \text { otherwise} \\ \end{array}\right. } \\&Y_i = {\left\{ \begin{array}{ll} \tau -1 &{}\text { if the } i \text {th node in S is benign} \\ 0 &{}\text { otherwise,} \end{array}\right. } \end{aligned}$$

where $i=1,2,\cdots , n$. According to our definitions, we have $\text {Pr}(X_i=\tau )=1 - r$ and $\text {Pr}(Y_i=\tau - 1)=1 - r$, where $i=1,2,\cdots , n$. Moreover, we denote S as the sum of these random variables, i.e., $S=\sum _{i=1}^n X_i + \sum _{i=1}^n Y_i$. Then, the expected value of S is $E(S)=-(1-2\tau )(1-r)n$. With the variables S and E(S), we can further rewrite Eq. 6 as follows:

$$\begin{aligned} \text {Pr}(\alpha _b \le \tau )= \text {Pr}( S -E(S) \ge -E(S)) \end{aligned}$$

According to Hoeffding’s inequality [17], we have

$$\begin{aligned} \text {Pr}( S -E(S) \ge -E(S))&\le \text {exp}\Big (-\frac{2E^2(s)}{(\tau ^2 + (1-\tau )^2)n}\Big ) =\text {exp}\Big (-\frac{2(1-2\tau )^2(1-r)^2n}{\tau ^2 + (1-\tau )^2}\Big ) \end{aligned}$$

Similarly, we can derive an upper bound of $Pr( \alpha _s \le \tau )$ as follows:

$$\begin{aligned} \text {Pr}( \alpha _s \le \tau )&\le \text {exp}\Big (-\frac{2(1-2\tau )^2 r^2 n}{\tau ^2 + (1-\tau )^2}\Big ) \end{aligned}$$

(7)

Since we consider $r<0.5$ in this work, we have:

$$\begin{aligned} \min \{\text {Pr}(\alpha _b \le \tau ), \text {Pr}( \alpha _s \le \tau ) \} = \text {exp}\Big (-\frac{2(1-2\tau )^2(1-r)^2n}{\tau ^2 + (1-\tau )^2}\Big ) \end{aligned}$$

(8)

By combining Eqs. 5 and 8, we obtain Eq. 3.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, B., Zhang, L., Gong, N.Z. (2018). SybilBlind: Detecting Fake Users in Online Social Networks Without Manual Labels. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2018. Lecture Notes in Computer Science(), vol 11050. Springer, Cham. https://doi.org/10.1007/978-3-030-00470-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-00470-5_11
Published: 07 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00469-9
Online ISBN: 978-3-030-00470-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics