Abstract
In this paper, we address the problem of classifying online social network users using a naively anonymized version of a social graph. We use two main user attributes defined by the graph structure to build an initial classifier, node degree and clustering coefficient, and then exploit user relationships to build a second classifier. We describe how to combine these two classifiers to build an Online Social Network (OSN) user classifier and then we evaluate the performance of our architecture by trying to solve two different classification problems (a binary and a multiclass problem) using data extracted from Twitter. Results show that the proposed classifier is sound and that both classification problems are feasible to solve by an attacker who is able to obtain a naively anonymized version of the social graph.
This work was partially supported by the Spanish MCYT and the FEDER funds under grants TSI2007-65406-C03-03 “E-AEGIS”, TIN2010-15764 “N-KHRONOUS”, and CONSOLIDER CSD2007-00004 “ARES”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boyd, D., Ellison, N.B.: Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication 13(1) (2007)
Wu, S., Hofman, J.M., Mason, W.A., Watts, D.J.: Who says what to whom on twitter. In: Proc. of World Wide Web Conference, WWW 2011 (2011)
Jernigan, C., Mistree, B.F.T.: Gaydar: Facebook friendships expose sexual orientation. First Monday 14(10) (2009)
Westin, A.: Privacy and Freedom. Atheneum (1970)
Macskassy, S.A., Provost, F.: Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res. 8, 935–983 (2007)
Macskassy, S.A., Provost, F.: A simple relational classifier. In: Proc. of the 2nd Workshop on Multi-Relational Data Mining, KDD 2003, pp. 64–76 (2003)
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD 1998: Proc. of the 1998 ACM SIGMOD International Conference on Management of Data, vol. 27, pp. 307–318. ACM Press, New York (1998)
Lu, Q., Getoor, L.: Link-based classification using labeled and unlabeled data. In: Proc. of the ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data, Washington, DC (2003)
Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: KDD 2004: Proc. of the 2004 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 593–598. ACM Press, New York (2004)
Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6(6), 721–741 (1984)
Neville, J., Jensen, D.: Iterative classification in relational data. In: AAAI 2000 Workshop on Learning Statistical Models from Relational Data (2000)
Gallagher, B., Eliassi-Rad, T.: An examination of experimental methodology for classifiers of relational data. In: Proc. of the 7th IEEE Int. Conf. on Data Mining Workshops, ICDMW 2007, pp. 411–416. IEEE Computer Society (2007)
Carvalho, V.R., Cohen, W.W.: On the collective classification of email ”speech acts”. In: SIGIR 2005: Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345–352. ACM, New York (2005)
Bhagat, S., Cormode, G., Rozenbaum, I.: Applying Link-Based Classification to Label Blogs. In: Zhang, H., Spiliopoulou, M., Mobasher, B., Giles, C.L., McCallum, A., Nasraoui, O., Srivastava, J., Yen, J. (eds.) WebKDD/SNA-KDD 2007. LNCS, vol. 5439, pp. 97–117. Springer, Heidelberg (2009)
Hay, M., Miklau, G., Jensen, D., Weis, P., Srivastava, S.: Anonymizing Social Networks. Technical report (2007)
Zheleva, E., Getoor, L.: Preserving the Privacy of Sensitive Relationships in Graph Data. In: Bonchi, F., Ferrari, E., Malin, B., Saygin, Y. (eds.) PinKDD 2007. LNCS, vol. 4890, pp. 153–171. Springer, Heidelberg (2008)
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 93–106. ACM, New York (2008)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 506–515. IEEE Computer Society, Washington, DC (2008)
Zou, L., Chen, L., Özsu, M.T.: k-automorphism: a general framework for privacy preserving network publication. Proc. VLDB Endow. 2(1), 946–957 (2009)
Ford, R., Truta, T.M., Campan, A.: P-sensitive k-anonymity for social networks. In: Stahlbock, R., Crone, S.F., Lessmann, S. (eds.) DMIN, pp. 403–409. CSREA Press (2009)
Knuth, D.E.: Art of Computer Programming: Fundamental Algorithms, 3rd edn., vol. 1. Addison-Wesley Professional (July 1997)
Hearst, M., Dumais, S., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their Applications 13(4), 18–28 (1998)
Manning, C.D., Raghavan, P., Schtze, H.: Support vector machines & machine learning on documents. In: Introduction to Information Retrieval, pp. 319–348. Cambridge University Press (2008)
Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62(1-2), 65–105 (2006)
Rocchio, J.: Relevance Feedback in Information Retrieval, pp. 313–323. Prentice Hall (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pérez-Solà, C., Herrera-Joancomartí, J. (2013). Classifying Online Social Network Users through the Social Graph. In: Garcia-Alfaro, J., Cuppens, F., Cuppens-Boulahia, N., Miri, A., Tawbi, N. (eds) Foundations and Practice of Security. FPS 2012. Lecture Notes in Computer Science, vol 7743. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37119-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-37119-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37118-9
Online ISBN: 978-3-642-37119-6
eBook Packages: Computer ScienceComputer Science (R0)