Classifying Online Social Network Users through the Social Graph

  • Cristina Pérez-Solà
  • Jordi Herrera-Joancomartí
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7743)


In this paper, we address the problem of classifying online social network users using a naively anonymized version of a social graph. We use two main user attributes defined by the graph structure to build an initial classifier, node degree and clustering coefficient, and then exploit user relationships to build a second classifier. We describe how to combine these two classifiers to build an Online Social Network (OSN) user classifier and then we evaluate the performance of our architecture by trying to solve two different classification problems (a binary and a multiclass problem) using data extracted from Twitter. Results show that the proposed classifier is sound and that both classification problems are feasible to solve by an attacker who is able to obtain a naively anonymized version of the social graph.


Online Social Networks Relational Classifiers Graph Anonymization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boyd, D., Ellison, N.B.: Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication 13(1) (2007)Google Scholar
  2. 2.
    Wu, S., Hofman, J.M., Mason, W.A., Watts, D.J.: Who says what to whom on twitter. In: Proc. of World Wide Web Conference, WWW 2011 (2011)Google Scholar
  3. 3.
    Jernigan, C., Mistree, B.F.T.: Gaydar: Facebook friendships expose sexual orientation. First Monday 14(10) (2009)Google Scholar
  4. 4.
    Westin, A.: Privacy and Freedom. Atheneum (1970)Google Scholar
  5. 5.
    Macskassy, S.A., Provost, F.: Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res. 8, 935–983 (2007)Google Scholar
  6. 6.
    Macskassy, S.A., Provost, F.: A simple relational classifier. In: Proc. of the 2nd Workshop on Multi-Relational Data Mining, KDD 2003, pp. 64–76 (2003)Google Scholar
  7. 7.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD 1998: Proc. of the 1998 ACM SIGMOD International Conference on Management of Data, vol. 27, pp. 307–318. ACM Press, New York (1998)CrossRefGoogle Scholar
  8. 8.
    Lu, Q., Getoor, L.: Link-based classification using labeled and unlabeled data. In: Proc. of the ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data, Washington, DC (2003)Google Scholar
  9. 9.
    Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: KDD 2004: Proc. of the 2004 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 593–598. ACM Press, New York (2004)CrossRefGoogle Scholar
  10. 10.
    Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6(6), 721–741 (1984)CrossRefGoogle Scholar
  11. 11.
    Neville, J., Jensen, D.: Iterative classification in relational data. In: AAAI 2000 Workshop on Learning Statistical Models from Relational Data (2000)Google Scholar
  12. 12.
    Gallagher, B., Eliassi-Rad, T.: An examination of experimental methodology for classifiers of relational data. In: Proc. of the 7th IEEE Int. Conf. on Data Mining Workshops, ICDMW 2007, pp. 411–416. IEEE Computer Society (2007)Google Scholar
  13. 13.
    Carvalho, V.R., Cohen, W.W.: On the collective classification of email ”speech acts”. In: SIGIR 2005: Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345–352. ACM, New York (2005)CrossRefGoogle Scholar
  14. 14.
    Bhagat, S., Cormode, G., Rozenbaum, I.: Applying Link-Based Classification to Label Blogs. In: Zhang, H., Spiliopoulou, M., Mobasher, B., Giles, C.L., McCallum, A., Nasraoui, O., Srivastava, J., Yen, J. (eds.) WebKDD/SNA-KDD 2007. LNCS, vol. 5439, pp. 97–117. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Hay, M., Miklau, G., Jensen, D., Weis, P., Srivastava, S.: Anonymizing Social Networks. Technical report (2007)Google Scholar
  16. 16.
    Zheleva, E., Getoor, L.: Preserving the Privacy of Sensitive Relationships in Graph Data. In: Bonchi, F., Ferrari, E., Malin, B., Saygin, Y. (eds.) PinKDD 2007. LNCS, vol. 4890, pp. 153–171. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 93–106. ACM, New York (2008)CrossRefGoogle Scholar
  18. 18.
    Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 506–515. IEEE Computer Society, Washington, DC (2008)CrossRefGoogle Scholar
  19. 19.
    Zou, L., Chen, L., Özsu, M.T.: k-automorphism: a general framework for privacy preserving network publication. Proc. VLDB Endow. 2(1), 946–957 (2009)Google Scholar
  20. 20.
    Ford, R., Truta, T.M., Campan, A.: P-sensitive k-anonymity for social networks. In: Stahlbock, R., Crone, S.F., Lessmann, S. (eds.) DMIN, pp. 403–409. CSREA Press (2009)Google Scholar
  21. 21.
    Knuth, D.E.: Art of Computer Programming: Fundamental Algorithms, 3rd edn., vol. 1. Addison-Wesley Professional (July 1997)Google Scholar
  22. 22.
    Hearst, M., Dumais, S., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their Applications 13(4), 18–28 (1998)CrossRefGoogle Scholar
  23. 23.
    Manning, C.D., Raghavan, P., Schtze, H.: Support vector machines & machine learning on documents. In: Introduction to Information Retrieval, pp. 319–348. Cambridge University Press (2008)Google Scholar
  24. 24.
    Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62(1-2), 65–105 (2006)CrossRefGoogle Scholar
  25. 25.
    Rocchio, J.: Relevance Feedback in Information Retrieval, pp. 313–323. Prentice Hall (1971)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Cristina Pérez-Solà
    • 1
  • Jordi Herrera-Joancomartí
    • 1
    • 2
  1. 1.Dept. d’Enginyeria de la Informació i les ComunicacionsUniversitat Autònoma de BarcelonaBellaterraSpain
  2. 2.Internet Interdisciplinary Institute (IN3)UOCSpain

Personalised recommendations