Node Classification in Social Networks

  • Smriti BhagatEmail author
  • Graham Cormode
  • S. Muthukrishnan


When dealing with large graphs, such as those that arise in the context of online social networks, a subset of nodes may be labeled. These labels can indicate demographic values, interest, beliefs or other characteristics of the nodes (users). A core problem is to use this information to extend the labeling so that all nodes are assigned a label (or labels).

In this chapter, we survey classification techniques that have been proposed for this problem. We consider two broad categories: methods based on iterative application of traditional classifiers using graph information as features, and methods which propagate the existing labels via random walks. We adopt a common perspective on these methods to highlight the similarities between different approaches within and across the two categories. We also describe some extensions and related directions to the central problem of node classification.


Node classification Graph labeling Semi-supervised learning Iterative methods 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    G. Aggarwal, N. Ailon, F. Constantin, E. Even-Dar, J. Feldman, G. Frahling, M. R. Henzinger, S. Muthukrishnan, N. Nisan, M. Pal, M. Sandler, and A. Sidiropoulos. Theory research at google. ACM SIGACT News archive, 39, 2008.Google Scholar
  2. [2]
    A. Azran. The rendezvous algorithm: Multiclass semi-supervised learning with markov random walks. In ICML, 2007.Google Scholar
  3. [3]
    S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: Taking random walks through the view graph. In WWW, 2008.Google Scholar
  4. [4]
    Y. Bengio, O. Delalleau, and N. Le Roux. Label propagation and quadratic criterion. In O. Chapelle, B. Scholkopf, and A. Zien, editors, Semi-Supervised Learning, pages 193–216. MIT Press, 2006.Google Scholar
  5. [5]
    S. Bhagat, G. Cormode, and I. Rozenbaum. Applying link-based classification to label blogs. In Joint 9thWEBKDDand 1st SNA-KDDWorkshop, 2007.Google Scholar
  6. [6]
    S. Bhagat, S.Muthukrishnan, and D. Sivakumar. Hierarchical probabilistic node labeling, 2010. Manuscript.Google Scholar
  7. [7]
    A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In ICML, 2001.Google Scholar
  8. [8]
    S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In ACM SIGMOD, 1998.Google Scholar
  9. [9]
    J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.Google Scholar
  10. [10]
    N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, 1999.Google Scholar
  11. [11]
    A. B. Goldberg, X. Zhu, and S. Wright. Dissimilarity in graph-based semisupervised classification. In Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS), 2007.Google Scholar
  12. [12]
    G. H. Golub and C. F. Van Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, 1996.Google Scholar
  13. [13]
    A. Goyal, F. Bonchi, and L. V. S. Lakshmanan. Learning influence probabilities in social networks. In WSDM, 2010.Google Scholar
  14. [14]
    G. Jeh and J. Widom. Scaling personalized web search. In WWW, 2003.Google Scholar
  15. [15]
    J. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. In FOCS, 1999.Google Scholar
  16. [16]
    D. Krushevskaja and S. Muthukrishnan. Inferring multi-labels on relationships, 2010. Manuscript.Google Scholar
  17. [17]
    A. N. Langville and C. D.Meyer. The use of linear algebra by web search engines. IMAGE Newsletter, 33:2–6, December 2004.Google Scholar
  18. [18]
    A. Lenhart and M. Madden. Teens, privacy and online social networks., 2007.Google Scholar
  19. [19]
    J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In WWW, 2010.Google Scholar
  20. [20]
    Q. Lu and L. Getoor. Link-based classification. In ICML, 2003.Google Scholar
  21. [21]
    S. A. Macskassy and F. Provost. A simple relational classifier. In MRDM Workshop, SIGKDD, 2003.Google Scholar
  22. [22]
    Sofus A. Macskassy and Foster Provost. Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning, 8:935–983, May 2007.Google Scholar
  23. [23]
    F. McSherry. Spectral partitioning of random graphs. In FOCS, 2001.Google Scholar
  24. [24]
    S. Muthukrishnan, B. Ghosh, and M. H. Schultz. First- and second-order diffusive methods for rapid, coarse, distributed load balancing. Theory Comput. Syst., 31(4), 1998.Google Scholar
  25. [25]
    S. Muthukrishnan and T. Suel. Second-order methods for distributed approximate single- and multicommodity flow. In RANDOM, 1998.Google Scholar
  26. [26]
    J. Neville and D. Jensen. Iterative classification in relational data. In Workshop on Learning Statistical Models from Relational Data, AAAI, 2000.Google Scholar
  27. [27]
    J. Neville and F. Provost. Predictive modeling with social networks., 2009.Google Scholar
  28. [28]
    A. D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating pagerank on graph streams. In PODS, 2008.Google Scholar
  29. [29]
    P. Sen, G. M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93–106, 2008.Google Scholar
  30. [30]
    J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997.Google Scholar
  31. [31]
    M. Szummer and T. Jaakkola. Partially labeled classification with markov random walks. In NIPS, 2001.Google Scholar
  32. [32]
    B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, 2002.Google Scholar
  33. [33]
    B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In IJCAI, 2001.Google Scholar
  34. [34]
    J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. In IEEE Transactions on Information Theory, 2005.Google Scholar
  35. [35]
    W. W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33:452–473, 1977.Google Scholar
  36. [36]
    D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In NIPS, 2004.Google Scholar
  37. [37]
    Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. In VLDB, 2009.Google Scholar
  38. [38]
    X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In ICML, 2003.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Smriti Bhagat
    • 1
    Email author
  • Graham Cormode
    • 2
  • S. Muthukrishnan
    • 1
  1. 1.Rutgers UniversityPiscataway, NJUSA
  2. 2.AT&T Labs–ResearchFlorham Park, NJUSA

Personalised recommendations