Abstract
When dealing with large graphs, such as those that arise in the context of online social networks, a subset of nodes may be labeled. These labels can indicate demographic values, interest, beliefs or other characteristics of the nodes (users). A core problem is to use this information to extend the labeling so that all nodes are assigned a label (or labels).
In this chapter, we survey classification techniques that have been proposed for this problem. We consider two broad categories: methods based on iterative application of traditional classifiers using graph information as features, and methods which propagate the existing labels via random walks. We adopt a common perspective on these methods to highlight the similarities between different approaches within and across the two categories. We also describe some extensions and related directions to the central problem of node classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
G. Aggarwal, N. Ailon, F. Constantin, E. Even-Dar, J. Feldman, G. Frahling, M. R. Henzinger, S. Muthukrishnan, N. Nisan, M. Pal, M. Sandler, and A. Sidiropoulos. Theory research at google. ACM SIGACT News archive, 39, 2008.
A. Azran. The rendezvous algorithm: Multiclass semi-supervised learning with markov random walks. In ICML, 2007.
S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: Taking random walks through the view graph. In WWW, 2008.
Y. Bengio, O. Delalleau, and N. Le Roux. Label propagation and quadratic criterion. In O. Chapelle, B. Scholkopf, and A. Zien, editors, Semi-Supervised Learning, pages 193–216. MIT Press, 2006.
S. Bhagat, G. Cormode, and I. Rozenbaum. Applying link-based classification to label blogs. In Joint 9thWEBKDDand 1st SNA-KDDWorkshop, 2007.
S. Bhagat, S.Muthukrishnan, and D. Sivakumar. Hierarchical probabilistic node labeling, 2010. Manuscript.
A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In ICML, 2001.
S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In ACM SIGMOD, 1998.
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, 1999.
A. B. Goldberg, X. Zhu, and S. Wright. Dissimilarity in graph-based semisupervised classification. In Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS), 2007.
G. H. Golub and C. F. Van Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, 1996.
A. Goyal, F. Bonchi, and L. V. S. Lakshmanan. Learning influence probabilities in social networks. In WSDM, 2010.
G. Jeh and J. Widom. Scaling personalized web search. In WWW, 2003.
J. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. In FOCS, 1999.
D. Krushevskaja and S. Muthukrishnan. Inferring multi-labels on relationships, 2010. Manuscript.
A. N. Langville and C. D.Meyer. The use of linear algebra by web search engines. IMAGE Newsletter, 33:2–6, December 2004.
A. Lenhart and M. Madden. Teens, privacy and online social networks. http://www.pewinternet.org/Reports/2007/Teens-Privacy-and-Online-Social-Networks.aspx, 2007.
J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In WWW, 2010.
Q. Lu and L. Getoor. Link-based classification. In ICML, 2003.
S. A. Macskassy and F. Provost. A simple relational classifier. In MRDM Workshop, SIGKDD, 2003.
Sofus A. Macskassy and Foster Provost. Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning, 8:935–983, May 2007.
F. McSherry. Spectral partitioning of random graphs. In FOCS, 2001.
S. Muthukrishnan, B. Ghosh, and M. H. Schultz. First- and second-order diffusive methods for rapid, coarse, distributed load balancing. Theory Comput. Syst., 31(4), 1998.
S. Muthukrishnan and T. Suel. Second-order methods for distributed approximate single- and multicommodity flow. In RANDOM, 1998.
J. Neville and D. Jensen. Iterative classification in relational data. In Workshop on Learning Statistical Models from Relational Data, AAAI, 2000.
J. Neville and F. Provost. Predictive modeling with social networks. http://www.cs.purdue.edu/homes/neville/courses/icwsm09-tutorial.html, 2009.
A. D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating pagerank on graph streams. In PODS, 2008.
P. Sen, G. M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93–106, 2008.
J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997.
M. Szummer and T. Jaakkola. Partially labeled classification with markov random walks. In NIPS, 2001.
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, 2002.
B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In IJCAI, 2001.
J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. In IEEE Transactions on Information Theory, 2005.
W. W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33:452–473, 1977.
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In NIPS, 2004.
Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. In VLDB, 2009.
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In ICML, 2003.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Bhagat, S., Cormode, G., Muthukrishnan, S. (2011). Node Classification in Social Networks. In: Aggarwal, C. (eds) Social Network Data Analytics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8462-3_5
Download citation
DOI: https://doi.org/10.1007/978-1-4419-8462-3_5
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-8461-6
Online ISBN: 978-1-4419-8462-3
eBook Packages: Computer ScienceComputer Science (R0)