Advertisement

Applying Link-Based Classification to Label Blogs

  • Smriti Bhagat
  • Graham Cormode
  • Irina Rozenbaum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5439)

Abstract

In analyzing data from social and communication networks, we encounter the problem of classifying objects where there is explicit link structure amongst the objects. We study the problem of inferring the classification of all the objects from a labeled subset, using only link-based information between objects.

We abstract the above as a labeling problem on multigraphs with weighted edges. We present two classes of algorithms, based on local and global similarities. Then we focus on multigraphs induced by blog data, and carefully apply our general algorithms to specifically infer labels such as age, gender and location associated with the blog based only on the link-structure amongst them. We perform a comprehensive set of experiments with real, large-scale blog data sets and show that significant accuracy is possible from little or no non-link information, and our methods scale to millions of nodes and edges.

Keywords

Graph labeling Relational learning Social Networks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adamic, L.A., Glance, N.: The political blogosphere and the 2004 U.S. election: divided they blog. In: International Workshop on Link Discovery (LinkKDD), pp. 36–43 (2005)Google Scholar
  2. 2.
    Van Assche, A., Vens, C., Blockeel, H., Džeroski, S.: A random forest approach to relational learning. In: Workshop on Statistical Relational Learning (2004)Google Scholar
  3. 3.
    Bhagat, S., Cormode, G., Muthukrishnan, S., Rozenbaum, I., Xue, H.: No blog is an island - analyzing connections across information networks. In: International Conference on Weblogs and Social Media (2007)Google Scholar
  4. 4.
    Burger, J.D., Henderson, J.C.: Barely legal writers: An exploration of features for predicting blogger age. In: AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs (2006)Google Scholar
  5. 5.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: ACM SIGMOD (1998)Google Scholar
  6. 6.
    Domingos, P., Richardson, M.: Markov logic: A unifying framework for statistical relational learning. In: Workshop on Statistical Relational Learning (2004)Google Scholar
  7. 7.
    Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Hu, J., Zeng, H.-J., Li, H., Niu, C., Chen, Z.: Demographic prediction based on user’s browsing behavior. In: International World Wide Web Conference (2007)Google Scholar
  9. 9.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC (1998)Google Scholar
  10. 10.
    Lu, Q., Getoor, L.: Link-based classification. In: International Conference on Machine Learning (2003)Google Scholar
  11. 11.
    MacKinnon, I., Warren, R.H.: Age and geographic inferences of the LiveJournal social network. In: Statistical Network Analysis Workshop (2006)Google Scholar
  12. 12.
    Macskassy, S.A., Provost, F.: A simple relational classifier. In: Workshop on Multi-Relational Data Mining (2003)Google Scholar
  13. 13.
    McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Annual Review of Sociology 27, 415–444 (2001)CrossRefGoogle Scholar
  14. 14.
    Mishne, G.: Experiments with mood classification in blog posts. In: Workshop on Stylistic Analysis of Text for Information Access (2005)Google Scholar
  15. 15.
    Neville, J., Jensen, D.: Iterative Classification in Relational Data. In: Workshop on Learning Statistical Models from Relational Data (2000)Google Scholar
  16. 16.
    Neville, J., Jensen, D., Friedland, L., Hay, M.: Learning relational probability trees. In: ACM Conference on Knowledge Discovery and Data Mining (SIGKDD) (2003)Google Scholar
  17. 17.
    Qu, H., Pietra, A.L., Poon, S.: Classifying blogs using NLP: Challenges and pitfalls. In: AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs (2006)Google Scholar
  18. 18.
    Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs (2006)Google Scholar
  19. 19.
    Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Conference on Uncertainty in Artificial Intelligence (2002)Google Scholar
  20. 20.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  21. 21.
    Yedidia, J., Freeman, W., Weiss, Y.: Generalized belief propagation. In: Advances in Neural Information Processing Systems (NIPS) (2000)Google Scholar
  22. 22.
    Zhang, T., Popescul, A., Dom, B.: Linear prediction models with graph regularization for web-page categorization. In: ACM Conference on Knowledge Discovery and Data Mining (SIGKDD) (2006)Google Scholar
  23. 23.
    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems (2004)Google Scholar
  24. 24.
    Zhou, D., Huang, J., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: International Conference on Machine Learning, pp. 1041–1048 (2005)Google Scholar
  25. 25.
    Zhu, X.: Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison (2006)Google Scholar
  26. 26.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: International Conference on Machine Learning (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Smriti Bhagat
    • 1
  • Graham Cormode
    • 1
  • Irina Rozenbaum
    • 1
  1. 1.Rutgers UniversityUSA

Personalised recommendations