Graffiti: graph-based classification in heterogeneous networks

Abstract

We address the problem of multi-label classification in heterogeneous graphs, where nodes belong to different types and different types have different sets of classification labels. We present a novel approach that aims to classify nodes based on their neighborhoods. We model the mutual influence of nodes as a random walk in which the random surfer aims at distributing class labels to nodes while walking through the graph. When viewing class labels as “colors”, the random surfer is essentially spraying different node types with different color palettes; hence the name Graffiti of our method. In contrast to previous work on topic-based random surfer models, our approach captures and exploits the mutual influence of nodes of the same type based on their connections to nodes of other types. We show important properties of our algorithm such as convergence and scalability. We also confirm the practical viability of Graffiti by an experimental study on subsets of the popular social networks Flickr and LibraryThing. We demonstrate the superiority of our approach by comparing it to three other state-of-the-art techniques for graph-based classification.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2006)

    Google Scholar 

  2. 2.

    Angelova, R., Kasneci, G., Suchanek, F.M., Weikum, G.: Graffiti: node labeling in heterogeneous networks. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web. ACM, New York (2009)

    Google Scholar 

  3. 3.

    Baeza-Yates, R.A., Boldi, P., Castillo, C.: Generalizing pagerank: damping functions for link-based ranking algorithms. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 308–315. ACM, New York (2006)

    Google Scholar 

  4. 4.

    Bartal, Y.: Probabilistic approximation of metric spaces and its algorithmic applications. In: Proceedings of the 37th IEEE Symposium on Foundations of Computer Science, pp. 184–193. IEEE, Piscataway (1996)

    Google Scholar 

  5. 5.

    Berkhin, P.: Bookmark-coloring algorithm for personalized pagerank computing. Journal of Internet Mathematics 3(1), 41–46 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  6. 6.

    Bharat, K., Henzinger, M.R.: Improved algorithms for topic distillation in a hyperlinked environment. In: SIGIR 1998: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (1998)

    Google Scholar 

  7. 7.

    Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: ICML: Proceedings of the 18th International Conference on Machine Learning, pp. 19–26. ICML (2001)

  8. 8.

    Blum, A., Lafferty, J.D., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: ICML: Proceedings of the 21st International Conference on Machine Learning, pp. 97–104. ICML (2004)

  9. 9.

    Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 18th International Conference on World Wide Web, pp. 595–601. WWW (2004)

  10. 10.

    Breslin, J.G., Passant, A., Decker, S.: The Social Semantic Web. Springer, New York (2009)

    Google Scholar 

  11. 11.

    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. ACM, New York (1998)

    Google Scholar 

  12. 12.

    Cohn, D., Hofmann, T.: The missing link—a probabilistic model of document content and hypertext connectivity. In: Neural Information Processing Systems 13 (2001)

  13. 13.

    Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: KDD: Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2003)

    Google Scholar 

  14. 14.

    Feldman, R., Shatkay, H.: Link analysis for bioinformatics: current state of the art. In: Pacific Symposium on Biocomputing. PSB (2003)

  15. 15.

    Feller, W.: An Introduction to Probability Theory and its Applications, 3rd edn. Wiley, New York (1968)

    Google Scholar 

  16. 16.

    Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classification in sparsely labeled networks. In: KDD ’08: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2008)

  17. 17.

    Gao, B., Liu, T.-Y., Ma, W.-Y.: Star-structured high-order heterogeneous data co-clustering based on consistent information theory. In: ICDM ’06: Proceedings of the 6th International Conference on Data Mining. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  18. 18.

    Getoor, L.: Link mining: a new data mining challenge. SIGKDD Explor. Newsl. 5(1), 84–89 (2003)

    MathSciNet  Article  Google Scholar 

  19. 19.

    Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newsl. 7(2), 3–12 (2005)

    Article  Google Scholar 

  20. 20.

    Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). MIT Press, Cambridge (2007)

    Google Scholar 

  21. 21.

    Haggstrom, O.: Finite markov chains and algorithmic applications. In: London Mathematical Society Student Texts. Cambridge University Press, Cambridge (2001)

    Google Scholar 

  22. 22.

    Harshman, R.A.: Foundations of the parafac procedure: models and conditions for an explanatory multi-modal factor analysis. In: UCLA Working Papers in Phonetics, UMI Serials in Microform, pp. 1–84 (1970)

  23. 23.

    Haveliwala, T., Kamvar, S.: The Second Eigenvalue of the Google Matrix. Stanford University Technical Report (2003)

  24. 24.

    Haveliwala, T.H.: Topic-sensitive pagerank. In: WWW: Proceedings of the 11th International World Wide Web Conference. WWW (2002)

  25. 25.

    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. Secur. (TISSEC) 20(4), 422–446 (2002)

    Google Scholar 

  26. 26.

    Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: ACM KDD: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)

  27. 27.

    Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, Secaucus (2001)

    Google Scholar 

  28. 28.

    Joachims, T.: Transductive inference for text classification using support vector machines. In ICML: Proceedings of the 16th International Conference on Machine Learning, ICML. Morgan Kaufmann, San Mateo (1999)

    Google Scholar 

  29. 29.

    Johnson, J.K., Bickson, D., Dolev, D.: Fixing convergence of Gaussian belief propagation. In: Proceedings of the 2009 IEEE International Conference on Symposium on Information Theory - Volume 3 (ISIT’09), vol. 3, pp. 1674–1678. IEEE Press, Piscataway (2009)

    Google Scholar 

  30. 30.

    Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: metric labeling and markov random fields. In: FOCS: Proceedings of the 40th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, Los Alamitos (1999)

    Google Scholar 

  31. 31.

    Kolda, T.G., Bader, B.W., Kenny, J.P.: Higher-order web link analysis using multilinear algebra. In ICDM: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 242–249 (2005)

  32. 32.

    Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond. Princeton University Press, Princeton (2006)

    Google Scholar 

  33. 33.

    Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006)

    Google Scholar 

  34. 34.

    Lin, J., Schatz, M.: Design patterns for efficient graph algorithms in MapReduce. In: Proceedings of the 2010 Workshop on Mining and Learning with Graphs Workshop (MLG-2010) (2010)

  35. 35.

    Lu, Q., Getoor, L.: Link-based classification. In: ICML, Proceedings of the Twentieth International Conference on Machine Learning. ICML (2003)

  36. 36.

    Macskassy, S.A., Macskassy, S.A., Macskassy, S.A., Provost, F., Provost, F.: Netkit-srl: a toolkit for network learning and inference. In: NAACSOS: Proceedings of the Annual Conference of the North American Association for Computational Social and Organizational Science (2005)

  37. 37.

    Nadeau, C., Bengio, Y.: Inference for the generalization error. J. Mach. Learn. 52(3), 239–281 (2003)

    MATH  Article  Google Scholar 

  38. 38.

    Nie, L., Davison, B.D., Qi, X.: Topical link analysis for web search. In: SIGIR: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval, pp. 91–98. ACM, New York (2006)

    Google Scholar 

  39. 39.

    Oh, H.-J., Myaeng, S.H., Lee, M.-H.: A practical hypertext catergorization method using links and incrementally available class information. In: SIGIR: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2000)

    Google Scholar 

  40. 40.

    Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Tech. rep., Stanford Digital Library Technologies Project (1998)

  41. 41.

    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988)

    Google Scholar 

  42. 42.

    Richardson, M., Domingos, P.: The intelligent surfer: probabilistic combination of link and content information in PageRank. In: NIPS: Advances in Neural Information Processing Systems 14. MIT Press, Cambridge (2002)

    Google Scholar 

  43. 43.

    Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective Classification in Network Data. Tech. Rep. CS-TR-4905, University of Maryland, College Park (2008)

  44. 44.

    Sheskin, D.: Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton (2007)

    Google Scholar 

  45. 45.

    Shrager, J., Hogg, T., Huberman, B.A.: Observation of phase transitions in spreading activation networks. Science 236, 1092–1094 (1987)

    Article  Google Scholar 

  46. 46.

    Stewart, W.: Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton (1994)

    Google Scholar 

  47. 47.

    Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML: Machine Learning, Proceedings of the Twenty-Third International Conference. ICML (2006)

  48. 48.

    Wang, X., Sun, J.-T., Chen, Z., Zhai, C.: Latent semantic analysis for multiple-type interrelated data objects. In: SIGIR: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2006)

    Google Scholar 

  49. 49.

    Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newsl. 5(1), 59–68 (2003)

    Article  Google Scholar 

  50. 50.

    Wu, T.-F., Lin, C.-J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)

    MathSciNet  MATH  Google Scholar 

  51. 51.

    Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: MM: Proceedings of the 17th ACM International Conference on Multimedia, pp. 175–184. ACM, New York (2009)

    Google Scholar 

  52. 52.

    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16, pp. 321–328 (2004)

  53. 53.

    Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.: Ranking on data manifolds. In: Proceedings of the 16th Conference on Advances in Neural Information Processing Systems, vol. 16, pp. 169–176 (2004)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ralitsa Angelova.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Angelova, R., Kasneci, G. & Weikum, G. Graffiti: graph-based classification in heterogeneous networks. World Wide Web 15, 139–170 (2012). https://doi.org/10.1007/s11280-011-0126-4

Download citation

Keywords

  • graph-based classification
  • social networks
  • heterogeneous networks