World Wide Web

, Volume 15, Issue 2, pp 139–170 | Cite as

Graffiti: graph-based classification in heterogeneous networks

  • Ralitsa Angelova
  • Gjergji Kasneci
  • Gerhard Weikum
Article

Abstract

We address the problem of multi-label classification in heterogeneous graphs, where nodes belong to different types and different types have different sets of classification labels. We present a novel approach that aims to classify nodes based on their neighborhoods. We model the mutual influence of nodes as a random walk in which the random surfer aims at distributing class labels to nodes while walking through the graph. When viewing class labels as “colors”, the random surfer is essentially spraying different node types with different color palettes; hence the name Graffiti of our method. In contrast to previous work on topic-based random surfer models, our approach captures and exploits the mutual influence of nodes of the same type based on their connections to nodes of other types. We show important properties of our algorithm such as convergence and scalability. We also confirm the practical viability of Graffiti by an experimental study on subsets of the popular social networks Flickr and LibraryThing. We demonstrate the superiority of our approach by comparing it to three other state-of-the-art techniques for graph-based classification.

Keywords

graph-based classification social networks heterogeneous networks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2006)Google Scholar
  2. 2.
    Angelova, R., Kasneci, G., Suchanek, F.M., Weikum, G.: Graffiti: node labeling in heterogeneous networks. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web. ACM, New York (2009)Google Scholar
  3. 3.
    Baeza-Yates, R.A., Boldi, P., Castillo, C.: Generalizing pagerank: damping functions for link-based ranking algorithms. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 308–315. ACM, New York (2006)CrossRefGoogle Scholar
  4. 4.
    Bartal, Y.: Probabilistic approximation of metric spaces and its algorithmic applications. In: Proceedings of the 37th IEEE Symposium on Foundations of Computer Science, pp. 184–193. IEEE, Piscataway (1996)Google Scholar
  5. 5.
    Berkhin, P.: Bookmark-coloring algorithm for personalized pagerank computing. Journal of Internet Mathematics 3(1), 41–46 (2006)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Bharat, K., Henzinger, M.R.: Improved algorithms for topic distillation in a hyperlinked environment. In: SIGIR 1998: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (1998)Google Scholar
  7. 7.
    Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: ICML: Proceedings of the 18th International Conference on Machine Learning, pp. 19–26. ICML (2001)Google Scholar
  8. 8.
    Blum, A., Lafferty, J.D., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: ICML: Proceedings of the 21st International Conference on Machine Learning, pp. 97–104. ICML (2004)Google Scholar
  9. 9.
    Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 18th International Conference on World Wide Web, pp. 595–601. WWW (2004)Google Scholar
  10. 10.
    Breslin, J.G., Passant, A., Decker, S.: The Social Semantic Web. Springer, New York (2009)CrossRefGoogle Scholar
  11. 11.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. ACM, New York (1998)Google Scholar
  12. 12.
    Cohn, D., Hofmann, T.: The missing link—a probabilistic model of document content and hypertext connectivity. In: Neural Information Processing Systems 13 (2001)Google Scholar
  13. 13.
    Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: KDD: Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2003)Google Scholar
  14. 14.
    Feldman, R., Shatkay, H.: Link analysis for bioinformatics: current state of the art. In: Pacific Symposium on Biocomputing. PSB (2003)Google Scholar
  15. 15.
    Feller, W.: An Introduction to Probability Theory and its Applications, 3rd edn. Wiley, New York (1968)MATHGoogle Scholar
  16. 16.
    Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classification in sparsely labeled networks. In: KDD ’08: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2008)Google Scholar
  17. 17.
    Gao, B., Liu, T.-Y., Ma, W.-Y.: Star-structured high-order heterogeneous data co-clustering based on consistent information theory. In: ICDM ’06: Proceedings of the 6th International Conference on Data Mining. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  18. 18.
    Getoor, L.: Link mining: a new data mining challenge. SIGKDD Explor. Newsl. 5(1), 84–89 (2003)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newsl. 7(2), 3–12 (2005)CrossRefGoogle Scholar
  20. 20.
    Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). MIT Press, Cambridge (2007)Google Scholar
  21. 21.
    Haggstrom, O.: Finite markov chains and algorithmic applications. In: London Mathematical Society Student Texts. Cambridge University Press, Cambridge (2001)Google Scholar
  22. 22.
    Harshman, R.A.: Foundations of the parafac procedure: models and conditions for an explanatory multi-modal factor analysis. In: UCLA Working Papers in Phonetics, UMI Serials in Microform, pp. 1–84 (1970)Google Scholar
  23. 23.
    Haveliwala, T., Kamvar, S.: The Second Eigenvalue of the Google Matrix. Stanford University Technical Report (2003)Google Scholar
  24. 24.
    Haveliwala, T.H.: Topic-sensitive pagerank. In: WWW: Proceedings of the 11th International World Wide Web Conference. WWW (2002)Google Scholar
  25. 25.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. Secur. (TISSEC) 20(4), 422–446 (2002)Google Scholar
  26. 26.
    Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: ACM KDD: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)Google Scholar
  27. 27.
    Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, Secaucus (2001)MATHGoogle Scholar
  28. 28.
    Joachims, T.: Transductive inference for text classification using support vector machines. In ICML: Proceedings of the 16th International Conference on Machine Learning, ICML. Morgan Kaufmann, San Mateo (1999)Google Scholar
  29. 29.
    Johnson, J.K., Bickson, D., Dolev, D.: Fixing convergence of Gaussian belief propagation. In: Proceedings of the 2009 IEEE International Conference on Symposium on Information Theory - Volume 3 (ISIT’09), vol. 3, pp. 1674–1678. IEEE Press, Piscataway (2009)Google Scholar
  30. 30.
    Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: metric labeling and markov random fields. In: FOCS: Proceedings of the 40th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, Los Alamitos (1999)Google Scholar
  31. 31.
    Kolda, T.G., Bader, B.W., Kenny, J.P.: Higher-order web link analysis using multilinear algebra. In ICDM: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 242–249 (2005)Google Scholar
  32. 32.
    Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond. Princeton University Press, Princeton (2006)MATHGoogle Scholar
  33. 33.
    Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006)MATHGoogle Scholar
  34. 34.
    Lin, J., Schatz, M.: Design patterns for efficient graph algorithms in MapReduce. In: Proceedings of the 2010 Workshop on Mining and Learning with Graphs Workshop (MLG-2010) (2010)Google Scholar
  35. 35.
    Lu, Q., Getoor, L.: Link-based classification. In: ICML, Proceedings of the Twentieth International Conference on Machine Learning. ICML (2003)Google Scholar
  36. 36.
    Macskassy, S.A., Macskassy, S.A., Macskassy, S.A., Provost, F., Provost, F.: Netkit-srl: a toolkit for network learning and inference. In: NAACSOS: Proceedings of the Annual Conference of the North American Association for Computational Social and Organizational Science (2005)Google Scholar
  37. 37.
    Nadeau, C., Bengio, Y.: Inference for the generalization error. J. Mach. Learn. 52(3), 239–281 (2003)MATHCrossRefGoogle Scholar
  38. 38.
    Nie, L., Davison, B.D., Qi, X.: Topical link analysis for web search. In: SIGIR: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval, pp. 91–98. ACM, New York (2006)CrossRefGoogle Scholar
  39. 39.
    Oh, H.-J., Myaeng, S.H., Lee, M.-H.: A practical hypertext catergorization method using links and incrementally available class information. In: SIGIR: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2000)Google Scholar
  40. 40.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Tech. rep., Stanford Digital Library Technologies Project (1998)Google Scholar
  41. 41.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988)Google Scholar
  42. 42.
    Richardson, M., Domingos, P.: The intelligent surfer: probabilistic combination of link and content information in PageRank. In: NIPS: Advances in Neural Information Processing Systems 14. MIT Press, Cambridge (2002)Google Scholar
  43. 43.
    Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective Classification in Network Data. Tech. Rep. CS-TR-4905, University of Maryland, College Park (2008)Google Scholar
  44. 44.
    Sheskin, D.: Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton (2007)MATHGoogle Scholar
  45. 45.
    Shrager, J., Hogg, T., Huberman, B.A.: Observation of phase transitions in spreading activation networks. Science 236, 1092–1094 (1987)CrossRefGoogle Scholar
  46. 46.
    Stewart, W.: Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton (1994)MATHGoogle Scholar
  47. 47.
    Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML: Machine Learning, Proceedings of the Twenty-Third International Conference. ICML (2006)Google Scholar
  48. 48.
    Wang, X., Sun, J.-T., Chen, Z., Zhai, C.: Latent semantic analysis for multiple-type interrelated data objects. In: SIGIR: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2006)Google Scholar
  49. 49.
    Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newsl. 5(1), 59–68 (2003)CrossRefGoogle Scholar
  50. 50.
    Wu, T.-F., Lin, C.-J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)MathSciNetMATHGoogle Scholar
  51. 51.
    Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: MM: Proceedings of the 17th ACM International Conference on Multimedia, pp. 175–184. ACM, New York (2009)CrossRefGoogle Scholar
  52. 52.
    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16, pp. 321–328 (2004)Google Scholar
  53. 53.
    Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.: Ranking on data manifolds. In: Proceedings of the 16th Conference on Advances in Neural Information Processing Systems, vol. 16, pp. 169–176 (2004)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Ralitsa Angelova
    • 1
  • Gjergji Kasneci
    • 2
  • Gerhard Weikum
    • 3
  1. 1.GoogleZurichSwitzerland
  2. 2.Microsoft ResearchCambridgeUK
  3. 3.Max-Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations