Information Marginalization on Subgraphs

  • Jiayuan Huang
  • Tingshao Zhu
  • Russell Greiner
  • Dengyong Zhou
  • Dale Schuurmans
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)


Real-world data often involves objects that exhibit multiple relationships; for example, ‘papers’ and ‘authors’ exhibit both paper-author interactions and paper-paper citation relationships. A typical learning problem requires one to make inferences about a subclass of objects (e.g. ‘papers’), while using the remaining objects and relations to provide relevant information. We present a simple, unified mechanism for incorporating information from multiple object types and relations when learning on a targeted subset. In this scheme, all sources of relevant information are marginalized onto the target subclass via random walks. We show that marginalized random walks can be used as a general technique for combining multiple sources of information in relational data. With this approach, we formulate new algorithms for transduction and ranking in relational data, and quantify the performance of new schemes on real world data—achieving good results in many problems.


  1. 1.
    Bekkerman, R., El-Yaniv, E., McCallum, A.: Multiway distributional clustering via pairwise interactions. In: ICML (2005)Google Scholar
  2. 2.
    Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioing. In: KDD (2001)Google Scholar
  3. 3.
    Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: KDD (2003)Google Scholar
  4. 4.
    El-Yaniv, R., Souroujon, O.: Iterative double clustering for unsupervised and semi-supervised learning. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Gao, B., Liu, T., Zheng, X., Cheng, Q., Ma, W.: Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: KDD (2005)Google Scholar
  6. 6.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. JACM 46 (1999)Google Scholar
  7. 7.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford (1998)Google Scholar
  8. 8.
    Tishby, N., Pereira, F., Bialek, W.: The information bottleneck method. In: Proceedings 37th Allerton Conference (1999)Google Scholar
  9. 9.
    Zhang, H., He, X., Ding, C., Gu, M.: Bipartite graph partitioning and data clustering. In: Proceedings of ACM CIKM 2001 (2001)Google Scholar
  10. 10.
    Zhou, D., Huang, J., Scholkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: ICML (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiayuan Huang
    • 1
    • 2
  • Tingshao Zhu
    • 2
  • Russell Greiner
    • 2
  • Dengyong Zhou
    • 3
  • Dale Schuurmans
    • 2
  1. 1.University of WaterlooWaterlooCanada
  2. 2.University of AlbertaEdmontonCanada
  3. 3.NEC Laboratories America, Inc. 

Personalised recommendations