Finding Itemset-Sharing Patterns in a Large Itemset-Associated Graph

  • Mutsumi Fukuzaki
  • Mio Seki
  • Hisashi Kashima
  • Jun Sese
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6119)

Abstract

Itemset mining and graph mining have attracted considerable attention in the field of data mining, since they have many important applications in various areas such as biology, marketing, and social network analysis. However, most existing studies focus only on either itemset mining or graph mining, and only a few studies have addressed a combination of both. In this paper, we introduce a new problem which we call itemset-sharing subgraph (ISS) set enumeration, where the task is to find sets of subgraphs with common itemsets in a large graph in which each vertex has an associated itemset. The problem has various interesting potential applications such as in side-effect analysis in drug discovery and the analysis of the influence of word-of-mouth communication in marketing in social networks. We propose an efficient algorithm ROBIN for finding ISS sets in such graph; this algorithm enumerates connected subgraphs having common itemsets and finds their combinations. Experiments using a synthetic network verify that our method can efficiently process networks with more than one million edges. Experiments using a real biological network show that our algorithm can find biologically interesting patterns. We also apply ROBIN to a citation network and find successful collaborative research works.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499 (1994)Google Scholar
  2. 2.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD ’00, pp. 1–12 (2000)Google Scholar
  3. 3.
    Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1(3), 259–289 (1997)CrossRefGoogle Scholar
  4. 4.
    Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: ICDM 2001, pp. 313–320 (2001)Google Scholar
  6. 6.
    Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: ICDM ’02, pp. 721 (2002)Google Scholar
  7. 7.
    Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: KDD ’04, pp. 59–68 (2004)Google Scholar
  8. 8.
    Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., Mamitsuka, H.: Incorporating gene functions as priors in model-based clustering of microarray gene expression data. Bioinformatics 24(16), i167–i173 (2008)CrossRefGoogle Scholar
  9. 9.
    Shiga, M., Takigawa, I., Mamitsuka, H.: A spectral clustering approach to optimally combining numerical vectors with a modular network. In: KDD ’07, pp. 647–656 (2007)Google Scholar
  10. 10.
    Bayardo, R.: Efficiently mining long patterns from databases. In: SIGMOD ’98, pp. 85–93 (1998)Google Scholar
  11. 11.
    Gasch, A.P., et al.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11(12), 4241–4257 (2000)Google Scholar
  12. 12.
    Knowledge Discovery Laboratory, University of Massachusetts Amherst: The Proximity DBLP database, http://kdl.cs.umass.edu/data/dblp/dblp-info.html
  13. 13.
    Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: KDD ’04, pp. 581–586 (2004)Google Scholar
  14. 14.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in knowledge discovery and data mining, pp. 307–328 (1996)Google Scholar
  15. 15.
    Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE TKDE 17(4), 462–478 (2005)Google Scholar
  16. 16.
    Ulitsky, I., Shamir, R.: Identification of functional modules using network topology and high throughput data. BMC Systems Biology 1 (2007)Google Scholar
  17. 17.
    Moser, F., Colak, R., Rafiey, A., Ester, M.: Mining cohesive patterns from graphs with feature vectors. In: SDM ’09 (2009)Google Scholar
  18. 18.
    Seki, M., Sese, J.: Identification of active biological networks and common expression conditions. In: BIBE ’08 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mutsumi Fukuzaki
    • 1
  • Mio Seki
    • 1
  • Hisashi Kashima
    • 2
  • Jun Sese
    • 1
  1. 1.Dept. of Computer ScienceOchanomizu Univ.TokyoJapan
  2. 2.Dept. of Math. InformaticsUniv. of TokyoTokyoJapan

Personalised recommendations