Advertisement

Mining Chains of Relations

  • Foto Aftrati
  • Gautam Das
  • Aristides Gionis
  • Heikki Mannila
  • Taneli Mielikäinen
  • Panayiotis Tsaparas
Part of the Intelligent Systems Reference Library book series (ISRL, volume 24)

Abstract

Traditional data mining methods consider the problem of mining a single relation that relates two different attributes. For example, in a scientific bibliography database, authors are related to papers, and we may be interested in discovering association rules between authors based on the papers that they have co-authored. However, in real life it is often the case that we have multiple attributes related through chains of relations. For example, authors write papers, and papers belong to one or more topics, defining a three-level chain of relations.

In this paper we consider the problem of mining such relational chains. We formulate a generic problem of finding selector sets (subsets of objects from one of the attributes) such that the projected dataset—the part of the dataset determined by the selector set—satisfies a specific property. The motivation for our approach is that a given property might not hold on the whole dataset, but holds when projecting the data on a subset of objects. We show that many existing and new data mining problems can be formulated in the framework. We discuss various algorithms and identify the conditions when apriori technique can be used. We experimentally demonstrate the effectiveness and efficiency of our methods.

Keywords

Bipartite Graph Association Rule Frequent Itemsets Frequent Subgraph Data Mining Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal, N., Liu, H., Tang, L., Yu, P.S.: Identifying the influential bloggers in a community. In: WSDM (2008)Google Scholar
  2. 2.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C, May 26-28, pp. 207–216. ACM Press, New York (1993)CrossRefGoogle Scholar
  3. 3.
    Anthony, M., Biggs, N.: Computational Learning Theory: An Introduction. Cambridge University Press, Cambridge (1997)Google Scholar
  4. 4.
    Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M., Lan, X.: Group formation in large social networks: membership, growth, and evolution. In: KDD, pp. 44–54 (2006)Google Scholar
  5. 5.
    Bayardo, R.J., Goethals, B., Zaki, M.J. (eds.): FIMI 2004, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1. ser. CEUR Workshop Proceedings, vol. 126 (2004), CEUR-WS.orgGoogle Scholar
  6. 6.
    Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link analysis ranking: Algorithms, theory, and experiments. ACM Transactions on Internet Technologies 5(1) (February 2005)Google Scholar
  7. 7.
    Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7(1), 5–22 (2003)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Calders, T., Lakshmanan, L.V.S., Ng, R.T., Paredaens, J.: Expressive power of an algebra for data mining. ACM Trans. Database Syst. 31, 1169–1214 (2006)CrossRefGoogle Scholar
  9. 9.
    Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F: Data peeler: Contraint-based closed pattern mining in n-ary relations. In: SIAM International Conference on Data Mining, pp. 37–48 (2008)Google Scholar
  11. 11.
    Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F.: Closed patterns meet n-ary relations. ACM Trans. Knowl. Discov. Data 3, 3:1–3:36 (2009)CrossRefGoogle Scholar
  12. 12.
    Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: KDD. ACM, New York (2010)Google Scholar
  13. 13.
    Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: KDD (2009)Google Scholar
  14. 14.
    Chvátal, V.: A greedy heuristic for the set-covering problem. Mathematics of Operations Research 4(3), 233–235 (1979)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Clare, A., Williams, H.E., Lester, N.: Scalable multi-relational association mining. In: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), pp. 355–358. IEEE Computer Society, Los Alamitos (2004)CrossRefGoogle Scholar
  16. 16.
    Cook, D.J., Holder, L.B.: Graph-based data mining. IEEE Intelligent Systems 15(2), 32–41 (2000)CrossRefGoogle Scholar
  17. 17.
    Costa, V.S., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., Laer, W.V.: Query transformations for improving the efficiency of ILP systems. Journal of Machine Learning Research 4, 465–491 (2003)Google Scholar
  18. 18.
    Dehaspe, L., de Raedt, L.: Mining association rules in multiple relations. In: Lavrac, N., Dzeroski, S. (eds.) ILP 1997. Proceedings, ser. Lecture Notes in Computer Science, vol. 1297, pp. 125–132. Springer, Heidelberg (1997)Google Scholar
  19. 19.
    Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)CrossRefGoogle Scholar
  20. 20.
    Deng, H., Lyu, M.R., King, I.: A generalized co-hits algorithm and its application to bipartite graphs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 239–248. ACM, New York (2009)CrossRefGoogle Scholar
  21. 21.
    Dzeroski, S., Lavrac, N. (eds.): Relational Data Mining. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  22. 22.
    Fagin, R., Guha, R.V., Kumar, R., Novak, J., Sivakumar, D., Tomkins, A.: Multi-structural databases. In: Li, C. (ed.) Proceedings of the Twenty-fourth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Baltimore, Maryland, USA, June 13-15, pp. 184–195. ACM, New York (2005)CrossRefGoogle Scholar
  23. 23.
    Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45(4), 634–652 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Garriga, G.C., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 804–809. Morgan Kaufmann Publishers Inc., San Francisco (2007)Google Scholar
  25. 25.
    Gibson, D., Kleinberg, J.M., Raghavan, P.: Inferring web communities from link topology. In: HYPERTEXT 1998. Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space - Structure in Hypermedia Systems, Pittsburgh, PA, USA, June 20-24, pp. 225–234. ACM Press, New York (1998)CrossRefGoogle Scholar
  26. 26.
    Goyal, A., Lu, W., Lakshmanan, L.V.: Celf++: optimizing the greedy algorithm for influence maximization in social networks. In: WWW, pp. 47–48. ACM Press, New York (2011)Google Scholar
  27. 27.
    Haussler, D.: Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence 36(2), 177–221 (1988)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Horváth, T.: Cyclic pattern kernels revisited. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 791–801. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  29. 29.
    Horváth, T., Gärtner, T., Wrobel, S.: Cyclic pattern kernels for predictive graph mining. In: Kim, W., Kohavi, R., Gehrke, J., DuMouchel, W. (eds.) Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004, August 22-25, pp. 158–167. ACM Press, New York (2004)CrossRefGoogle Scholar
  30. 30.
    Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA, December 19-22, pp. 549–552. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  31. 31.
    Huan, J., Wang, W., Prins, J., Yang, J.: SPIN: mining maximal frequent subgraphs from graph databases. In: Kim, W., Kohavi, R., Gehrke, J., DuMouchel, W. (eds.) Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004, August 22-25, pp. 581–586. ACM Press, New York (2004)CrossRefGoogle Scholar
  32. 32.
    Jaschke, R., Hotho, A., Schmitz, C., Ganter, B., Gerd, S.: Trias–an algorithm for mining iceberg tri-lattices. In: Proceedings of the Sixth International Conference on Data Mining, pp. 907–911. IEEE Computer Society, DC, USA (2006)CrossRefGoogle Scholar
  33. 33.
    Jeh, G., Widom, J.: Mining the space of graph properties. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004, August 22-25, pp. 187–196. ACM Press, New York (2004)CrossRefGoogle Scholar
  34. 34.
    Ji, L., Tan, K.-L., Tung, A.K.H.: Mining frequent closed cubes in 3d datasets. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB Endowment, pp. 811–822 (2006)Google Scholar
  35. 35.
    Jin, Y., Murali, T.M., Ramakrishnan, N.: Compositional mining of multirelational biological datasets. ACM Trans. Knowl. Discov. Data 2, 2:1–2:35 (2008)CrossRefGoogle Scholar
  36. 36.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system. In: ICDM, pp. 229–238 (2009)Google Scholar
  37. 37.
    Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. In: KDD (2003)Google Scholar
  38. 38.
    Kempe, D., Kleinberg, J.M., Tardos, É.: Influential nodes in a diffusion model for social networks. In: ICALP (2005)Google Scholar
  39. 39.
    Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Upfal, E.: The web as a graph. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Dallas, Texas, USA, May 15-17, pp. 1–10. ACM Press, New York (2000)CrossRefGoogle Scholar
  40. 40.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Computer Networks 31(11-16), 1481–1493 (1999)CrossRefGoogle Scholar
  41. 41.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, California, USA, November 29 - December 2, pp. 313–320. IEEE Computer Society Press, Los Alamitos (2001)CrossRefGoogle Scholar
  42. 42.
    Lappas, T., Liu, K., Terzi, E.: Finding a team of experts in social networks. In: KDD (2009)Google Scholar
  43. 43.
    Lappas, T., Terzi, E., Gunopulos, D., Mannila, H.: Finding effectors in social networks. In: KDD (2010)Google Scholar
  44. 44.
    Leskovec, J., Lang, K.J., Mahoney, M.W.: Empirical comparison of algorithms for network community detection. In: WWW (2010)Google Scholar
  45. 45.
    Long, B., Wu, X., Zhang, Z., Yu, P.S.: Unsupervised learning on k-partite graphs. In: Knowledge Discovery and Data Mining, pp. 317–326 (2006)Google Scholar
  46. 46.
    Mannila, H., Terzi, E.: Finding links and initiators: A graph-reconstruction problem. In: SDM (2009)Google Scholar
  47. 47.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Minining and Knowledge Discovery 1(3), 241–258 (1997)CrossRefGoogle Scholar
  48. 48.
    Martin, A.: General mixed integer programming: Computational issues for branch-and-cut algorithms. In: Computational Combinatorial Optimization, pp. 1–25 (2001)Google Scholar
  49. 49.
    Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRefGoogle Scholar
  50. 50.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Stanford University, Tech. Rep. (1998)Google Scholar
  51. 51.
    Pandurangan, G., Raghavan, P., Upfal, E.: Using pageRank to characterize web structure. In: Ibarra, O.H., Zhang, L. (eds.) COCOON 2002. LNCS, vol. 2387, pp. 330–339. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  52. 52.
    Rajagopalan, S., Vazirani, V.V.: Primal-dual RNC approximation algorithms for set cover and covering integer programs. SIAM Journal on Computing 28(2), 525–540 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  53. 53.
    Sarawagi, S., Sathe, G.: i 3: Intelligent, interactive investigation of OLAP data cubes. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, p. 589. ACM Press, New York (2000)CrossRefGoogle Scholar
  54. 54.
    Theodoros, L., Kun, L., Evimaria, T.: A survey of algorithms and systems for expert location in social networks. In: Aggarwal, C.C. (ed.) Social Network Data Analytics, pp. 215–241. Springer, Heidelberg (2011)Google Scholar
  55. 55.
    Tong, H., Papadimitriou, S., Sun, J., Yu, P.S., Faloutsos, C.: Colibri: fast mining of large static and dynamic graphs. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008)Google Scholar
  56. 56.
    Wang, C., Wang, W., Pei, J., Zhu, Y., Shi, B.: Scalable mining of large disk-based graph databases. In: Kim, W., Kohavi, R., Gehrke, J., DuMouchel, W. (eds.) Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, pp. 316–325. ACM Press, New York (2004)CrossRefGoogle Scholar
  57. 57.
    Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explorations 5(1), 59–68 (2003)CrossRefGoogle Scholar
  58. 58.
    Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24-27, pp. 286–295. ACM Press, New York (2003)CrossRefGoogle Scholar
  59. 59.
    Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure-based approach. In: Weikum, G., König, A.C., Deßloch, S. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13-18, pp. 335–346. ACM Press, New York (2004)CrossRefGoogle Scholar
  60. 60.
    Yannakakis, M.: Node-and edge-deletion NP-complete problems. In: Lipton, R.J., Burkhard, W., Savitch, W., Friedman, E.P., Aho, A. (eds.) Proceedings of the tenth annual ACM symposium on Theory of computing, San Diego, California, United States, May 01-03, pp. 253–264. ACM Press, New York (1978)CrossRefGoogle Scholar
  61. 61.
    Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23-26, pp. 71–80. ACM Press, New York (2002)CrossRefGoogle Scholar
  62. 62.
    Zheng, A.X., Ng, A.Y., Jordan, M.I.: Stable algorithms for link analysis. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, September 9-13, pp. 258–266. ACM Press, New York (2001)Google Scholar
  63. 63.
    Zou, Z., Gao, H., Li, J.: Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 633–642. ACM Press, New York (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Foto Aftrati
    • 1
  • Gautam Das
    • 2
  • Aristides Gionis
    • 3
  • Heikki Mannila
    • 4
  • Taneli Mielikäinen
    • 5
  • Panayiotis Tsaparas
    • 6
  1. 1.National Technical University of AthensGreece
  2. 2.University of Texas at ArlingtonUSA
  3. 3.Yahoo! ResearchBarcelona
  4. 4.University of HelsinkiFinland
  5. 5.Nokia Research CenterPalo Alto
  6. 6.Microsoft ResearchMountain View

Personalised recommendations