Annals of Mathematics and Artificial Intelligence

, Volume 69, Issue 4, pp 315–342 | Cite as

Mining closed patterns in relational, graph and network data

Article

Abstract

Recent theoretical insights have led to the introduction of efficient algorithms for mining closed item-sets. This paper investigates potential generalizations of this paradigm to mine closed patterns in relational, graph and network databases. Several semantics and associated definitions for closed patterns in relational data have been introduced in previous work, but the differences among these and the implications of the choice of semantics was not clear. The paper investigates these implications in the context of generalizing the LCM algorithm, an algorithm for enumerating closed item-sets. LCM is attractive since its run time is linear in the number of closed patterns and since it does not need to store the patterns output in order to avoid duplicates, further reducing memory signature and run time. Our investigation shows that the choice of semantics has a dramatic effect on the properties of closed patterns and as a result, in some settings a generalization of the LCM algorithm is not possible. On the other hand, we provide a full generalization of LCM for the semantic setting that has been previously used by the Claudien system.

Keywords

Closed relational patterns Relational data Graphs Networks Algorithms 

Mathematics Subject Classifications (2010)

97R40 68T27 68Q55 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of ACM SIGMOD Conference on Management of Data, pp. 207–216 (1993)Google Scholar
  2. 2.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328 (1996)Google Scholar
  3. 3.
    Arimura, H., Uno, T.: An output-polynomial time algorithm for mining frequent closed attribute trees. In: Proc. 15th Conference on Inductive Logic Programming, pp. 1–19 (2005)Google Scholar
  4. 4.
    Balcázar, J., Garriga, G.: Horn axiomatizations for sequential data. Theor. Comput. Sci. 371, 247–264 (2007)CrossRefMATHGoogle Scholar
  5. 5.
    Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. Lect. Notes Comput. Sci. 1861, 972–986 (2000)CrossRefGoogle Scholar
  6. 6.
    Blair, R., Fang, H., Branham, W., Hass, B., Dial, S., Moland, C., Tong, W., Shi, L., Perkins, R., Sheehan, D.: The estrogen receptor relative binding affinities of 188 natural and xenochemicals: structural diversity of ligands. Toxicol. Sci. 54, 138–153 (2000)CrossRefGoogle Scholar
  7. 7.
    Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On maximal frequent and minimal infrequent sets in binary matrices. Ann. Math. Artif. Intell. 39, 211–221 (2003)CrossRefMATHMathSciNetGoogle Scholar
  8. 8.
    Branham, W., Dial, S., Moland, C., Hass, B., Blair, R., Fang, H., Shi, L., Tong, W., Perkins, R., Sheehan, D.: Binding of phytoestrogens and mycoestrogens to the rat uterine estrogen receptor. J. Nutr. 132, 658–664 (2002)Google Scholar
  9. 9.
    Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Proceedings 12th Pacific-Asia Conference on Knowledge Discovery in Databases, pp. 858–863. Springer (2008)Google Scholar
  10. 10.
    De Raedt, L.: Logical and Relational Learning. Springer (2008)Google Scholar
  11. 11.
    De Raedt, L., Dehaspe, L.: Clausal discovery. Mach. Learn. 26, 1058–1063 (1997)CrossRefGoogle Scholar
  12. 12.
    De Raedt, L., Ramon, J.: Condensed representations for Inductive Logic Programming. In: Proc. of the 9th International Conference on Principles of Knowledge Representation and Reasoning, pp. 438–446 (2004)Google Scholar
  13. 13.
    Dehaspe, L., Toivonen, H.: Discovery of relational association rules. Relational Data Mining, pp. 189–208. Springer-Verlag New York, Inc. New York, NY, USA (2000)Google Scholar
  14. 14.
    Deshpande, M., Kuramochi, M., & Karypis, G.: Frequent sub-structure-based approaches for classifying chemical compounds. In: Proc. of the Third IEEE International Conference on Data Mining, pp. 35–42 (2003)Google Scholar
  15. 15.
    Di Mauro, N., Basile, T., Ferilli, S., Esposito, F., Fanizzi, N.: An exhaustive matching procedure for the improvement of learning efficiency. In: Proceedings 13th International Conference on Inductive Logic Programming, pp. 112–129. Springer (2003)Google Scholar
  16. 16.
    Fang, H., Tong, W., Shi, L., Blair, R., Perkins, R., Branham, W., Hass, B., Xie, Q., Dial, S., Moland, C., Sheehan, D.: Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. Chem. Res. Toxicol. 14, 280–294 (2001)CrossRefGoogle Scholar
  17. 17.
    Fiedler, M., Borgelt, C.: Support computation for mining frequent subgraphs in a single graph. In: Proceedings of the Workshop on Mining and Learning with Graphs (2007)Google Scholar
  18. 18.
    Ganter, B., Wille, R.: Formal Concept Analysis. Mathematical Foundations. Springer (1998)Google Scholar
  19. 19.
    Garriga, G., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (2007)Google Scholar
  20. 20.
    Goethals, B., Zaki, M.: Advances in frequent itemset mining implementations: report on FIMI’03. SIGKDD Explor. Newsl. 6, 109–117 (2004)CrossRefGoogle Scholar
  21. 21.
    Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.: Discovering all most specific sentences. ACM Trans. Database Syst. 28, 140–174 (2003)CrossRefGoogle Scholar
  22. 22.
    Han, J., Pei, J., Yin, Y. Mining frequent patterns without candidate generation. In: Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)Google Scholar
  23. 23.
    Horváth, T., Alexin, Z., Gyimóthy, T., Wrobel, S.: Application of different learning methods to Hungarian part-of-speech tagging. In: Proceedings 9th International Workshop on Inductive Logic Programming, pp. 128–139 (1999)Google Scholar
  24. 24.
    Horváth, T., Turán, G.: Learning logic programs with structured background knowledge. Artif. Intell. 128, 31–97 (2001)CrossRefMATHGoogle Scholar
  25. 25.
    Kramer, S., De Raedt, L.: Feature construction with version spaces for biochemical applications. In: Proceedings of the 18th International Conference on Machine Learning, pp. 258–265 (2001)Google Scholar
  26. 26.
    Kuramochi, M., Karypis, G.: Finding frequent patterns in a large sparse graph. In: Proceedings of the Fourth SIAM International Conference on Data Mining. SIAM (2004)Google Scholar
  27. 27.
    Kuznetsov, S.: Learning of simple conceptual graphs from positive and negative examples. In: Proceedings of the 3rd European Conference on Principles and Practive of Knowledge Discovery in Databases, pp. 384–391 (1999)Google Scholar
  28. 28.
    Kuznetsov, S.: Machine learning and formal concept analysis. In: Proceedings of the 2nd International Conference on Formal Concept Analysis, pp. 287–312 (2004)Google Scholar
  29. 29.
    Kuznetsov, S., Samokhin, M.: Learning closed sets of labeled graphs for chemical applications. In: Proceedings of the 15th International Conference on Inductive Logic Programming, pp. 190–208 (2005)Google Scholar
  30. 30.
    Lloyd, J.: Foundations of Logic Programming. Springer (1987)Google Scholar
  31. 31.
    Malerba, D., Lisi, F.: Discovering associations between spatial objects: an ILP application. In: 11th International Conference on ILP, pp. 156–163 (2001)Google Scholar
  32. 32.
    Maloberti, J., Suzuki, E.: Improving efficiency of frequent query discovery by eliminating non-relevant candidates. Discovery Science, pp. 220–232. Springer Berlin Heidelberg, Heidelberg (2003)Google Scholar
  33. 33.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1, 241–258 (1997)CrossRefGoogle Scholar
  34. 34.
    McCallum, A., Nigam, K., Rennie, J., Seymore, K.: A machine learning approach to building domain-specific search engines. In: Proc. of the 16th International Joint Conference on Artificial Intelligence (1999)Google Scholar
  35. 35.
    Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Log. Program. 20, 629–679 (1994)CrossRefGoogle Scholar
  36. 36.
    Nienhuys-Cheng, S., De Wolf, R.: Foundations of inductive logic programming. In: Lecture Notes in Artificial Intelligence, no. 1228. Springer (1997)Google Scholar
  37. 37.
    Nijssen, S., Kok, J.: Efficient frequent query discovery in FARMER. In: Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 350–362 (2003)Google Scholar
  38. 38.
    Pei, J., Han, J., Mao, R.: CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proc. of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30 (2000)Google Scholar
  39. 39.
    Plotkin, G.D.: A note on inductive generalization. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence, vol. 5, pp. 153–163. American Elsevier (1970)Google Scholar
  40. 40.
    Schietgat, L., Costa, F., Ramon, J., De Raedt, L.: Effective feature construction by maximum common subgraph sampling. Mach. Learn. 83, 137–161 (2011)CrossRefMATHMathSciNetGoogle Scholar
  41. 41.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: An efficient algorithm for enumerating closed patterns in transaction databases. In: Proceedings of the 7th International Conference on Discovery Science, pp. 16–31 (2004)Google Scholar
  42. 42.
    Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of the 2nd IEEE International Conference on Data Mining, pp. 721–724 (2002)Google Scholar
  43. 43.
    Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 286–295 (2003)Google Scholar
  44. 44.
    Zaki, M.: Mining non-redundant association rules. Data Min. Knowl. Disc. 4, 223–248 (2004)CrossRefMathSciNetGoogle Scholar
  45. 45.
    Zaki, M., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proc. of the 2nd. SIAM International Conference on Data Mining (2002)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  1. 1.INRIA Lille Nord EuropeLilleFrance
  2. 2.Department of Computer ScienceTufts UniversityMedfordUSA
  3. 3.Department of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium

Personalised recommendations