Skip to main content
Log in

Mining closed patterns in relational, graph and network data

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

Recent theoretical insights have led to the introduction of efficient algorithms for mining closed item-sets. This paper investigates potential generalizations of this paradigm to mine closed patterns in relational, graph and network databases. Several semantics and associated definitions for closed patterns in relational data have been introduced in previous work, but the differences among these and the implications of the choice of semantics was not clear. The paper investigates these implications in the context of generalizing the LCM algorithm, an algorithm for enumerating closed item-sets. LCM is attractive since its run time is linear in the number of closed patterns and since it does not need to store the patterns output in order to avoid duplicates, further reducing memory signature and run time. Our investigation shows that the choice of semantics has a dramatic effect on the properties of closed patterns and as a result, in some settings a generalization of the LCM algorithm is not possible. On the other hand, we provide a full generalization of LCM for the semantic setting that has been previously used by the Claudien system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of ACM SIGMOD Conference on Management of Data, pp. 207–216 (1993)

  2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328 (1996)

  3. Arimura, H., Uno, T.: An output-polynomial time algorithm for mining frequent closed attribute trees. In: Proc. 15th Conference on Inductive Logic Programming, pp. 1–19 (2005)

  4. Balcázar, J., Garriga, G.: Horn axiomatizations for sequential data. Theor. Comput. Sci. 371, 247–264 (2007)

    Article  MATH  Google Scholar 

  5. Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. Lect. Notes Comput. Sci. 1861, 972–986 (2000)

    Article  Google Scholar 

  6. Blair, R., Fang, H., Branham, W., Hass, B., Dial, S., Moland, C., Tong, W., Shi, L., Perkins, R., Sheehan, D.: The estrogen receptor relative binding affinities of 188 natural and xenochemicals: structural diversity of ligands. Toxicol. Sci. 54, 138–153 (2000)

    Article  Google Scholar 

  7. Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On maximal frequent and minimal infrequent sets in binary matrices. Ann. Math. Artif. Intell. 39, 211–221 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  8. Branham, W., Dial, S., Moland, C., Hass, B., Blair, R., Fang, H., Shi, L., Tong, W., Perkins, R., Sheehan, D.: Binding of phytoestrogens and mycoestrogens to the rat uterine estrogen receptor. J. Nutr. 132, 658–664 (2002)

    Google Scholar 

  9. Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Proceedings 12th Pacific-Asia Conference on Knowledge Discovery in Databases, pp. 858–863. Springer (2008)

  10. De Raedt, L.: Logical and Relational Learning. Springer (2008)

  11. De Raedt, L., Dehaspe, L.: Clausal discovery. Mach. Learn. 26, 1058–1063 (1997)

    Article  Google Scholar 

  12. De Raedt, L., Ramon, J.: Condensed representations for Inductive Logic Programming. In: Proc. of the 9th International Conference on Principles of Knowledge Representation and Reasoning, pp. 438–446 (2004)

  13. Dehaspe, L., Toivonen, H.: Discovery of relational association rules. Relational Data Mining, pp. 189–208. Springer-Verlag New York, Inc. New York, NY, USA (2000)

    Google Scholar 

  14. Deshpande, M., Kuramochi, M., & Karypis, G.: Frequent sub-structure-based approaches for classifying chemical compounds. In: Proc. of the Third IEEE International Conference on Data Mining, pp. 35–42 (2003)

  15. Di Mauro, N., Basile, T., Ferilli, S., Esposito, F., Fanizzi, N.: An exhaustive matching procedure for the improvement of learning efficiency. In: Proceedings 13th International Conference on Inductive Logic Programming, pp. 112–129. Springer (2003)

  16. Fang, H., Tong, W., Shi, L., Blair, R., Perkins, R., Branham, W., Hass, B., Xie, Q., Dial, S., Moland, C., Sheehan, D.: Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. Chem. Res. Toxicol. 14, 280–294 (2001)

    Article  Google Scholar 

  17. Fiedler, M., Borgelt, C.: Support computation for mining frequent subgraphs in a single graph. In: Proceedings of the Workshop on Mining and Learning with Graphs (2007)

  18. Ganter, B., Wille, R.: Formal Concept Analysis. Mathematical Foundations. Springer (1998)

  19. Garriga, G., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (2007)

  20. Goethals, B., Zaki, M.: Advances in frequent itemset mining implementations: report on FIMI’03. SIGKDD Explor. Newsl. 6, 109–117 (2004)

    Article  Google Scholar 

  21. Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.: Discovering all most specific sentences. ACM Trans. Database Syst. 28, 140–174 (2003)

    Article  Google Scholar 

  22. Han, J., Pei, J., Yin, Y. Mining frequent patterns without candidate generation. In: Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)

  23. Horváth, T., Alexin, Z., Gyimóthy, T., Wrobel, S.: Application of different learning methods to Hungarian part-of-speech tagging. In: Proceedings 9th International Workshop on Inductive Logic Programming, pp. 128–139 (1999)

  24. Horváth, T., Turán, G.: Learning logic programs with structured background knowledge. Artif. Intell. 128, 31–97 (2001)

    Article  MATH  Google Scholar 

  25. Kramer, S., De Raedt, L.: Feature construction with version spaces for biochemical applications. In: Proceedings of the 18th International Conference on Machine Learning, pp. 258–265 (2001)

  26. Kuramochi, M., Karypis, G.: Finding frequent patterns in a large sparse graph. In: Proceedings of the Fourth SIAM International Conference on Data Mining. SIAM (2004)

  27. Kuznetsov, S.: Learning of simple conceptual graphs from positive and negative examples. In: Proceedings of the 3rd European Conference on Principles and Practive of Knowledge Discovery in Databases, pp. 384–391 (1999)

  28. Kuznetsov, S.: Machine learning and formal concept analysis. In: Proceedings of the 2nd International Conference on Formal Concept Analysis, pp. 287–312 (2004)

  29. Kuznetsov, S., Samokhin, M.: Learning closed sets of labeled graphs for chemical applications. In: Proceedings of the 15th International Conference on Inductive Logic Programming, pp. 190–208 (2005)

  30. Lloyd, J.: Foundations of Logic Programming. Springer (1987)

  31. Malerba, D., Lisi, F.: Discovering associations between spatial objects: an ILP application. In: 11th International Conference on ILP, pp. 156–163 (2001)

  32. Maloberti, J., Suzuki, E.: Improving efficiency of frequent query discovery by eliminating non-relevant candidates. Discovery Science, pp. 220–232. Springer Berlin Heidelberg, Heidelberg (2003)

    Google Scholar 

  33. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1, 241–258 (1997)

    Article  Google Scholar 

  34. McCallum, A., Nigam, K., Rennie, J., Seymore, K.: A machine learning approach to building domain-specific search engines. In: Proc. of the 16th International Joint Conference on Artificial Intelligence (1999)

  35. Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Log. Program. 20, 629–679 (1994)

    Article  Google Scholar 

  36. Nienhuys-Cheng, S., De Wolf, R.: Foundations of inductive logic programming. In: Lecture Notes in Artificial Intelligence, no. 1228. Springer (1997)

  37. Nijssen, S., Kok, J.: Efficient frequent query discovery in FARMER. In: Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 350–362 (2003)

  38. Pei, J., Han, J., Mao, R.: CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proc. of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30 (2000)

  39. Plotkin, G.D.: A note on inductive generalization. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence, vol. 5, pp. 153–163. American Elsevier (1970)

  40. Schietgat, L., Costa, F., Ramon, J., De Raedt, L.: Effective feature construction by maximum common subgraph sampling. Mach. Learn. 83, 137–161 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  41. Uno, T., Asai, T., Uchida, Y., Arimura, H.: An efficient algorithm for enumerating closed patterns in transaction databases. In: Proceedings of the 7th International Conference on Discovery Science, pp. 16–31 (2004)

  42. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of the 2nd IEEE International Conference on Data Mining, pp. 721–724 (2002)

  43. Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 286–295 (2003)

  44. Zaki, M.: Mining non-redundant association rules. Data Min. Knowl. Disc. 4, 223–248 (2004)

    Article  MathSciNet  Google Scholar 

  45. Zaki, M., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proc. of the 2nd. SIAM International Conference on Data Mining (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gemma C. Garriga.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garriga, G.C., Khardon, R. & De Raedt, L. Mining closed patterns in relational, graph and network data. Ann Math Artif Intell 69, 315–342 (2013). https://doi.org/10.1007/s10472-012-9324-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-012-9324-8

Keywords

Mathematics Subject Classifications (2010)

Navigation