The Model of Most Informative Patterns and Its Application to Knowledge Extraction from Graph Databases

  • Frédéric Pennerath
  • Amedeo Napoli
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5782)


This article introduces the class of Most Informative Patterns (MIPs) for characterizing a given dataset. MIPs form a reduced subset of non redundant closed patterns that are extracted from data thanks to a scoring function depending on domain knowledge. Accordingly, MIPs are designed for providing experts good insights on the content of datasets during data analysis. The article presents the model of MIPs and their formal properties wrt other kinds of patterns. Then, two algorithms for extracting MIPs are detailed: the first directly searches for MIPs in a dataset while the second screens MIPs from frequent patterns. The efficiencies of both algorithms are compared when applied to reference datasets. Finally the application of MIPs to labelled graphs, here molecular graphs, is discussed.


Association Rule Frequent Pattern Frequent Itemset Direct Extraction Knowledge Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., pp. 207–216. ACM Press, New York (1993)CrossRefGoogle Scholar
  2. 2.
    Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, pp. 647–652. ACM Press, New York (2004)Google Scholar
  3. 3.
    Soulet, A., Crémilleux, B.: Extraction des top-k motifs par approximer-et-pousser. In: Noirhomme-Fraiture, M., Venturini, G. (eds.) Extraction et gestion des connaissances (EGC 2007), Actes des cinquièmes journées Extraction et Gestion des Connaissances, Namur, Belgique. Volume RNTI-E-9 of Revue des Nouvelles Technologies de l’Information, Cépaduès-Éditions, January 23-26, vol. 2, pp. 271–282 (2007)Google Scholar
  4. 4.
    Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) SDM. SIAM, Philadelphia (2006)Google Scholar
  5. 5.
    Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. of Art. Intell. Res. 1, 231–255 (1994)Google Scholar
  6. 6.
    Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: Li, Y., Liu, B., Sarawagi, S. (eds.) KDD, pp. 204–212. ACM, New York (2008)CrossRefGoogle Scholar
  7. 7.
    Pennerath, F., Napoli, A.: La famille des motifs les plus informatifs. application à l’extraction de graphes en chimie organique. Revue I3 8(2), 252 (2008)Google Scholar
  8. 8.
    McKay, B.D.: Practical graph isomorphism. Congr. Numer. 30, 45–87 (1981)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE T. Knowl. Data. En. 12(3), 372–390 (2000)CrossRefGoogle Scholar
  10. 10.
    Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, vol. 51. IEEE Computer Society, Los Alamitos (2002)Google Scholar
  11. 11.
    Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: A case study. In: KDD, pp. 254–260 (1999)Google Scholar
  14. 14.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. International Journal of Information Systems 24(1), 25–46 (1999)zbMATHGoogle Scholar
  15. 15.
    Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7(1), 5–22 (2003)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Soulet, A., Crémilleux, B.: Adequate condensed representations of patterns. Data Min. Knowl. Discov. 17(1), 94–110 (2008)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing itemset patterns: a profile-based approach. In: Grossman, R., Bayardo, R.J., Bennett, K.P. (eds.) KDD, pp. 314–323. ACM, New York (2005)Google Scholar
  18. 18.
    Chen, C., Lin, C.X., Yan, X., Han, J.: On effective presentation of graph patterns: a structural representative approach. In: Shanahan, J.G., Amer-Yahia, S., Manolescu, I., Zhang, Y., Evans, D.A., Kolcz, A., Choi, K.S., Chowdhury, A. (eds.) CIKM, pp. 299–308. ACM, New York (2008)CrossRefGoogle Scholar
  19. 19.
    Gallo, A., Bie, T.D., Cristianini, N.: Mini: Mining informative non-redundant itemsets. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 438–445. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  20. 20.
    Hasan, M.A., Chaoji, V., Salem, S., Besson, J., Zaki, M.J.: Origami: Mining representative orthogonal graph patterns. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), pp. 153–162. IEEE Computer Society, Los Alamitos (2007)CrossRefGoogle Scholar
  21. 21.
    Bringmann, B., Zimmermann, A.: One in a million: picking the right patterns. Knowledge and Information Systems (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Frédéric Pennerath
    • 1
    • 2
  • Amedeo Napoli
    • 2
  1. 1.SupélecMetzFrance
  2. 2.Orpailleur team, LORIA, BP 239Vandoeuvre-lès-Nancy CedexFrance

Personalised recommendations