Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs

  • 187 Accesses

  • 11 Citations

Abstract

Emerging patterns are patterns of great interest for discovering information from data and characterizing classes. Mining emerging patterns remains a challenge, especially with graph data. In this paper, we propose a method to mine the whole set of frequent emerging graph patterns, given a frequency threshold and an emergence threshold. Our results are achieved thanks to a change of the description of the initial problem so that we are able to design a process combining efficient algorithmic and data mining methods. Moreover, we show that the closed graph patterns are a condensed representation of the frequent emerging graph patterns and we propose a new condensed representation based on the representative pruned graph patterns: by providing shorter patterns, it is especially dedicated to represent a set of graph patterns. Experiments on a real-world database composed of chemicals show the feasibility and the efficiency of our approach.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    http://www.gnu.org/licenses/gpl/html

  2. 2.

    http://www.info.univ-tours.fr/~soulet/music-dfs/music-dfs.html

References

  1. Borgelt, C., & Berthold, M. R. (2002). Mining molecular fragments: Finding relevant substructures of molecules. In Proceedings of the IEEE International Conference on Data Mining (ICDM’02) (pp. 51–58).

  2. Borgelt, C., Meinl, T., & Berthold, M. (2005). Moss: a program for molecular substructure mining. In Workshop Open Source Data Mining Software (pp. 6–15). ACM Press.

  3. Calders, T., Rigotti, C., & Boulicaut, J.-F. (2005). A survey on condensed representations for frequent sets. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.), Constraint-based mining and inductive databases. Lecture notes in computer science (Vol. 3848, pp. 64–80). Springer.

  4. Cook, D. J., & Holder, L. B. (2006). Mining graph data. Wiley.

  5. Cordella, L. P., Foggia, P., Sansone, C., & Vento, M. (1999). Performance evaluation of the vf graph matching algorithm. In ICIAP ’99: Proceedings of the 10th international conference on image analysis and processing (p. 1172). Washington, DC, USA: IEEE Computer Society.

  6. De Raedt, L., & Kramer, S. (2001). The levelwise version space algorithm and its application to molecular fragment finding. In IJCAI’01 (pp. 853–862).

  7. Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth international conference on knowledge discovery and data mining (ACM SIGKDD’99) (pp. 43–52). San Diego, CA: ACM Press.

  8. EPAFHM (2008). Mid continent ecology division (environement protection agency), fathead minnow. http://www.epa.gov/med/Prods_Pubs/fathead_minnow.htm.

  9. Garey, M. R., & Johnson, D. S. (1979). Computers and intractability. Freeman and Company.

  10. Hassan, M., Bielawski, J., Hempel, J., & Waldman, M. (1996). Optimization and visualization of molecular diversity of combinatorial libraries. Molecular Diversity, 2(1), 64–74.

  11. Kramer, S., Raedt, L. D., & Helma, C. (2001). Molecular feature mining in HIV data. In KDD (pp. 136–143).

  12. Li, J., Dong, G., & Ramamohanarao, K. (2001). Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems, 3(2), 131–145.

  13. Li, J., & Wong, L. (2001). Emerging patterns and gene expression data. Genome Informatics, 12, 3–13.

  14. Lozano, S., Poezevara, G., Halm-Lemeille, M.-P., Lescot-Fontaine, E., Lepailleur, A., Bissell-Siders, R., et al. (2010). Introduction of jumping fragments in combination with qsars for the assessment of classification in ecotoxicology. Journal of Chemical Information and Modeling, 50(8), 1330–1339.

  15. Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3), 241–258.

  16. Morishita, S., Sese, J., & Ward, B. (2000). Traversing itemset lattices with statistical metric pruning. In In Proc. of the 19th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 226–236). ACM.

  17. Ng, R. T., Lakshmanan, V. S., Han,J.,& Pang, A. (1998). Exploratory mining and pruning optimizations of constrained associations rules. In proceedings of ACM SIGMOD’98 (pp. 13–24). ACM Press.

  18. Nijssen, S., & Kok, J. N. (2004). A quickstart in frequent structure mining can make a difference. In W. Kim, R. Kohavi, J. Gehrke, & W. DuMouchel, (Eds.), KDD (pp. 647–652). ACM.

  19. Plantevit, M. & Crémilleux, B. (2009). Condensed representation of sequential patterns according to frequency-based measures. In 8th international symposium on Intelligent Data Analysis (IDA’09), Lecture Notes in Computer Science (Vol. 5772, pp. 155–166). Lyon, France: Springer.

  20. Poezevara, G., Cuissart, B., & Crémilleux, B. (2009). Discovering emerging graph patterns from chemicals. In 18th International Symposium on Methodologies for Intelligent Systems (ISMIS’09). Lecture Notes in Artificial Intelligence (Vol. 5522, pp. 45–55). Prague, Czech Republic: Springer.

  21. Schervish, M. J. (1995). Theory of statisitics (chapter 7). Large sample theory (p. 467). Springer series in statisitics. Springer.

  22. Soulet, A., & Crémilleux, B. (2009). Mining constraint-based patterns using automatic relaxation. Intelligent Data Analysis, 13(1), 1–25.

  23. Soulet, A., Crémilleux, B., & Rioult, F. (2005). Knowledge discovery in inductive databases: KDID 2004. Lecture notes in computer science, chapter Condensed representation of EPs and patterns quantified by frequency-based measures, (Vol. 3377, pp. 173–190). Springer.

  24. Soulet, A., Kléma, J., & Crémilleux, B. (2007). Post-proceedings of the 5th international workshop on Knowledge Discovery in Inductive Databases in conjunction with ECML/PKDD 2006 (KDID’06). Lecture notes in computer science, chapter efficient mining under rich constraints derived from various datasets (Vol. 4747, pp. 223–239). Springer.

  25. Ting, R. M. H., & Bailey, J. (2006). Mining minimal contrast subgraph patterns. In J. Ghosh, D. Lambert, D. B. Skillicorn, & J. Srivastava, (Eds.), SDM, (pp. 638–642). SIAM.

  26. Veith, G., Greenwood, B., Hunter, R., Niemi, G., & Regal, R. (1988). On the intrinsic dimensionality of chemical structure space. Chemosphere, 17(8), 1617–1644

  27. Wörlein, M., Meinl, T., Fischer, I., & Philippsen, M. (2005). A quantitative comparison of the subgraph miners mofa, gspan, FFSM, and gaston. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, & J. Gama (Eds.), Knowledge Discovery in Databases: PKDD 2005. Lecture notes in computer science (Vol. 3721, pp. 392–403). Springer.

  28. Yan, X., & Han, J. (2003). Closegraph: Mining closed frequent graph patterns. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’03) (pp. 286–295). New York, NY, USA: ACM.

Download references

Acknowledgements

The authors would like to thank Arnaud Soulet for very fruitful discussions and the Music-dfs prototype and the CERMN lab for its invaluable help about the data and the chemical knowledge. This work is partly supported by the ANR (French Research National Agency) funded Innotox, Bingo2 projects and the Region Basse-Normandie (Innotox2 project).

Author information

Correspondence to Bruno Crémilleux.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Poezevara, G., Cuissart, B. & Crémilleux, B. Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs. J Intell Inf Syst 37, 333 (2011). https://doi.org/10.1007/s10844-011-0168-1

Download citation

Keywords

  • Data mining
  • Emerging patterns
  • Condensed representation
  • Subgraph isomorphism
  • Chemical information