Advertisement

Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs

  • Guillaume Poezevara
  • Bertrand Cuissart
  • Bruno Crémilleux
Article

Abstract

Emerging patterns are patterns of great interest for discovering information from data and characterizing classes. Mining emerging patterns remains a challenge, especially with graph data. In this paper, we propose a method to mine the whole set of frequent emerging graph patterns, given a frequency threshold and an emergence threshold. Our results are achieved thanks to a change of the description of the initial problem so that we are able to design a process combining efficient algorithmic and data mining methods. Moreover, we show that the closed graph patterns are a condensed representation of the frequent emerging graph patterns and we propose a new condensed representation based on the representative pruned graph patterns: by providing shorter patterns, it is especially dedicated to represent a set of graph patterns. Experiments on a real-world database composed of chemicals show the feasibility and the efficiency of our approach.

Keywords

Data mining Emerging patterns Condensed representation Subgraph isomorphism Chemical information 

Notes

Acknowledgements

The authors would like to thank Arnaud Soulet for very fruitful discussions and the Music-dfs prototype and the CERMN lab for its invaluable help about the data and the chemical knowledge. This work is partly supported by the ANR (French Research National Agency) funded Innotox, Bingo2 projects and the Region Basse-Normandie (Innotox2 project).

References

  1. Borgelt, C., & Berthold, M. R. (2002). Mining molecular fragments: Finding relevant substructures of molecules. In Proceedings of the IEEE International Conference on Data Mining (ICDM’02) (pp. 51–58).Google Scholar
  2. Borgelt, C., Meinl, T., & Berthold, M. (2005). Moss: a program for molecular substructure mining. In Workshop Open Source Data Mining Software (pp. 6–15). ACM Press.Google Scholar
  3. Calders, T., Rigotti, C., & Boulicaut, J.-F. (2005). A survey on condensed representations for frequent sets. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.), Constraint-based mining and inductive databases. Lecture notes in computer science (Vol. 3848, pp. 64–80). Springer.Google Scholar
  4. Cook, D. J., & Holder, L. B. (2006). Mining graph data. Wiley.Google Scholar
  5. Cordella, L. P., Foggia, P., Sansone, C., & Vento, M. (1999). Performance evaluation of the vf graph matching algorithm. In ICIAP ’99: Proceedings of the 10th international conference on image analysis and processing (p. 1172). Washington, DC, USA: IEEE Computer Society.CrossRefGoogle Scholar
  6. De Raedt, L., & Kramer, S. (2001). The levelwise version space algorithm and its application to molecular fragment finding. In IJCAI’01 (pp. 853–862).Google Scholar
  7. Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth international conference on knowledge discovery and data mining (ACM SIGKDD’99) (pp. 43–52). San Diego, CA: ACM Press.CrossRefGoogle Scholar
  8. EPAFHM (2008). Mid continent ecology division (environement protection agency), fathead minnow. http://www.epa.gov/med/Prods_Pubs/fathead_minnow.htm.
  9. Garey, M. R., & Johnson, D. S. (1979). Computers and intractability. Freeman and Company.Google Scholar
  10. Hassan, M., Bielawski, J., Hempel, J., & Waldman, M. (1996). Optimization and visualization of molecular diversity of combinatorial libraries. Molecular Diversity, 2(1), 64–74.CrossRefGoogle Scholar
  11. Kramer, S., Raedt, L. D., & Helma, C. (2001). Molecular feature mining in HIV data. In KDD (pp. 136–143).Google Scholar
  12. Li, J., Dong, G., & Ramamohanarao, K. (2001). Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems, 3(2), 131–145.CrossRefGoogle Scholar
  13. Li, J., & Wong, L. (2001). Emerging patterns and gene expression data. Genome Informatics, 12, 3–13.Google Scholar
  14. Lozano, S., Poezevara, G., Halm-Lemeille, M.-P., Lescot-Fontaine, E., Lepailleur, A., Bissell-Siders, R., et al. (2010). Introduction of jumping fragments in combination with qsars for the assessment of classification in ecotoxicology. Journal of Chemical Information and Modeling, 50(8), 1330–1339.CrossRefGoogle Scholar
  15. Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3), 241–258.CrossRefGoogle Scholar
  16. Morishita, S., Sese, J., & Ward, B. (2000). Traversing itemset lattices with statistical metric pruning. In In Proc. of the 19th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 226–236). ACM.Google Scholar
  17. Ng, R. T., Lakshmanan, V. S., Han,J.,& Pang, A. (1998). Exploratory mining and pruning optimizations of constrained associations rules. In proceedings of ACM SIGMOD’98 (pp. 13–24). ACM Press.Google Scholar
  18. Nijssen, S., & Kok, J. N. (2004). A quickstart in frequent structure mining can make a difference. In W. Kim, R. Kohavi, J. Gehrke, & W. DuMouchel, (Eds.), KDD (pp. 647–652). ACM.Google Scholar
  19. Plantevit, M. & Crémilleux, B. (2009). Condensed representation of sequential patterns according to frequency-based measures. In 8th international symposium on Intelligent Data Analysis (IDA’09), Lecture Notes in Computer Science (Vol. 5772, pp. 155–166). Lyon, France: Springer.CrossRefGoogle Scholar
  20. Poezevara, G., Cuissart, B., & Crémilleux, B. (2009). Discovering emerging graph patterns from chemicals. In 18th International Symposium on Methodologies for Intelligent Systems (ISMIS’09). Lecture Notes in Artificial Intelligence (Vol. 5522, pp. 45–55). Prague, Czech Republic: Springer.Google Scholar
  21. Schervish, M. J. (1995). Theory of statisitics (chapter 7). Large sample theory (p. 467). Springer series in statisitics. Springer.Google Scholar
  22. Soulet, A., & Crémilleux, B. (2009). Mining constraint-based patterns using automatic relaxation. Intelligent Data Analysis, 13(1), 1–25.Google Scholar
  23. Soulet, A., Crémilleux, B., & Rioult, F. (2005). Knowledge discovery in inductive databases: KDID 2004. Lecture notes in computer science, chapter Condensed representation of EPs and patterns quantified by frequency-based measures, (Vol. 3377, pp. 173–190). Springer.Google Scholar
  24. Soulet, A., Kléma, J., & Crémilleux, B. (2007). Post-proceedings of the 5th international workshop on Knowledge Discovery in Inductive Databases in conjunction with ECML/PKDD 2006 (KDID’06). Lecture notes in computer science, chapter efficient mining under rich constraints derived from various datasets (Vol. 4747, pp. 223–239). Springer.Google Scholar
  25. Ting, R. M. H., & Bailey, J. (2006). Mining minimal contrast subgraph patterns. In J. Ghosh, D. Lambert, D. B. Skillicorn, & J. Srivastava, (Eds.), SDM, (pp. 638–642). SIAM.Google Scholar
  26. Veith, G., Greenwood, B., Hunter, R., Niemi, G., & Regal, R. (1988). On the intrinsic dimensionality of chemical structure space. Chemosphere, 17(8), 1617–1644CrossRefGoogle Scholar
  27. Wörlein, M., Meinl, T., Fischer, I., & Philippsen, M. (2005). A quantitative comparison of the subgraph miners mofa, gspan, FFSM, and gaston. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, & J. Gama (Eds.), Knowledge Discovery in Databases: PKDD 2005. Lecture notes in computer science (Vol. 3721, pp. 392–403). Springer.Google Scholar
  28. Yan, X., & Han, J. (2003). Closegraph: Mining closed frequent graph patterns. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’03) (pp. 286–295). New York, NY, USA: ACM.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Guillaume Poezevara
    • 1
  • Bertrand Cuissart
    • 1
  • Bruno Crémilleux
    • 1
  1. 1.Laboratoire GREYC-CNRS UMR 6072Université de Caen Basse-NormandieCaenFrance

Personalised recommendations