Skip to main content
Log in

MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Graph-based data mining approaches have been mainly proposed to the task popularly known as frequent subgraph mining subject to a single user preference, like frequency, size, etc. In this work, we propose to deal with the frequent subgraph mining problem from multiobjective optimization viewpoint, where a subgraph (or solution) is defined by several user-defined preferences (or objectives), which are conflicting in nature. For example, mined subgraphs with high frequency are often of small size, and vice-versa. Use of such objectives in the multiobjective subgraph mining process generates Pareto-optimal subgraphs, where no subgraph is better than another subgraph in all objectives. We have applied a Pareto dominance approach for the evaluation and search subgraphs regarding to both proximity and diversity in multiobjective sense, which has incorporated in the framework of Subdue algorithm for subgraph mining. The method is called multiobjective subgraph mining by Subdue (MOSubdue) and has several advantages: (i) generation of Pareto-optimal subgraphs in a single run (ii) selection of subgraph-seeds from the candidate subgraphs based on all objectives (iii) search in the multiobjective subgraphs lattice space, and (iv) capability to deal with different multiobjective frequent subgraph mining tasks by customizing the tackled objectives. The good performance of MOSubdue is shown by performing multiobjective subgraph mining defined by two and three objectives on two real-life datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C, Wang, H (eds) (2010) Managing and mining graph data series. Springer, Berlin

    Google Scholar 

  2. Baños R, Gil C, Montoya MG, Ortega J (2004) A new Pareto-based algorithm for multi-objective graph partitioning. In: Aykanat C, Dayar T, Körpeoglu I (eds) Computer and Information Sciences—ISCIS 2004, vol 3280 of Lecture Notes in Computer Science. Springer, Berlin, pp 779–788

    Google Scholar 

  3. Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceeding of IEEE Int Conf Data Min (ICDM’02), pp 51–58

  4. Chankong V, Haimes YY (1983) Multiobjective decision making theory and methodology. North-Holland, Amsterdam

    MATH  Google Scholar 

  5. Coello CA, Lamont GB, Van Veldhuizen DA (2007) Evolutionary algorithms for solving multi-objective problems. Springer, Berlin

    MATH  Google Scholar 

  6. Cook DJ, Holder LB (2000) Graph-based data mining. IEEE Intell Syst 15: 32–41

    Article  Google Scholar 

  7. Cook, DJ, Holder, LB (2007) Mining graph data. Wiley, London

    MATH  Google Scholar 

  8. Cook DJ, Holder LB, Djoko S (1996) Scalable discovery of informative structural concepts using domain knowledge. IEEE Expert Intell Syst Appl 11: 59–68

    Google Scholar 

  9. Dearholt DW, Schvaneveldt RW (1990) Properties of Pathfinder networks. In: Schvaneveldt R (ed) Pathfinder associative networks: Studies in knowledge organization. Ablex, Norwood, pp 1–30

    Google Scholar 

  10. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6: 182–197

    Article  Google Scholar 

  11. De Moya-Anegón F, Vargas-Quesada B, Herrero-Solana V, Chinchilla-Rodríguez Z, Corera-Álvarez E, Munoz-Fernández FJ (2004) A new technique for building maps of large scientific domains based on the cocitation of classes and categories. Scientometrics 61(1): 129–145

    Article  Google Scholar 

  12. Falkowski T, Barth A, Spiliopoulou M (2007) Dengraph: a density-based community detection algorithm. In: IEEE/WIC/ACM Int Conf Web Intelligence. IEEE Computer Society, Los Alamitos, pp 112–115

    Google Scholar 

  13. Fischer I, Meinl T (2004) Graph based molecular data mining—an overview. In: Thissen W, Wieringa P, Pantic M, Ludema M (eds) IEEE Int Conf Syst Man Cy vol 76, pp 4578–4582

  14. Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In: Proceeding of 5th Int Conf Genetic Algorithms (ICGA93), pp 416–423

  15. Gal, T, Stewart, TJ, Hanne, T (eds) (1999) Multicriteria decision making: advances in MCDM models, algorithms, theory and applications. Kluwer, Dordrecht

    MATH  Google Scholar 

  16. Gonzalez JA, Holder LB, Cook DJ (2000) Structural knowledge discovery used to analyze earthquake activity. In: Proceeding 13th Ann Florida Art Intell Res Symp (FLAIRS), pp 86–90

  17. Holder LB, Cook DJ (2005) Graph-based data mining. In: Wang J (ed) Encyclopedia of data warehousing and mining Vol II. Information Science Reference, Hershey, pp 943–949

  18. Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs acrossmassive biological networks for functional discovery. Bioinformatics 21(1): i213–i221

    Article  Google Scholar 

  19. Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceeding of IEEE Int Conf Data Min (ICDM’03), pp 549–552

  20. Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceeding of 4th Euro Conf Prin Data Min Knowl Disc (PKDD’00), pp 13–23

  21. Ishibuchi H, Tsukamoto N, Nojima Y (2008) Evolutionary many-objective optimization: a short review. In: Proceeding of IEEE Congr Evol Comput, pp 2424–2431

  22. Jin, Y (ed) (2006) Multi-objective machine learning. Studies in Computational Intelligence, vol 16. Springer, Heidelberg

    Google Scholar 

  23. Jin Y, Sendhoff B (2008) Pareto-based multi-objective machine learning: an overview and case studies. IEEE Trans Syst Man Cybern C 38: 397–415

    Google Scholar 

  24. Kang U, Tsourakakis C, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Know Inf Syst 27: 303–325

    Article  Google Scholar 

  25. Kondor RI, Lafferty JD (2002) Diffusion kernels on graphs and other discrete input spaces. In: Proceeding of 19th Int Conf Machine Learning, (ICML’02), pp 315–322

  26. Kukluk J, Holder LB, Cook DJ (2007) Learning node replacement graph grammars in metabolic pathways. In: Proceeding of Int Conf Bioinformat & Comput Biol (BIOCOMP-07), pp 44–50

  27. Kuramochi M, Karypis G (2004) An efficient algorithm for discovering frequent subgraphs. IEEE Trans Knowl Data Eng 16: 1038–1051

    Article  Google Scholar 

  28. Long B, Zhang Z, Yu P (2010) A general framework for relation graph clustering. Knowl Inf Syst 24(3): 393–413

    Article  Google Scholar 

  29. Lowerre BT (1976) The HARPY speech recognition system. PhD thesis, Carnegie Mellon University, Pittsburgh

  30. Matsuda T, Horiuchi T, Motoda H, Washio T (2000) Extension of graph-based induction for general graph structured data. In: Terano T, Liu H, Chen ALP (eds) Proceeding of 4th Pacific-Asia Conf Know Dis Data Mining (PAKDD’00), volume 1805 of Lecture Notes in Computer Science, pp 420–431. Springer, Berlin

  31. Narasimhamurthy A, Greene D, Hurley N, Cunningham P (2010) Partitioning large networks without breaking communities. Know Inf Syst 25: 345–369

    Article  Google Scholar 

  32. Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceeding of 10th ACM SIGKDD Int Conf Knowl Disc & Data Min (KDD’04), pp 647–652

  33. Nijssen S, Kok JN (2006) Frequent subgraphs: runtimes don’t say everything. In: Proceeding of 4th Int Conf Mining Learn Graphs (MLG’06), pp 173–180

  34. Noble C, Cook D (2003) Graph-based anomaly detection. In: Proceeding of 9th ACM SIGKDD Int Conf Knowl Disc Data Mining, pp 631–636

  35. Papadopoulos AN, Lyritsis A, Manolopoulos Y (2008) SkyGraph: an algorithm for important subgraph discovery in relational graphs. Data Min Knowl Disc 17: 57–76

    Article  MathSciNet  Google Scholar 

  36. Peng W, Li T (2011) Temporal relation co-clustering on directional social network and author-topic evolution. Knowl Inf Syst 26(3): 467–486

    Article  Google Scholar 

  37. Purshouse RC, Fleming PJ (2007) On the evolutionary optimisation of many conflicting objectives. IEEE Trans Evol Comput 11(6): 770–784

    Article  Google Scholar 

  38. Qian T, Srivastava J, Peng Z, Sheu P (2009) Simultaneously finding fundamental articles and new topics using a community tracking method. In: Thanaruk T, Boonserm K, Nick C, Tu-Bao H (eds) Advances in knowledge discovery and data mining, vol 5476 of Lecture Notes in Computer Science. Springer, Berlin, pp 796–803

    Google Scholar 

  39. Quirin A, Cordón Ó, Guerrero-Bote VP, Vargas-Quesada B, De Moya-Anegón F (2008) A quick MST-based algorithm to obtain Pathfinder networks. J Am Soc Inf Sci Technol 59(12): 1912–1924

    Article  Google Scholar 

  40. Quirin A, Cordón Ó, Vargas-Quesada B, Moya-Anegon F (2010) Graph-based data mining: a new tool for the analysis and comparison of scientific domains represented as scientograms. J Informetr 4(3): 291–312

    Article  Google Scholar 

  41. Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceeding of 25th Int Conf Data Engg (ICDE’09), pp 844–855, IEEE

  42. Rissanen J (1989) Stochastic complexity in statistical inquiry theory. World Scientific Publishing Co Inc, River Edge

    Google Scholar 

  43. Romero-Zaliz RC, Rubio-Escudero C, Cobb JP, Herrera F, Zwir I (2008) A multiobjective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database. IEEE Trans Evol Comput 12(6): 679–701

    Article  Google Scholar 

  44. Shelokar P, Quirin A, Cordón Ó (2010) A multiobjective variant of the subdue graph mining algorithm based on the NSGA-II selection mechanism. In: Proceeding of IEEE Congr Evol Comput (CEC’10), pp 463–470

  45. Shrivastava N, Majumder A, Rastogi R (2008) Mining (social) network graphs to detect random link attacks. In: IEEE 24th Int Conf Data Eng (ICDE’08), pp 486–495

  46. Tsourakakis C (2011) Counting triangles in real-world networks using projections. Knowl Inf Syst 26(3): 501–520

    Article  Google Scholar 

  47. Vargas-Quesada B, De Moya-Anegón F (2007) Visualizing the structure of science. Springer, New York

    Google Scholar 

  48. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceeding of IEEE Int Conf Data Min (ICDM’02), pp 721–724

  49. Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceeding of 9th ACM SIGKDD Int Conf Knowl Disc & Data Min (KDD’03), pp 286–295

  50. Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Tech Decis 5: 597–604

    Article  Google Scholar 

  51. Zhu F, Yan X, Han J, Yu PS (2007) gPrune: a constraint pushing framework for graph pattern mining. In: Proceeding of PAKDD Conference, pp 388–400

  52. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3: 257–271

    Article  Google Scholar 

  53. Zitzler E, Thiele L, Deb K (2000) Comparison of multiobjective evolutionary algorithms: empirical results. IEEE Trans Evol Comput 8: 173–195

    Google Scholar 

  54. Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca VG (2003) Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput 7: 117–132

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Óscar Cordón.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shelokar, P., Quirin, A. & Cordón, Ó. MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining. Knowl Inf Syst 34, 75–108 (2013). https://doi.org/10.1007/s10115-011-0452-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0452-y

Keywords

Navigation