Knowledge and Information Systems

, Volume 34, Issue 1, pp 75–108 | Cite as

MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining

Regular Paper

Abstract

Graph-based data mining approaches have been mainly proposed to the task popularly known as frequent subgraph mining subject to a single user preference, like frequency, size, etc. In this work, we propose to deal with the frequent subgraph mining problem from multiobjective optimization viewpoint, where a subgraph (or solution) is defined by several user-defined preferences (or objectives), which are conflicting in nature. For example, mined subgraphs with high frequency are often of small size, and vice-versa. Use of such objectives in the multiobjective subgraph mining process generates Pareto-optimal subgraphs, where no subgraph is better than another subgraph in all objectives. We have applied a Pareto dominance approach for the evaluation and search subgraphs regarding to both proximity and diversity in multiobjective sense, which has incorporated in the framework of Subdue algorithm for subgraph mining. The method is called multiobjective subgraph mining by Subdue (MOSubdue) and has several advantages: (i) generation of Pareto-optimal subgraphs in a single run (ii) selection of subgraph-seeds from the candidate subgraphs based on all objectives (iii) search in the multiobjective subgraphs lattice space, and (iv) capability to deal with different multiobjective frequent subgraph mining tasks by customizing the tackled objectives. The good performance of MOSubdue is shown by performing multiobjective subgraph mining defined by two and three objectives on two real-life datasets.

Keywords

Graph-based data mining Frequent subgraph mining Subdue Gaston Multiobjective graph-based data mining Pareto-based multiobjective optimization Evolutionary multiobjective optimization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C, Wang, H (eds) (2010) Managing and mining graph data series. Springer, BerlinGoogle Scholar
  2. 2.
    Baños R, Gil C, Montoya MG, Ortega J (2004) A new Pareto-based algorithm for multi-objective graph partitioning. In: Aykanat C, Dayar T, Körpeoglu I (eds) Computer and Information Sciences—ISCIS 2004, vol 3280 of Lecture Notes in Computer Science. Springer, Berlin, pp 779–788Google Scholar
  3. 3.
    Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceeding of IEEE Int Conf Data Min (ICDM’02), pp 51–58Google Scholar
  4. 4.
    Chankong V, Haimes YY (1983) Multiobjective decision making theory and methodology. North-Holland, AmsterdamMATHGoogle Scholar
  5. 5.
    Coello CA, Lamont GB, Van Veldhuizen DA (2007) Evolutionary algorithms for solving multi-objective problems. Springer, BerlinMATHGoogle Scholar
  6. 6.
    Cook DJ, Holder LB (2000) Graph-based data mining. IEEE Intell Syst 15: 32–41CrossRefGoogle Scholar
  7. 7.
    Cook, DJ, Holder, LB (2007) Mining graph data. Wiley, LondonMATHGoogle Scholar
  8. 8.
    Cook DJ, Holder LB, Djoko S (1996) Scalable discovery of informative structural concepts using domain knowledge. IEEE Expert Intell Syst Appl 11: 59–68Google Scholar
  9. 9.
    Dearholt DW, Schvaneveldt RW (1990) Properties of Pathfinder networks. In: Schvaneveldt R (ed) Pathfinder associative networks: Studies in knowledge organization. Ablex, Norwood, pp 1–30Google Scholar
  10. 10.
    Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6: 182–197CrossRefGoogle Scholar
  11. 11.
    De Moya-Anegón F, Vargas-Quesada B, Herrero-Solana V, Chinchilla-Rodríguez Z, Corera-Álvarez E, Munoz-Fernández FJ (2004) A new technique for building maps of large scientific domains based on the cocitation of classes and categories. Scientometrics 61(1): 129–145CrossRefGoogle Scholar
  12. 12.
    Falkowski T, Barth A, Spiliopoulou M (2007) Dengraph: a density-based community detection algorithm. In: IEEE/WIC/ACM Int Conf Web Intelligence. IEEE Computer Society, Los Alamitos, pp 112–115Google Scholar
  13. 13.
    Fischer I, Meinl T (2004) Graph based molecular data mining—an overview. In: Thissen W, Wieringa P, Pantic M, Ludema M (eds) IEEE Int Conf Syst Man Cy vol 76, pp 4578–4582Google Scholar
  14. 14.
    Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In: Proceeding of 5th Int Conf Genetic Algorithms (ICGA93), pp 416–423Google Scholar
  15. 15.
    Gal, T, Stewart, TJ, Hanne, T (eds) (1999) Multicriteria decision making: advances in MCDM models, algorithms, theory and applications. Kluwer, DordrechtMATHGoogle Scholar
  16. 16.
    Gonzalez JA, Holder LB, Cook DJ (2000) Structural knowledge discovery used to analyze earthquake activity. In: Proceeding 13th Ann Florida Art Intell Res Symp (FLAIRS), pp 86–90Google Scholar
  17. 17.
    Holder LB, Cook DJ (2005) Graph-based data mining. In: Wang J (ed) Encyclopedia of data warehousing and mining Vol II. Information Science Reference, Hershey, pp 943–949Google Scholar
  18. 18.
    Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs acrossmassive biological networks for functional discovery. Bioinformatics 21(1): i213–i221CrossRefGoogle Scholar
  19. 19.
    Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceeding of IEEE Int Conf Data Min (ICDM’03), pp 549–552Google Scholar
  20. 20.
    Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceeding of 4th Euro Conf Prin Data Min Knowl Disc (PKDD’00), pp 13–23Google Scholar
  21. 21.
    Ishibuchi H, Tsukamoto N, Nojima Y (2008) Evolutionary many-objective optimization: a short review. In: Proceeding of IEEE Congr Evol Comput, pp 2424–2431Google Scholar
  22. 22.
    Jin, Y (ed) (2006) Multi-objective machine learning. Studies in Computational Intelligence, vol 16. Springer, HeidelbergGoogle Scholar
  23. 23.
    Jin Y, Sendhoff B (2008) Pareto-based multi-objective machine learning: an overview and case studies. IEEE Trans Syst Man Cybern C 38: 397–415Google Scholar
  24. 24.
    Kang U, Tsourakakis C, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Know Inf Syst 27: 303–325CrossRefGoogle Scholar
  25. 25.
    Kondor RI, Lafferty JD (2002) Diffusion kernels on graphs and other discrete input spaces. In: Proceeding of 19th Int Conf Machine Learning, (ICML’02), pp 315–322Google Scholar
  26. 26.
    Kukluk J, Holder LB, Cook DJ (2007) Learning node replacement graph grammars in metabolic pathways. In: Proceeding of Int Conf Bioinformat & Comput Biol (BIOCOMP-07), pp 44–50Google Scholar
  27. 27.
    Kuramochi M, Karypis G (2004) An efficient algorithm for discovering frequent subgraphs. IEEE Trans Knowl Data Eng 16: 1038–1051CrossRefGoogle Scholar
  28. 28.
    Long B, Zhang Z, Yu P (2010) A general framework for relation graph clustering. Knowl Inf Syst 24(3): 393–413CrossRefGoogle Scholar
  29. 29.
    Lowerre BT (1976) The HARPY speech recognition system. PhD thesis, Carnegie Mellon University, PittsburghGoogle Scholar
  30. 30.
    Matsuda T, Horiuchi T, Motoda H, Washio T (2000) Extension of graph-based induction for general graph structured data. In: Terano T, Liu H, Chen ALP (eds) Proceeding of 4th Pacific-Asia Conf Know Dis Data Mining (PAKDD’00), volume 1805 of Lecture Notes in Computer Science, pp 420–431. Springer, BerlinGoogle Scholar
  31. 31.
    Narasimhamurthy A, Greene D, Hurley N, Cunningham P (2010) Partitioning large networks without breaking communities. Know Inf Syst 25: 345–369CrossRefGoogle Scholar
  32. 32.
    Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceeding of 10th ACM SIGKDD Int Conf Knowl Disc & Data Min (KDD’04), pp 647–652Google Scholar
  33. 33.
    Nijssen S, Kok JN (2006) Frequent subgraphs: runtimes don’t say everything. In: Proceeding of 4th Int Conf Mining Learn Graphs (MLG’06), pp 173–180Google Scholar
  34. 34.
    Noble C, Cook D (2003) Graph-based anomaly detection. In: Proceeding of 9th ACM SIGKDD Int Conf Knowl Disc Data Mining, pp 631–636Google Scholar
  35. 35.
    Papadopoulos AN, Lyritsis A, Manolopoulos Y (2008) SkyGraph: an algorithm for important subgraph discovery in relational graphs. Data Min Knowl Disc 17: 57–76MathSciNetCrossRefGoogle Scholar
  36. 36.
    Peng W, Li T (2011) Temporal relation co-clustering on directional social network and author-topic evolution. Knowl Inf Syst 26(3): 467–486CrossRefGoogle Scholar
  37. 37.
    Purshouse RC, Fleming PJ (2007) On the evolutionary optimisation of many conflicting objectives. IEEE Trans Evol Comput 11(6): 770–784CrossRefGoogle Scholar
  38. 38.
    Qian T, Srivastava J, Peng Z, Sheu P (2009) Simultaneously finding fundamental articles and new topics using a community tracking method. In: Thanaruk T, Boonserm K, Nick C, Tu-Bao H (eds) Advances in knowledge discovery and data mining, vol 5476 of Lecture Notes in Computer Science. Springer, Berlin, pp 796–803Google Scholar
  39. 39.
    Quirin A, Cordón Ó, Guerrero-Bote VP, Vargas-Quesada B, De Moya-Anegón F (2008) A quick MST-based algorithm to obtain Pathfinder networks. J Am Soc Inf Sci Technol 59(12): 1912–1924CrossRefGoogle Scholar
  40. 40.
    Quirin A, Cordón Ó, Vargas-Quesada B, Moya-Anegon F (2010) Graph-based data mining: a new tool for the analysis and comparison of scientific domains represented as scientograms. J Informetr 4(3): 291–312CrossRefGoogle Scholar
  41. 41.
    Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceeding of 25th Int Conf Data Engg (ICDE’09), pp 844–855, IEEEGoogle Scholar
  42. 42.
    Rissanen J (1989) Stochastic complexity in statistical inquiry theory. World Scientific Publishing Co Inc, River EdgeGoogle Scholar
  43. 43.
    Romero-Zaliz RC, Rubio-Escudero C, Cobb JP, Herrera F, Zwir I (2008) A multiobjective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database. IEEE Trans Evol Comput 12(6): 679–701CrossRefGoogle Scholar
  44. 44.
    Shelokar P, Quirin A, Cordón Ó (2010) A multiobjective variant of the subdue graph mining algorithm based on the NSGA-II selection mechanism. In: Proceeding of IEEE Congr Evol Comput (CEC’10), pp 463–470Google Scholar
  45. 45.
    Shrivastava N, Majumder A, Rastogi R (2008) Mining (social) network graphs to detect random link attacks. In: IEEE 24th Int Conf Data Eng (ICDE’08), pp 486–495Google Scholar
  46. 46.
    Tsourakakis C (2011) Counting triangles in real-world networks using projections. Knowl Inf Syst 26(3): 501–520CrossRefGoogle Scholar
  47. 47.
    Vargas-Quesada B, De Moya-Anegón F (2007) Visualizing the structure of science. Springer, New YorkGoogle Scholar
  48. 48.
    Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceeding of IEEE Int Conf Data Min (ICDM’02), pp 721–724Google Scholar
  49. 49.
    Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceeding of 9th ACM SIGKDD Int Conf Knowl Disc & Data Min (KDD’03), pp 286–295Google Scholar
  50. 50.
    Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Tech Decis 5: 597–604CrossRefGoogle Scholar
  51. 51.
    Zhu F, Yan X, Han J, Yu PS (2007) gPrune: a constraint pushing framework for graph pattern mining. In: Proceeding of PAKDD Conference, pp 388–400Google Scholar
  52. 52.
    Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3: 257–271CrossRefGoogle Scholar
  53. 53.
    Zitzler E, Thiele L, Deb K (2000) Comparison of multiobjective evolutionary algorithms: empirical results. IEEE Trans Evol Comput 8: 173–195Google Scholar
  54. 54.
    Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca VG (2003) Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput 7: 117–132CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Prakash Shelokar
    • 1
  • Arnaud Quirin
    • 1
  • Óscar Cordón
    • 1
    • 2
  1. 1.European Centre for Soft ComputingMieresSpain
  2. 2.Department of Computer Science and Artificial Intelligence (DECSAI) and the Research Centre on Information and Communication Technologies (CITIC-UGR)University of GranadaGranadaSpain

Personalised recommendations