Abstract
Graph-based data mining approaches have been mainly proposed to the task popularly known as frequent subgraph mining subject to a single user preference, like frequency, size, etc. In this work, we propose to deal with the frequent subgraph mining problem from multiobjective optimization viewpoint, where a subgraph (or solution) is defined by several user-defined preferences (or objectives), which are conflicting in nature. For example, mined subgraphs with high frequency are often of small size, and vice-versa. Use of such objectives in the multiobjective subgraph mining process generates Pareto-optimal subgraphs, where no subgraph is better than another subgraph in all objectives. We have applied a Pareto dominance approach for the evaluation and search subgraphs regarding to both proximity and diversity in multiobjective sense, which has incorporated in the framework of Subdue algorithm for subgraph mining. The method is called multiobjective subgraph mining by Subdue (MOSubdue) and has several advantages: (i) generation of Pareto-optimal subgraphs in a single run (ii) selection of subgraph-seeds from the candidate subgraphs based on all objectives (iii) search in the multiobjective subgraphs lattice space, and (iv) capability to deal with different multiobjective frequent subgraph mining tasks by customizing the tackled objectives. The good performance of MOSubdue is shown by performing multiobjective subgraph mining defined by two and three objectives on two real-life datasets.
Similar content being viewed by others
References
Aggarwal, C, Wang, H (eds) (2010) Managing and mining graph data series. Springer, Berlin
Baños R, Gil C, Montoya MG, Ortega J (2004) A new Pareto-based algorithm for multi-objective graph partitioning. In: Aykanat C, Dayar T, Körpeoglu I (eds) Computer and Information Sciences—ISCIS 2004, vol 3280 of Lecture Notes in Computer Science. Springer, Berlin, pp 779–788
Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceeding of IEEE Int Conf Data Min (ICDM’02), pp 51–58
Chankong V, Haimes YY (1983) Multiobjective decision making theory and methodology. North-Holland, Amsterdam
Coello CA, Lamont GB, Van Veldhuizen DA (2007) Evolutionary algorithms for solving multi-objective problems. Springer, Berlin
Cook DJ, Holder LB (2000) Graph-based data mining. IEEE Intell Syst 15: 32–41
Cook, DJ, Holder, LB (2007) Mining graph data. Wiley, London
Cook DJ, Holder LB, Djoko S (1996) Scalable discovery of informative structural concepts using domain knowledge. IEEE Expert Intell Syst Appl 11: 59–68
Dearholt DW, Schvaneveldt RW (1990) Properties of Pathfinder networks. In: Schvaneveldt R (ed) Pathfinder associative networks: Studies in knowledge organization. Ablex, Norwood, pp 1–30
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6: 182–197
De Moya-Anegón F, Vargas-Quesada B, Herrero-Solana V, Chinchilla-Rodríguez Z, Corera-Álvarez E, Munoz-Fernández FJ (2004) A new technique for building maps of large scientific domains based on the cocitation of classes and categories. Scientometrics 61(1): 129–145
Falkowski T, Barth A, Spiliopoulou M (2007) Dengraph: a density-based community detection algorithm. In: IEEE/WIC/ACM Int Conf Web Intelligence. IEEE Computer Society, Los Alamitos, pp 112–115
Fischer I, Meinl T (2004) Graph based molecular data mining—an overview. In: Thissen W, Wieringa P, Pantic M, Ludema M (eds) IEEE Int Conf Syst Man Cy vol 76, pp 4578–4582
Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In: Proceeding of 5th Int Conf Genetic Algorithms (ICGA93), pp 416–423
Gal, T, Stewart, TJ, Hanne, T (eds) (1999) Multicriteria decision making: advances in MCDM models, algorithms, theory and applications. Kluwer, Dordrecht
Gonzalez JA, Holder LB, Cook DJ (2000) Structural knowledge discovery used to analyze earthquake activity. In: Proceeding 13th Ann Florida Art Intell Res Symp (FLAIRS), pp 86–90
Holder LB, Cook DJ (2005) Graph-based data mining. In: Wang J (ed) Encyclopedia of data warehousing and mining Vol II. Information Science Reference, Hershey, pp 943–949
Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs acrossmassive biological networks for functional discovery. Bioinformatics 21(1): i213–i221
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceeding of IEEE Int Conf Data Min (ICDM’03), pp 549–552
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceeding of 4th Euro Conf Prin Data Min Knowl Disc (PKDD’00), pp 13–23
Ishibuchi H, Tsukamoto N, Nojima Y (2008) Evolutionary many-objective optimization: a short review. In: Proceeding of IEEE Congr Evol Comput, pp 2424–2431
Jin, Y (ed) (2006) Multi-objective machine learning. Studies in Computational Intelligence, vol 16. Springer, Heidelberg
Jin Y, Sendhoff B (2008) Pareto-based multi-objective machine learning: an overview and case studies. IEEE Trans Syst Man Cybern C 38: 397–415
Kang U, Tsourakakis C, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Know Inf Syst 27: 303–325
Kondor RI, Lafferty JD (2002) Diffusion kernels on graphs and other discrete input spaces. In: Proceeding of 19th Int Conf Machine Learning, (ICML’02), pp 315–322
Kukluk J, Holder LB, Cook DJ (2007) Learning node replacement graph grammars in metabolic pathways. In: Proceeding of Int Conf Bioinformat & Comput Biol (BIOCOMP-07), pp 44–50
Kuramochi M, Karypis G (2004) An efficient algorithm for discovering frequent subgraphs. IEEE Trans Knowl Data Eng 16: 1038–1051
Long B, Zhang Z, Yu P (2010) A general framework for relation graph clustering. Knowl Inf Syst 24(3): 393–413
Lowerre BT (1976) The HARPY speech recognition system. PhD thesis, Carnegie Mellon University, Pittsburgh
Matsuda T, Horiuchi T, Motoda H, Washio T (2000) Extension of graph-based induction for general graph structured data. In: Terano T, Liu H, Chen ALP (eds) Proceeding of 4th Pacific-Asia Conf Know Dis Data Mining (PAKDD’00), volume 1805 of Lecture Notes in Computer Science, pp 420–431. Springer, Berlin
Narasimhamurthy A, Greene D, Hurley N, Cunningham P (2010) Partitioning large networks without breaking communities. Know Inf Syst 25: 345–369
Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceeding of 10th ACM SIGKDD Int Conf Knowl Disc & Data Min (KDD’04), pp 647–652
Nijssen S, Kok JN (2006) Frequent subgraphs: runtimes don’t say everything. In: Proceeding of 4th Int Conf Mining Learn Graphs (MLG’06), pp 173–180
Noble C, Cook D (2003) Graph-based anomaly detection. In: Proceeding of 9th ACM SIGKDD Int Conf Knowl Disc Data Mining, pp 631–636
Papadopoulos AN, Lyritsis A, Manolopoulos Y (2008) SkyGraph: an algorithm for important subgraph discovery in relational graphs. Data Min Knowl Disc 17: 57–76
Peng W, Li T (2011) Temporal relation co-clustering on directional social network and author-topic evolution. Knowl Inf Syst 26(3): 467–486
Purshouse RC, Fleming PJ (2007) On the evolutionary optimisation of many conflicting objectives. IEEE Trans Evol Comput 11(6): 770–784
Qian T, Srivastava J, Peng Z, Sheu P (2009) Simultaneously finding fundamental articles and new topics using a community tracking method. In: Thanaruk T, Boonserm K, Nick C, Tu-Bao H (eds) Advances in knowledge discovery and data mining, vol 5476 of Lecture Notes in Computer Science. Springer, Berlin, pp 796–803
Quirin A, Cordón Ó, Guerrero-Bote VP, Vargas-Quesada B, De Moya-Anegón F (2008) A quick MST-based algorithm to obtain Pathfinder networks. J Am Soc Inf Sci Technol 59(12): 1912–1924
Quirin A, Cordón Ó, Vargas-Quesada B, Moya-Anegon F (2010) Graph-based data mining: a new tool for the analysis and comparison of scientific domains represented as scientograms. J Informetr 4(3): 291–312
Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceeding of 25th Int Conf Data Engg (ICDE’09), pp 844–855, IEEE
Rissanen J (1989) Stochastic complexity in statistical inquiry theory. World Scientific Publishing Co Inc, River Edge
Romero-Zaliz RC, Rubio-Escudero C, Cobb JP, Herrera F, Zwir I (2008) A multiobjective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database. IEEE Trans Evol Comput 12(6): 679–701
Shelokar P, Quirin A, Cordón Ó (2010) A multiobjective variant of the subdue graph mining algorithm based on the NSGA-II selection mechanism. In: Proceeding of IEEE Congr Evol Comput (CEC’10), pp 463–470
Shrivastava N, Majumder A, Rastogi R (2008) Mining (social) network graphs to detect random link attacks. In: IEEE 24th Int Conf Data Eng (ICDE’08), pp 486–495
Tsourakakis C (2011) Counting triangles in real-world networks using projections. Knowl Inf Syst 26(3): 501–520
Vargas-Quesada B, De Moya-Anegón F (2007) Visualizing the structure of science. Springer, New York
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceeding of IEEE Int Conf Data Min (ICDM’02), pp 721–724
Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceeding of 9th ACM SIGKDD Int Conf Knowl Disc & Data Min (KDD’03), pp 286–295
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Tech Decis 5: 597–604
Zhu F, Yan X, Han J, Yu PS (2007) gPrune: a constraint pushing framework for graph pattern mining. In: Proceeding of PAKDD Conference, pp 388–400
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3: 257–271
Zitzler E, Thiele L, Deb K (2000) Comparison of multiobjective evolutionary algorithms: empirical results. IEEE Trans Evol Comput 8: 173–195
Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca VG (2003) Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput 7: 117–132
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shelokar, P., Quirin, A. & Cordón, Ó. MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining. Knowl Inf Syst 34, 75–108 (2013). https://doi.org/10.1007/s10115-011-0452-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0452-y