Abstract
In many recent applications, a graph is used to simulate many complex systems, such as social networks, traffic models or bioinformatics, and the underlying graphs for these systems are very large. Algorithms for mining all frequent subgraphs from a single large graph have attracted much attention and been studied in more detail lately. Mining frequent subgraphs is important, and defined as finding all subgraphs whose occurrences in a dataset are greater than or equal to a given frequency threshold. Among frequent subgraph algorithms, GraMi is considered as the state-of-the-art approach. However, GraMi has a huge search space, and therefore still needs a lot of time and memory to process a large graph. In this paper, we propose two effective strategies to balance and reduce the search space of GraMi, which can decrease the number of candidate subgraphs generated, with early pruning of a large portion of the domain for each candidate. Our experiments were performed on four real datasets and the results show that the performance of our balancing GraMi is better than those of the original algorithm GraMi and the optimized version SoGraMi.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. very large data bases, VLDB, vol 1215, pp 487-499
Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsl 2(2):14–20
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Vo B, Hong TP, Le B (2012) DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206
Deng ZH (2016) DiffNodesets: An efficient structure for fast mining frequent itemsets. Appl Soft Comput 41:214–223
Bui H, Vo B, Nguyen H, Nguyen-Hoang TA, Hong TP (2018) A weighted N-list-based method for mining frequent weighted itemsets. Expert Syst Appl 96:388–405
Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) negFIN: An efficient algorithm for fast mining frequent itemsets. Expert Syst Appl 105:129–143
Vo B, Pham S, Le T, Deng ZH (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178–186
Le T, Vo B (2015) An N-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42(19):6648–6657
Nguyen LT, Vu VV, Lam MT, Duong TT, Manh LT, Nguyen TT, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inf Sci 495:78–99
Vo B, Nguyen LV, Vu VV, Lam MT, Duong TT, Manh LT, Hong TP (2020) Mining correlated high utility itemsets in one phase. IEEE Access 8:90465–90477
Nouioua M, Fournier-Viger P, Wu CW, Lin JCW, Gan W (2021) FHUQI-Miner: Fast high utility quantitative itemset mining. Appl Intell: 1–25
Baek Y, Yun U, Kim H, Nam H, Kim H, Lin JCW, Pedrycz W (2021) RHUPS: Mining recent high utility patterns with sliding window–based arrival time control over data streams. ACM Trans Intell Syst Technol (TIST) 12(2):1–27
Gan W, Lin JCW, Zhang J, Fournier-Viger P, Chao HC, Philip SY (2020) Fast utility mining on sequence data. IEEE transactions on cybernetics 51(2):487–500
Tran T, Vo B, Le TTN, Nguyen NT (2017) Text clustering using frequent weighted utility itemsets. Cybern Syst 48(3):193–209
Gan W, Lin JCW, Chao HC, Fujita H, Philip SY (2019) Correlated utility-based pattern mining. Inf Sci 504:470–486
Jung JJ (2012) Constraint graph-based frequent pattern updating from temporal databases. Expert Syst Appl 39(3):3169–3173
Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: Frequent subgraph and pattern mining in a single large graph. Proc VLDB Endow 7(7):517-528
Nguyen LB, Vo B, Le NT, Snasel V, Zelinka I (2020) Fast and scalable algorithms for mining subgraphs in a single large graph. Eng Appl Artif Intell 90:103539
Abdelhamid E, Abdelaziz I, Kalnis P, Khayyat Z, Jamour F (2016) Scalemine: Scalable parallel frequent subgraph mining in a single large graph. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 716-727
Qiao F, Zhang X, Li P, Ding Z, Jia S, Wang H (2018) A parallel approach for frequent subgraph mining in a single large graph using spark. Appl Sci 8(2):230
Le NT, Vo B, Nguyen LB, Fujita H, Le B (2020) Mining weighted subgraphs in a single large graph. Inf Sci 514:149–165
Zeng J, Yang LT, Lin M, Ning H, Ma J (2020) A survey: Cyber-physical-social systems and their system-level design methodology. Future Gener Comput Syst 105:1028–1042
Ding RX, Wang X, Shang K, Herrera F (2019) Social network analysis-based conflict relationship investigation and conflict degree-based consensus reaching process for large scale decision making using sparse representation. Inf Fusion 50:251–272
Iqbal R, Doctor F, More B, Mahmud S, Yousuf U (2020) Big data analytics and computational intelligence for cyber–physical systems: recent trends and state of the art applications. Future Gener Comput Syst 105:766–778
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining. Proc IEEE, pp 721-724
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM (JACM) 23(1):31–42
Talukder N, Zaki MJ (2016) A distributed approach for graph mining in massive networks. Data Min Knowl Disc 30(5):1024–1052
Zhao X, Chen Y, Xiao C, Ishikawa Y, Tang J (2016) Frequent subgraph mining based on Pregel. Comput J 59(8):1113–1128
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Disc 11(3):243–271
Shahrivari S, Jalili S (2015) Distributed discovery of frequent subgraphs of a network using MapReduce. Computing 97(11):1101–1120
Li J, Liu Y, Pan J, Zhang P, Chen W, Wang L (2020) Map-balance-reduce: an improved parallel programming model for load balancing of MapReduce. Future Gener Comput Syst 105:993–1001
Bhuiyan MA, Al Hasan M (2014) An iterative MapReduce based frequent subgraph mining algorithm. IEEE Trans Knowl Data Eng 27(3):608–620
Aridhi S, d’Orazio L, Maddouri M, Mephu E (2014) A novel mapreduce-based approach for distributed frequent subgraph mining. Reconnaissance de Formes et Intelligence Artificielle (RFIA)
Dhiman A, Jain SK (2016) Optimizing frequent subgraph mining for single large graph. Procedia Comput Sci 89:378–385
Mrzic A, Meysman P, Bittremieux W, Moris P, Cule B, Goethals B, Laukens K (2018) Grasping frequent subgraph mining for bioinformatics applications. BioData Min 11(1):20
Nabti CE (2017) Subgraph Isomorphism Search in Massive Graph Data. Doctoral dissertation, University of de Lyon
Jia Y, Zhang J, Huan J (2011) An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst 28(2):423–447
Acosta-Mendoza N, Gago-Alonso A, Medina-Pagola JE (2012) Frequent approximate subgraphs as features for graph-based image classification. Knowl Based Syst 27:381–392
Acknowledgements
This work was supported by Institute for Computational Science and Technology (ICST) – Ho Chi Minh City and the Department of Science and Technology (DOST) – Ho Chi Minh City under grant no. 23/2021/HĐ-QKHCN.
We are especially thankful to Mohammed Elseidy, who provided the GraMi source code and two datasets, MiCo and CiteSeer.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nguyen, L.B.Q., Nguyen, L.T.T., Vo, B. et al. An efficient and scalable approach for mining subgraphs in a single large graph. Appl Intell 52, 17881–17895 (2022). https://doi.org/10.1007/s10489-022-03164-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03164-5