Performance Evaluation of Approximate Pattern Mining Based on Probabilistic and Statistical Techniques
Approximate frequent pattern mining is to find approximate patterns, not exact frequent patterns with tolerable variations for more efficiency. As the size of database increases, much faster mining techniques are needed to deal with huge databases. Moreover, it is more difficult to discover exact results of mining patterns due to inherent noise or data diversity. In these cases, by mining approximate frequent patterns, more efficient mining can be performed in terms of runtime, memory usage and scalability. In this paper, we benchmark efficient algorithms of mining approximate frequent patterns based on statistical and probabilistic methods. We study the characteristics of approximate mining algorithms, and perform performance evaluations of the state of the art approximate mining algorithms. Finally, we analyze the test results for more improvement.
KeywordsApproximate frequent pattern mining Lossy Counting technique Chernoff technique Probabilistic technique Statistical technique Performance evaluation
This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 2012-0003740 and 2012-0000478).
- 1.Chen C, Yan X, Zhu F, Han J (2007) gApprox: mining frequent approximate patterns from a massive network. ICDM, pp 445–450Google Scholar
- 2.Chi R, Wai A (2006) Mining top-K frequent itemsets from data streams. Data Min Knowl Discov 13(2):197–217Google Scholar
- 4.Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov (DMKD) l.15(1):55–86Google Scholar
- 5.Manku G, Motwani R (2002) Approximate frequency counts over data streams. VLDBGoogle Scholar
- 7.Wong P, Chan T, Wong MH, Leung K (2012) Predicting approximate protein-DNA binding cores using association rule mining, ICDE pp 965–976Google Scholar
- 9.Zhao Y, Zhang C, Zhang S (2006) Efficient frequent itemsets mining by sampling, advances in intelligent IT. Active Media Technology, pp 112–117Google Scholar
- 10.Zhu F, Yan X, Han J, Yu PS (2007) Efficient discovery of frequent approximate sequential patterns. In: International conference on data mining (ICDM), pp 751–756Google Scholar