Performance Evaluation of Approximate Pattern Mining Based on Probabilistic and Statistical Techniques

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 215)

Abstract

Approximate frequent pattern mining is to find approximate patterns, not exact frequent patterns with tolerable variations for more efficiency. As the size of database increases, much faster mining techniques are needed to deal with huge databases. Moreover, it is more difficult to discover exact results of mining patterns due to inherent noise or data diversity. In these cases, by mining approximate frequent patterns, more efficient mining can be performed in terms of runtime, memory usage and scalability. In this paper, we benchmark efficient algorithms of mining approximate frequent patterns based on statistical and probabilistic methods. We study the characteristics of approximate mining algorithms, and perform performance evaluations of the state of the art approximate mining algorithms. Finally, we analyze the test results for more improvement.

Keywords

Approximate frequent pattern mining Lossy Counting technique Chernoff technique Probabilistic technique Statistical technique Performance evaluation 

Notes

Acknowledgments

This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 2012-0003740 and 2012-0000478).

References

  1. 1.
    Chen C, Yan X, Zhu F, Han J (2007) gApprox: mining frequent approximate patterns from a massive network. ICDM, pp 445–450Google Scholar
  2. 2.
    Chi R, Wai A (2006) Mining top-K frequent itemsets from data streams. Data Min Knowl Discov 13(2):197–217Google Scholar
  3. 3.
    Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent pattern tree approach. Data Min Knowl Disc 8:53–87MathSciNetCrossRefGoogle Scholar
  4. 4.
    Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov (DMKD) l.15(1):55–86Google Scholar
  5. 5.
    Manku G, Motwani R (2002) Approximate frequency counts over data streams. VLDBGoogle Scholar
  6. 6.
    Wong RC, Fu AW (2006) Mining top-K frequent itemsets from data streams. Data Min Knowl Discov 13:193–217MathSciNetCrossRefGoogle Scholar
  7. 7.
    Wong P, Chan T, Wong MH, Leung K (2012) Predicting approximate protein-DNA binding cores using association rule mining, ICDE pp 965–976Google Scholar
  8. 8.
    Yun U, Ryu K (2011) Approximate weight frequent pattern mining with/without noisy environments. Knowl Based Syst 24(1):73–82CrossRefGoogle Scholar
  9. 9.
    Zhao Y, Zhang C, Zhang S (2006) Efficient frequent itemsets mining by sampling, advances in intelligent IT. Active Media Technology, pp 112–117Google Scholar
  10. 10.
    Zhu F, Yan X, Han J, Yu PS (2007) Efficient discovery of frequent approximate sequential patterns. In: International conference on data mining (ICDM), pp 751–756Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Department of Computer EngineeringChungbuk National UniversityHeungdeok-guRepublic of Korea

Personalised recommendations