Advertisement

HashEclat: an efficient frequent itemset algorithm

  • Chunkai Zhang
  • Panbo Tian
  • Xudong Zhang
  • Qing Liao
  • Zoe L. Jiang
  • Xuan Wang
Original Article
  • 11 Downloads

Abstract

The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than ‘exact’ result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.

Keywords

Frequent itemset MinHash Approximate algorithm Eclat 

Notes

Acknowledgements

This study was supported by the Foundation Item: Shenzhen Research Council (no. JSGG20170822160842949, JCYJ20170307151518535).

References

  1. 1.
    Han J, Kamber M (2006) Data mining: concepts and techniques. Data Min Concepts Models Methods Algorithms Second Ed 5(4): 1–18zbMATHGoogle Scholar
  2. 2.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference very large data bases, VLDB vol, pp 487–499Google Scholar
  3. 3.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM sigmod record, pp 1–12Google Scholar
  4. 4.
    Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390CrossRefGoogle Scholar
  5. 5.
    Heaton J (2016) Comparing dataset characteristics that favor the Apriori, Eclat or FP-growth frequent itemset mining algorithms. In: Southeast con, pp 1–7Google Scholar
  6. 6.
    Preiss PM, Ma R, Tai ME, Lukin A, Rispoli M, Zupancic P, Greiner M (2015) Strongly correlated quantum walks in optical lattices. Science 347(6227):1229–1233MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–335Google Scholar
  8. 8.
    Ma Z, Yang J, Zhang T, Liu F (2016) An improved eclat algorithm for mining association rules based on increased search strategy. Int J Database Theory Appl 9(5):251–266CrossRefGoogle Scholar
  9. 9.
    Xiong ZY, Chen PE, Zhang YF (2010) Improvement of eclat algorithm for association rules based on hash boolean matrix. Appl Res Comput 27(4):1323–1325Google Scholar
  10. 10.
    Cohen H, Porat E (2010) Fast set intersection and two-patterns matching. Theor Comput Sci 411(40–42):3795–3800MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Wang X, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715MathSciNetCrossRefGoogle Scholar
  12. 12.
    Wang R, Wang X, Kwong S et al (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEEE Trans Fuzzy Syst 25(6):1460–1475CrossRefGoogle Scholar
  13. 13.
    Cohen E, Kaplan H (2013) What you can do with coordinated samples. Lect Notes Comput Sci 8096:452–467MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Cohen E, Kaplan H, Sen S (2009) Coordinated weighted sampling for estimating aggregates over multiple weight assignments. Proc VLDB Endow 2(1):646–657CrossRefGoogle Scholar
  15. 15.
    Teschner M, Heidelberger B, Müller M, Pomerantes D, Gross MH (2003) Optimized spatial hashing for collision detection of deformable objects. In: Vmv, pp 47–54Google Scholar
  16. 16.
    Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (2000) Min-wise independent permutations. J Comput Syst Sci 60(3):630–659MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R et al (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78CrossRefGoogle Scholar
  18. 18.
    Pagh R, Stöckel M, Woodruff DP (2014) Is min-wise hashing optimal for summarizing set intersection?. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 109–120Google Scholar
  19. 19.
    Goethals B (2002) Survey on frequent pattern mining. Univ Helsinki 63(14):47–52Google Scholar
  20. 20.
    Aggarwal CC, Han J (eds) (2014) Frequent pattern mining. Springer, Berlin, pp 19–23zbMATHGoogle Scholar
  21. 21.
    Wang X, Zhang T, Wang R (2017) Non-iterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. In: IEEE transactions on systems, man, and cybernetics: systems, IEEE early access articles, p 99Google Scholar
  22. 22.
    Xun Y, Zhang J, Qin X, Zhao X (2017) Fidoop-dp: data partitioning in frequent itemset mining on hadoop clusters. In: IEEE transactions on parallel and distributed systems, vol 99, pp 77–84Google Scholar
  23. 23.
    Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations (extended abstract). In: Stoc’98 Proceedings of the Thirtieth annual acm symposium on theory of computing, vol 60, no 3, pp 327–336Google Scholar
  24. 24.
    Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of compression and complexity of sequences, pp 21–29Google Scholar
  25. 25.
    Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge, pp 7–15CrossRefGoogle Scholar
  26. 26.
    Wang X, Xing HJ, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRefGoogle Scholar
  27. 27.
    Szmit R (2013) Locality sensitive hashing for similarity search using MapReduce on large scale data. In: Language processing and intelligent information systems, pp 171–178Google Scholar
  28. 28.
    Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In: BMVC, pp 812–815Google Scholar
  29. 29.
    Wang H, Cao J, Shu L, Rafiei D (2013) Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis. In Proceedings of the 22nd ACM international conference on information and knowledge management, pp 1969–1978Google Scholar
  30. 30.
    Li P, Owen A, Zhang CH (2012) One permutation hashing for efficient search and learning. arXiv preprint arXiv, pp 1208–1259Google Scholar
  31. 31.
    Li P, Shrivastava A, Moore J, König AC (2011) B-bit minwise hashing for large-scale learning. Comput Sci 54(8):101–109Google Scholar
  32. 32.
    Wang X, Wang R, Feng H-M, Wang H (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620–635MathSciNetCrossRefGoogle Scholar
  33. 33.
    Li P, Gui W (2010) b-Bit minwise hashing for estimating three-way similarities. In: International conference on neural information processing systems, pp 1387–1395Google Scholar
  34. 34.
    Frequent Itemset Mining Dataset Repository, Available at: http://fimi.ua.ac.be/data. Accessed 18 Dec 2018
  35. 35.
    Huang H, Xu H, Wang X, Silamu W (2015) Maximum f1-score discriminative training criterion for automatic mispronunciation detection. IEEE/ACM Trans Audio Speech Lang Process 23(4):787–797CrossRefGoogle Scholar
  36. 36.
    Wang X, He Y-L, Dabby D (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Chunkai Zhang
    • 1
  • Panbo Tian
    • 1
  • Xudong Zhang
    • 1
  • Qing Liao
    • 1
  • Zoe L. Jiang
    • 1
  • Xuan Wang
    • 1
  1. 1.Department of Computer Science and TechnologyHarbin Institute of Technology (Shenzhen)ShenzhenChina

Personalised recommendations