Advertisement

Knowledge and Information Systems

, Volume 15, Issue 2, pp 233–257 | Cite as

Computing the minimum-support for mining frequent patterns

  • Shichao ZhangEmail author
  • Xindong Wu
  • Chengqi Zhang
  • Jingli Lu
Regular Paper

Abstract

Frequent pattern mining is based on the assumption that users can specify the minimum-support for mining their databases. It has been recognized that setting the minimum-support is a difficult task to users. This can hinder the widespread applications of these algorithms. In this paper we propose a computational strategy for identifying frequent itemsets, consisting of polynomial approximation and fuzzy estimation. More specifically, our algorithms (polynomial approximation and fuzzy estimation) automatically generate actual minimum-supports (appropriate to a database to be mined) according to users’ mining requirements. We experimentally examine the algorithms using different datasets, and demonstrate that our fuzzy estimation algorithm fittingly approximates actual minimum-supports from the commonly-used requirements.

Keywords

Data mining Minimum support Frequent patterns Association rules 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarawal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of the ACM PODS, pp 18–24Google Scholar
  2. 2.
    Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD conference on management of data, pp 207–216Google Scholar
  3. 3.
    Agrawal R and Shafer J (1996). Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6): 962–969 CrossRefGoogle Scholar
  4. 4.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of international conference on very large data bases, pp 487–499Google Scholar
  5. 5.
    Bayardo B (1998) Efficiently mining long patterns from databases. In: Proceedings of ACM international conference on management of data, pp 85–93Google Scholar
  6. 6.
    Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 265–276Google Scholar
  7. 7.
    Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering, Heidelberg, pp 443–452Google Scholar
  8. 8.
    Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman JD and Yang C (2001). Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1): 64–78 CrossRefGoogle Scholar
  9. 9.
    Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, pp 43–52Google Scholar
  10. 10.
    El-Hajj M, Zaiane O (2003) Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington DC, pp 24–27Google Scholar
  11. 11.
    Han E, Karypis G and Kumar V (2000). Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3): 337–352 CrossRefGoogle Scholar
  12. 12.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12Google Scholar
  13. 13.
    Han J, Pei J, Yin Y and Mao R (2004). Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8(1): 53–87 CrossRefMathSciNetGoogle Scholar
  14. 14.
    Han J, Wang J, Lu Y, Tzvetkov P (2002) Mining Top-K frequent closed patterns without minimum support. In: Proceedings of the 2002 IEEE international conference on data mining, pp 211–218Google Scholar
  15. 15.
    Hipp J, Guntzer U (2002) Is pushing constraints deeply into the mining algorithms really what we want? SIGKDD Explor 4(1):50–55CrossRefGoogle Scholar
  16. 16.
    Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE international conference on data mining, San Jose, California, pp 369–376Google Scholar
  17. 17.
    Lin D, Kedem Z (1998) Pincer-search: a new algorithm for discovering the maximum frequent set. In: Proceedings of the 6th international conference on extending database technology (EDBT’98), Valencia, pp 105–119Google Scholar
  18. 18.
    Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 4th international conference on knowledge discovery and data mining, New York, pp 80–86Google Scholar
  19. 19.
    Liu H and Motoda H (2001). Instance selection and construction for data mining. Kluwer, Dordrecht Google Scholar
  20. 20.
    Omiecinski ER (2003). Alternative interest measures for mining associations in databases. IEEE TKDE 15(1): 57–69 MathSciNetGoogle Scholar
  21. 21.
    Park J, Chen M, Yu P (1995) An effective hash based algorithm for mining association rules. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 175–186Google Scholar
  22. 22.
    Pei J, Han J, Lakshmanan L (2001) Mining frequent itemsets with convertible constraints. In: Proceedings of 17th international conference on data engineering, Heidelberg, pp 433–442Google Scholar
  23. 23.
    Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-Mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE international conference on data mining (ICDM’01), San Jose pp 441–448Google Scholar
  24. 24.
    Piatetsky-Shapiro G and Steingold S (2000). Measuring lift quality in database marketing. SIGKDD Explor 2(2): 76–80 CrossRefGoogle Scholar
  25. 25.
    Roddick JF and Rice S (2001). What’s interesting about cricket?—on thresholds and anticipation in discovered rules. SIGKDD Explor 3: 1–5 CrossRefGoogle Scholar
  26. 26.
    Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of international conference on very large data bases, pp 688–692Google Scholar
  27. 27.
    Silberschatz A and Tuzhilin A (1996). What makes patterns interesting in knowledge discovery systems. IEEE Trans Knowl Data Eng 8(6): 970–974 CrossRefGoogle Scholar
  28. 28.
    Silverstein C, Brin S, Motwani R, Ullman J (1998) Scalable techniques for mining causal structures. In: Proceedings of ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 51–57Google Scholar
  29. 29.
    Srikant R and Agrawal R (1997). Mining generalized association rules. Future Gener Comput Syst 13: 161–180 CrossRefGoogle Scholar
  30. 30.
    Steinbach M, Tan P, Xiong H, Kumar V (2004) Generalizing the notion of support. KDD04 689–694Google Scholar
  31. 31.
    Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th international conference on knowledge discovery and data mining, Edmonton, pp 32–41Google Scholar
  32. 32.
    Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering, Boston, pp 79–90Google Scholar
  33. 33.
    Wang K, He Y, Cheung D, Chin F (2001) Mining confident rules without support requirement. In: Proceedings of the 10th ACM international conference on information and knowledge management (CIKM 2001), AtlantaGoogle Scholar
  34. 34.
    Wang K, He Y and Han J (2003). Pushing support constraints into association rules mining. IEEE Trans Knowl Data Eng 15(3): 642–658 CrossRefMathSciNetGoogle Scholar
  35. 35.
    Webb G (2000) Efficient search for association rules. In: Proceedings of international conference on knowledge discovery and data mining pp 99–107Google Scholar
  36. 36.
    Wu X, Zhang C and Zhang S (2004). Efficient mining of both positive and negative association rules. ACM Trans Inf Syst 22(3): 381–405 CrossRefGoogle Scholar
  37. 37.
    Xu Y, Yu J, Liu G, Lu H (2002) From path tree to frequent patterns: a framework for mining frequent patterns. In: Proceedings of 2002 IEEE international conference on data mining (ICDM’02), Maebashi City, Japan, pp 514–521Google Scholar
  38. 38.
    Zaki M, Ogihara M (1998) Theoretical foundations of association rules. In: Proceedings of the 3rd ACM SIGMOD’98 workshop on research issues in data mining and knowledge discovery, Seattle, pp 85–93Google Scholar
  39. 39.
    Zaki M, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the 3rd international conference on knowledge discovery in databases (KDD’97), Newport Beach, pp 283–286Google Scholar
  40. 40.
    Zhang C, Zhang S (2002) Association rules mining: models and algorithms. Publishers in Lecture Notes on Computer Science, vol 2307, Springer Berlin, p. 243Google Scholar
  41. 41.
    Zhang C, Zhang S and Webb G (2003). Identifying approximate itemsets of interest in large databases. Appl Intell 18: 91–104 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • Shichao Zhang
    • 1
    Email author
  • Xindong Wu
    • 2
  • Chengqi Zhang
    • 3
  • Jingli Lu
    • 4
  1. 1.Faculty of Computer Science and Information TechnologyGuangxi Normal UniversityGuilinPeople’s Republic of China
  2. 2.Department of Computer ScienceUniversity of VermontBurlingtonUSA
  3. 3.Faculty of Information TechnologyUniversity of Technology, SydneyBroadwayAustralia
  4. 4.Institute of Information Sciences and TechnologyMassey UniversityPalmerston NorthNew Zealand

Personalised recommendations