Approximate Frequent Itemset Mining In the Presence of Random Noise

  • Hong Cheng
  • Philip S. Yu
  • Jiawei Han

Frequent itemset mining has been a focused theme in data mining research and an important first step in the analysis of data arising in a broad range of applications. The traditional exact model for frequent itemset requires that every item occur in each supporting transaction. However, real application data is usually subject to random noise or measurement error, which poses new challenges for the efficient discovery of frequent itemset from the noisy data. Mining approximate frequent itemset in the presence of noise involves two key issues: the definition of a noise-tolerant mining model and the design of an efficient mining algorithm. In this chapter, we will give an overview of the approximate itemset mining algorithms in the presence of random noise and examine several noise-tolerant mining approaches.

Key words: error-tolerant itemset, approximate frequent itemset, core pattern recovery


Random Noise Frequent Pattern Mining Algorithm Frequent Itemset Error Threshold 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1964.Google Scholar
  2. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. SIGMOD’93, pages 207-216, May 1993.Google Scholar
  3. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. VLDB’94, pages 487-499, Sept. 1994.Google Scholar
  4. R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. of SIGMOD, pages 439-450, 2000.Google Scholar
  5. R. J. Bayardo. Efficiently mining long patterns from databases. In Proc. SIGMOD’98, pages 85-93, June 1998.Google Scholar
  6. J.F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by means of free-sets. In Principles of Data Mining and Knowledge Discovery, pages 75-85, 2000.Google Scholar
  7. D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. ICDE’01, pages 443-452, April 2001.Google Scholar
  8. H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for effective classification. In Proc. 2007 Int. Conf. Data Engineering (ICDE’07), Istanbul, Turkey, April 2007.Google Scholar
  9. H. Cheng, P. S. Yu, and J. Han AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery. In Proc. of ICDM, pages 839-844, 2006.Google Scholar
  10. G. Cong, K. Tan, A. Tung, and X. Xu. Mining top-k covering rule groups for gene expression data. In Proc. of SIGMOD, pages 670-681, 2005.Google Scholar
  11. FIMI: Frequent itemset mining implementations repository., 2003.
  12. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. SIGMOD’00, pages 1-12, May 2000.Google Scholar
  13. W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules.In Proc. of ICDM, pages 369-376, 2001.Google Scholar
  14. B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. of KDD, pages 80-86, 1998.Google Scholar
  15. J. Liu, S. Paulsen, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemset from noisy data. In Technical report, Department of Computer Science, TR05-015, 2005.Google Scholar
  16. J. Liu, S. Paulsen, X. Sun, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemsets in the presence of noise: Algorithm and analysis. In Proc. SDM’06, pages 405-416, April 2006.Google Scholar
  17. H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Knowledge Discovery and Data Mining, pages 189-194, 1996.Google Scholar
  18. J. Pei, G. Dong, W. Zou, and J. Han. Mining condensed frequent pattern bases. In Knowledge and Information Systems, volume 6 of 5, pages 570-594, 2004.Google Scholar
  19. J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. DMKD’00, pages 11-20, May 2000.Google Scholar
  20. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge, 2nd edition, 1992.Google Scholar
  21. J. Seppänen and H. Mannila. Dense itemsets. In Proc. of KDD, pages 683-688, 2004.Google Scholar
  22. M. Steinbach, P. Tan, and V. Kumar. Support envelopes: A technique for exploring the structure of association patterns. In Proc. KDD’04, pages 296-305, Aug. 2004.Google Scholar
  23. UCI: machine learning repository.˜mlearn/MLSummary.html, 2007.
  24. V. Verykios, E. Bertino, I. Fovino, L. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Record, 3:50-57, 2004.CrossRefGoogle Scholar
  25. K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In Proc. of CIKM, pages 483-490, 1999.Google Scholar
  26. X. Yan, M. R. Mehan, Y. Huang, M. S. Waterman, P. S. Yu, and X. J. Zhou. A graph-based approach to systematically reconstruct human transcriptional regulatory modules. In Proc. of ISMB, 2007.Google Scholar
  27. X. Yan, P. S. Yu, and J. Han. Graph Indexing: A frequent structure-based approach. In Proc. of SIGMOD, pages 335-346, 2004.Google Scholar
  28. C. Yang, U. Fayyad, and P. S. Bradley. Efficient discovery of error-tolerant frequent itemsets in high dimensions. In Proc. KDD’01, pages 194-203, Aug. 2001.Google Scholar
  29. M. J. Zaki. Scalable algorithms for association mining. IEEE Trans. Knowledge and Data Engineering, 12:372-390, 2000.CrossRefGoogle Scholar
  30. M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proc. SDM’02, pages 457-473, April 2002.Google Scholar
  31. F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng. Mining colossal frequent patterns by core pattern fusion. In Proc. 2007 Int. Conf. Data Engineering (ICDE’07), Istanbul, Turkey, April 2007.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Hong Cheng
    • 1
  • Philip S. Yu
    • 2
  • Jiawei Han
    • 1
  1. 1.University of Illinois at Urbana-ChampaignUSA
  2. 2.IBM T. J. Watson Research CenterUSA

Personalised recommendations