## Abstract

This paper presents a method for discovering approximate frequent itemsets of interest in large scale databases. This method uses the *central limit theorem* to increase efficiency, enabling us to reduce the sample size by about half compared to previous approximations. Further efficiency is gained by pruning from the search space uninteresting frequent itemsets. In addition to improving efficiency, this measure also reduces the number of itemsets that the user need consider. The model and algorithm have been implemented and evaluated using both synthetic and real-world databases. Our experimental results demonstrate the efficiency of the approach.

data mining sampling approximate frequent itemset

## Preview

Unable to display preview. Download preview PDF.

## References

- 1.C. Aggarawal and P. Yu, “A new framework for itemset generation,” in
*Proceedings of the ACM PODS*, 1998, pp. 18–24.Google Scholar - 2.R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” in
*Proceedings of the ACM SIGMOD Conference on Management of Data*, 1993, pp. 207–216.Google Scholar - 3.R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective,”
*IEEE Trans. Knowledge and Data Eng.*, vol. 5, no.6, pp. 914–925, 1993.Google Scholar - 4.S. Brin, R. Motwani, and C. Silverstein, “Beyond market baskets: Generalizing association rules to Correlations,” in
*Proceedings of the ACMSIGMOD International Conference on Management of Data*, 1997, pp. 265–276.Google Scholar - 5.C. Carter, H. Hamilton, and N. Cercone, “Share based measures for itemsets,” in
*Principles of Data Mining and Knowledge Discovery*, edited by J. Komorowski and J. Zytkow, pp. 14–24, 1997.Google Scholar - 6.J. Park, M. Chen, and P. Yu, “Using a Hash-based method with transaction trimming for mining association rules,”
*IEEE Trans. Knowledge and Data Eng.*, vol. 9, no.5, pp. 813–824, 1997.Google Scholar - 7.T. Shintani and M. Kitsuregawa, “Parallel mining algorithms for generalized association rules with classification hierarchy,” in
*Proceedings of the ACM SIGMOD International Conference on Management of Data*, 1998, pp. 25–36.Google Scholar - 8.R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” in
*Proceedings of the ACM SIGMOD International Conference on Management of Data*, 1996, pp. 1–12.Google Scholar - 9.R. Srikant and R. Agrawal, “Mining generalized association rules,”
*Future Generation Computer Systems*, vol. 13, pp. 161–180, 1997.Google Scholar - 10.D. Tsur, J. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal, “Query flocks: A generalization of association-rule mining,” in
*Proceedings of the ACM SIGMOD International Conference on Management of Data*, 1998, pp. 1–12.Google Scholar - 11.S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic item-set counting and implication rules for market basket data,” in
*Proceedings of the ACM SIGMOD International Conference on Management of Data*, 1997, pp. 255–264.Google Scholar - 12.H. Toivonen, “Sampling large databases for association rules,” in
*Proceedings of the 22nd VLDB Conference*, 1996, pp. 134–145.Google Scholar - 13.G. Webb, “Efficient search for association rules,” in
*Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, Boston, MA, 2000, pp. 99–107.Google Scholar - 14.R. Durrett,
*Probability: Theory and Examples*, Duxbury Press, 1996.Google Scholar - 15.T. Hagerup and C. Rub, “A guided tour of Chernoff bounds,”
*Information Processing Letters*, vol. 33, pp. 305–308, 1989.Google Scholar - 16.R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in
*Proceedings of the 20th VLDB Conference*, 1994, pp. 487–499.Google Scholar - 17.E. Omiecinski and A. Savasere, “Efficient mining of association rules in large dynamic databases,” in
*Proceedings of 16th British National Conference on Databases BNCOD 16*, Cardiff, Wales, UK, 1998, pp. 49–63.Google Scholar - 18.A. Savasere, E. Omiecinski, and S. Navathe, “An efficient algorithm for mining association rules in large databases,” in
*Proceedings of the 21st International Conference on Very Large Data Bases*, Zurich, Switzerland, 1995, pp. 688–692.Google Scholar - 19.G. Piatetsky-Shapiro, “Discovery, analysis, and presentation of strong rules,” in
*Knowledge Discovery in Databases*, edited by G. Piatetsky-Shapiro and W. Frawley, AAAI Press/MIT Press, pp. 229–248, 1991.Google Scholar - 20.D. Cheung, J. Han, V. Ng, and C. Wong, “Maintenance of discovered association rules in large databases: An incremental updating technique,” in
*Proceedings of IEEE*, 1996, pp. 106–114.Google Scholar - 21.R. Godin and R. Missaoui, “An incremental concept formation approach for learning from databases,”
*Theoretical Computer Science*, vol. 133, pp. 387–419, 1994.Google Scholar - 22.J. Han, Y. Cai, and N. Cercone, “Knowledge discovery in databases: An attribute-oriented approach,” in
*Proceedings of VLDB-92*, Canada, 1992, pp. 547–559.Google Scholar - 23.M. Houtsma and A. Swami, “Set-oriented data mining in relational databases,”
*Data & Knowledge Engineering*, vol. 17, pp. 245–262, 1995.Google Scholar - 24.R. Miller and Y. Yang, “Association rules over interval data,” in
*Proceedings of the ACM SIGMOD International Conference on Management of Data*, 1997, pp. 452–461.Google Scholar - 25.D. Rasmussen and R. Yager, “Induction of fuzzy characteristic rules,” in
*Principles of Data Mining and Knowledge Discovery*, edited by J. Komorowski and J. Zytkow, pp. 123–133. 1997.Google Scholar - 26.E. Han, G. Karypis, and V. Kumar, “Scalable parallel data mining for association rules,” in
*Proceedings of the ACM SIGMOD International Conference on Management of Data*, 1997, pp. 277–288.Google Scholar - 27.M. Chen, J. Han, and P. Yu, “Data mining: An overview from a database perspective,”
*IEEE Trans. Knowledge and Data Eng.*, vol. 8, no.6, pp. 866–881, 1996.Google Scholar - 28.U. Fayyad and P. Stolorz, “Data mining and KDD: Promise and challenges,”
*Future Generation Computer Systems*, vol. 13, pp. 99–115, 1997.Google Scholar - 29.J. Hosking, E. Pednault, and M. Sudan, “A statistical perspective on data mining,”
*Future Generation Computer Systems*, vol. 13,pp. 117–134, 1997.Google Scholar - 30.H. Liu and H. Motoda,
*Instance Selection and Construction for Data Mining*, Kluwer Academic Publishers: Dordrecht, 2001.Google Scholar - 31.N. Syed, H. Liu, and K. Sung, “From incremental learning to model independent instance selection—A support vector machine approach,” Technical Report, TRA9/99, School of Computing, National University of Singapore, Sept, 1999 (http://techrep.comp.nus.edu.sg/techreports/1999/TRA9-99.asp).Google Scholar

## Copyright information

© Kluwer Academic Publishers 2003