Skip to main content

Cardinality Statistics Based Maximal Frequent Itemsets Mining

  • Conference paper
Book cover Information Systems, Technology and Management (ICISTM 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 285))

Abstract

Extracting frequent itemsets is an important task in many data mining applications. Since the result set of all the frequent itemsets are likely to be undesirably large, condensed representations, such as maximal and closed frequent itemsets are used. The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitudes smaller than the set of all frequent itemsets. But whenever there are very long patterns present in the data, it is often impractical to generate the entire set of closed frequent itemsets. The only recourse is to mine the maximal frequent itemsets in the domain with very long patterns. In this paper, we propose a new approach for mining all maximal frequent itemsets which introduces and makes use of the compact data structure: Reduced Transaction Pattern List (RTPL), for representing the database. Our implementation exploits the advantages of combining RTPL representation with statistical information of cardinality of each item in database. We devise pruning strategy to substantially reduce the combinatorial search space by making use of statistical information of items at two levels. Our experiments using synthetic and real-world standard benchmark dataset shows that the proposed algorithm outperforms algorithms not using cardinality statistical information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Aggarwal, C., Prasad, V.V.V.: Depth First Generation of Long Patterns. In: 7th Int’l Conference on Knowledge Discovery and Data Mining (August 2000)

    Google Scholar 

  2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast Discovery of Association Rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. 1994 Int. Conf. Very Large Data Bases (VLDB 1994), Santiago, Chile, pp. 487–499 (September 1994)

    Google Scholar 

  4. Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proc. Int. Conf. Data Engineering (ICDE 1995), Taipei, Taiwan, pp. 3–14 (March 1995)

    Google Scholar 

  5. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining Frequent Patterns with Counting Inference. SIGKDD Explorations 2(2) (December 2000)

    Google Scholar 

  6. Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. In: ACM SIGMOD Conf. Management of Data (June 1998)

    Google Scholar 

  7. Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: Proc. of the ACM-SIGMOD Conf. on Management of Data, pp. 255–264 (1997)

    Google Scholar 

  8. Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In: Intl. Conf. on Data Engineering (April 2001)

    Google Scholar 

  9. Gouda, K., Zaki, Z.J.: Efficiently Mining Maximal Frequent Itemsets. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)

    Google Scholar 

  10. Grahne, G., Zhu, J.: High Performance Mining of Maximal Frequent Itemsets. In: 6th International Workshop on High Performance Data Mining (May 2003)

    Google Scholar 

  11. Gunopulos, D., Mannila, H., Saluja, S.: Discovering All the Most Specific Sentences by Randomized Algorithms Extended Abstract. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 215–229. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  12. Han, J., Dong, G., Yin, Y.: Efficient Mining of Partial Periodic Patterns in Time Series Database. In: Proc. Int. Conf. Data Engineering, Sydney, Australia, pp. 106–115 (April 1999)

    Google Scholar 

  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers (2009)

    Google Scholar 

  14. Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD 2000), Dallas, TX (2000)

    Google Scholar 

  15. Lin, D.I., Kedem, Z.M.: Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set. In: 6th Intl. Conf. Extending Database Technology (March 1998)

    Google Scholar 

  16. Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient Algorithms for Discovering Association Rules. In: Proc. AAAI 1994 Workshop Knowledge Discovery in Databases (KDD 1994), Seattle, WA, pp. 181–192 (July 1994)

    Google Scholar 

  17. Park, J.S., Chen, M.S., Yu, P.S.: An Effective Hash Based Algorithm for Mining Association Rules. In: Proc. of the 1995 ACM-SIGMOD Conf. on Management of Data, pp. 175–186 (1996)

    Google Scholar 

  18. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  19. Pei, J., Han, J., Mao, R.: CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. In: Proc. 2000 ACM-SIGMOD Int. Workshop on Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX (May 2000)

    Google Scholar 

  20. Rymon, R.: Search through Systematic Set Enumeration. In: Proc. of Third Int’l Conf. on Principles of Knowledge Representation and Reasoning, pp. 539–550 (1992)

    Google Scholar 

  21. UCI Machine Learning Repositor, http://archive.ics.uci.edu/ml/datasets/Mushroom

  22. Savasere, A., Omiecinski, E., Navathe, S.: An Efficient Algorithm for Mining Association Rules in Large Databases. In: Proc. of the 21st Conf. on Very Large Data-Bases, pp. 432–444 (1995)

    Google Scholar 

  23. Zaki, M.J., Hsiao, C.-J.: CHARM: An Efficient Algorithm for Closed Itemset Mining

    Google Scholar 

  24. Zaki, M.J.: Generating Non-Redundant Association Rules. In: 6th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (August 2000)

    Google Scholar 

  25. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. of the Third Int’l Conf. on Knowledge Discovery in Databases and Data Mining, pp. 283–286 (1997)

    Google Scholar 

  26. Zou, Q., Chu, W.W., Lu, B.: Smartminer: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets. In: 2nd IEEE Int’l Conf. on Data Mining (November 2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dhabu, M.M., Deshpande, P.S. (2012). Cardinality Statistics Based Maximal Frequent Itemsets Mining. In: Dua, S., Gangopadhyay, A., Thulasiraman, P., Straccia, U., Shepherd, M., Stein, B. (eds) Information Systems, Technology and Management. ICISTM 2012. Communications in Computer and Information Science, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29166-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29166-1_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29165-4

  • Online ISBN: 978-3-642-29166-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics