Abstract
Extracting frequent itemsets is an important task in many data mining applications. Since the result set of all the frequent itemsets are likely to be undesirably large, condensed representations, such as maximal and closed frequent itemsets are used. The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitudes smaller than the set of all frequent itemsets. But whenever there are very long patterns present in the data, it is often impractical to generate the entire set of closed frequent itemsets. The only recourse is to mine the maximal frequent itemsets in the domain with very long patterns. In this paper, we propose a new approach for mining all maximal frequent itemsets which introduces and makes use of the compact data structure: Reduced Transaction Pattern List (RTPL), for representing the database. Our implementation exploits the advantages of combining RTPL representation with statistical information of cardinality of each item in database. We devise pruning strategy to substantially reduce the combinatorial search space by making use of statistical information of items at two levels. Our experiments using synthetic and real-world standard benchmark dataset shows that the proposed algorithm outperforms algorithms not using cardinality statistical information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Aggarwal, C., Prasad, V.V.V.: Depth First Generation of Long Patterns. In: 7th Int’l Conference on Knowledge Discovery and Data Mining (August 2000)
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast Discovery of Association Rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. 1994 Int. Conf. Very Large Data Bases (VLDB 1994), Santiago, Chile, pp. 487–499 (September 1994)
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proc. Int. Conf. Data Engineering (ICDE 1995), Taipei, Taiwan, pp. 3–14 (March 1995)
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining Frequent Patterns with Counting Inference. SIGKDD Explorations 2(2) (December 2000)
Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. In: ACM SIGMOD Conf. Management of Data (June 1998)
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: Proc. of the ACM-SIGMOD Conf. on Management of Data, pp. 255–264 (1997)
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In: Intl. Conf. on Data Engineering (April 2001)
Gouda, K., Zaki, Z.J.: Efficiently Mining Maximal Frequent Itemsets. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)
Grahne, G., Zhu, J.: High Performance Mining of Maximal Frequent Itemsets. In: 6th International Workshop on High Performance Data Mining (May 2003)
Gunopulos, D., Mannila, H., Saluja, S.: Discovering All the Most Specific Sentences by Randomized Algorithms Extended Abstract. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 215–229. Springer, Heidelberg (1996)
Han, J., Dong, G., Yin, Y.: Efficient Mining of Partial Periodic Patterns in Time Series Database. In: Proc. Int. Conf. Data Engineering, Sydney, Australia, pp. 106–115 (April 1999)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers (2009)
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD 2000), Dallas, TX (2000)
Lin, D.I., Kedem, Z.M.: Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set. In: 6th Intl. Conf. Extending Database Technology (March 1998)
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient Algorithms for Discovering Association Rules. In: Proc. AAAI 1994 Workshop Knowledge Discovery in Databases (KDD 1994), Seattle, WA, pp. 181–192 (July 1994)
Park, J.S., Chen, M.S., Yu, P.S.: An Effective Hash Based Algorithm for Mining Association Rules. In: Proc. of the 1995 ACM-SIGMOD Conf. on Management of Data, pp. 175–186 (1996)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Pei, J., Han, J., Mao, R.: CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. In: Proc. 2000 ACM-SIGMOD Int. Workshop on Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX (May 2000)
Rymon, R.: Search through Systematic Set Enumeration. In: Proc. of Third Int’l Conf. on Principles of Knowledge Representation and Reasoning, pp. 539–550 (1992)
UCI Machine Learning Repositor, http://archive.ics.uci.edu/ml/datasets/Mushroom
Savasere, A., Omiecinski, E., Navathe, S.: An Efficient Algorithm for Mining Association Rules in Large Databases. In: Proc. of the 21st Conf. on Very Large Data-Bases, pp. 432–444 (1995)
Zaki, M.J., Hsiao, C.-J.: CHARM: An Efficient Algorithm for Closed Itemset Mining
Zaki, M.J.: Generating Non-Redundant Association Rules. In: 6th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (August 2000)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. of the Third Int’l Conf. on Knowledge Discovery in Databases and Data Mining, pp. 283–286 (1997)
Zou, Q., Chu, W.W., Lu, B.: Smartminer: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets. In: 2nd IEEE Int’l Conf. on Data Mining (November 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dhabu, M.M., Deshpande, P.S. (2012). Cardinality Statistics Based Maximal Frequent Itemsets Mining. In: Dua, S., Gangopadhyay, A., Thulasiraman, P., Straccia, U., Shepherd, M., Stein, B. (eds) Information Systems, Technology and Management. ICISTM 2012. Communications in Computer and Information Science, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29166-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-29166-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29165-4
Online ISBN: 978-3-642-29166-1
eBook Packages: Computer ScienceComputer Science (R0)