Cardinality Statistics Based Maximal Frequent Itemsets Mining

Dhabu, Meera M.; Deshpande, Parag S.

doi:10.1007/978-3-642-29166-1_3

Meera M. Dhabu⁷ &
Parag S. Deshpande⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 285))

Included in the following conference series:

International Conference on Information Systems, Technology and Management

1178 Accesses
3 Citations

Abstract

Extracting frequent itemsets is an important task in many data mining applications. Since the result set of all the frequent itemsets are likely to be undesirably large, condensed representations, such as maximal and closed frequent itemsets are used. The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitudes smaller than the set of all frequent itemsets. But whenever there are very long patterns present in the data, it is often impractical to generate the entire set of closed frequent itemsets. The only recourse is to mine the maximal frequent itemsets in the domain with very long patterns. In this paper, we propose a new approach for mining all maximal frequent itemsets which introduces and makes use of the compact data structure: Reduced Transaction Pattern List (RTPL), for representing the database. Our implementation exploits the advantages of combining RTPL representation with statistical information of cardinality of each item in database. We devise pruning strategy to substantially reduce the combinatorial search space by making use of statistical information of items at two levels. Our experiments using synthetic and real-world standard benchmark dataset shows that the proposed algorithm outperforms algorithms not using cardinality statistical information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Aggarwal, C., Prasad, V.V.V.: Depth First Generation of Long Patterns. In: 7th Int’l Conference on Knowledge Discovery and Data Mining (August 2000)
Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast Discovery of Association Rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)
Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. 1994 Int. Conf. Very Large Data Bases (VLDB 1994), Santiago, Chile, pp. 487–499 (September 1994)
Google Scholar
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proc. Int. Conf. Data Engineering (ICDE 1995), Taipei, Taiwan, pp. 3–14 (March 1995)
Google Scholar
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining Frequent Patterns with Counting Inference. SIGKDD Explorations 2(2) (December 2000)
Google Scholar
Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. In: ACM SIGMOD Conf. Management of Data (June 1998)
Google Scholar
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: Proc. of the ACM-SIGMOD Conf. on Management of Data, pp. 255–264 (1997)
Google Scholar
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In: Intl. Conf. on Data Engineering (April 2001)
Google Scholar
Gouda, K., Zaki, Z.J.: Efficiently Mining Maximal Frequent Itemsets. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)
Google Scholar
Grahne, G., Zhu, J.: High Performance Mining of Maximal Frequent Itemsets. In: 6th International Workshop on High Performance Data Mining (May 2003)
Google Scholar
Gunopulos, D., Mannila, H., Saluja, S.: Discovering All the Most Specific Sentences by Randomized Algorithms Extended Abstract. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 215–229. Springer, Heidelberg (1996)
Chapter Google Scholar
Han, J., Dong, G., Yin, Y.: Efficient Mining of Partial Periodic Patterns in Time Series Database. In: Proc. Int. Conf. Data Engineering, Sydney, Australia, pp. 106–115 (April 1999)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers (2009)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD 2000), Dallas, TX (2000)
Google Scholar
Lin, D.I., Kedem, Z.M.: Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set. In: 6th Intl. Conf. Extending Database Technology (March 1998)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient Algorithms for Discovering Association Rules. In: Proc. AAAI 1994 Workshop Knowledge Discovery in Databases (KDD 1994), Seattle, WA, pp. 181–192 (July 1994)
Google Scholar
Park, J.S., Chen, M.S., Yu, P.S.: An Effective Hash Based Algorithm for Mining Association Rules. In: Proc. of the 1995 ACM-SIGMOD Conf. on Management of Data, pp. 175–186 (1996)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Chapter Google Scholar
Pei, J., Han, J., Mao, R.: CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. In: Proc. 2000 ACM-SIGMOD Int. Workshop on Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX (May 2000)
Google Scholar
Rymon, R.: Search through Systematic Set Enumeration. In: Proc. of Third Int’l Conf. on Principles of Knowledge Representation and Reasoning, pp. 539–550 (1992)
Google Scholar
UCI Machine Learning Repositor, http://archive.ics.uci.edu/ml/datasets/Mushroom
Savasere, A., Omiecinski, E., Navathe, S.: An Efficient Algorithm for Mining Association Rules in Large Databases. In: Proc. of the 21st Conf. on Very Large Data-Bases, pp. 432–444 (1995)
Google Scholar
Zaki, M.J., Hsiao, C.-J.: CHARM: An Efficient Algorithm for Closed Itemset Mining
Google Scholar
Zaki, M.J.: Generating Non-Redundant Association Rules. In: 6th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (August 2000)
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. of the Third Int’l Conf. on Knowledge Discovery in Databases and Data Mining, pp. 283–286 (1997)
Google Scholar
Zou, Q., Chu, W.W., Lu, B.: Smartminer: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets. In: 2nd IEEE Int’l Conf. on Data Mining (November 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Visvesvaraya National Institute of Technology, Nagpur, India
Meera M. Dhabu & Parag S. Deshpande

Authors

Meera M. Dhabu
View author publications
You can also search for this author in PubMed Google Scholar
Parag S. Deshpande
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science, College of Engineering and Science, Louisiana Tech University, 71272, Ruston, LA, USA
Sumeet Dua
Department of Information Systems, College of Engineering and Information, Technology, UMBC, 1000 Hilltop Circle, 2125, Baltimore, MD, USA
Aryya Gangopadhyay
Department of Computer Science, The University of Manitoba, Winnipeg, MB, Canada
Parimala Thulasiraman
ISTI - CNR, Pisa, Italy
Umberto Straccia
Faculty of Computer Science, Dalhousie University Halifax, B3H 1W5, Nova Scotia, Canada
Michael Shepherd
Faculty of Media: Media Systems, Bauhaus University Weimar, 99421, Weimar, Germany
Benno Stein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dhabu, M.M., Deshpande, P.S. (2012). Cardinality Statistics Based Maximal Frequent Itemsets Mining. In: Dua, S., Gangopadhyay, A., Thulasiraman, P., Straccia, U., Shepherd, M., Stein, B. (eds) Information Systems, Technology and Management. ICISTM 2012. Communications in Computer and Information Science, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29166-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-29166-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29165-4
Online ISBN: 978-3-642-29166-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics