Deducing Bounds on the Support of Itemsets

Calders, Toon

doi:10.1007/978-3-540-44497-8_11

Toon Calders⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

389 Accesses
7 Citations

Abstract

Mining Frequent Itemsets is the core operation of many data mining algorithms. This operation however, is very data intensive and sometimes produces a prohibitively large output. In this paper we give a complete set of rules for deducing tight bounds on the support of an itemset if the supports of all its subsets are known. Based on the derived bounds [l,u] on the support of a candidate itemset I, we can decide not to access the database to count the support of I if l is larger than the support threshold (I will certainly be frequent), or if u is below the threshold (I will certainly fail the frequency test). We can also use the deduction rules to reduce the size of an adequate representation of the collection of frequent sets; all itemsets I with bounds [l,u], where l =u, do not need to be stored explicitly. To assess the usability in practice, we implemented the deduction rules and we present experiments on real-life data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., et al.: Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD, pp. 207–216 (1993)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. VLDB, pp. 487–499 (1994)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Dept. of Inf. and CS., Irvine, CA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Boulicaut, J.-F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 62–73. Springer, Heidelberg (2000)
Chapter Google Scholar
Boulicaut, J.-F., et al.: Approximation of frequency queries by means of free-sets. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)
Chapter Google Scholar
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proc. PODS (2001)
Google Scholar
Bykowski, A., et al.: Model-independent bounding of the supports of boolean formulae in binary data. In: Proc. ECML-PKDD Workshop KDID, pp. 20–31 (2002)
Google Scholar
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)
Chapter Google Scholar
Calders, T., Paredaens, J.: Axiomatization of frequent sets. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 204–218. Springer, Heidelberg (2000)
Chapter Google Scholar
Fagin, R., et al.: A logic for reasoning about probabilities. Information and Computation 87(1,2), 78–128 (1990)
Article MathSciNet MATH Google Scholar
Groth, D., Robertson, E.: Discovering frequent itemsets in the presence of highly frequent items. In: Proc. Workshop RBDM, in Conjunction with 14th Intl. Conf. On Applications of Prolog (2001)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. ACM SIGMOD, pp. 1–12 (2000)
Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Dept. of Inf. and CS., Irvine, CA (1999), http://kdd.ics.uci.edu
Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunction-free generators. In: Proc. ICDM, pp. 305–312 (2001)
Google Scholar
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proc. KDD (1996)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. DMKD 1(3), 241–258 (1997)
Google Scholar
Nilsson, N.: Probabilistic logic. Artificial Intelligence 28, 71–87 (1986)
Article MathSciNet MATH Google Scholar
Pasquier, N., et al.: Discovering frequent closed itemsets for association rules. In: Proc. ICDT, pp. 398–416 (1999)
Google Scholar
Pei, J., et al.: Closet: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop DMKD, Dallas, TX (2000)
Google Scholar
Zaki, M.J., Hsiao, C.: ChARM: An efficient algorithm for closed association rule mining. In: Proc. ICDM (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Antwerp, Universiteitsplein 1, B-2610, Wilrijk, Belgium
Toon Calders

Authors

Toon Calders
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Informatica, Università di Torino, Italy
Rosa Meo
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
Pier Luca Lanzi
Nokia Research Center, Nokia Group, P.O.Box 407, FIN-00045, Finland
Mika Klemettinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Calders, T. (2004). Deducing Bounds on the Support of Itemsets. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-44497-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics