Abstract
Mining Frequent Itemsets is the core operation of many data mining algorithms. This operation however, is very data intensive and sometimes produces a prohibitively large output. In this paper we give a complete set of rules for deducing tight bounds on the support of an itemset if the supports of all its subsets are known. Based on the derived bounds [l,u] on the support of a candidate itemset I, we can decide not to access the database to count the support of I if l is larger than the support threshold (I will certainly be frequent), or if u is below the threshold (I will certainly fail the frequency test). We can also use the deduction rules to reduce the size of an adequate representation of the collection of frequent sets; all itemsets I with bounds [l,u], where l =u, do not need to be stored explicitly. To assess the usability in practice, we implemented the deduction rules and we present experiments on real-life data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., et al.: Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. VLDB, pp. 487–499 (1994)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Dept. of Inf. and CS., Irvine, CA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Boulicaut, J.-F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 62–73. Springer, Heidelberg (2000)
Boulicaut, J.-F., et al.: Approximation of frequency queries by means of free-sets. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proc. PODS (2001)
Bykowski, A., et al.: Model-independent bounding of the supports of boolean formulae in binary data. In: Proc. ECML-PKDD Workshop KDID, pp. 20–31 (2002)
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)
Calders, T., Paredaens, J.: Axiomatization of frequent sets. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 204–218. Springer, Heidelberg (2000)
Fagin, R., et al.: A logic for reasoning about probabilities. Information and Computation 87(1,2), 78–128 (1990)
Groth, D., Robertson, E.: Discovering frequent itemsets in the presence of highly frequent items. In: Proc. Workshop RBDM, in Conjunction with 14th Intl. Conf. On Applications of Prolog (2001)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. ACM SIGMOD, pp. 1–12 (2000)
Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Dept. of Inf. and CS., Irvine, CA (1999), http://kdd.ics.uci.edu
Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunction-free generators. In: Proc. ICDM, pp. 305–312 (2001)
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proc. KDD (1996)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. DMKD 1(3), 241–258 (1997)
Nilsson, N.: Probabilistic logic. Artificial Intelligence 28, 71–87 (1986)
Pasquier, N., et al.: Discovering frequent closed itemsets for association rules. In: Proc. ICDT, pp. 398–416 (1999)
Pei, J., et al.: Closet: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop DMKD, Dallas, TX (2000)
Zaki, M.J., Hsiao, C.: ChARM: An efficient algorithm for closed association rule mining. In: Proc. ICDM (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Calders, T. (2004). Deducing Bounds on the Support of Itemsets. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-44497-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive