Abstract
We identify and explore in this paper an important phenomenon which points out that the power-law relationship appears in the distribution of itemset supports. Characterizing such a relationship will benefit many applications such as providing the direction of tuning the performance of the frequent-itemset mining. Nevertheless, due to the explosive number of itemsets, it will be prohibitively expensive to retrieve characteristics of the power-law relationship in the distribution of itemset supports. As such, we also propose in this paper a valid and cost-effective algorithm, called algorithm PPL, to extract characteristics of the distribution without the need of discovering all itemsets in advance. Experimental results demonstrate that algorithm PPL is able to efficiently extract the characteristics of the power-law relationship with high accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of VLDB (1994)
Bi, Z., Faloutsos, C., Korn, F.: The ”DGX” Distribution for Mining Massive, Skewed Data. In: Proc. of SIGKDD (2000)
Breslau, L., Cao, P., Fan, L., Phillips, G., Shenker, S.: Web caching and zipf-like distributions: Evidence and implications. In: Proc. of IEEE INFOCOM (1999)
Cheung, Y.L., Fu, A.W.: Mining Association Rules without Support Threshold: with and without Item Constraints. In: TKDE (2004)
Cochran, W.G.: Sampling Techniques. John Wiley and Sons, Chichester (1977)
Egghe, L.: The distribution of n-grams. Scientometrics (2000)
Geerts, F., Goethals, B., Bussche, J.V.d.: A tight upper bound on the number of candidate patterns. In: Proc. of IEEE ICDM (2001)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Ioannidis, Y.: The history of histograms. In: Proc. of VLDB (2003)
Park, J.-S., Chen, M.-S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proc. of SIGMOD (1995)
Ramesh, G., Maniatty, W.A., Zaki, M.J.: Feasible itemset distributions in data mining: Theory and application. In: Proc. of ACM PODS (2003)
Rice, J.A.: Mathematical statistics and data analysis. Duxbury Press (1995)
Toivonen, H.: Sampling large databases for association rules. In: Proc. of VLDB (1996)
Wang, J., Han, J., Lu, Y., Tzvetkov, P.: TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets. In: TKDE (2005)
Zaki, M.J., Parthasarathy, S.: Evaluation of sampling for data mining of association rules. In: Int. Workshop on Research Issues in Data Engineering (1997)
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proc. of SIGKDD (2001)
Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Reading (1949)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chuang, KT., Huang, JL., Chen, MS. (2006). On Exploring the Power-Law Relationship in the Itemset Support Distribution. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_41
Download citation
DOI: https://doi.org/10.1007/11687238_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32960-2
Online ISBN: 978-3-540-32961-9
eBook Packages: Computer ScienceComputer Science (R0)