On Exploring the Power-Law Relationship in the Itemset Support Distribution

  • Kun-Ta Chuang
  • Jiun-Long Huang
  • Ming-Syan Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

We identify and explore in this paper an important phenomenon which points out that the power-law relationship appears in the distribution of itemset supports. Characterizing such a relationship will benefit many applications such as providing the direction of tuning the performance of the frequent-itemset mining. Nevertheless, due to the explosive number of itemsets, it will be prohibitively expensive to retrieve characteristics of the power-law relationship in the distribution of itemset supports. As such, we also propose in this paper a valid and cost-effective algorithm, called algorithm PPL, to extract characteristics of the distribution without the need of discovering all itemsets in advance. Experimental results demonstrate that algorithm PPL is able to efficiently extract the characteristics of the power-law relationship with high accuracy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of VLDB (1994)Google Scholar
  2. 2.
    Bi, Z., Faloutsos, C., Korn, F.: The ”DGX” Distribution for Mining Massive, Skewed Data. In: Proc. of SIGKDD (2000)Google Scholar
  3. 3.
    Breslau, L., Cao, P., Fan, L., Phillips, G., Shenker, S.: Web caching and zipf-like distributions: Evidence and implications. In: Proc. of IEEE INFOCOM (1999)Google Scholar
  4. 4.
    Cheung, Y.L., Fu, A.W.: Mining Association Rules without Support Threshold: with and without Item Constraints. In: TKDE (2004)Google Scholar
  5. 5.
    Cochran, W.G.: Sampling Techniques. John Wiley and Sons, Chichester (1977)MATHGoogle Scholar
  6. 6.
    Egghe, L.: The distribution of n-grams. Scientometrics (2000)Google Scholar
  7. 7.
    Geerts, F., Goethals, B., Bussche, J.V.d.: A tight upper bound on the number of candidate patterns. In: Proc. of IEEE ICDM (2001)Google Scholar
  8. 8.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)Google Scholar
  9. 9.
    Ioannidis, Y.: The history of histograms. In: Proc. of VLDB (2003)Google Scholar
  10. 10.
    Park, J.-S., Chen, M.-S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proc. of SIGMOD (1995)Google Scholar
  11. 11.
    Ramesh, G., Maniatty, W.A., Zaki, M.J.: Feasible itemset distributions in data mining: Theory and application. In: Proc. of ACM PODS (2003)Google Scholar
  12. 12.
    Rice, J.A.: Mathematical statistics and data analysis. Duxbury Press (1995)Google Scholar
  13. 13.
    Toivonen, H.: Sampling large databases for association rules. In: Proc. of VLDB (1996)Google Scholar
  14. 14.
    Wang, J., Han, J., Lu, Y., Tzvetkov, P.: TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets. In: TKDE (2005)Google Scholar
  15. 15.
    Zaki, M.J., Parthasarathy, S.: Evaluation of sampling for data mining of association rules. In: Int. Workshop on Research Issues in Data Engineering (1997)Google Scholar
  16. 16.
    Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proc. of SIGKDD (2001)Google Scholar
  17. 17.
    Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Reading (1949)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kun-Ta Chuang
    • 1
  • Jiun-Long Huang
    • 2
  • Ming-Syan Chen
    • 1
  1. 1.Graduate Institute of Communication EngineeringNational Taiwan UniversityTaipeiTaiwan, ROC
  2. 2.Department of Computer ScienceNational Chiao Tung UniversityHsinchuTaiwan, ROC

Personalised recommendations