Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases

  • Colin Cooper
  • Michele Zito
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4702)

Abstract

We investigate the statistical properties of the databases generated by the IBM QUEST program. Motivated by the claim (also supported empirical evidence) that item occurrences in real life market basket databases follow a rather different pattern, we propose an alternative model for generating artificial data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of the 20th Int. Conf. on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc, San Francisco (1994)Google Scholar
  2. 2.
    Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pp. 255–264. ACM Press, New York (1997)CrossRefGoogle Scholar
  4. 4.
    Cooper, C.: The age specific degree distribution of web-graphs. Combinatorics. Probability and Computing 15(5), 637–661 (2006)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pp. 1–12. ACM Press, New York (2000)CrossRefGoogle Scholar
  6. 6.
    Kolchin, V.F., Sevast’yanov, B.A., Chistyakov, V.P.: Random Allocations. Winston & Sons (1978)Google Scholar
  7. 7.
    Mitzenmacher, M.: A brief history of generative models for power law and lognormal distributions. Internet Mathematics 1(2), 226–251 (2004)MATHMathSciNetGoogle Scholar
  8. 8.
    Redner, S.: How popular is your paper? an empirical study of the citation distribution. European Physical Journal B 4, 401–404 (1998)Google Scholar
  9. 9.
    Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21th Int. Conf. on Very Large Data Bases, pp. 432–444. Morgan Kaufmann Publishers Inc, San Francisco (1995)Google Scholar
  10. 10.
    Watts, D.J.: The ”new” science of networks. Annual Review of Sociology 30, 243–270 (2004)CrossRefGoogle Scholar
  11. 11.
    Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proc. of the 7th ACM SIGKDD Int. Conf. on Knowledge Discovery nd Data mining, pp. 401–406. ACM Press, New York (2001)Google Scholar
  12. 12.
    Zaïane, O., El-Hajj, M., Li, Y., Luk, S.: Scrutinizing frequent pattern discovery performance. In: ICDE 2005. Proc. of the 21st Int. Conf. on Data Engineering, pp. 1109–1110. IEEE Computer Society, Los Alamitos (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Colin Cooper
    • 1
  • Michele Zito
    • 2
  1. 1.Department of Computer Science, Kings’ College, London WC2R 2LSUK
  2. 2.Department of Computer Science, University of Liverpool, Liverpool, L69 3BXUK

Personalised recommendations