Advertisement

DepMiner: A Method and a System for the Extraction of Significant Dependencies

  • Rosa Meo
  • Leonardo D’Ambrosi
Part of the Intelligent Systems Reference Library book series (ISRL, volume 23)

Abstract

We propose DepMiner, a method implementing a simple but effective model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the values assumed by a set of variables on a domain of finite values. This method is based on Δ, the departure of the probability of an observed event from a referential probability of the same event. The observed probability is the probability that the variables assume in the database given values; the referential probability, is the probability of the same event estimated in the condition of maximum entropy.

DepMiner is able to distinguish between dependencies among the variables intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to evaluate the utility of an itemset w.r.t. its subsets. The method is powerful: at the same time it detects significant positive dependencies as well as negative ones suitable to identify rare itemsets. Since Δ is anti-monotonic it can be embedded efficiently in algorithms. The system returns itemsets ranked by Δ and presents the histogram of Δ distribution. Parameters that govern the method, such as minimum support for itemsets and thresholds of Δ are automatically determined by the system. The system uses the thresholds for Δ to identify the statistically significant itemsets. Thus it succeeds to reduce the volume of results more then competitive methods.

Keywords

Association Rule Maximum Entropy Minimum Support Frequent Itemsets Association Rule Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proc. PODS (1998)Google Scholar
  2. 2.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science (2006)Google Scholar
  3. 3.
    Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proc. SIGMOD (1997)Google Scholar
  4. 4.
    Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Discov. 14(1) (2007)Google Scholar
  5. 5.
    Duan, L., Street, W.N.: Finding maximal fully-correlated itemsets in large databases. In: IEEE International Conference on Data Mining, pp. 770–775 (2009)Google Scholar
  6. 6.
    Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Gallo, A., De Bie, T., Cristianini, N.: MINI: Mining informative non-redundant itemsets. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 438–445. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. In: Proc. KDD (2006)Google Scholar
  9. 9.
    Goodman, K.: Measures of association for cross classifications. J. Amer. Stat. Ass. 49(268) (1954)Google Scholar
  10. 10.
    Hilderman, R.J., Hamilton, H.J.: Measuring the interestingness of discovered knowledge: A principled approach. Intell. Data Anal. 7, 347–382 (2003)zbMATHGoogle Scholar
  11. 11.
    Knobbe, A.J., Ho, E.K.Y.: Maximally informative k-itemsets and their efficient discovery. In: KDD, pp. 237–244 (2006)Google Scholar
  12. 12.
    Liu, Z.Z.H.: Searching for interacting features. In: The 20th International Joint Conference on AI, IJCAI 2007 (2007)Google Scholar
  13. 13.
    Meo, R.: Theory of dependence values. TODS 45(3) (2000)Google Scholar
  14. 14.
    Meo, R.: Maximum independence and mutual information. TOIT 48(1) (January 2002)Google Scholar
  15. 15.
    Meo, R., Ienco, D.: Replacing support in association rule mining. In: Sing, Y., Rountree, N. (eds.) Rare Association Rule Mining and Knowledge Discovery: Technologies for Infrequent and Critical Event Detection. IGI Global publisher (2008)Google Scholar
  16. 16.
    Omiecinski, E.: Alternative interest measures for mining associations in databases. TKDE 15(1) (2003)Google Scholar
  17. 17.
    Savinov, A.: Mining dependence rules by finding largest support quota. In: Proc. SAC (2004)Google Scholar
  18. 18.
    Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SDM (2006)Google Scholar
  19. 19.
    Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proc. KDD (2002)Google Scholar
  20. 20.
    Tatti, N.: Maximum entropy based significance of itemsets. In: Proc. ICDM (2007)Google Scholar
  21. 21.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: Lcm v2. In: FIMI 2004 (2004)Google Scholar
  22. 22.
    Webb, G.I.: Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 434–443 (2006)Google Scholar
  23. 23.
    Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns. In: KDD (2006)Google Scholar
  24. 24.
    Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: VLDB. pp. 709–720 (2005)Google Scholar
  25. 25.
    Zhang, X., Pan, F., Wang, W., Nobel, A.B.: Mining non-redundant high order correlations in binary data. PVLDB 1(1) (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Rosa Meo
    • 1
  • Leonardo D’Ambrosi
    • 2
  1. 1.University of TorinoItaly
  2. 2.Regional Agency for Health Care Services - A.Re.S.S. PiemonteItaly

Personalised recommendations