Machine Learning

, Volume 71, Issue 2–3, pp 307–323 | Cite as

Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Technical Note

Abstract

Standard pattern discovery techniques, such as association rules, suffer an extreme risk of finding very large numbers of spurious patterns for many knowledge discovery tasks. The direct-adjustment approach to controlling this risk applies a statistical test during the discovery process, using a critical value adjusted to take account of the size of the search space. However, a problem with the direct-adjustment strategy is that it may discard numerous true patterns. This paper investigates the assignment of different critical values to different areas of the search space as an approach to alleviating this problem, using a variant of a technique originally developed for other purposes. This approach is shown to be effective at increasing the number of discoveries while still maintaining strict control over the risk of false discoveries.

Keywords

Pattern discovery Significant patterns Significant rules Layered critical values Association rules Statistical procedures 

References

  1. Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining associations between sets of items in massive databases. In Proceedings of the 1993 ACM-SIGMOD international conference on management of data (pp. 207–216). Washington, DC, May 1993. Google Scholar
  2. Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99) (pp. 261–270). Google Scholar
  3. Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., & Lakhal, L. (2000). Mining minimal non-redundant association rules using frequent closed itemsets. In First international conference on computational logic—CL 2000 (pp. 972–986). Berlin: Springer. Google Scholar
  4. Bay, S. D., & Pazzani, M. J. (2001). Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246. MATHCrossRefGoogle Scholar
  5. Bayardo, R. J. Jr., Agrawal, R., & Gunopulos, D. (2000). Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3), 217–240. CrossRefGoogle Scholar
  6. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A new and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300. MATHMathSciNetGoogle Scholar
  7. Brijs, T., Swinnen, G., Vanhoof, K., & Wets, G. (1999). Using association rules for product assortment decisions: A case study. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 254–260). New York: ACM. CrossRefGoogle Scholar
  8. Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: Generalizing association rules to correlations. In J. Peckham (Ed.), SIGMOD 1997, Proceedings ACM SIGMOD international conference on management of data, May 1997 (pp. 265–276). New York: ACM. CrossRefGoogle Scholar
  9. Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99) (pp. 15–18). New York: ACM. Google Scholar
  10. DuMouchel, W., & Pregibon, D. (2001). Empirical Bayes screening for multi-item associations. In KDD-2001: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, August 2001 (pp. 76–76). New York: ACM. Google Scholar
  11. Gionis, A., Mannila, H., Mielikainen, T., & Tsaparas, P. (2006). Assessing data mining results via swap randomization. In 12th international conference on knowledge discovery and data mining (KDD) (pp. 167–176). Google Scholar
  12. Han, J., Wang, J., Lu, Y., & Tzvetkov, P. (2002). Mining top-K frequent closed patterns without minimum support. In International conference on data mining (pp. 211–218). Google Scholar
  13. Hettich, S., & Bay, S. D. (2007). The UCI KDD archive. Department of Information and Computer Science, University of California, Irvine, CA. http://kdd.ics.uci.edu.
  14. Holland, B. S., & Copenhaver, M. D. (1988). Improved Bonferroni-type multiple testing procedures. Psychological Bulletin, 104(1), 145–149. CrossRefGoogle Scholar
  15. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70. MathSciNetGoogle Scholar
  16. International Business Machines (1996). IBM intelligent miner user’s guide, version 1, release 1. Google Scholar
  17. Jaroszewicz, S., & Simovici, D. A. (2004). Interestingness of frequent itemsets using Bayesian networks as background knowledge. In R. Kohavi, J. Gehrke, & J. Ghosh (Eds.), KDD-2004: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, August 2004 (pp. 178–186). New York: ACM. CrossRefGoogle Scholar
  18. Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 249–271). Menlo Park: AAAI. Google Scholar
  19. Liu, B., Hsu, W., & Ma, Y. (1999). Pruning and summarizing the discovered associations. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99), August 1999 (pp. 125–134). New York: AAAI. CrossRefGoogle Scholar
  20. Megiddo, N., & Srikant, R. (1998). Discovering predictive association rules. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98) (pp. 27–78). Menlo Park: AAAI. Google Scholar
  21. Možina, M., Demšar, J., Žabkar, J., & Bratko, I. (2006). Why is rule learning optimistic and how to correct it. In Machine learning: ECML 2006 (pp. 330–340). Berlin: Springer. CrossRefGoogle Scholar
  22. Newman, D. J., Hettich, S., Blake, C., & Merz, C. J. (2007). UCI repository of machine learning databases (Machine-readable data repository). Department of Information and Computer Science, University of California, Irvine, CA. Google Scholar
  23. Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro & J. Frawley (Eds.), Knowledge discovery in databases (pp. 229–248). Menlo Park: AAAI/MIT Press. Google Scholar
  24. Scheffer, T. (1995). Finding association rules that trade support optimally against confidence. Intelligent Data Analysis, 9(4), 381–395. Google Scholar
  25. Scheffer, T., & Wrobel, S. (2002). Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research, 3, 833–862. CrossRefMathSciNetGoogle Scholar
  26. Webb, G. I. (1995). OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 3, 431–465. MATHMathSciNetGoogle Scholar
  27. Webb, G. I. (2001). Discovering associations with numeric variables. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2001) (pp. 383–388). New York: ACM. CrossRefGoogle Scholar
  28. Webb, G. I. (2002). Magnum opus, version 1.3. Software. Melbourne: G.I. Webb & Associates. Google Scholar
  29. Webb, G. I. (2006). Discovering significant rules. In L. Ungar, M. Craven, D. Gunopulos, & T. Eliassi-Rad (Eds.), Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-2006 (pp. 434–443). New York: ACM. CrossRefGoogle Scholar
  30. Webb, G. I. (2007). Discovering significant patterns. Machine Learning, 68(1), 1–33. CrossRefMathSciNetGoogle Scholar
  31. Webb, G. I., & Zhang, S. (2005). K-optimal rule discovery. Data Mining and Knowledge Discovery, 10(1), 39–79. CrossRefMathSciNetGoogle Scholar
  32. Zaki, M. J. (2004). Mining non-redundant association rules. Data Mining and Knowledge Discovery, 9(3), 223–248. CrossRefMathSciNetGoogle Scholar
  33. Zhang, H., Padmanabhan, B., & Tuzhilin, A. (2004). On the discovery of significant statistical quantitative rules. In Proceedings of the tenth international conference on knowledge discovery and data mining (KDD-2004), August 2004 (pp. 374–383). New York: ACM. CrossRefGoogle Scholar
  34. Zheng, Z., Kohavi, R., & Mason, L. (2001). Real world performance of association rule algorithms. In Proceedings of the seventh international conference on knowledge discovery and data mining (KDD-2001), August 2001 (pp. 401–406). New York: ACM. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Faculty of Information TechnologyMonash UniversityClaytonAustralia

Personalised recommendations