Advertisement

Journal of Intelligent Information Systems

, Volume 40, Issue 3, pp 431–454 | Cite as

BruteSuppression: a size reduction method for Apriori rule sets

  • Jon Hills
  • Anthony Bagnall
  • Beatriz de la Iglesia
  • Graeme Richards
Article

Abstract

Association rule mining can provide genuine insight into the data being analysed; however, rule sets can be extremely large, and therefore difficult and time-consuming for the user to interpret. We propose reducing the size of Apriori rule sets by removing overlapping rules, and compare this approach with two standard methods for reducing rule set size: increasing the minimum confidence parameter, and increasing the minimum antecedent support parameter. We evaluate the rule sets in terms of confidence and coverage, as well as two rule interestingness measures that favour rules with antecedent conditions that are poor individual predictors of the target class, as we assume that these represent potentially interesting rules. We also examine the distribution of the rules graphically, to assess whether particular classes of rules are eliminated. We show that removing overlapping rules substantially reduces rule set size in most cases, and alters the character of a rule set less than if the standard parameters are used to constrain the rule set to the same size. Based on our results, we aim to extend the Apriori algorithm to incorporate the suppression of overlapping rules.

Keywords

Apriori Data mining Interestingness Partial classification Rules 

References

  1. Agrawal, R., Imieliński, T., Swami, A. (1993). Mining association rules between sets of items in large databases. In ACM SIGMOD record (Vol. 22, pp. 207–216). ACM.Google Scholar
  2. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487–499).Google Scholar
  3. Ali, K., Manganaris, S., Srikant, R. (1997). Partial classification using association rules. In Proceedings of the third international conference on knowledge discovery and data mining (pp. 115–118).Google Scholar
  4. Balcázar, J. (2009). Confidence width: An objective measure for association rule novelty. In Workshop on quality issues, measures of interestingness and evaluation of data mining models QIMIE (Vol. 9).Google Scholar
  5. Bayardo, R., Agrawal, R., Gunopulos, D. (2000). Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2), 217–240.CrossRefGoogle Scholar
  6. Bayardo Jr., R., & Agrawal, R. (1999). Mining the most interesting rules. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 145–154). ACM.Google Scholar
  7. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J., Yang, C. (2001). Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering, 13(1), 64–78.CrossRefGoogle Scholar
  8. Freitas, A. (1999). On rule interestingness measures. Knowledge-Based Systems, 12(5–6), 309–315.CrossRefGoogle Scholar
  9. Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T. (1996). Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. In ACM SIGMOD Record (Vol. 25, pp. 13–23). ACM.Google Scholar
  10. Gebhardt, F. (1991). Choosing among competing generalizations. Knowledge Acquisition, 3(4), 361–380.CrossRefGoogle Scholar
  11. Goodman, L., & Kruskal, W. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49(268), 732–764.zbMATHGoogle Scholar
  12. Hills, J., Davis, L.M., Bagnall, A. (2012). Interestingness measures for fixed consequent rules. In Intelligent Data Engineering and Automated Learning - IDEAL 2012 (pp. 68–75).Google Scholar
  13. Hussain, F., Liu, H., Suzuki, E., Lu, H. (2000). Exception rule mining with a relative interestingness measure. In Knowledge discovery and data Mining. Current issues and new applications (pp. 86–97).Google Scholar
  14. de la Iglesia, B., Richards, G., Philpott, M., Rayward-Smith, V. (2006). The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification. European Journal of Operational Research, 169(3), 898–917.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: A unifying view. In Inductive logic programming (pp. 174–185).Google Scholar
  16. Liu, H., Lu, H., Feng, L., Hussain, F. (1999). Efficient search of reliable exceptions. In Methodologies for knowledge discovery and data mining (pp 194–204).Google Scholar
  17. Liu, H., Liu, L., Zhang, H. (2011). A fast pruning redundant rule method using Galois connection. Applied Soft Computing, 11(1), 130–137.CrossRefGoogle Scholar
  18. Major, J., & Mangano, J. (1995). Selecting among rules induced from a hurricane database. Journal of Intelligent Information Systems, 4(1), 39–52.CrossRefGoogle Scholar
  19. Ohsaki, M., Kitaguchi, S., Okamoto, K., Yokoi, H., Yamaguchi, T. (2004). Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In Knowledge discovery in databases: PKDD 2004 (pp. 362–373).Google Scholar
  20. Paper Authors (2012). Companion website. https://sites.google.com/site/brutesuppression. Accessed 30 Nov 2012.
  21. Reynolds, A., & de la Iglesia, B. (2006). Rule induction using multi-objective metaheuristics: Encouraging rule diversity. In International joint conference on neural networks, 2006. IJCNN’06. (pp 3343–3350). IEEE.Google Scholar
  22. Richards, G., & Rayward-Smith, V. (2001). Discovery of association rules in tabular data. In Proceedings of the IEEE international conference on data mining (p. 465). IEEE Computer Society.Google Scholar
  23. Richards, G., & Rayward-Smith, V. (2005). The discovery of association rules from tabular databases comprising nominal and ordinal attributes. Intelligent Data Analysis, 9(3), 289–307.Google Scholar
  24. Sarma, P.K.D., & Mahanta, A.K. (2012). Reduction of number of association rules with inter itemset distance in transaction databases. International Journal of Database Management Systems, 4(5), 61–82.CrossRefGoogle Scholar
  25. Shaharanee, I.N.M. and Hadzic, F., Dillon, T.S. (2011). Interestingness measures for association rules based on statistical validity. Knowledge-Based Systems, 24(3), 386–392.CrossRefGoogle Scholar
  26. Tamir, R., & Singer, Y. (2006). On a confidence gain measure for association rule discovery and scoring. The VLDB Journal, 15(1), 40–52.CrossRefGoogle Scholar
  27. Xu, Y., Li, Y., Shaw, G. (2011). Reliable representations for association rules. Data & Knowledge Engineering, 70(6), 555–575.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Jon Hills
    • 1
  • Anthony Bagnall
    • 1
  • Beatriz de la Iglesia
    • 1
  • Graeme Richards
    • 1
  1. 1.School of Computing SciencesUniversity of East AngliaNorwichUK

Personalised recommendations