Computational Statistics

, Volume 23, Issue 2, pp 303–315 | Cite as

Selective association rule generation

Original Paper

Abstract

Mining association rules is a popular and well researched method for discovering interesting relations between variables in large databases. A practical problem is that at medium to low support values often a large number of frequent itemsets and an even larger number of association rules are found in a database. A widely used approach is to gradually increase minimum support and minimum confidence or to filter the found rules using increasingly strict constraints on additional measures of interestingness until the set of rules found is reduced to a manageable size. In this paper we describe a different approach which is based on the idea to first define a set of “interesting” itemsets (e.g., by a mixture of mining and expert knowledge) and then, in a second step to selectively generate rules for only these itemsets. The main advantage of this approach over increasing thresholds or filtering rules is that the number of rules found is significantly reduced while at the same time it is not necessary to increase the support and confidence thresholds which might lead to missing important information in the database.

Keywords

Data mining Association rules Rule generation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data. ACM Press, pp 207–216Google Scholar
  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on very large data bases, VLDB. Morgan Kaufmann, pp 487–499Google Scholar
  3. Bayardo RJ, Agrawal R, Gunopulos D (2000) Constraint-based rule mining in large, dense databases. Data Mining Knowled Discov 4(2/3):217–240CrossRefGoogle Scholar
  4. Borgelt C (2003) Efficient implementations of Apriori and Eclat. In: FIMI’03: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementationsGoogle Scholar
  5. Borgelt C (2006) Apriori—Association rule induction, School of Computer Science, Otto-von-Guericke-University of Magdeburg. http://fuzzy.cs.uni-magdeburg.de/~borgelt/apriori.htmlGoogle Scholar
  6. Borgelt C, Kruse R (2002) Induction of association rules: Apriori implementation. In: Proceedings of the 15th conference on computational statistics (Compstat 2002, Berlin, Germany). Physika Verlag, HeidelbergGoogle Scholar
  7. Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19(1):79–86CrossRefGoogle Scholar
  8. Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: Report on FIMI’03. SIGKDD Explorations 6(1):109–117CrossRefGoogle Scholar
  9. Hahsler M, Buchta C, Grün B, Hornik K (2007) arules: Mining Association Rules and Frequent Itemsets. R package version 0.6-0. http://CRAN.R-project.org/Google Scholar
  10. Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. SIGKDD Explorations 2(2):1–58CrossRefGoogle Scholar
  11. Imielinski T, Virmani A (1998) Association rules... and what’s next? towards second generation data mining systems. In: Proceedings of the second East European symposium on advances in databases and information systems. Lecture notes in computer science, vol 1475. Springer, London, pp 6–25Google Scholar
  12. Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo AI (1994) Finding interesting rules from large sets of discovered association rules. In: Adam NR, Bhargava BK, Yesha Y (eds) Third international conference on information and knowledge management (CIKM’94). ACM Press, pp 401–407Google Scholar
  13. Knuth D (1997) The art of computer programming, sorting and searching, vol 3, 3rd edn. Digital searching, pp 492–512Google Scholar
  14. Kohavi R (1996) Scaling up the accuracy of Naïve–Bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 202–207Google Scholar
  15. Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers report: peeling the onion. SIGKDD Explorat 2(2):86–98CrossRefGoogle Scholar
  16. Luo J, Bridges S (2000) Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection. Int J Intell Syst 15(8):687–703MATHCrossRefGoogle Scholar
  17. Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI Repository of Machine Learning Databases, University of California, Irvine, Department of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.htmlGoogle Scholar
  18. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceeding of the 7th international conference on database theory. Lecture notes in computer science (LNCS 1540). Springer, Heidelberg, pp 398–416Google Scholar
  19. Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley WJ (eds). Knowledge discovery in databases. AAAI/MIT Press, Cambridge, MAGoogle Scholar
  20. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds) Proceedings of the 3rd international conference on knowledge discovery and data mining, KDD. AAAI Press, pp 67–73Google Scholar
  21. Srivastava J, Cooley R, Deshpande M, Tan P-N (2000) Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explorat 1(2):12–23CrossRefGoogle Scholar
  22. Tan P-N, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313CrossRefGoogle Scholar
  23. Zaki MJ (2004) Mining non-redundant association rules. Data Mining Knowled Discov 9:223–248CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Michael Hahsler
    • 1
  • Christian Buchta
    • 2
  • Kurt Hornik
    • 3
  1. 1.Department of Information Systems and OperationsInstitut für Informationswirtschaft, Wirtschaftsuniversität WienWienAustria
  2. 2.Institute for Tourism and Leisure StudiesWirtschaftsuniversität WienWienAustria
  3. 3.Department of Statistics and MathematicsWirtschaftsuniversität WienWienAustria

Personalised recommendations