Knowledge and Information Systems

, Volume 38, Issue 2, pp 391–418 | Cite as

On the adaptability of G3PARM to the extraction of rare association rules

Regular Paper

Abstract

To date, association rule mining has mainly focused on the discovery of frequent patterns. Nevertheless, it is often interesting to focus on those that do not frequently occur. Existing algorithms for mining this kind of infrequent patterns are mainly based on exhaustive search methods and can be applied only over categorical domains. In a previous work, the use of grammar-guided genetic programming for the discovery of frequent association rules was introduced, showing that this proposal was competitive in terms of scalability, expressiveness, flexibility and the ability to restrict the search space. The goal of this work is to demonstrate that this proposal is also appropriate for the discovery of rare association rules. This approach allows one to obtain solutions within specified time limits and does not require large amounts of memory, as current algorithms do. It also provides mechanisms to discard noise from the rare association rule set by applying four different and specific fitness functions, which are compared and studied in depth. Finally, this approach is compared with other existing algorithms for mining rare association rules, and an analysis of the mined rules is performed. As a result, this approach mines rare rules in a homogeneous and low execution time. The experimental study shows that this proposal obtains a small and accurate set of rules close to the size specified by the data miner.

Keywords

Rare association rules Grammar-guided genetic programming  Evolutionary computation 

References

  1. 1.
    Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of the 6th international conference on machine learning and applications, ICMLA ’07, pp 73–80, Cincinnati, OhioGoogle Scholar
  2. 2.
    Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, pp 307–328. http://dl.acm.org/citation.cfm?id=257938.257975
  3. 3.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc., pp 487–499Google Scholar
  4. 4.
    Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235MATHGoogle Scholar
  5. 5.
    Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proceedings of the 1st workshop on frequent itemset mining implementations, FIMI ’03, Melbourne, Florida, USA, pp 1–9Google Scholar
  6. 6.
    Chen Y, Peng W, Lee S (2011) Ceminer—an efficient algorithm for mining closed patterns from time interval-based data. In: Proceedings of the 11th IEEE international conference on data mining, ICDM ’11, Vancouver, BC, Canada, pp 121–130Google Scholar
  7. 7.
    Chi Y, Wang H, Yu PS, Muntz RR (2006) Catch the momento: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3):265–294CrossRefGoogle Scholar
  8. 8.
    Datar E, Fujiwara M, Gionis S, Indyk A, Motwani P, Ullman R, Yang JD, Cohen C (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78CrossRefGoogle Scholar
  9. 9.
    De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, ACM SIGKDD ’08, Las Vegas, USA, pp 204–212Google Scholar
  10. 10.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNetGoogle Scholar
  11. 11.
    García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064CrossRefGoogle Scholar
  12. 12.
    García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the cec’2005 special session on real parameter optimization. J Heuristics 15(6):617–644CrossRefMATHGoogle Scholar
  13. 13.
    García-Piquer A, Fornells A, Orriols-Puig A, Corral G, Golobardes E (2011) Data classification through an evolutionary approach based on multiple criteria. Knowl Inf Syst. doi:10.1007/s10115-011-0462-9
  14. 14.
    Gruau F (1996) On using syntactic constraints with genetic programming. Adv Genet Progr 2:377–394Google Scholar
  15. 15.
    Ha H, Hwang D, Ryu B, Yun KH (2003) Mining association rules on significant rare data using relative support. J Syst Softw 67(3):181–191CrossRefGoogle Scholar
  16. 16.
    Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87CrossRefMathSciNetGoogle Scholar
  17. 17.
    Hoai RI, Whigham NX, Shan PA, O’neill Y, McKay M (2010) Grammar-based genetic programming: a survey. Genet Progr Evol Mach 11(3–4):365–396Google Scholar
  18. 18.
    Koh YS, Rountree N (2005) Finding sporadic rules using apriori-inverse. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 3518:97–106Google Scholar
  19. 19.
    Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information science reference, Hershey, NYGoogle Scholar
  20. 20.
    Koufakou A, Secretan J, Georgiopoulos M (2011) Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl Inf Syst 29:697–725CrossRefGoogle Scholar
  21. 21.
    Li T, Li X (2010) Novel alarm correlation analysis system based on association rules mining in telecommunication networks. Inf Sci 180(16):2960–2978CrossRefGoogle Scholar
  22. 22.
    Luna JM, Ramírez A, Romero JR, Ventura S (2010) An intruder detection approach based on infrequent rating pattern mining. In: Proceedings of the 10th international conference on intelligent systems design and applications, ISDA ’10, Cairo, Egypt, pp 682–688Google Scholar
  23. 23.
    Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76CrossRefGoogle Scholar
  24. 24.
    Mata J, Álvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. In: Proceeding of the 6th international conference on knowledge discovery and data mining, PAKDD ’02, pp 40–51Google Scholar
  25. 25.
    Ordoñez C, Ezquerra N, Santana C (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):259–283CrossRefGoogle Scholar
  26. 26.
    Piatetsky-Shapiro G (1991) Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI Press, Menlo Park, CA, pp 229–248Google Scholar
  27. 27.
    Rahman A, Ezeife CI, Aggarwal AK (2008) Wifi miner: an online apriori-infrequent based wireless intrusion system. In: Proceedings of the 2nd international workshop in knowledge discovery from sensor data, Sensor-KDD ’08, Las Vegas, USA, pp 76–93Google Scholar
  28. 28.
    Rastogi R, Shim K (2002) Mining optimized association rules with categorical and numeric attributes. IEEE Trans Knowl Data Eng 14(1):29–50CrossRefGoogle Scholar
  29. 29.
    Romero C, Luna JM, Romero JR, Ventura S (2011) Rm-tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576CrossRefGoogle Scholar
  30. 30.
    Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86CrossRefGoogle Scholar
  31. 31.
    Sánchez D, Serrano JM, Cerda L, Vila MA (2008) Association rules applied to credit card fraud detection. Expert Syst Appl 36:3630–3640CrossRefGoogle Scholar
  32. 32.
    Schuster A, Wolff R, Trock D (2004) A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7(4):458–475CrossRefGoogle Scholar
  33. 33.
    Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence, ICTAI ’07, Patras, Greece, pp 305–312Google Scholar
  34. 34.
    Szathmary L, Valtchev P, Napoli A (2010) Generating rare association rules using the minimal rare itemsets family. Int J Softw Inf 4(3):219–238Google Scholar
  35. 35.
    Tan P, Kumar V Interestingness measures for association patterns: a perspective. In: Proceedings of the workshop on postprocessing in machine learning and data mining, KDD ’00, New York, USAGoogle Scholar
  36. 36.
    Tung AKH, Lu H, Han J, Feng L (2003) Efficient mining of intertransaction association rules. IEEE Trans Knowl Data Eng 15(1):43–56. http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1161581 Google Scholar
  37. 37.
    Ventura S, Romero C, Zafra A, Delgado JA, Hervs C (2008) Jclec: a java framework for evolutionary computation. Soft Comput 12(4):381–392CrossRefGoogle Scholar
  38. 38.
    Yun U, Ryu KH (2011) Approximate weighted frequent pattern mining with/without noisy environments. Knowl Based Syst 24(1):73–82CrossRefGoogle Scholar
  39. 39.
    Zhang C, Zhang S (2002) Association rule mining: models and algorithms. Springer, BerlinCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Department of Computer Science and Numerical AnalysisUniversity of CordobaCordobaSpain

Personalised recommendations