Skip to main content

Mining Efficiently Significant Classification Association Rules

  • Chapter
Book cover Data Mining: Foundations and Practice

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

Summary

Classification Rule Mining (CRM) is a well-known Data Mining technique for the extraction of hidden Classification Rules (CRs) from a given database that is coupled with a set of pre-defined classes, the objective being to build a classifier to classify “unseen” data-records. One recent approach to CRM is to employ Association Rule Mining (ARM) techniques to identify the desired CRs, i.e. Classification Association Rule Mining (CARM). Although the advantages of accuracy and efficiency offered by CARM have been established in many papers, one major drawback is the large number of Classification Association Rules (CARs) that may be generated — up to a maximum of “2nn − 1” in the worst case, where n represents the number of data-attributes in a database. However, there are only a limited number, say at most in each class, of CARs that are required to distinguish between classes. The problem addressed in this chapter is how to efficiently identify the such CARs. Having a CAR list that is generated from a given database, based on the well-established “Support-Confidence” framework, a rule weighting scheme is proposed in this chapter, which assigns a score to a CAR that evaluates how significantly this CAR contributes to a single pre-defined class. Consequently a rule mining approach is presented, that addresses the above, that operates in time O(k 2 n 2) in its deterministic fashion, and O(kn) in its randomised fashion, where k represents the number of CARs in each class that are potentially significant to distinguish between classes and k; as opposed to exponential time O(2n) — the time required in score computation to mine all CARs in a “one-by-one” manner. The experimental results show good performance regarding the accuracy of classification when using the proposed rule weighting scheme with a suggested rule ordering mechanism, and evidence that the proposed rule mining approach performs well with respect to the efficiency of computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds): Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD-93, ACM, New York, NY), Washington, DC, United States, May 1993. (pages 207–216)

    Google Scholar 

  2. Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds): Proceedings of the 20th International Conference on Very Large Data Bases (VLDB-94, Morgan Kaufmann, San Francisco, CA), Santiago de Chile, Chile, September 1994. (ISBN 1-55860-153-8, pages 487–499)

    Google Scholar 

  3. Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds): Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97, AAAI, Menlo Park, CA), Newport Beach, California, United States, August 1997. (ISBN 1-57735-027-8, pages 115–118)

    Google Scholar 

  4. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, Department of Information and Computer Science

  5. Bong CH, Narayanan K (2004) An empirical study of feature selection for text categorization based on term weightage. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI-04, IEEE Computer Society), Beijing, China, September 2004. (ISBN 0-7695-2100-2, pages 599–602)

    Google Scholar 

  6. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Haussler D (ed): Proceedings of the fifth ACM Annual Workshop on Computational Learning Theory (COLT-92, ACM, New York, NY), Pittsburgh, Pennsylvania, United States, July 1992. (ISBN 0-89791-497-X, pages 144–152)

    Google Scholar 

  7. Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Peckham J (ed): Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD-97, ACM, New York, NY), Tucson, Arizona, United States, May 1997. (SIGMOD Record 26(2), pages 255–264)

    Google Scholar 

  8. Burdick D, Calimlim M, Gehrke J (2001) MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conference on Data Engineering (ICDE-01, IEEE Computer Society), Heidelberg, Germany, April 2001. (ISBN 0-7695-1001-9, pages 443–452)

    Google Scholar 

  9. Cheung W, Zaïane OR (2003) Incremental mining of frequent patterns without candidate generation or support constrain. In: Seventh International Database Engineering and Applications Symposium (IDEAS-03, IEEE Computer Society), Hong Kong, China, July 2003. (ISBN 0-7695-1981-4, pages 111–116)

    Google Scholar 

  10. Clark P, Boswell R (1991) Rule induction with CN2: Some recent improvements. In: Kodratoff Y (ed): Machine Learning – Proceedings of the Fifth European Working Session on Learning (EWSL-91, Springer, Berlin Heidelberg New York), Porto, Portugal, March 1991. (LNAI 482, ISBN 3-540-53816-X, pages 151–163)

    Google Scholar 

  11. Coenen F, Goulbourne G, Leng P (2001) Computing association rules using partial totals. In: Raedt LD, Siebes A (eds): Principles of Data Mining and Knowledge Discovery – Proceedings of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-01, Springer, Berlin Heidelberg New York), Freiburg, Germany, September 2001. (LNAI 2168, ISBN 3-540-42534-9, pages 54–66)

    Google Scholar 

  12. Coenen F, Leng P (2001) Optimising association rule algorithms using itemset ordering. In: Bramer M, Coenen F, Preece A (eds): Research and Development in Intelligent Systems XVIII – Proceedings of the Twenty-first SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES-01, Springer, Berlin Heidelberg New York), Cambridge, United Kingdom, December 2001. (ISBN 1852335351, pages 53–66)

    Google Scholar 

  13. Coenen F, Leng P (2002) Finding association rules with some very frequent attributes. In: Elomaa T, Mannila H, Toivonen H (eds): Principles of Data Mining and Knowledge Discovery – Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-02, Springer, Berlin Heidelberg New York), Helsinki, Finland, August 2002. (LNAI 2431, ISBN 3-540-44037-2, pages 99–111)

    Google Scholar 

  14. Coenen F (2003) The LUCS-KDD discretised/normalised ARM and CARM data library. http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS-KDD-DN/, Department of Computer Science, The University of Liverpool, UK

  15. Coenen F, Leng P (2004) An evaluation of approaches to classification rule selection. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM-04, IEEE Computer Society), Brighton, UK, November 2004. (ISBN 0-7695-2142-8, pages 359–362)

    Google Scholar 

  16. Coenen F, Leng P, Ahmed S (2004) Data structure for association rule mining: T-trees and p-trees. IEEE Transactions on Knowledge and Data Engineering, Volume 16(6):774–778

    Article  Google Scholar 

  17. Coenen F, Leng P, Goulbourne G (2004) Tree structures for mining association rules. Journal of Data Mining and Knowledge Discovery, Volume 8(1):25–51

    Article  MathSciNet  Google Scholar 

  18. Coenen F, Leng P, Zhang L (2005) Threshold tuning for improved classification association rule mining. In: Ho TB, Cheung D, Liu H (eds): Advances in Knowledge Discovery and Data Mining – Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-05, Springer, Berlin Heidelberg New York), Hanoi, Vietnam, May 2005. (LNAI 3518, ISBN 3-540-26076-5, pages 216–225)

    Google Scholar 

  19. De Bonis A, Ga¸sieniec L, Vaccaro U (2003) Generalized framework for selectors with applications in optimal group testing. In: Baeten JCM, Lenstra JK, Parrow J, Woeginger GJ (eds): Proceedings of the Thirtieth International Colloquium on Automata, Languages and Programming (ICALP-03, Springer, Berlin Heidelberg New York), Eindhoven, The Netherlands, June 30–July 4, 2003. (LNAI 2719, ISBN 3-540-40493-7, pages 81–96)

    Google Scholar 

  20. Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2/3):103–130.

    Article  MATH  Google Scholar 

  21. Dong G, Li J (1999) Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99, ACM, New York, NY), San Diego, CA, United States, August 1999. (pages 43–52)

    Google Scholar 

  22. Dong G, Zhang X, Wong L, Li J (1999) CAEP: Classification by aggregating emerging patterns. In: Arikawa S, Furukawa K (eds): Discovery Science – Proceedings of the Second International Conference Discovery Science (DS-99, Springer, Berlin Heidelberg New York), Tokyo, Japan, December 1999. (LNAI 1721, ISBN 3-540-66713-X, pages 30–42)

    Google Scholar 

  23. Dunham MH (2002) Data mining: Introductory and advanced topics. Prentice-Hall, August 2002. (ISBN 0-13-088892-3)

    Google Scholar 

  24. El-Hajj M, Zaïane OR (2003) Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds): Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03, ACM, New York, NY), Washington, DC, United States, August 2003. (ISBN 1-58113-737-0, pages 109–118)

    Google Scholar 

  25. Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms, Springer, Berlin Heidelberg New York, Germany, 2002. (ISBN 3-540-43331-7)

    MATH  Google Scholar 

  26. Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: Cercone N, Lin TY, Wu X (eds): Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM-01, IEEE Computer Society), San Jose, CA, United Stated, 29 November–2 December 2001. (ISBN 0-7695-1119-8, pages 163–170)

    Google Scholar 

  27. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA (eds): Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD-00, ACM, New York, NY), Dallas, TX, United States, May 2000. (ISBN 1-58113-218-2, pages 1–12)

    Google Scholar 

  28. Han J, Kamber M (2001) Data mining: Concepts and techniques. Morgan Kaufmann, San Francisco, CA, United States, 2001. (ISBN 1-55860-489-8)

    Google Scholar 

  29. Han J, Kamber M (2006) Data mining: Concepts and techniques (Second Edition). Morgan Kaufmann, San Francisco, CA, United States, March 2006. (ISBN 1-55860-901-6)

    Google Scholar 

  30. Hand D, Mannila H, Smyth R (2001) Principles of data mining. MIT, Cambridge, MA, United States, August 2001. (ISBN 0-262-08290-X)

    Google Scholar 

  31. Hidber C (1999) Online association rule mining. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds): Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD-99, ACM, New York, NY), Philadelphia, Pennsylvania, United States, June 1999. (ISBN 1-58113-084-8, pages 145–156)

    Google Scholar 

  32. Holsheimer M, Kersten ML, Mannila H, Toivonen H (1995) A perspective on databases and data mining. In: Fayyad UM, Uthurusamy R (eds): Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95, AAAI Press, Menlo Park, CA), Montreal, Canada, August 1995. (ISBN 0-929280-82-2, pages 150–155)

    Google Scholar 

  33. Houtsma M, Swami A (1995) Set-oriented mining of association rules in relational databases. In: Yu PS, Chen AL (eds): Proceedings of the Eleventh International Conference on Data Engineering (ICDE-95, IEEE Computer Society), Taipei, Taiwan, March 1995. (ISBN 0-8186-6910-1, pages 25–33)

    Google Scholar 

  34. James M (1985) Classification algorithms. Wiley, New York, NY, United States, 1985. (ISBN 0-471-84799-2)

    MATH  Google Scholar 

  35. Lavrač N, Flach P, Zupan B (1999) Rule evaluation measures: A unifying view. In: Dzeroski S, Flach PA (eds): Proceedings of the Ninth International Workshop on Inductive Logic Programming (ILP-99, Springer, Berlin Heidelberg), Bled, Slovenia, June 1999. (LNAI 1634, ISBN 3-540-66109-3, pages 174–185)

    Google Scholar 

  36. Li W, Han J, Pei J (2001) CMAR: Accurate and efficient classification based on multiple class-association rules. In: Cercone N, Lin TY, Wu X (eds): Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM-01, IEEE Computer Society), San Jose, CA, United States, 29 November–2 December 2001. (ISBN 0-7695-1119-8, pages 369–376)

    Google Scholar 

  37. Lin D-I, Kedem ZM (1998) Pincer search: A new algorithm for discovering the maximum frequent set. In: Schek H-J, Saltor F, Ramos I, Alonso G (eds): Advances in Database Technology – Proceedings of the Sixth International Conference on Extending Database Technology (EDBT-98, Springer, Berlin Heidelberg New York), Valencia, Spain, March 1998. (LNAI 1377, ISBN 3-540-64264-1, pages 105–119)

    Google Scholar 

  38. Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds): Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98, AAAI, Menlo Park, CA), New York City, New York, United States, August 1998. (ISBN 1-57735-070-7, pages 80–86)

    Google Scholar 

  39. Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-02, ACM, New York, NY), Edmonton, Alberta, Canada, July 2002. (ISBN 1-58113-567-X, pages 229–238)

    Google Scholar 

  40. Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Fayyad UM, Uthurusamy R (eds): Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop (KDD-94, AAAI, Menlo Park, CA), Seattle, Washington, United States, July 1994. (Technical Report WS-94-03, ISBN 0-929280-73-3, pages 181–192)

    Google Scholar 

  41. Michalski RS (1980) Pattern recognition as rule-guided inductive inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1980. (pages 774–778)

    Google Scholar 

  42. Mirkin B, Mirkin BG (2005) Clustering for data mining: A data recovery approach. Chapman & Hall/CRC Press, April 2005. (ISBN 1584885343)

    Google Scholar 

  43. Park JS, Chen M-S, Yu PS (1995) An effective hash based algorithm for mining association rules. In: Carey MJ, Schneider DA (eds): Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD-95, ACM, New York, NY), San Jose, CA, United States, May 1995. (SIGMOD Record 24(2), pages 175–186)

    Google Scholar 

  44. Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, Rastogi R (eds): 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD-DMKD-01), Dallas, TX, United Stated, May 2000. (pages 21–30)

    Google Scholar 

  45. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Francisco, CA, United States, 1993. (ISBN 1-55860-238-0)

    Google Scholar 

  46. Quinlan JR, Cameron-Jones RM (1993) FOIL: A midterm report. In: Brazdil R (ed): Machine Learning – Proceedings of the 1993 European Conference on Machine Learning (ECML-93, Springer, Berlin Heidelberg New York), Vienna, Austria, April 1993. (LNAI 667, ISBN 3-540-56602-3, pages 3–20)

    Google Scholar 

  47. Roberto J, Bayardo Jr (1998) Efficiently mining long patterns from databases. In: Hass LM, Tiwary A (eds): Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD-98, ACM, New York, NY), Seattle, Washington, United States, June 1998. (ISBN 0-89791-995-5, pages 85–93)

    Google Scholar 

  48. Rymon R (1992) Search through systematic set enumeration. In: Nebel B, Rich C, Swartout WR (eds): Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning (KR-92, Morgan Kaufmann, San Francisco, CA), Cambridge, MA, United States, October 1992. (ISBN 1-55860-262-3, pages 539–550)

    Google Scholar 

  49. Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the twenty-first International Conference on Very Large Data Bases (VLDB-95, Morgan Kaufmann, San Francisco, CA), Zurich, Switzerland, September 1995. (ISBN 1-55860-379-4, pages 432–444)

    Google Scholar 

  50. Toivonen H (1996) Sampling large databases for association rules. In: Vijayaraman TM, Buchmann AP, Mohan C, Sarda NL (eds): Proceedings of the twenty-second International Conference on Very Large Data Bases (VLDB-96, Morgan Kaufmann, San Francisco, CA), Mumbai (Bombay), India, September 1996. (ISBN 1-55860-382-4, pages 134–145)

    Google Scholar 

  51. Wang J, Han J, Pei J (2003) CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In: In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds): Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03, ACM, New York, NY), Washington, DC, United States, August 2003. (ISBN 1-58113-737-0, pages 236–245)

    Google Scholar 

  52. Wang W, Yang J (2005) Mining sequential patterns from large data sets. Springer, Berlin Heidelberg New York, April 2005. (ISBN 0-387-24246-5)

    MATH  Google Scholar 

  53. Yin X, Han J (2003) CPAR: Classification based on predictive association rules. In: Barbará D, Kamath C (eds): Proceedings of the Third SIAM International Conference on Data Mining (SDM-03, SIAM, Philadelphia, PA), San Francisco, CA, United States, May 2003. (ISBN 0-89871-545-8, pages 331–335)

    Google Scholar 

  54. Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds): Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97, AAAI, Menlo Park, CA), Beach, CA, United States, August 1997. (ISBN 1-57735-027-8, pages 283–286)

    Google Scholar 

  55. Zaki MJ, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Grossman RL, Han J, Kumar V, Mannila H, Motwani R (eds): Proceedings of the Second SIAM International Conference on Data Mining (SDM-02, SIAM, Philadelphia, PA), Arlington, VA, United States, April 2002. (ISBN 0-89871-517-2, Part IX No. 1)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wang, Y.J., Xin, Q., Coenen, F. (2008). Mining Efficiently Significant Classification Association Rules. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78488-3_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78487-6

  • Online ISBN: 978-3-540-78488-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics