Mining Efficiently Significant Classification Association Rules

Wang, Yanbo J.; Xin, Qin; Coenen, Frans

doi:10.1007/978-3-540-78488-3_26

Yanbo J. Wang⁶,
Qin Xin⁷ &
Frans Coenen⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

1221 Accesses
3 Citations

Summary

Classification Rule Mining (CRM) is a well-known Data Mining technique for the extraction of hidden Classification Rules (CRs) from a given database that is coupled with a set of pre-defined classes, the objective being to build a classifier to classify “unseen” data-records. One recent approach to CRM is to employ Association Rule Mining (ARM) techniques to identify the desired CRs, i.e. Classification Association Rule Mining (CARM). Although the advantages of accuracy and efficiency offered by CARM have been established in many papers, one major drawback is the large number of Classification Association Rules (CARs) that may be generated — up to a maximum of “2ⁿ − n − 1” in the worst case, where n represents the number of data-attributes in a database. However, there are only a limited number, say at most k̂ in each class, of CARs that are required to distinguish between classes. The problem addressed in this chapter is how to efficiently identify the k̂ such CARs. Having a CAR list that is generated from a given database, based on the well-established “Support-Confidence” framework, a rule weighting scheme is proposed in this chapter, which assigns a score to a CAR that evaluates how significantly this CAR contributes to a single pre-defined class. Consequently a rule mining approach is presented, that addresses the above, that operates in time O(k ² n ²) in its deterministic fashion, and O(kn) in its randomised fashion, where k represents the number of CARs in each class that are potentially significant to distinguish between classes and k ≥ k̂; as opposed to exponential time O(2ⁿ) — the time required in score computation to mine all k̂ CARs in a “one-by-one” manner. The experimental results show good performance regarding the accuracy of classification when using the proposed rule weighting scheme with a suggested rule ordering mechanism, and evidence that the proposed rule mining approach performs well with respect to the efficiency of computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds): Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD-93, ACM, New York, NY), Washington, DC, United States, May 1993. (pages 207–216)
Google Scholar
Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds): Proceedings of the 20th International Conference on Very Large Data Bases (VLDB-94, Morgan Kaufmann, San Francisco, CA), Santiago de Chile, Chile, September 1994. (ISBN 1-55860-153-8, pages 487–499)
Google Scholar
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds): Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97, AAAI, Menlo Park, CA), Newport Beach, California, United States, August 1997. (ISBN 1-57735-027-8, pages 115–118)
Google Scholar
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, Department of Information and Computer Science
Bong CH, Narayanan K (2004) An empirical study of feature selection for text categorization based on term weightage. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI-04, IEEE Computer Society), Beijing, China, September 2004. (ISBN 0-7695-2100-2, pages 599–602)
Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Haussler D (ed): Proceedings of the fifth ACM Annual Workshop on Computational Learning Theory (COLT-92, ACM, New York, NY), Pittsburgh, Pennsylvania, United States, July 1992. (ISBN 0-89791-497-X, pages 144–152)
Google Scholar
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Peckham J (ed): Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD-97, ACM, New York, NY), Tucson, Arizona, United States, May 1997. (SIGMOD Record 26(2), pages 255–264)
Google Scholar
Burdick D, Calimlim M, Gehrke J (2001) MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conference on Data Engineering (ICDE-01, IEEE Computer Society), Heidelberg, Germany, April 2001. (ISBN 0-7695-1001-9, pages 443–452)
Google Scholar
Cheung W, Zaïane OR (2003) Incremental mining of frequent patterns without candidate generation or support constrain. In: Seventh International Database Engineering and Applications Symposium (IDEAS-03, IEEE Computer Society), Hong Kong, China, July 2003. (ISBN 0-7695-1981-4, pages 111–116)
Google Scholar
Clark P, Boswell R (1991) Rule induction with CN2: Some recent improvements. In: Kodratoff Y (ed): Machine Learning – Proceedings of the Fifth European Working Session on Learning (EWSL-91, Springer, Berlin Heidelberg New York), Porto, Portugal, March 1991. (LNAI 482, ISBN 3-540-53816-X, pages 151–163)
Google Scholar
Coenen F, Goulbourne G, Leng P (2001) Computing association rules using partial totals. In: Raedt LD, Siebes A (eds): Principles of Data Mining and Knowledge Discovery – Proceedings of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-01, Springer, Berlin Heidelberg New York), Freiburg, Germany, September 2001. (LNAI 2168, ISBN 3-540-42534-9, pages 54–66)
Google Scholar
Coenen F, Leng P (2001) Optimising association rule algorithms using itemset ordering. In: Bramer M, Coenen F, Preece A (eds): Research and Development in Intelligent Systems XVIII – Proceedings of the Twenty-first SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES-01, Springer, Berlin Heidelberg New York), Cambridge, United Kingdom, December 2001. (ISBN 1852335351, pages 53–66)
Google Scholar
Coenen F, Leng P (2002) Finding association rules with some very frequent attributes. In: Elomaa T, Mannila H, Toivonen H (eds): Principles of Data Mining and Knowledge Discovery – Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-02, Springer, Berlin Heidelberg New York), Helsinki, Finland, August 2002. (LNAI 2431, ISBN 3-540-44037-2, pages 99–111)
Google Scholar
Coenen F (2003) The LUCS-KDD discretised/normalised ARM and CARM data library. http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS-KDD-DN/, Department of Computer Science, The University of Liverpool, UK
Coenen F, Leng P (2004) An evaluation of approaches to classification rule selection. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM-04, IEEE Computer Society), Brighton, UK, November 2004. (ISBN 0-7695-2142-8, pages 359–362)
Google Scholar
Coenen F, Leng P, Ahmed S (2004) Data structure for association rule mining: T-trees and p-trees. IEEE Transactions on Knowledge and Data Engineering, Volume 16(6):774–778
Article Google Scholar
Coenen F, Leng P, Goulbourne G (2004) Tree structures for mining association rules. Journal of Data Mining and Knowledge Discovery, Volume 8(1):25–51
Article MathSciNet Google Scholar
Coenen F, Leng P, Zhang L (2005) Threshold tuning for improved classification association rule mining. In: Ho TB, Cheung D, Liu H (eds): Advances in Knowledge Discovery and Data Mining – Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-05, Springer, Berlin Heidelberg New York), Hanoi, Vietnam, May 2005. (LNAI 3518, ISBN 3-540-26076-5, pages 216–225)
Google Scholar
De Bonis A, Ga¸sieniec L, Vaccaro U (2003) Generalized framework for selectors with applications in optimal group testing. In: Baeten JCM, Lenstra JK, Parrow J, Woeginger GJ (eds): Proceedings of the Thirtieth International Colloquium on Automata, Languages and Programming (ICALP-03, Springer, Berlin Heidelberg New York), Eindhoven, The Netherlands, June 30–July 4, 2003. (LNAI 2719, ISBN 3-540-40493-7, pages 81–96)
Google Scholar
Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2/3):103–130.
Article MATH Google Scholar
Dong G, Li J (1999) Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99, ACM, New York, NY), San Diego, CA, United States, August 1999. (pages 43–52)
Google Scholar
Dong G, Zhang X, Wong L, Li J (1999) CAEP: Classification by aggregating emerging patterns. In: Arikawa S, Furukawa K (eds): Discovery Science – Proceedings of the Second International Conference Discovery Science (DS-99, Springer, Berlin Heidelberg New York), Tokyo, Japan, December 1999. (LNAI 1721, ISBN 3-540-66713-X, pages 30–42)
Google Scholar
Dunham MH (2002) Data mining: Introductory and advanced topics. Prentice-Hall, August 2002. (ISBN 0-13-088892-3)
Google Scholar
El-Hajj M, Zaïane OR (2003) Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds): Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03, ACM, New York, NY), Washington, DC, United States, August 2003. (ISBN 1-58113-737-0, pages 109–118)
Google Scholar
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms, Springer, Berlin Heidelberg New York, Germany, 2002. (ISBN 3-540-43331-7)
MATH Google Scholar
Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: Cercone N, Lin TY, Wu X (eds): Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM-01, IEEE Computer Society), San Jose, CA, United Stated, 29 November–2 December 2001. (ISBN 0-7695-1119-8, pages 163–170)
Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA (eds): Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD-00, ACM, New York, NY), Dallas, TX, United States, May 2000. (ISBN 1-58113-218-2, pages 1–12)
Google Scholar
Han J, Kamber M (2001) Data mining: Concepts and techniques. Morgan Kaufmann, San Francisco, CA, United States, 2001. (ISBN 1-55860-489-8)
Google Scholar
Han J, Kamber M (2006) Data mining: Concepts and techniques (Second Edition). Morgan Kaufmann, San Francisco, CA, United States, March 2006. (ISBN 1-55860-901-6)
Google Scholar
Hand D, Mannila H, Smyth R (2001) Principles of data mining. MIT, Cambridge, MA, United States, August 2001. (ISBN 0-262-08290-X)
Google Scholar
Hidber C (1999) Online association rule mining. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds): Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD-99, ACM, New York, NY), Philadelphia, Pennsylvania, United States, June 1999. (ISBN 1-58113-084-8, pages 145–156)
Google Scholar
Holsheimer M, Kersten ML, Mannila H, Toivonen H (1995) A perspective on databases and data mining. In: Fayyad UM, Uthurusamy R (eds): Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95, AAAI Press, Menlo Park, CA), Montreal, Canada, August 1995. (ISBN 0-929280-82-2, pages 150–155)
Google Scholar
Houtsma M, Swami A (1995) Set-oriented mining of association rules in relational databases. In: Yu PS, Chen AL (eds): Proceedings of the Eleventh International Conference on Data Engineering (ICDE-95, IEEE Computer Society), Taipei, Taiwan, March 1995. (ISBN 0-8186-6910-1, pages 25–33)
Google Scholar
James M (1985) Classification algorithms. Wiley, New York, NY, United States, 1985. (ISBN 0-471-84799-2)
MATH Google Scholar
Lavrač N, Flach P, Zupan B (1999) Rule evaluation measures: A unifying view. In: Dzeroski S, Flach PA (eds): Proceedings of the Ninth International Workshop on Inductive Logic Programming (ILP-99, Springer, Berlin Heidelberg), Bled, Slovenia, June 1999. (LNAI 1634, ISBN 3-540-66109-3, pages 174–185)
Google Scholar
Li W, Han J, Pei J (2001) CMAR: Accurate and efficient classification based on multiple class-association rules. In: Cercone N, Lin TY, Wu X (eds): Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM-01, IEEE Computer Society), San Jose, CA, United States, 29 November–2 December 2001. (ISBN 0-7695-1119-8, pages 369–376)
Google Scholar
Lin D-I, Kedem ZM (1998) Pincer search: A new algorithm for discovering the maximum frequent set. In: Schek H-J, Saltor F, Ramos I, Alonso G (eds): Advances in Database Technology – Proceedings of the Sixth International Conference on Extending Database Technology (EDBT-98, Springer, Berlin Heidelberg New York), Valencia, Spain, March 1998. (LNAI 1377, ISBN 3-540-64264-1, pages 105–119)
Google Scholar
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds): Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98, AAAI, Menlo Park, CA), New York City, New York, United States, August 1998. (ISBN 1-57735-070-7, pages 80–86)
Google Scholar
Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-02, ACM, New York, NY), Edmonton, Alberta, Canada, July 2002. (ISBN 1-58113-567-X, pages 229–238)
Google Scholar
Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Fayyad UM, Uthurusamy R (eds): Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop (KDD-94, AAAI, Menlo Park, CA), Seattle, Washington, United States, July 1994. (Technical Report WS-94-03, ISBN 0-929280-73-3, pages 181–192)
Google Scholar
Michalski RS (1980) Pattern recognition as rule-guided inductive inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1980. (pages 774–778)
Google Scholar
Mirkin B, Mirkin BG (2005) Clustering for data mining: A data recovery approach. Chapman & Hall/CRC Press, April 2005. (ISBN 1584885343)
Google Scholar
Park JS, Chen M-S, Yu PS (1995) An effective hash based algorithm for mining association rules. In: Carey MJ, Schneider DA (eds): Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD-95, ACM, New York, NY), San Jose, CA, United States, May 1995. (SIGMOD Record 24(2), pages 175–186)
Google Scholar
Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, Rastogi R (eds): 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD-DMKD-01), Dallas, TX, United Stated, May 2000. (pages 21–30)
Google Scholar
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Francisco, CA, United States, 1993. (ISBN 1-55860-238-0)
Google Scholar
Quinlan JR, Cameron-Jones RM (1993) FOIL: A midterm report. In: Brazdil R (ed): Machine Learning – Proceedings of the 1993 European Conference on Machine Learning (ECML-93, Springer, Berlin Heidelberg New York), Vienna, Austria, April 1993. (LNAI 667, ISBN 3-540-56602-3, pages 3–20)
Google Scholar
Roberto J, Bayardo Jr (1998) Efficiently mining long patterns from databases. In: Hass LM, Tiwary A (eds): Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD-98, ACM, New York, NY), Seattle, Washington, United States, June 1998. (ISBN 0-89791-995-5, pages 85–93)
Google Scholar
Rymon R (1992) Search through systematic set enumeration. In: Nebel B, Rich C, Swartout WR (eds): Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning (KR-92, Morgan Kaufmann, San Francisco, CA), Cambridge, MA, United States, October 1992. (ISBN 1-55860-262-3, pages 539–550)
Google Scholar
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the twenty-first International Conference on Very Large Data Bases (VLDB-95, Morgan Kaufmann, San Francisco, CA), Zurich, Switzerland, September 1995. (ISBN 1-55860-379-4, pages 432–444)
Google Scholar
Toivonen H (1996) Sampling large databases for association rules. In: Vijayaraman TM, Buchmann AP, Mohan C, Sarda NL (eds): Proceedings of the twenty-second International Conference on Very Large Data Bases (VLDB-96, Morgan Kaufmann, San Francisco, CA), Mumbai (Bombay), India, September 1996. (ISBN 1-55860-382-4, pages 134–145)
Google Scholar
Wang J, Han J, Pei J (2003) CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In: In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds): Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03, ACM, New York, NY), Washington, DC, United States, August 2003. (ISBN 1-58113-737-0, pages 236–245)
Google Scholar
Wang W, Yang J (2005) Mining sequential patterns from large data sets. Springer, Berlin Heidelberg New York, April 2005. (ISBN 0-387-24246-5)
MATH Google Scholar
Yin X, Han J (2003) CPAR: Classification based on predictive association rules. In: Barbará D, Kamath C (eds): Proceedings of the Third SIAM International Conference on Data Mining (SDM-03, SIAM, Philadelphia, PA), San Francisco, CA, United States, May 2003. (ISBN 0-89871-545-8, pages 331–335)
Google Scholar
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds): Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97, AAAI, Menlo Park, CA), Beach, CA, United States, August 1997. (ISBN 1-57735-027-8, pages 283–286)
Google Scholar
Zaki MJ, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Grossman RL, Han J, Kumar V, Mannila H, Motwani R (eds): Proceedings of the Second SIAM International Conference on Data Mining (SDM-02, SIAM, Philadelphia, PA), Arlington, VA, United States, April 2002. (ISBN 0-89871-517-2, Part IX No. 1)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool, L69 3BX, UK
Yanbo J. Wang & Frans Coenen
Department of Informatics, University of Bergen, P.B.7800, N-5020, Bergen, Norway
Qin Xin

Authors

Yanbo J. Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qin Xin
View author publications
You can also search for this author in PubMed Google Scholar
Frans Coenen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, San Jose State University, San Jose, CA, 95192, USA
Tsau Young Lin
Department of Computer Science and Information Systems, Kennesaw State University, Building 11, Room 3060 1000 Chastain Road, Kennesaw, GA, 30144, USA
Ying Xie
Department of Computer Science, The University at Stony Brook, Stony Brook, New York, 11794-4400, USA
Anita Wasilewska
Institute of Information Science, Academia Sinica, No 128, Academia Road, Section 2 Nankang, Taipei, 11529, Taiwan
Churn-Jung Liau

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, Y.J., Xin, Q., Coenen, F. (2008). Mining Efficiently Significant Classification Association Rules. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-78488-3_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78487-6
Online ISBN: 978-3-540-78488-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics