An information-theoretic approach to quantitative association rule mining

Ke, Yiping; Cheng, James; Ng, Wilfred

doi:10.1007/s10115-007-0104-4

An information-theoretic approach to quantitative association rule mining

Regular Paper
Published: 18 August 2007

Volume 16, pages 213–244, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yiping Ke¹,
James Cheng¹ &
Wilfred Ng¹

241 Accesses
26 Citations
Explore all metrics

Abstract

Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas the QARs that are not returned by MIC are shown to be less interesting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Issues in Quantitative Association Rule Mining: A Big Data Perspective

Discovering Overlapping Quantitative Associations by Density-Based Mining of Relevant Attributes

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

References

Agrawal R, Imielinski T and Swami A (1993). Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6): 914–925
Article Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the ACM SIGMOD international conference on management of data, Washington DC, May 1993, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca J, Jarke M, Zaniolo C (eds) Proceedings of 20th international conference on very large data bases, Santiago de Chile, Chile, September 1994, pp 487–499
Asuncion A, Newman DJ (2007) UCI machine learning repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science
Aumann Y and Lindell Y (2003). A statistical theory for quantitative association rules. J Intell Inf Syst 20(3): 255–283
Article Google Scholar
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Peckham J (eds) Proceedings of the ACM SIGMOD international conference on management of data, Arizona, May 1997, pp 265–276
Brin S, Rastogi R and Shim K (2003). Mining optimized gain rules for numeric attributes. IEEE Trans Knowl Data Eng 15(2): 324–338
Article Google Scholar
Chen ZY, Liu GH (2005) Quantitative association rules mining methods with privacy-preserving. In: Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies, Dalian, China, December 2005, pp 910–912
Cormen TH, Leiserson CE, Rivest RL and Stein C (2001). Introduction to algorithms, 2nd edn. MIT Press, Cambridge
MATH Google Scholar
Cover TM and Thomas JA (1991). Elements of information theory. Wiley, New York
MATH Google Scholar
Fukuda T, Morimoto Y, Morishita S and Tokuyama T (2001). Data mining with optimized two-dimensional association rules. ACM Trans Database Syst 26(2): 179–213
Article MATH Google Scholar
Furnkranz J (1999). Separate-and-conquer rule learning. Artif Intell Rev 13(1): 3–54
Article Google Scholar
Holt JD and Chung SM (2001). Multipass algorithms for mining association rules in text databases. Knowl Inf Syst 3(2): 168–183
Article MATH Google Scholar
IBM (1993) Quest synthetic data generation code for classification. http://www.almaden.ibm.com/software/projects/iis/hdb/Projects/data_mining/mining.shtml
Jing W, Huang L, Luo Y, Xu W, Yao Y (2006) An algorithm for privacy-preserving quantitative association rules mining. In: Proceedings of the 2nd IEEE International Symposium on Dependable, Autonomic and Secure Computing, Indianapolis, Indiana, USA, September 2006, pp 315–324
Kaya M, Alhajj R (2005) Novel approach to optimize quantitative association rules by employing multi-objective genetic algorithm. In: Ali M, Esposito F (eds) Proceedings of the 18th international conference on innovations in applied artificial intelligence, Bari, Italy, June 2005, pp 560–562
Ke Y, Cheng J, Ng W (2006) MIC framework: an information-theoretic approach to quantitative association rule mining. In: Liu L, Reuter A, Whang KY, Zhang J (eds) Proceedings of the 22nd International Conference on Data Engineering, Atlanta, GA, USA, April 2006, p 112
Ke Y, Cheng J, Ng W (2006) Mining quantitative correlated patterns using an information-theoretic approach. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USA, August 2006, pp 227–236
Mata J, Alvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. In: Cheng MS, Yu PS, Liu B (eds) Proceedings of the sixth Pacific–Asia Conference on Advances in Knowledge Discovery and Data Mining, Taipei, Taiwan, May 2002, pp 40–51
Mata J, Alvarez JL, Riquelme JC (2002) An evolutionary algorithm to discover numeric association rules. In: Proceedings of the ACM SAC symposium on applied computing, Madrid, Spain, March 2002, pp 590–594
Miller RJ, Yang Y (1997) Association rules over interval data. In: Peckham J (eds) Proceedings of the ACM SIGMOD international conference on management of data, Tucson, Arizona, USA, May 1997, pp 452–461
Ordonez C, Ezquerra N and Santana CA (2006). Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3): 1–2
Article Google Scholar
Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley WJ (eds) Knowledge Discovery in Databases, pp 229–248
Rastogi R and Shim K (2002). Mining optimized association rules with categorical and numeric attributes. IEEE Trans Knowl Data Eng 14(1): 29–50
Article Google Scholar
Rückert U, Richter L, Kramer S (2004) Quantitative association rules based on half-spaces: an optimization approach. In: Proceedings of the 4th IEEE international conference on data mining, Brighton, UK, November 2004, pp 507–510
Salleb-Aouissi A, Vrain C, Nortet C (2007) Quantminer: a genetic algorithm for mining quantitative association rules. In: Veloso MM (eds) Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, January 2007, pp 1035–1040
Shannon C (1948) A mathematical theory of communication, i and ii. Bell Syst Tech J 27:379–423, 623–656, July, October
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Jagadish HV, Mumick IS (eds) Proceedings of the ACM SIGMOD international conference on management of data, Montreal, Quebec, Canada, June 1996, pp 1–12
Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. PhD thesis, The University of Texas, Austin
Studholme C, Hawkes DJ and Hill DLG (1999). An overlap invariant entropy measure of 3d medical image alignment. Pattern Recognit 32(1): 71–86
Article Google Scholar
Thabtah FA, Cowling P and Peng Y (2006). Multiple labels associative classification. Knowl Inf Syst 9(1): 109–129
Article Google Scholar
Wang K, Tay SHW, Liu B (1998) Interestingness-based interval merger for numeric association rules. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining, New York City, New York, USA, August 1998, pp 121–128
Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 2001, pp 383–388
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) Proceedings of the Ninth ACM SIGKDD International Conference on knowledge discovery and data mining, Washington, DC, USA, August 2003, pp 326–335
Zhang H, Padmanabhan B, Tuzhilin A (2004) On the discovery of significant statistical quantitative rules. In: Kim W, Kohavi R, Gehrke J, DuMouchel W (eds) Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, Washington, USA, August 2004, pp 374–383
Zhang Z, Lu Y, Zhang B (1997) An effective partitioning-combining algorithm for discovering quantitative association rules. In: Proceedings of the first Pacific–Asia conference on knowledge discovery and data mining, Singapore, April 1997, pp 241–251

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Yiping Ke, James Cheng & Wilfred Ng

Authors

Yiping Ke
View author publications
You can also search for this author in PubMed Google Scholar
James Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Wilfred Ng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiping Ke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ke, Y., Cheng, J. & Ng, W. An information-theoretic approach to quantitative association rule mining. Knowl Inf Syst 16, 213–244 (2008). https://doi.org/10.1007/s10115-007-0104-4

Download citation

Received: 20 September 2006
Revised: 19 June 2007
Accepted: 22 July 2007
Published: 18 August 2007
Issue Date: August 2008
DOI: https://doi.org/10.1007/s10115-007-0104-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An information-theoretic approach to quantitative association rule mining

Abstract

Access this article

Similar content being viewed by others

Issues in Quantitative Association Rule Mining: A Big Data Perspective

Discovering Overlapping Quantitative Associations by Density-Based Mining of Relevant Attributes

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An information-theoretic approach to quantitative association rule mining

Abstract

Access this article

Similar content being viewed by others

Issues in Quantitative Association Rule Mining: A Big Data Perspective

Discovering Overlapping Quantitative Associations by Density-Based Mining of Relevant Attributes

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation