Association Rule Discovery with Unbalanced Class Distributions

Gu, Lifang; Li, Jiuyong; He, Hongxing; Williams, Graham; Hawkins, Simon; Kelman, Chris

doi:10.1007/978-3-540-24581-0_19

Lifang Gu⁸,
Jiuyong Li⁹,
Hongxing He⁸,
Graham Williams⁸,
Simon Hawkins⁸ &
…
Chris Kelman¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2903))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1567 Accesses
9 Citations

Abstract

There are many methods for finding association rules in very large data. However it is well known that most general association rule discovery methods find too many rules, many of which are uninteresting rules. Furthermore, the performances of many such algorithms deteriorate when the minimum support is low. They fail to find many interesting rules even when support is low, particularly in the case of significantly unbalanced classes. In this paper we present an algorithm which finds association rules based on a set of new interestingness criteria. The algorithm is applied to a real-world health data set and successfully identifies groups of patients with high risk of adverse reaction to certain drugs. A statistically guided method of selecting appropriate features has also been developed. Initial results have shown that the proposed algorithm can find interesting patterns from data sets with unbalanced class distributions without performance loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the Twentieth International Conference on Very Large Databases, Santiago, Chile, pp. 487–499 (1994)
Google Scholar
Bate, A., Lindquist, M., Edwards, I.R., Olsson, S., Orre, R., Landner, A., De Freitas, R.M.: A bayesian neural network method for adverse drug reaction signal generation. European Journal of Clinical Pharmacology 54, 315–321 (1998)
Article Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings, ACM SIGMOD International Conference on Management of Data: SIGMOD 1997, Tucson, Arizona, USA, May 13–15, vol. 26(2), pp. 255–264. ACM Press, NY (1997)
Chapter Google Scholar
Dhar, V., Tuzhilin, A.: Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering 5(6) (1993)
Google Scholar
DuMouchel, W.: Bayesian data mining in large frequency tables, with an application to the fda spontaneous reporting system. American Statistical Association 53(3), 177–190 (1999)
Google Scholar
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, pp. 13–23. ACM Press, New York (1996)
Chapter Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD 2000), May 2000, pp. 1–12 (2000)
Google Scholar
Li, J., Shen, H., Topor, R.: Mining the smallest association rule set for prediction. In: Proceedings of 2001 IEEE International Conference on Data Mining (ICDM 2001), pp. 361–368. IEEE Computer Society Press, Los Alamitos (2001)
Google Scholar
Li, J., Shen, H., Topor, R.: Mining the optimal class assciation rule set. Knowledge-based Systems 15(7), 399–405 (2002)
Article Google Scholar
Li, J., Shen, H., Topor, R.: Mining informative rule set for prediction. Journal of intelligent information systems (in press)
Google Scholar
Ottervanger, J.P., Valkenburg, H.A., Grobbee, D.E., Stricker, B.H.C.: Differences in perceived and presented adverse drug reactions in general practice. Journal of Clinical Epidemiology 51(9), 795–799 (1998)
Article Google Scholar
Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro, G. (ed.) Knowledge Discovery in Databases, pp. 229–248. AAAI Press/The MIT Press, Menlo Park, California (1991)
Google Scholar
Reid, M., Euerle, B., Bollinger, M.: Angioedema (2002)
Google Scholar
Shenoy, P., Haritsa, J.R., Sudarshan, S., Bhalotia, G., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Dallas, Texas. ACM SIGMOD Record, vol. 29(2), pp. 22–33. ACM Press, New York (1999)
Google Scholar
Webb, G.I.: Efficient search for association rules. In: Proceedinmgs of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), pp. 99–107. ACM Press, New York (2000)
Chapter Google Scholar
Williams, G., Vickers, D., Baxter, R., Hawkins, S., Kelman, C., Solon, R., He, H., Gu, L.: Queensland linked data set. Technical Report CMIS 02/21, CSIRO Mathematical and Information Sciences, Canberra (2002); Report on the development, structure and content of the Queensland Linked Data Set, in collaboration with the Commonwealth Department of Health and Ageing and Queensland Health
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD 1997), p. 283. AAAI Press, Menlo Park (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Lifang Gu, Hongxing He, Graham Williams & Simon Hawkins
Department of Mathematics and Computing, The University of Southern Queensland,
Jiuyong Li
Commonwealth Department of Health and Ageing,
Chris Kelman

Authors

Lifang Gu
View author publications
You can also search for this author in PubMed Google Scholar
Jiuyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongxing He
View author publications
You can also search for this author in PubMed Google Scholar
Graham Williams
View author publications
You can also search for this author in PubMed Google Scholar
Simon Hawkins
View author publications
You can also search for this author in PubMed Google Scholar
Chris Kelman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Australian National University, ACT 0200, Acton, Australia
Tamás (Tom) Domonkos Gedeon
Murdoch University,
Lance Chun Che Fung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gu, L., Li, J., He, H., Williams, G., Hawkins, S., Kelman, C. (2003). Association Rule Discovery with Unbalanced Class Distributions. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-24581-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics