Data Mining of Association Rules and the Process of Knowledge Discovery in Databases

Hipp, Jochen; Güntzer, Ulrich; Nakhaeizadeh, Gholamreza

doi:10.1007/3-540-46131-0_2

Jochen Hipp²,
Ulrich Güntzer³ &
Gholamreza Nakhaeizadeh²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2394))

731 Accesses
6 Citations

Abstract

In this paper we deal with association rule mining in the context of a complex, interactive and iterative knowledge discovery process. After a general introduction covering the basics of association rule mining and of the knowledge discovery process in databases we draw the attention to the problematic aspects concerning the integration of both. Actually, we come to the conclusion that with regard to human involvement and interactivity the current situation is far from being satisfying. In our paper we tackle this problem on three sides: First of all there is the algorithmic complexity. Although today’s algorithms efficiently prune the immense search space the achieved run times do not allow true interactivity. Nevertheless we present a rule caching schema that significantly reduces the number of mining runs. This schema helps to gain interactivity even in the presence of extreme run times of the mining algorithms. Second, today the mining data is typically stored in a relational database management system. We present an efficient integration with modern database systems which is one of the key factors in practical mining applications. Third, interesting rules must be picked from the set of generated rules. This might be quite costly because the generated rule sets normally are quite large whereas the percentage of useful rules is typically only a very small fraction. We enhance the traditional association rule mining framework in order to cope with this situation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Adriaans and D. Zantinge. Data Mining. Addison-Wesley, Harlow, England, 1996.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (ACM SIGMOD’ 93), pages 207–216, Washington, USA, May 1993.
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB’ 94), Santiago, Chile, June 1994.
Google Scholar
T. Barth. Guidelines for the data mining process. Technical report, University of Stuttgart, Stuttgart, Germany, 1998. ESPRIT Project Number 22700.
Google Scholar
R. J. Brachman and T. Anand. The process of knowledge discovery in databases: A human centered approach. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 2, pages 37–57. AAAI/MIT Press, 1996.
Google Scholar
S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data (ACM SIGMOD’ 7), pages 265–276, 1997.
Google Scholar
S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (ACM SIGMOD’ 97), pages 265–276, 1997.
Google Scholar
C. E. Brodley and P. Smyth. The process of applying machine learning algorithms. In Presented at Workshop on Applying Machine Learning in Practice, 12th International Machine Learning Conference (IMLC 95), Tahoe City, CA, 1995.
Google Scholar
P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth. CRISP-DM 1.0. http://www.crisp-dm.org/, 2000.
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11):27–34, November 1996.
Google Scholar
J. Han, Y. Fu, W. Wang, K. Koperski, and O. Zaiane. DMQL: A data mining query language for relational databases. In Proceedings of the 1996 SIGMOD’96 Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’ 96), Montreal, Canada, June 1996.
Google Scholar
H. Heuser. Lehrbuch der Analysis. B. G. Teubner Verlag, Stuttgart, 8 edition, 1990.
MATH Google Scholar
J. Hipp, U. Güntzer, and U. Grimmer. Integrating association rule mining algorithms with relational database systems. In Proceedings of the 3rd International Conference on Enterprise Information Systems (ICEIS 2001), pages 130–137, Setúbal, Portugal, July 7–10 2001.
Google Scholar
J. Hipp, U. Güntzer, and G. Nakhaeizadeh. Algorithms for association rule mining-a general survey and comparison. SIGKDD Explorations, 2(1):58–64, July 2000.
Google Scholar
J. Hipp, U. Güntzer, and G. Nakhaeizadeh. Mining association rules: Deriving a superior algorithm by analysing today’s approaches. In Proceedings of the 4th European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’ 00), pages 159–168, Lyon, France, September 13–16 2000.
Google Scholar
J. Hipp and G. Lindner. Analysing warranty claims of automobiles. an application description following the CRISP-DM data mining process. In Proceedings of 5th International Computer Science Conference (ICSC’ 99), pages 31–40, Hong Kong, China, December 13–15 1999.
Google Scholar
J. Hipp, C. Mangold, U. Güntzer, and G. Nakhaeizadeh. Efficient rule retrieval and postponed restrict operations for association rule mining. In Proceedings of the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’02), May 6–8 2002.
Google Scholar
J. Hipp, A. Myka, R. Wirth, and U. Güntzer. A new algorithm for faster mining of generalized association rules. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’ 98), pages 74–82, Nantes, France, Sept. 23–26 1998.
Google Scholar
IBM. Intelligent Miner Handbook, 1999.
Google Scholar
T. Imielinski, A. Virmani, and A. Abdulghani. Data mining: Application programming interface and query language for database mining. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining (KDD’ 96), pages 256–262, Portland, Oregon, USA, August 1996.
Google Scholar
T. Imielinski, A. Virmani, and A. Abdulghani. DMajor-application programming interface for database mining. Data Mining and Knowledge Discovery, 3(4):347–372, December 1999.
Google Scholar
L. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of constrained frequent set queries: 2-var constraints. In 3rd SIGMOD’98 Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), pages 157–168, Seattle, WA, June 1998.
Google Scholar
R. Meo, G. Psaila, and S. Ceri. A new sql-like operator for mining association rules. In Proceedings of the 22nd International Conference on Very Large Databases (VLDB’ 96), Mumbai (Bombay), India, September 1996.
Google Scholar
R. Ng, L. S. Lakshmanan, J. Han, and T. Mah. Exploratory mining via constrained frequent set queries. In Proceedings of the 1999 ACM-SIGMOD International Conference on Management of Data (SIGMOD’ 99), pages 556–558, Philadelphia, PA, USA, June 1999.
Google Scholar
R. Ng, L. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proceedings of 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’ 98), Seattle, Washington, USA, June 1998.
Google Scholar
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st Conference on Very Large Databases (VLDB’ 95), pages 432–444, Zürich, Switzerland, September 1995.
Google Scholar
R. Srikant and R. Agrawal. Mining generalized association rules. In Proceedings of the 21st Conference on Very Large Databases (VLDB’ 95), Zürich, Switzerland, September 1995.
Google Scholar
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of the 1996 ACM SIGMOD Conference on Management of Data, Montreal, Canada, June 1996.
Google Scholar
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proceedings of the 3rd International Conference on KDD and Data Mining (KDD’ 97), Newport Beach, California, August 1997.
Google Scholar
G. J. Williams and Z. Huang. Modelling the kdd process. Technical report, CSIRO Division of Information Technology, GPO Box 664 Canberra ACT 2601 Australia, Februar 1996.
Google Scholar
R. Wirth, M. Borth, and J. Hipp. When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In Proceedings of the PKDD 2001 Workshop on Ubiquitous Data Mining for Mobile and Distributed Environments, pages 56–64, Freiburg, Germany, September 3–7 2001.
Google Scholar
R. Wirth and J. Hipp. CRISP-DM: Towards a standard process modell for data mining. In Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pages 29–39, Manchester, UK, April 2000.
Google Scholar
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proceedings of the 3rd International Conference on KDD and Data Mining (KDD’ 97), Newport Beach, California, August 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

DaimlerChrysler AG, Research & Technology, 89081, Ulm, Germany
Jochen Hipp & Gholamreza Nakhaeizadeh
Wilhelm Schickard-Institute, University of Tübingen, 72076, Tübingen, Germany
Ulrich Güntzer

Authors

Jochen Hipp
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Güntzer
View author publications
You can also search for this author in PubMed Google Scholar
Gholamreza Nakhaeizadeh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Science, August-Bebel-Str. 16-20, 04275, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hipp, J., Güntzer, U., Nakhaeizadeh, G. (2002). Data Mining of Association Rules and the Process of Knowledge Discovery in Databases. In: Perner, P. (eds) Advances in Data Mining. Lecture Notes in Computer Science(), vol 2394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46131-0_2

Download citation

DOI: https://doi.org/10.1007/3-540-46131-0_2
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44116-8
Online ISBN: 978-3-540-46131-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics