Abstract
We consider the problem of finding two-dimensional association rules for categorical attributes. Suppose we have two conditional attributes A and B both of whose domains are categorical, and one binary target attribute whose domain is “positive”, “negative”. We want to split the Cartesian product of domains of A and B into two subsets so that a certain objective function is optimized, i.e., we want to find a good segmentation of the domains of A and B. We consider in this paper the objective function that maximizes the confidence under the constraint of the upper bound of the support size. We first prove that the problem is NP-hard, and then propose an approximation algorithm based on semidefinite programming. In order to evaluate the effectiveness and efficiency of the proposed algorithm, we carry out computational experiments for problem instances generated by real sales data consisting of attributes whose domain size is a few hundreds at maximum. Approximation ratios of the solutions obtained measured by comparing solutions for semidefinite programming relaxation range from 76% to 95%. It is observed that the performance of generated association rules are significantly superior to that of one-dimensional rules.
Research of this paper is partly supported by the Grant-in-Aid for Scientific Research on Priority Areas (A) by the Ministry of Education, Science, Sports and Culture of Japan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, Proc. of the ACM SIGMOD Conference on Management of Data, 207–216, 1995.
S. Arora, D. Karger, and M. Karpinski, Polynomial time approximation schemes for dense instances of NP-hard problems, Proc. 27th ACM Symposium on Theory of Computing, 284–293, 1995.
Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama, Greedily finding a dense subgraph, Proc. of the 5th Scandinavian Workshop on Algorithm Theory (SWAT), LNCS 1097, 136–148, Springer, 1996.
T. Asano, D. Chen, N. Katoh, and T. Tokuyama, Polynomial-time solutions to image segmentation problems, Proc. of 7th ACM/SIAM Symposium on Discrete Algorithms, pp. 104–113, 1996.
U. Feige and M. Seltser, On the densest κ-subgraph problems, Technical Report, Dept. of Applied Mathematics and Computer Science, The Weizmann Institute, September, 1997.
A. Frieze and M. Jerrum, Improved algorithms for Max κ-cut and Max bisection, Algorithmica, 18 (1997), 67–81.
K. Fujisawa, M. Kojima and K. Nakata, SDPA (Semidefinite Programming Algorithm)-User’s Manual-., Tech. Report B-308, Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan, 1998.
K. Fujisawa, M. Kojima and K. Nakata, Exploiting Sparsity in Primal-Dual Interior-Point Methods for Semidefinite Programming, Mathematical. Programming, Vol. 79, pp. 235–253, 1997.
T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Constructing efficient decision trees by using optimized association rules. Proc. of 22nd VLDB Conference, 146–155, 1996.
T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. “Data Mining Using Two-Dimensional Optimized Association Rules: Scheme, Algorithms, and Visualization”, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pages 13–23, June 1996, ACM Press.
M.R. Garey and D.S. Johnson, Computers and Intractability: A guide to the Theory of NP-completeness, Freeman, 1979.
D.S. Johnson, The NP-completeness column: An ongoing guide, Journal of Algorithms, Vol.8 (1984), 438–448.
Y. Hamuro, N. Katoh, Y. Matsuda and K. Yada, Mining Pharmacy Data Helps to Make Profits, Data Mining and Knowledge Discovery. Vol.2, No.4, (1998), pp.391–398.
M. Kearns and Y. Mansour, On the boosting ability of top-down decision tree learning algorithms, Journal of Computer and System Sciences, 58 (1999) 109–128.
M. Kojima, S. Shindoh and S. Hara, Interior-point methods for the monotone semidefinite linear complementarity problems, SIAM Journal on Optimization, Vol. 7, pp. 86–125, 1997.
G. Kortsarz and D. Peleg, On choosing a dense subgraph, Proc. of 34th IEEE Symp. on Foundations of Computer Sci., 692–701, 1993.
M. Kubo and K. Fujisawa, The Hierarchical Building Block Method and the Controlled Intensification and Diversification Scheme — Two New Frameworks of Metaheuristics —, unpublished manuscript, 1999.
S. Mehrotra, On the implementation of a primal-dual interior point method, SIAM Journal on Optimization, Vol 2, pp. 575–601, 1992.
Y. Morimoto, T. Fukuda, S. Morishita and T. Tokuyama, Implementation and evaluation of decision trees with range and region splitting, Constraint, 2(3/4), (1997), 163–189.
Y. Morimoto, T. Fukuda, H. Matsuzawa, K. Yoda and T. Tokuyama, “Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases”, Proceedings of VLDB 98, New York, USA, August 1998.
J.R. Quinlan, Induction of decision trees, Machine Learning, 1 (1986), 81–106.
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.
A. Srivastav and K. Wolf, Finding dense subgraphs with semidefinite programming, Approximation Algorithms for Combinatorial Optimization, LNCS 1444, 181–191, Springer, 1998.
K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing Optimized Rectilinear Regions for Association Rules, Proceedings of Knowledge Discovery and Data Mining 1997 (KDD’ 97), AAAI, Newport Beach, USA, August 1997, AAAI Press.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fujisawa, K., Hamuro, Y., Katoh, N., Tokuyama, T., Yada, K. (1999). Approximation of Optimal Two-Dimensional Association Rules for Categorical Attributes Using Semidefinite Programming. In: Arikawa, S., Furukawa, K. (eds) Discovery Science. DS 1999. Lecture Notes in Computer Science(), vol 1721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46846-3_14
Download citation
DOI: https://doi.org/10.1007/3-540-46846-3_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66713-1
Online ISBN: 978-3-540-46846-2
eBook Packages: Springer Book Archive