Abstract
Given a family of transaction databases, various data mining methods for extracting patterns distinguishing one database from another have been extensively studied. This paper particularly focuses on a problem of finding patterns that are more uncorrelated in one database, called a base, and begin to be correlated to some extent in another database, called a target. The detected patterns are not highly correlated at the target. In spite of less correlatedness at the target, the detected patterns are regarded as indicative based on a fact that they are uncorrelated in the base.
We design our search procedure for those patterns by applying optimization strategy under some constraints. More precisely, the objective is to minimize the correlation of patterns at the base under the constraint using upper bound of correlations at the target and the lower bound for the correlation changes over two databases. As there exist many potential solutions, we apply top N control that attains the bottom N correlation values at the base for all the patterns satisfying the constraint.
As we measure the degree of correlation by k-way mutual information, that is monotonically increasing with respect to item addition, we can design a dynamic pruning method for disregarding useless items under the top N control. This contributes for much reducing the computational cost, in whole search process, needed to calculate correlation values over several items as random variables. As a result, we can present a complete search procedure producing only top N solution patterns from a set of all patterns satisfying the constraint, and show its effectiveness and efficiency through experiments.
Keywords
- Information-theoretic correlation
- Correlation change mining
- Top-N minimization for correlation change
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann (2011)
Taniguchi, T., Haraguchi, M.: Discovery of Hidden Correlations in a Local Transaction Database Based on Differences of Correlations. Engineering Application of Artificial Intelligence 19(4), 419–428 (2006)
Silverstein, C., Brin, S., Motwani, R.: Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery 2(1), 39–68 (1998)
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pp. 265–276 (1997)
Yoshioka, M.: NSContrast: An Exploratory News Article Analysis System that Characterizes the Differences between News Sites. In: Proc. of SIGIR 2009, Workshop on Information Access in a Multilingual World, pp. 25–29 (2009)
Dong, G., Li, J.: Mining Border Descriptions of Emerging Patterns from Dataset Pairs. Knowledge and Information Systems 8(2), 178–202 (2005)
Bay, S.D., Pazzani, M.J.: Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Zhang, X., Pan, F., Wang, W., Nobel, A.: Mining Non-Redundant High Order Correlations in Binary Data. In: Proc. of VLDB, pp. 1178–1188 (2008)
Omiecinski, E.: Alternative Interest Measures for Mining Associations in Databases. IEEE Transactions on Knowledge and Data Engineering 15(1), 57–69 (2003)
Younes, N.B., Hamrouni, T., Yahia, S.B.: Bridging Conjunctive and Disjunctive Search Spaces for Mining a New Concise and Exact Representation of Correlated Patterns. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS (LNAI), vol. 6332, pp. 189–204. Springer, Heidelberg (2010)
Cheng, C., Fu, A., Zhang, Y.: Entropy-Based Subspace Clustering for Mining Numerical Data. In: Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 84–93 (1999)
Novak, P.K., Lavrac, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. Journal of Machine Learning Research 10, 377–403 (2009)
Ke, Y.P., Cheng, J., Ng, W.: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach. In: Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 227–236 (2006)
Ke, Y.P., Cheng, J., Ng, W.: Correlation Search in Graph Databases. In: Proc. of the 13th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 390–399 (2007)
Sinka, M.P., Corne, D.W.: A Large Benchmark Dataset for Web Document Clustering. In: Soft Computing Systems: Design, Management and Applications. Frontiers in Artificial Intelligence and Applications, vol. 87, pp. 881–890. IOS Press (2002)
Haraguchi, M., Okubo, Y.: Pinpoint Clustering of Web Pages and Mining Implicit Crossover Concepts. In: Usmani, Z.(ed.) Web Intelligence and Intelligent Agents, pp. 391–410. InTech (2010)
Tomita, E., Seki, T.: An Efficient Branch-and-Bound Algorithm for Finding a Maximum Clique with Computational Experiments. Journal of Global Optimization 37(1), 95–111 (2007)
Li, J., Dong, G., Ramamohanarao, K.: Making Use of the Most Expressive Jumping Emerging Patterns for Classification. Knowledge and Information Systems 3(2), 131–145 (2001)
Li, A., Haraguchi, M., Okubo, Y.: Contrasting Correlations by an Efficient Double-Clique Condition. In: Perner, P. (ed.) MLDM 2011. LNCS (LNAI), vol. 6871, pp. 469–483. Springer, Heidelberg (2011)
Uno, T., Kiyomi, M., Arimura, H.: LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining. In: Proc. of the 1st Int’l Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 77–86. ACM (2005)
Terlecki, P., Walczak, K.: Efficient Discovery of Top-K Minimal Jumping Emerging Patterns. In: Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) RSCTC 2008. LNCS (LNAI), vol. 5306, pp. 438–447. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, A., Haraguchi, M., Okubo, Y. (2012). Top-N Minimization Approach for Indicative Correlation Change Mining. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-31537-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)
