Skip to main content

Top-N Minimization Approach for Indicative Correlation Change Mining

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7376)

Abstract

Given a family of transaction databases, various data mining methods for extracting patterns distinguishing one database from another have been extensively studied. This paper particularly focuses on a problem of finding patterns that are more uncorrelated in one database, called a base, and begin to be correlated to some extent in another database, called a target. The detected patterns are not highly correlated at the target. In spite of less correlatedness at the target, the detected patterns are regarded as indicative based on a fact that they are uncorrelated in the base.

We design our search procedure for those patterns by applying optimization strategy under some constraints. More precisely, the objective is to minimize the correlation of patterns at the base under the constraint using upper bound of correlations at the target and the lower bound for the correlation changes over two databases. As there exist many potential solutions, we apply top N control that attains the bottom N correlation values at the base for all the patterns satisfying the constraint.

As we measure the degree of correlation by k-way mutual information, that is monotonically increasing with respect to item addition, we can design a dynamic pruning method for disregarding useless items under the top N control. This contributes for much reducing the computational cost, in whole search process, needed to calculate correlation values over several items as random variables. As a result, we can present a complete search procedure producing only top N solution patterns from a set of all patterns satisfying the constraint, and show its effectiveness and efficiency through experiments.

Keywords

  • Information-theoretic correlation
  • Correlation change mining
  • Top-N minimization for correlation change

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann (2011)

    Google Scholar 

  2. Taniguchi, T., Haraguchi, M.: Discovery of Hidden Correlations in a Local Transaction Database Based on Differences of Correlations. Engineering Application of Artificial Intelligence 19(4), 419–428 (2006)

    CrossRef  Google Scholar 

  3. Silverstein, C., Brin, S., Motwani, R.: Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery 2(1), 39–68 (1998)

    CrossRef  Google Scholar 

  4. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pp. 265–276 (1997)

    Google Scholar 

  5. Yoshioka, M.: NSContrast: An Exploratory News Article Analysis System that Characterizes the Differences between News Sites. In: Proc. of SIGIR 2009, Workshop on Information Access in a Multilingual World, pp. 25–29 (2009)

    Google Scholar 

  6. Dong, G., Li, J.: Mining Border Descriptions of Emerging Patterns from Dataset Pairs. Knowledge and Information Systems 8(2), 178–202 (2005)

    CrossRef  Google Scholar 

  7. Bay, S.D., Pazzani, M.J.: Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)

    CrossRef  MATH  Google Scholar 

  8. Zhang, X., Pan, F., Wang, W., Nobel, A.: Mining Non-Redundant High Order Correlations in Binary Data. In: Proc. of VLDB, pp. 1178–1188 (2008)

    Google Scholar 

  9. Omiecinski, E.: Alternative Interest Measures for Mining Associations in Databases. IEEE Transactions on Knowledge and Data Engineering 15(1), 57–69 (2003)

    CrossRef  MathSciNet  Google Scholar 

  10. Younes, N.B., Hamrouni, T., Yahia, S.B.: Bridging Conjunctive and Disjunctive Search Spaces for Mining a New Concise and Exact Representation of Correlated Patterns. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS (LNAI), vol. 6332, pp. 189–204. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  11. Cheng, C., Fu, A., Zhang, Y.: Entropy-Based Subspace Clustering for Mining Numerical Data. In: Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 84–93 (1999)

    Google Scholar 

  12. Novak, P.K., Lavrac, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. Journal of Machine Learning Research 10, 377–403 (2009)

    MATH  Google Scholar 

  13. Ke, Y.P., Cheng, J., Ng, W.: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach. In: Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 227–236 (2006)

    Google Scholar 

  14. Ke, Y.P., Cheng, J., Ng, W.: Correlation Search in Graph Databases. In: Proc. of the 13th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 390–399 (2007)

    Google Scholar 

  15. Sinka, M.P., Corne, D.W.: A Large Benchmark Dataset for Web Document Clustering. In: Soft Computing Systems: Design, Management and Applications. Frontiers in Artificial Intelligence and Applications, vol. 87, pp. 881–890. IOS Press (2002)

    Google Scholar 

  16. Haraguchi, M., Okubo, Y.: Pinpoint Clustering of Web Pages and Mining Implicit Crossover Concepts. In: Usmani, Z.(ed.) Web Intelligence and Intelligent Agents, pp. 391–410. InTech (2010)

    Google Scholar 

  17. Tomita, E., Seki, T.: An Efficient Branch-and-Bound Algorithm for Finding a Maximum Clique with Computational Experiments. Journal of Global Optimization 37(1), 95–111 (2007)

    CrossRef  MathSciNet  MATH  Google Scholar 

  18. Li, J., Dong, G., Ramamohanarao, K.: Making Use of the Most Expressive Jumping Emerging Patterns for Classification. Knowledge and Information Systems 3(2), 131–145 (2001)

    CrossRef  Google Scholar 

  19. Li, A., Haraguchi, M., Okubo, Y.: Contrasting Correlations by an Efficient Double-Clique Condition. In: Perner, P. (ed.) MLDM 2011. LNCS (LNAI), vol. 6871, pp. 469–483. Springer, Heidelberg (2011)

    Google Scholar 

  20. Uno, T., Kiyomi, M., Arimura, H.: LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining. In: Proc. of the 1st Int’l Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 77–86. ACM (2005)

    Google Scholar 

  21. Terlecki, P., Walczak, K.: Efficient Discovery of Top-K Minimal Jumping Emerging Patterns. In: Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) RSCTC 2008. LNCS (LNAI), vol. 5306, pp. 438–447. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, A., Haraguchi, M., Okubo, Y. (2012). Top-N Minimization Approach for Indicative Correlation Change Mining. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31537-4_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31536-7

  • Online ISBN: 978-3-642-31537-4

  • eBook Packages: Computer ScienceComputer Science (R0)