Hyperclique pattern discovery

Xiong, Hui; Tan, Pang-Ning; Kumar, Vipin

doi:10.1007/s10618-006-0043-9

Hyperclique pattern discovery

Published: 26 May 2006

Volume 13, pages 219–242, (2006)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Hui Xiong¹,
Pang-Ning Tan² &
Vipin Kumar³

642 Accesses
117 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support. Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns. In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support property which can help efficiently eliminate spurious patterns involving items with substantially different support levels. Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures. In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approximate representation of hypercliques

Article 23 April 2016

Mining Correlated Patterns with Multiple Minimum All-Confidence Thresholds

Inferring Knowledge from Concise Representations of Both Frequent and Rare Jaccard Itemsets

Notes

It is available at http://www.almaden.ibm.com/ software/quest/resources.
This is observed on Sun Ultra 10 work station with a 440 MHz CPU and 128 Mbytes of memory.
When computing Pearson's correlation coefficient, the data mean is not subtracted.
Note that the number of items shown in Table 4 for pumsb, pumsb ^* are somewhat different from the numbers reported in Zaki and Hsiao (2002), because we only consider item IDs for which the count is at least one. For example, although the minimum item ID in pumsb is 0 and the maximum item ID is 7116, there are only 2113 distinct item IDs that appear in the data set.
The data set is available at http://trec.nist.gov.

References

Agarwal R, Aggarwal C, Prasad V (2000) A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases
Bayardo R, Agrawal R, Gunopulous D (1999) Constraint-based rule mining in large, dense databases. In: Proceedings of the Int'l Conference on Data Engineering
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data. pp 265–276
Burdick D, Calimlim M, Gehrke J (2001) MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 2001 Int'l Conference on Data Engineering (ICDE)
Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman J, Yang C (2000) Finding interesting associations without support pruning. In: Proceedings of the 2000 Int'l Conference on Data Engineering (ICDE). pp 489–499
Feder T, Motwani R (1995) Clique partitions, graph compression and speeding-up algorithms. Special Issue for the STOC conference. Journal of Computer and System Sciences 51:261–272
Google Scholar
Grahne G, Lakshmanan LVS, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceedings of the Int'l Conference on Data Engineering
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: Data mining, inference, and prediction, Springer
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Omiecinski E (2003) Alternative interest measures for mining associations. IEEE Transactions on Knowledge and Data Engineering 15(1)
Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
Reynolds HT (1977) The analysis of cross-classifications. The Free Press
Rijsbergen CJV (1979) Information retrieval, 2nd edn., Butterworths, London
Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Wang K, He Y, Cheung D, Chin Y (2001) Mining confident rules without support requirement. In: Proceedings of the 2001 ACM International Conference on Information and Knowledge Management (CIKM)
Xiong H, He X, Ding C, Zhang Y, Kumar V, Holbrook S (2005) Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proceedings of the Pacific Symposium on Biocomputing (PSB)
Xiong H, Steinbach M, Tan P-N, Kumpar V (2004) HICAP: Hierarchial clustering with pattern preservation. In: Proceedings of 2004 SIAM Int'l Conference on Data Mining (SDM). pp 279–290
Xiong H, Tan P, Kumar V (2003a) Mining hyperclique patterns with confidence pruning. In: Technical Report 03-006, January, Department of computer science, University of Minnesota, Twin Cities
Xiong H, Tan P, Kumar V (2003b) Mining strong affinity association patterns in data sets with skewed support distribution. In: Proceedings of the 3rd IEEE International Conference on Data Mining. pp 387–394
Yang C, Fayyad UM, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Zaki M, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of 2002 SIAM International Conference on Data Mining
Zipf G (1949) Human behavior and principle of least effort: An introudction to human ecology. Addison Wesley, Cambridge, Massachusetts

Download references

Acknowledgments

This work was partially supported by NSF grant # IIS-0308264, DOE/LLNL W-7045-ENG-48, and by Army High Performance Computing Research Center under the auspices of the Department of the Army, Army Research Laboratory cooperative agreement number DAAD19-01-2-0014. The content of this work does not necessarily reflect the position or policy of the government and no official endorsement should be inferred. Access to computing facilities was provided by the AHPCRC and the Minnesota Supercomputing Institute. Finally, we would like to thank Dr. Mohammed J. Zaki for providing us the CHARM code. Also, we would like to thank Dr. Shashi Shekhar, Dr. Ke Wang, and Michael Steinbach for valuable comments.

Author information

Authors and Affiliations

Department of Management Science & Information Systems, Rutgers University, Ackerson Hall, 180 University Avenue, Newark, NJ, 07102, USA
Hui Xiong
Department of Computer Science & Engineering, Michigan State University, 3115 Engineering Building, East Lansing, MI, 48824-1226, USA
Pang-Ning Tan
Department of Computer Science and Engineering, University of Minnesota, 4-192, EE/CSci Building, Minneapolis, MN, 55455, USA
Vipin Kumar

Authors

Hui Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Pang-Ning Tan
View author publications
You can also search for this author in PubMed Google Scholar
Vipin Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Xiong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, H., Tan, PN. & Kumar, V. Hyperclique pattern discovery. Data Min Knowl Disc 13, 219–242 (2006). https://doi.org/10.1007/s10618-006-0043-9

Download citation

Received: 11 February 2004
Accepted: 06 February 2006
Published: 26 May 2006
Issue Date: September 2006
DOI: https://doi.org/10.1007/s10618-006-0043-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hyperclique pattern discovery

Abstract

Access this article

Similar content being viewed by others

An approximate representation of hypercliques

Mining Correlated Patterns with Multiple Minimum All-Confidence Thresholds

Inferring Knowledge from Concise Representations of Both Frequent and Rare Jaccard Itemsets

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hyperclique pattern discovery

Abstract

Access this article

Similar content being viewed by others

An approximate representation of hypercliques

Mining Correlated Patterns with Multiple Minimum All-Confidence Thresholds

Inferring Knowledge from Concise Representations of Both Frequent and Rare Jaccard Itemsets

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation