Skip to main content
Log in

Hyperclique pattern discovery

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support. Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns. In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support property which can help efficiently eliminate spurious patterns involving items with substantially different support levels. Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures. In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. It is available at http://www.almaden.ibm.com/ software/quest/resources.

  2. This is observed on Sun Ultra 10 work station with a 440 MHz CPU and 128 Mbytes of memory.

  3. When computing Pearson's correlation coefficient, the data mean is not subtracted.

  4. Note that the number of items shown in Table 4 for pumsb, pumsb * are somewhat different from the numbers reported in Zaki and Hsiao (2002), because we only consider item IDs for which the count is at least one. For example, although the minimum item ID in pumsb is 0 and the maximum item ID is 7116, there are only 2113 distinct item IDs that appear in the data set.

  5. The data set is available at http://trec.nist.gov.

References

  • Agarwal R, Aggarwal C, Prasad V (2000) A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. pp 207–216

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases

  • Bayardo R, Agrawal R, Gunopulous D (1999) Constraint-based rule mining in large, dense databases. In: Proceedings of the Int'l Conference on Data Engineering

  • Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data

  • Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data. pp 265–276

  • Burdick D, Calimlim M, Gehrke J (2001) MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 2001 Int'l Conference on Data Engineering (ICDE)

  • Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman J, Yang C (2000) Finding interesting associations without support pruning. In: Proceedings of the 2000 Int'l Conference on Data Engineering (ICDE). pp 489–499

  • Feder T, Motwani R (1995) Clique partitions, graph compression and speeding-up algorithms. Special Issue for the STOC conference. Journal of Computer and System Sciences 51:261–272

    Google Scholar 

  • Grahne G, Lakshmanan LVS, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceedings of the Int'l Conference on Data Engineering

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data

  • Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: Data mining, inference, and prediction, Springer

  • Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  • Omiecinski E (2003) Alternative interest measures for mining associations. IEEE Transactions on Knowledge and Data Engineering 15(1)

  • Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery

  • Reynolds HT (1977) The analysis of cross-classifications. The Free Press

  • Rijsbergen CJV (1979) Information retrieval, 2nd edn., Butterworths, London

  • Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  • Wang K, He Y, Cheung D, Chin Y (2001) Mining confident rules without support requirement. In: Proceedings of the 2001 ACM International Conference on Information and Knowledge Management (CIKM)

  • Xiong H, He X, Ding C, Zhang Y, Kumar V, Holbrook S (2005) Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proceedings of the Pacific Symposium on Biocomputing (PSB)

  • Xiong H, Steinbach M, Tan P-N, Kumpar V (2004) HICAP: Hierarchial clustering with pattern preservation. In: Proceedings of 2004 SIAM Int'l Conference on Data Mining (SDM). pp 279–290

  • Xiong H, Tan P, Kumar V (2003a) Mining hyperclique patterns with confidence pruning. In: Technical Report 03-006, January, Department of computer science, University of Minnesota, Twin Cities

  • Xiong H, Tan P, Kumar V (2003b) Mining strong affinity association patterns in data sets with skewed support distribution. In: Proceedings of the 3rd IEEE International Conference on Data Mining. pp 387–394

  • Yang C, Fayyad UM, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  • Zaki M, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of 2002 SIAM International Conference on Data Mining

  • Zipf G (1949) Human behavior and principle of least effort: An introudction to human ecology. Addison Wesley, Cambridge, Massachusetts

Download references

Acknowledgments

This work was partially supported by NSF grant # IIS-0308264, DOE/LLNL W-7045-ENG-48, and by Army High Performance Computing Research Center under the auspices of the Department of the Army, Army Research Laboratory cooperative agreement number DAAD19-01-2-0014. The content of this work does not necessarily reflect the position or policy of the government and no official endorsement should be inferred. Access to computing facilities was provided by the AHPCRC and the Minnesota Supercomputing Institute. Finally, we would like to thank Dr. Mohammed J. Zaki for providing us the CHARM code. Also, we would like to thank Dr. Shashi Shekhar, Dr. Ke Wang, and Michael Steinbach for valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Xiong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, H., Tan, PN. & Kumar, V. Hyperclique pattern discovery. Data Min Knowl Disc 13, 219–242 (2006). https://doi.org/10.1007/s10618-006-0043-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-006-0043-9

Keywords

Navigation