Abstract
Deriving frequent itemsets from databases is an important research issue in data mining. The number of frequent itemsets may be unusually large when a low minimum support threshold is given. As such, the design of a compact representation to compress and describe them is an interesting topic. In the past, most related research on compact representation focused on frequent closed itemsets and frequent maximal itemsets. The former is a lossless compact technology that can totally recover all frequent itemsets and their frequencies. Contrarily, the latter may lose some information regarding frequent itemsets, because it reserves frequent itemsets only and is unable to identify their frequency. In this paper, we propose a new compact representation that lies between closed itemsets and maximal itemsets. It can reserve all frequent itemsets and identify their approximate frequency. In addition, an efficient algorithm that corresponds to this new concept is designed to find related key information in databases. Finally, a series of experiments are conducted to show the effectiveness of compact representation and the performance of the proposed algorithm.
Similar content being viewed by others
References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 207–216
Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 85–93
Bellman R (1958) On a routing problem. Q Appl Math 16:87–90
Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. int j data min knowl discov 7(1):5–22
Calders T, Goethals B (2007) Non-derivable itemset mining. Int J Data Min Knowl Discov 14(1):171–206
Chandola V, Kumar V (2007) Summarization–compressing data into an informative representation. Int J Data Min Knowl Discov 12(3):355–378
Gallo A, DeBie T, Cristianini N (2007) MINI: mining informative non-redundant itemsets. In: Proceedings of the 11th conference on principles and practice of knowledge discovery in databases, pp 438–445
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12
Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. In: ACM SIGKDD explorations newsletter, pp 58–64
Kontonasios K-N, De Bie T (2012) Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In: Proceedings of the 11th international conference on advances in intelligent data analysis, pp 161–171
Lijffijt J, P P, Puolamäki K (2012) A statistical significance testing approach to mining the most informative set of patterns. Int J Data Min Knowl Discov 28(1):238–263
Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining
Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581
Nori F, Mahmood D, Mohamad HS (2013) A sliding window based algorithm for frequent closed itemset mining over data streams. J Syst Softw 86:615–623
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, pp 398–416
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Int J Knowl Inf Syst 6(5):570–594
Prabha S, Shanmugapriya S, Duraiswamy K (2013) A survey on closed frequent pattern mining. Int J Comput Appl 63(14):47–52
Tatti N, Mampaey M (2010) Using background knowledge to rank itemsets. Int J Data Min Knowl Discov 21(2):293–309
Tran A, Truong T, Le B (2014) Simultaneous mining of frequent closed itemsets and their generators: foundation and algorithm. Eng Appl Artif Intell 36:64–80
van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Int J Data Min Knowl Discov 25(2):208–242
van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. Int J Data Min Knowl Discov Databases 8190:272–287
Wang J, Karypis G (2006) On efficiently summarizing categorical databases. Int J Knowl Inf Syst 9(1):19–37
Webb GI (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. Int J ACM Trans on Knowl Discov Data 4(1):3. doi:10.1145/1644873.1644876
Webb GI, Vreeken J (2014) Efficient discovery of the most interesting associations. Int J ACM Trans Knowledge Discov Data 8(3):15. doi:10.1145/2601433
Xiang Y, Jin R, Fuhry D, Dragan FF (2011) Summarizing transactional databases with overlapped hyperrectangles. Int J Data Min Knowl Discov 23(2):215–251
Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceedings of the 31st international conference on very large data bases, pp 709–720
Acknowledgments
This research was supported by the Ministry of Science and Technology of the Republic of China under contract number MOST 103-2221-E-390-014-MY2.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Huang, JN., Hong, TP. & Chiang, MC. Reference itemsets: useful itemsets to approximate the representation of frequent itemsets. Soft Comput 21, 6143–6157 (2017). https://doi.org/10.1007/s00500-016-2172-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2172-4