Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Huang, Jheng-Nan; Hong, Tzung-Pei; Chiang, Ming-Chao

doi:10.1007/s00500-016-2172-4

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Methodologies and Application
Published: 20 May 2016

Volume 21, pages 6143–6157, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jheng-Nan Huang¹,
Tzung-Pei Hong^1,2 &
Ming-Chao Chiang¹

199 Accesses
1 Citation
Explore all metrics

Abstract

Deriving frequent itemsets from databases is an important research issue in data mining. The number of frequent itemsets may be unusually large when a low minimum support threshold is given. As such, the design of a compact representation to compress and describe them is an interesting topic. In the past, most related research on compact representation focused on frequent closed itemsets and frequent maximal itemsets. The former is a lossless compact technology that can totally recover all frequent itemsets and their frequencies. Contrarily, the latter may lose some information regarding frequent itemsets, because it reserves frequent itemsets only and is unable to identify their frequency. In this paper, we propose a new compact representation that lies between closed itemsets and maximal itemsets. It can reserve all frequent itemsets and identify their approximate frequency. In addition, an efficient algorithm that corresponds to this new concept is designed to find related key information in databases. Finally, a series of experiments are conducted to show the effectiveness of compact representation and the performance of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

A comprehensive survey of data mining

Article 06 February 2020

Manoj Kumar Gupta & Pravin Chandra

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Article Open access 22 February 2023

U. H. W. A. Hewage, R. Sinha & M. Asif Naeem

References

Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 207–216
Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 85–93
Bellman R (1958) On a routing problem. Q Appl Math 16:87–90
Article MathSciNet MATH Google Scholar
Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. int j data min knowl discov 7(1):5–22
Article MathSciNet Google Scholar
Calders T, Goethals B (2007) Non-derivable itemset mining. Int J Data Min Knowl Discov 14(1):171–206
Article MathSciNet Google Scholar
Chandola V, Kumar V (2007) Summarization–compressing data into an informative representation. Int J Data Min Knowl Discov 12(3):355–378
Google Scholar
Gallo A, DeBie T, Cristianini N (2007) MINI: mining informative non-redundant itemsets. In: Proceedings of the 11th conference on principles and practice of knowledge discovery in databases, pp 438–445
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12
Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. In: ACM SIGKDD explorations newsletter, pp 58–64
Kontonasios K-N, De Bie T (2012) Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In: Proceedings of the 11th international conference on advances in intelligent data analysis, pp 161–171
Lijffijt J, P P, Puolamäki K (2012) A statistical significance testing approach to mining the most informative set of patterns. Int J Data Min Knowl Discov 28(1):238–263
Article MathSciNet MATH Google Scholar
Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining
Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581
Nori F, Mahmood D, Mohamad HS (2013) A sliding window based algorithm for frequent closed itemset mining over data streams. J Syst Softw 86:615–623
Article Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, pp 398–416
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Int J Knowl Inf Syst 6(5):570–594
Article Google Scholar
Prabha S, Shanmugapriya S, Duraiswamy K (2013) A survey on closed frequent pattern mining. Int J Comput Appl 63(14):47–52
Google Scholar
Tatti N, Mampaey M (2010) Using background knowledge to rank itemsets. Int J Data Min Knowl Discov 21(2):293–309
Article MathSciNet Google Scholar
Tran A, Truong T, Le B (2014) Simultaneous mining of frequent closed itemsets and their generators: foundation and algorithm. Eng Appl Artif Intell 36:64–80
Article Google Scholar
van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Int J Data Min Knowl Discov 25(2):208–242
Article MathSciNet Google Scholar
van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. Int J Data Min Knowl Discov Databases 8190:272–287
Google Scholar
Wang J, Karypis G (2006) On efficiently summarizing categorical databases. Int J Knowl Inf Syst 9(1):19–37
Article Google Scholar
Webb GI (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. Int J ACM Trans on Knowl Discov Data 4(1):3. doi:10.1145/1644873.1644876
MathSciNet Google Scholar
Webb GI, Vreeken J (2014) Efficient discovery of the most interesting associations. Int J ACM Trans Knowledge Discov Data 8(3):15. doi:10.1145/2601433
Google Scholar
Xiang Y, Jin R, Fuhry D, Dragan FF (2011) Summarizing transactional databases with overlapped hyperrectangles. Int J Data Min Knowl Discov 23(2):215–251
Article MathSciNet MATH Google Scholar
Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceedings of the 31st international conference on very large data bases, pp 709–720

Download references

Acknowledgments

This research was supported by the Ministry of Science and Technology of the Republic of China under contract number MOST 103-2221-E-390-014-MY2.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan
Jheng-Nan Huang, Tzung-Pei Hong & Ming-Chao Chiang
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong

Authors

Jheng-Nan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Chao Chiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jheng-Nan Huang.

Ethics declarations

Conflict of interest

None.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, JN., Hong, TP. & Chiang, MC. Reference itemsets: useful itemsets to approximate the representation of frequent itemsets. Soft Comput 21, 6143–6157 (2017). https://doi.org/10.1007/s00500-016-2172-4

Download citation

Published: 20 May 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00500-016-2172-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

A comprehensive survey of data mining

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

A comprehensive survey of data mining

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation