Skip to main content
Log in

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Deriving frequent itemsets from databases is an important research issue in data mining. The number of frequent itemsets may be unusually large when a low minimum support threshold is given. As such, the design of a compact representation to compress and describe them is an interesting topic. In the past, most related research on compact representation focused on frequent closed itemsets and frequent maximal itemsets. The former is a lossless compact technology that can totally recover all frequent itemsets and their frequencies. Contrarily, the latter may lose some information regarding frequent itemsets, because it reserves frequent itemsets only and is unable to identify their frequency. In this paper, we propose a new compact representation that lies between closed itemsets and maximal itemsets. It can reserve all frequent itemsets and identify their approximate frequency. In addition, an efficient algorithm that corresponds to this new concept is designed to find related key information in databases. Finally, a series of experiments are conducted to show the effectiveness of compact representation and the performance of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  • Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 207–216

  • Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 85–93

  • Bellman R (1958) On a routing problem. Q Appl Math 16:87–90

    Article  MathSciNet  MATH  Google Scholar 

  • Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. int j data min knowl discov 7(1):5–22

    Article  MathSciNet  Google Scholar 

  • Calders T, Goethals B (2007) Non-derivable itemset mining. Int J Data Min Knowl Discov 14(1):171–206

    Article  MathSciNet  Google Scholar 

  • Chandola V, Kumar V (2007) Summarization–compressing data into an informative representation. Int J Data Min Knowl Discov 12(3):355–378

    Google Scholar 

  • Gallo A, DeBie T, Cristianini N (2007) MINI: mining informative non-redundant itemsets. In: Proceedings of the 11th conference on principles and practice of knowledge discovery in databases, pp 438–445

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12

  • Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. In: ACM SIGKDD explorations newsletter, pp 58–64

  • Kontonasios K-N, De Bie T (2012) Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In: Proceedings of the 11th international conference on advances in intelligent data analysis, pp 161–171

  • Lijffijt J, P P, Puolamäki K (2012) A statistical significance testing approach to mining the most informative set of patterns. Int J Data Min Knowl Discov 28(1):238–263

    Article  MathSciNet  MATH  Google Scholar 

  • Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining

  • Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581

  • Nori F, Mahmood D, Mohamad HS (2013) A sliding window based algorithm for frequent closed itemset mining over data streams. J Syst Softw 86:615–623

    Article  Google Scholar 

  • Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, pp 398–416

  • Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Int J Knowl Inf Syst 6(5):570–594

    Article  Google Scholar 

  • Prabha S, Shanmugapriya S, Duraiswamy K (2013) A survey on closed frequent pattern mining. Int J Comput Appl 63(14):47–52

    Google Scholar 

  • Tatti N, Mampaey M (2010) Using background knowledge to rank itemsets. Int J Data Min Knowl Discov 21(2):293–309

    Article  MathSciNet  Google Scholar 

  • Tran A, Truong T, Le B (2014) Simultaneous mining of frequent closed itemsets and their generators: foundation and algorithm. Eng Appl Artif Intell 36:64–80

    Article  Google Scholar 

  • van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Int J Data Min Knowl Discov 25(2):208–242

    Article  MathSciNet  Google Scholar 

  • van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. Int J Data Min Knowl Discov Databases 8190:272–287

    Google Scholar 

  • Wang J, Karypis G (2006) On efficiently summarizing categorical databases. Int J Knowl Inf Syst 9(1):19–37

    Article  Google Scholar 

  • Webb GI (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. Int J ACM Trans on Knowl Discov Data 4(1):3. doi:10.1145/1644873.1644876

    MathSciNet  Google Scholar 

  • Webb GI, Vreeken J (2014) Efficient discovery of the most interesting associations. Int J ACM Trans Knowledge Discov Data 8(3):15. doi:10.1145/2601433

    Google Scholar 

  • Xiang Y, Jin R, Fuhry D, Dragan FF (2011) Summarizing transactional databases with overlapped hyperrectangles. Int J Data Min Knowl Discov 23(2):215–251

    Article  MathSciNet  MATH  Google Scholar 

  • Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceedings of the 31st international conference on very large data bases, pp 709–720

Download references

Acknowledgments

This research was supported by the Ministry of Science and Technology of the Republic of China under contract number MOST 103-2221-E-390-014-MY2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jheng-Nan Huang.

Ethics declarations

Conflict of interest

None.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, JN., Hong, TP. & Chiang, MC. Reference itemsets: useful itemsets to approximate the representation of frequent itemsets. Soft Comput 21, 6143–6157 (2017). https://doi.org/10.1007/s00500-016-2172-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2172-4

Keywords

Navigation