Soft Computing

, Volume 21, Issue 20, pp 6143–6157 | Cite as

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

  • Jheng-Nan Huang
  • Tzung-Pei Hong
  • Ming-Chao Chiang
Methodologies and Application

Abstract

Deriving frequent itemsets from databases is an important research issue in data mining. The number of frequent itemsets may be unusually large when a low minimum support threshold is given. As such, the design of a compact representation to compress and describe them is an interesting topic. In the past, most related research on compact representation focused on frequent closed itemsets and frequent maximal itemsets. The former is a lossless compact technology that can totally recover all frequent itemsets and their frequencies. Contrarily, the latter may lose some information regarding frequent itemsets, because it reserves frequent itemsets only and is unable to identify their frequency. In this paper, we propose a new compact representation that lies between closed itemsets and maximal itemsets. It can reserve all frequent itemsets and identify their approximate frequency. In addition, an efficient algorithm that corresponds to this new concept is designed to find related key information in databases. Finally, a series of experiments are conducted to show the effectiveness of compact representation and the performance of the proposed algorithm.

Keywords

Data mining Frequent itemset Closed itemset Maximal itemset Approximate representation Reference itemset 

References

  1. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 207–216Google Scholar
  2. Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 85–93Google Scholar
  3. Bellman R (1958) On a routing problem. Q Appl Math 16:87–90MathSciNetCrossRefMATHGoogle Scholar
  4. Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. int j data min knowl discov 7(1):5–22MathSciNetCrossRefGoogle Scholar
  5. Calders T, Goethals B (2007) Non-derivable itemset mining. Int J Data Min Knowl Discov 14(1):171–206MathSciNetCrossRefGoogle Scholar
  6. Chandola V, Kumar V (2007) Summarization–compressing data into an informative representation. Int J Data Min Knowl Discov 12(3):355–378Google Scholar
  7. Gallo A, DeBie T, Cristianini N (2007) MINI: mining informative non-redundant itemsets. In: Proceedings of the 11th conference on principles and practice of knowledge discovery in databases, pp 438–445Google Scholar
  8. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12Google Scholar
  9. Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. In: ACM SIGKDD explorations newsletter, pp 58–64Google Scholar
  10. Kontonasios K-N, De Bie T (2012) Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In: Proceedings of the 11th international conference on advances in intelligent data analysis, pp 161–171Google Scholar
  11. Lijffijt J, P P, Puolamäki K (2012) A statistical significance testing approach to mining the most informative set of patterns. Int J Data Min Knowl Discov 28(1):238–263MathSciNetCrossRefMATHGoogle Scholar
  12. Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  13. Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581Google Scholar
  14. Nori F, Mahmood D, Mohamad HS (2013) A sliding window based algorithm for frequent closed itemset mining over data streams. J Syst Softw 86:615–623CrossRefGoogle Scholar
  15. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, pp 398–416Google Scholar
  16. Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Int J Knowl Inf Syst 6(5):570–594CrossRefGoogle Scholar
  17. Prabha S, Shanmugapriya S, Duraiswamy K (2013) A survey on closed frequent pattern mining. Int J Comput Appl 63(14):47–52Google Scholar
  18. Tatti N, Mampaey M (2010) Using background knowledge to rank itemsets. Int J Data Min Knowl Discov 21(2):293–309MathSciNetCrossRefGoogle Scholar
  19. Tran A, Truong T, Le B (2014) Simultaneous mining of frequent closed itemsets and their generators: foundation and algorithm. Eng Appl Artif Intell 36:64–80CrossRefGoogle Scholar
  20. van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Int J Data Min Knowl Discov 25(2):208–242MathSciNetCrossRefGoogle Scholar
  21. van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. Int J Data Min Knowl Discov Databases 8190:272–287Google Scholar
  22. Wang J, Karypis G (2006) On efficiently summarizing categorical databases. Int J Knowl Inf Syst 9(1):19–37CrossRefGoogle Scholar
  23. Webb GI (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. Int J ACM Trans on Knowl Discov Data 4(1):3. doi:10.1145/1644873.1644876 MathSciNetGoogle Scholar
  24. Webb GI, Vreeken J (2014) Efficient discovery of the most interesting associations. Int J ACM Trans Knowledge Discov Data 8(3):15. doi:10.1145/2601433 Google Scholar
  25. Xiang Y, Jin R, Fuhry D, Dragan FF (2011) Summarizing transactional databases with overlapped hyperrectangles. Int J Data Min Knowl Discov 23(2):215–251MathSciNetCrossRefMATHGoogle Scholar
  26. Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceedings of the 31st international conference on very large data bases, pp 709–720Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Jheng-Nan Huang
    • 1
  • Tzung-Pei Hong
    • 1
    • 2
  • Ming-Chao Chiang
    • 1
  1. 1.Department of Computer Science and EngineeringNational Sun Yat-Sen UniversityKaohsiungTaiwan
  2. 2.Department of Computer Science and Information EngineeringNational University of KaohsiungKaohsiungTaiwan

Personalised recommendations