Frequent Itemset Border Approximation by Dualization

Chapter

Abstract

The approach FIBAD is introduced with the purpose of computing approximate borders of frequent itemsets by leveraging dualization and computation of approximate minimal transversals of hypergraphs. The distinctiveness of the FIBAD’s theoretical foundations is the approximate dualization where a new function \(\widetilde{f}\) is defined to compute the approximate negative border. From a methodological point of view, the function \(\widetilde{f}\) is implemented by the method AMTHR that consists of a reduction of the hypergraph and a computation of its minimal transversals. For evaluation purposes, we study the sensibility of FIBAD to AMTHR by replacing this latter by two other algorithms that compute approximate minimal transversals. We also compare our approximate dualization-based method with an existing approach that computes directly, without dualization, the approximate borders. The experimental results show that our method outperforms the other methods as it produces borders that have the highest quality.

Keywords

Frequent itemsets Borders Hypergraph transversals Dualization Approximation 

References

  1. 1.
    Abreu, R., van Gemund, A.: A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In: Proceedings of the 8th Symposium on Abstraction, Reformulation and Approximation (SARA 2009), Lake Arrowhead, CA, USA, July 2009Google Scholar
  2. 2.
    Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 12–19, Seattle, WA, USA, August 2004Google Scholar
  3. 3.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large database. In: ACM SIGMOD International Conference on Management of Data, pp. 207–216, May 1993Google Scholar
  4. 4.
    Bailey, J., Manoukian, T., Ramamohanarao, K.: A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pp. 485–488, Melbourne, Florida, USA, November 2003Google Scholar
  5. 5.
    Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 85–93, Seattle, June 1998Google Scholar
  6. 6.
    Berge, C.: Hypergraphs: Combinatorics of Finite Sets. North Holland Mathematical Library, vol. 45 (1989)Google Scholar
  7. 7.
    Boley, M.: On approximating minimum infrequent and maximum frequent sets. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS (LNAI), vol. 4755, pp. 68–77. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Boulicaut, J.F., Bykowski, A., Rigotti, R.: Free-sets : a condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7(1), 5–22 (2003)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of International Conference on Data Engineering (ICDE 2001), pp. 443–452, Heidelberg, Germany (2001)Google Scholar
  10. 10.
    De Marchi, F., Petit, J.: Zigzag: a new algorithm for mining large inclusion dependencies in database. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pp. 27–34, Melbourne, Florida, USA, November 2003Google Scholar
  11. 11.
    Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 1999), pp. 43–52, San Diego, USA, August 1999Google Scholar
  12. 12.
    Dong, G., Li, J.: Mining border descriptions of emerging patterns from datasetpairs. Knowl. Inf. Syst. 8(2), 178–202 (2005)CrossRefGoogle Scholar
  13. 13.
    Ducournau, A., Bretto, A., Rital, S., Laget, B.: A reductive approach to hypergraph clustering: an application to image segmentation. Pattern Recogn. 45(7), 2788–2803 (2012)CrossRefMATHGoogle Scholar
  14. 14.
    Durand, N., Crémilleux, B.: ECCLAT: a new approach of clusters discovery in categorical data. In: Proceedings of the 22nd SGAI International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES 2002), pp. 177–190, Cambridge, UK, December 2002Google Scholar
  15. 15.
    Durand, N., Quafafou, M.: Approximation of frequent itemset border by computing approximate minimal hypergraph transversals. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 357–368. Springer, Heidelberg (2014)Google Scholar
  16. 16.
    Eiter, T., Gottlob, G.: Hypergraph transversal computation and related problems in logic and AI. In: Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol. 2424, pp. 549–564. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  17. 17.
    Flouvat, F., De Marchi, F., Petit, J.M.: A new classification of datasets for frequent itemsets. Intell. Inf. Syst. 34, 1–19 (2010)CrossRefGoogle Scholar
  18. 18.
    Flouvat, F., De Marchi, F., Petit, J.: ABS: adaptive borders search of frequent itemsets. In: Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), Brighton, UK, November 2004Google Scholar
  19. 19.
    Fredman, M.L., Khachiyan, L.: On the complexity of dualization of monotone disjunctive normal forms. Algorithms 21(3), 618–628 (1996)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Gouda, K., Zaki, M.J.: GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11, 1–20 (2005)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)CrossRefGoogle Scholar
  22. 22.
    Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15, 55–86 (2007)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Hasan, M., Zaki, M.J.: MUSK: uniform sampling of k maximal patterns. In: SIAM Data Mining Conference (SDM 2009), pp. 650–661, Sparks, Nevada, USA (2009)Google Scholar
  24. 24.
    Hébert, C., Bretto, A., Crémilleux, B.: A data mining formalization to improve hypergraph transversal computation. Fundamenta Informaticae, IOS Press 80(4), 415–433 (2007)MathSciNetMATHGoogle Scholar
  25. 25.
    Jin, R., Xiang, Y., Liu, L.: Cartesian contour: a concise representation for a collection of frequent sets. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 417–425, Paris, France, June 2009Google Scholar
  26. 26.
    Karonski, M., Palka, Z.: One standard Marczewski-Steinhaus outdistances between hypergraphs. Zastosowania Matematyki Applicationes Mathematicae 16(1), 47–57 (1977)MathSciNetMATHGoogle Scholar
  27. 27.
    Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 7(1), 69–79 (1999)CrossRefGoogle Scholar
  28. 28.
    Kavvadias, D., Stavropoulos, E.: An efficient algorithm for the transversal hypergraph generation. Graph Algorithms Appl. 9(2), 239–264 (2005)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Lin, D.-I., Kedem, Z.M.: Pincer search: a new algorithm for discovering the maximum frequent set. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 105–119. Springer, Heidelberg (1998)Google Scholar
  30. 30.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997)CrossRefGoogle Scholar
  31. 31.
    Moens, S., Goethals, B.: Randomly sampling maximal itemsets. In: Proceedings of ACM SIGKDD Workshop on Interactive Data Exploration and Analytics (IDEA 2013), pp. 79–86, Chicago, Illinois, USA (2013)Google Scholar
  32. 32.
    Murakami, K., Uno, T.: Efficient algorithms for dualizing large-scale hypergraphs. Discrete Appl. Math. 170, 83–94 (2014)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46. Elsevier (1999)Google Scholar
  34. 34.
    Ramamohanarao, K., Bailey, J., Fan, H.: Efficient mining of contrast patterns and their applications to classification. In: Proceedings of the 3rd International Conference on Intelligent Sensing and Information Processing (ICISIP 2005), pp. 39–47, Bangalore, India, December 2005Google Scholar
  35. 35.
    Rioult, F., Zanuttini, B., Crémilleux, B.: Nonredundant generalized rules and their impact in classification. In: Ras, Z.W., Tsay, L.-S. (eds.) Advances in Intelligent Information Systems. SCI, vol. 265, pp. 3–25. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  36. 36.
    Ruchkys, D.P., Song, S.W.: A parallel approximation hitting set algorithm for gene expression analysis. In: Proceedings of the 14th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2002), pp. 75–81, Washington, DC, USA, October 2002Google Scholar
  37. 37.
    Satoh, K., Uno, T.: Enumerating maximal frequent sets using irredundant dualization. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 256–268. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  38. 38.
    Vinterbo, S., Øhrn, A.: Minimal approximate hitting sets and rule templates. Approx. Reason. 25, 123–143 (2000)MathSciNetCrossRefMATHGoogle Scholar
  39. 39.
    Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining Itemsets that Compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)MathSciNetCrossRefMATHGoogle Scholar
  40. 40.
    Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of International Conference on Knowledge Discovery in Databases (KDD 2004), pp. 344–353, Seattle, WA, USA (2004)Google Scholar
  41. 41.
    Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining colossal frequent patterns by core pattern fusion. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE 2007), pp. 706–715, Istanbul, Turkey, April 2007Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Aix Marseille Université, CNRS, ENSAM, Université de Toulon, LSIS UMR 7296MarseilleFrance

Personalised recommendations