Skip to main content

Frequent Itemset Border Approximation by Dualization

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVI

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9670))

Abstract

The approach FIBAD is introduced with the purpose of computing approximate borders of frequent itemsets by leveraging dualization and computation of approximate minimal transversals of hypergraphs. The distinctiveness of the FIBAD’s theoretical foundations is the approximate dualization where a new function \(\widetilde{f}\) is defined to compute the approximate negative border. From a methodological point of view, the function \(\widetilde{f}\) is implemented by the method AMTHR that consists of a reduction of the hypergraph and a computation of its minimal transversals. For evaluation purposes, we study the sensibility of FIBAD to AMTHR by replacing this latter by two other algorithms that compute approximate minimal transversals. We also compare our approximate dualization-based method with an existing approach that computes directly, without dualization, the approximate borders. The experimental results show that our method outperforms the other methods as it produces borders that have the highest quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Frequent Itemset Border Approximation by Dualization.

  2. 2.

    Approximate Minimal Transversals by Hypergraph Reduction.

  3. 3.

    Frequent Itemset Mining Implementations, http://fimi.ua.ac.be/data/.

  4. 4.

    http://www.cs.kent.edu/~lliu/sourceCode.html.

References

  1. Abreu, R., van Gemund, A.: A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In: Proceedings of the 8th Symposium on Abstraction, Reformulation and Approximation (SARA 2009), Lake Arrowhead, CA, USA, July 2009

    Google Scholar 

  2. Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 12–19, Seattle, WA, USA, August 2004

    Google Scholar 

  3. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large database. In: ACM SIGMOD International Conference on Management of Data, pp. 207–216, May 1993

    Google Scholar 

  4. Bailey, J., Manoukian, T., Ramamohanarao, K.: A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pp. 485–488, Melbourne, Florida, USA, November 2003

    Google Scholar 

  5. Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 85–93, Seattle, June 1998

    Google Scholar 

  6. Berge, C.: Hypergraphs: Combinatorics of Finite Sets. North Holland Mathematical Library, vol. 45 (1989)

    Google Scholar 

  7. Boley, M.: On approximating minimum infrequent and maximum frequent sets. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS (LNAI), vol. 4755, pp. 68–77. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Boulicaut, J.F., Bykowski, A., Rigotti, R.: Free-sets : a condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7(1), 5–22 (2003)

    Article  MathSciNet  Google Scholar 

  9. Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of International Conference on Data Engineering (ICDE 2001), pp. 443–452, Heidelberg, Germany (2001)

    Google Scholar 

  10. De Marchi, F., Petit, J.: Zigzag: a new algorithm for mining large inclusion dependencies in database. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pp. 27–34, Melbourne, Florida, USA, November 2003

    Google Scholar 

  11. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 1999), pp. 43–52, San Diego, USA, August 1999

    Google Scholar 

  12. Dong, G., Li, J.: Mining border descriptions of emerging patterns from datasetpairs. Knowl. Inf. Syst. 8(2), 178–202 (2005)

    Article  Google Scholar 

  13. Ducournau, A., Bretto, A., Rital, S., Laget, B.: A reductive approach to hypergraph clustering: an application to image segmentation. Pattern Recogn. 45(7), 2788–2803 (2012)

    Article  MATH  Google Scholar 

  14. Durand, N., Crémilleux, B.: ECCLAT: a new approach of clusters discovery in categorical data. In: Proceedings of the 22nd SGAI International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES 2002), pp. 177–190, Cambridge, UK, December 2002

    Google Scholar 

  15. Durand, N., Quafafou, M.: Approximation of frequent itemset border by computing approximate minimal hypergraph transversals. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 357–368. Springer, Heidelberg (2014)

    Google Scholar 

  16. Eiter, T., Gottlob, G.: Hypergraph transversal computation and related problems in logic and AI. In: Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol. 2424, pp. 549–564. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Flouvat, F., De Marchi, F., Petit, J.M.: A new classification of datasets for frequent itemsets. Intell. Inf. Syst. 34, 1–19 (2010)

    Article  Google Scholar 

  18. Flouvat, F., De Marchi, F., Petit, J.: ABS: adaptive borders search of frequent itemsets. In: Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), Brighton, UK, November 2004

    Google Scholar 

  19. Fredman, M.L., Khachiyan, L.: On the complexity of dualization of monotone disjunctive normal forms. Algorithms 21(3), 618–628 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  20. Gouda, K., Zaki, M.J.: GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11, 1–20 (2005)

    Article  MathSciNet  Google Scholar 

  21. Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)

    Article  Google Scholar 

  22. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15, 55–86 (2007)

    Article  MathSciNet  Google Scholar 

  23. Hasan, M., Zaki, M.J.: MUSK: uniform sampling of k maximal patterns. In: SIAM Data Mining Conference (SDM 2009), pp. 650–661, Sparks, Nevada, USA (2009)

    Google Scholar 

  24. Hébert, C., Bretto, A., Crémilleux, B.: A data mining formalization to improve hypergraph transversal computation. Fundamenta Informaticae, IOS Press 80(4), 415–433 (2007)

    MathSciNet  MATH  Google Scholar 

  25. Jin, R., Xiang, Y., Liu, L.: Cartesian contour: a concise representation for a collection of frequent sets. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 417–425, Paris, France, June 2009

    Google Scholar 

  26. Karonski, M., Palka, Z.: One standard Marczewski-Steinhaus outdistances between hypergraphs. Zastosowania Matematyki Applicationes Mathematicae 16(1), 47–57 (1977)

    MathSciNet  MATH  Google Scholar 

  27. Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 7(1), 69–79 (1999)

    Article  Google Scholar 

  28. Kavvadias, D., Stavropoulos, E.: An efficient algorithm for the transversal hypergraph generation. Graph Algorithms Appl. 9(2), 239–264 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  29. Lin, D.-I., Kedem, Z.M.: Pincer search: a new algorithm for discovering the maximum frequent set. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 105–119. Springer, Heidelberg (1998)

    Google Scholar 

  30. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997)

    Article  Google Scholar 

  31. Moens, S., Goethals, B.: Randomly sampling maximal itemsets. In: Proceedings of ACM SIGKDD Workshop on Interactive Data Exploration and Analytics (IDEA 2013), pp. 79–86, Chicago, Illinois, USA (2013)

    Google Scholar 

  32. Murakami, K., Uno, T.: Efficient algorithms for dualizing large-scale hypergraphs. Discrete Appl. Math. 170, 83–94 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  33. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46. Elsevier (1999)

    Google Scholar 

  34. Ramamohanarao, K., Bailey, J., Fan, H.: Efficient mining of contrast patterns and their applications to classification. In: Proceedings of the 3rd International Conference on Intelligent Sensing and Information Processing (ICISIP 2005), pp. 39–47, Bangalore, India, December 2005

    Google Scholar 

  35. Rioult, F., Zanuttini, B., Crémilleux, B.: Nonredundant generalized rules and their impact in classification. In: Ras, Z.W., Tsay, L.-S. (eds.) Advances in Intelligent Information Systems. SCI, vol. 265, pp. 3–25. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  36. Ruchkys, D.P., Song, S.W.: A parallel approximation hitting set algorithm for gene expression analysis. In: Proceedings of the 14th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2002), pp. 75–81, Washington, DC, USA, October 2002

    Google Scholar 

  37. Satoh, K., Uno, T.: Enumerating maximal frequent sets using irredundant dualization. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 256–268. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  38. Vinterbo, S., Øhrn, A.: Minimal approximate hitting sets and rule templates. Approx. Reason. 25, 123–143 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  39. Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining Itemsets that Compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  40. Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of International Conference on Knowledge Discovery in Databases (KDD 2004), pp. 344–353, Seattle, WA, USA (2004)

    Google Scholar 

  41. Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining colossal frequent patterns by core pattern fusion. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE 2007), pp. 706–715, Istanbul, Turkey, April 2007

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Durand .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Durand, N., Quafafou, M. (2016). Frequent Itemset Border Approximation by Dualization. In: Hameurlain, A., Küng, J., Wagner, R., Bellatreche, L., Mohania, M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVI. Lecture Notes in Computer Science(), vol 9670. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49784-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49784-5_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49783-8

  • Online ISBN: 978-3-662-49784-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics