Frequent Itemset Border Approximation by Dualization

Durand, Nicolas; Quafafou, Mohamed

doi:10.1007/978-3-662-49784-5_2

Nicolas Durand¹⁸ &
Mohamed Quafafou¹⁸

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9670))

443 Accesses
2 Citations

Abstract

The approach FIBAD is introduced with the purpose of computing approximate borders of frequent itemsets by leveraging dualization and computation of approximate minimal transversals of hypergraphs. The distinctiveness of the FIBAD’s theoretical foundations is the approximate dualization where a new function \(\widetilde{f}\) is defined to compute the approximate negative border. From a methodological point of view, the function \(\widetilde{f}\) is implemented by the method AMTHR that consists of a reduction of the hypergraph and a computation of its minimal transversals. For evaluation purposes, we study the sensibility of FIBAD to AMTHR by replacing this latter by two other algorithms that compute approximate minimal transversals. We also compare our approximate dualization-based method with an existing approach that computes directly, without dualization, the approximate borders. The experimental results show that our method outperforms the other methods as it produces borders that have the highest quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Frequent Itemset Border Approximation by Dualization.
2.
Approximate Minimal Transversals by Hypergraph Reduction.
3.
Frequent Itemset Mining Implementations, http://fimi.ua.ac.be/data/.
4.
http://www.cs.kent.edu/~lliu/sourceCode.html.

References

Abreu, R., van Gemund, A.: A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In: Proceedings of the 8th Symposium on Abstraction, Reformulation and Approximation (SARA 2009), Lake Arrowhead, CA, USA, July 2009
Google Scholar
Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 12–19, Seattle, WA, USA, August 2004
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large database. In: ACM SIGMOD International Conference on Management of Data, pp. 207–216, May 1993
Google Scholar
Bailey, J., Manoukian, T., Ramamohanarao, K.: A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pp. 485–488, Melbourne, Florida, USA, November 2003
Google Scholar
Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 85–93, Seattle, June 1998
Google Scholar
Berge, C.: Hypergraphs: Combinatorics of Finite Sets. North Holland Mathematical Library, vol. 45 (1989)
Google Scholar
Boley, M.: On approximating minimum infrequent and maximum frequent sets. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS (LNAI), vol. 4755, pp. 68–77. Springer, Heidelberg (2007)
Chapter Google Scholar
Boulicaut, J.F., Bykowski, A., Rigotti, R.: Free-sets : a condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7(1), 5–22 (2003)
Article MathSciNet Google Scholar
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of International Conference on Data Engineering (ICDE 2001), pp. 443–452, Heidelberg, Germany (2001)
Google Scholar
De Marchi, F., Petit, J.: Zigzag: a new algorithm for mining large inclusion dependencies in database. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pp. 27–34, Melbourne, Florida, USA, November 2003
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 1999), pp. 43–52, San Diego, USA, August 1999
Google Scholar
Dong, G., Li, J.: Mining border descriptions of emerging patterns from datasetpairs. Knowl. Inf. Syst. 8(2), 178–202 (2005)
Article Google Scholar
Ducournau, A., Bretto, A., Rital, S., Laget, B.: A reductive approach to hypergraph clustering: an application to image segmentation. Pattern Recogn. 45(7), 2788–2803 (2012)
Article MATH Google Scholar
Durand, N., Crémilleux, B.: ECCLAT: a new approach of clusters discovery in categorical data. In: Proceedings of the 22nd SGAI International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES 2002), pp. 177–190, Cambridge, UK, December 2002
Google Scholar
Durand, N., Quafafou, M.: Approximation of frequent itemset border by computing approximate minimal hypergraph transversals. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 357–368. Springer, Heidelberg (2014)
Google Scholar
Eiter, T., Gottlob, G.: Hypergraph transversal computation and related problems in logic and AI. In: Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol. 2424, pp. 549–564. Springer, Heidelberg (2002)
Chapter Google Scholar
Flouvat, F., De Marchi, F., Petit, J.M.: A new classification of datasets for frequent itemsets. Intell. Inf. Syst. 34, 1–19 (2010)
Article Google Scholar
Flouvat, F., De Marchi, F., Petit, J.: ABS: adaptive borders search of frequent itemsets. In: Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), Brighton, UK, November 2004
Google Scholar
Fredman, M.L., Khachiyan, L.: On the complexity of dualization of monotone disjunctive normal forms. Algorithms 21(3), 618–628 (1996)
Article MathSciNet MATH Google Scholar
Gouda, K., Zaki, M.J.: GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11, 1–20 (2005)
Article MathSciNet Google Scholar
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)
Article Google Scholar
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15, 55–86 (2007)
Article MathSciNet Google Scholar
Hasan, M., Zaki, M.J.: MUSK: uniform sampling of k maximal patterns. In: SIAM Data Mining Conference (SDM 2009), pp. 650–661, Sparks, Nevada, USA (2009)
Google Scholar
Hébert, C., Bretto, A., Crémilleux, B.: A data mining formalization to improve hypergraph transversal computation. Fundamenta Informaticae, IOS Press 80(4), 415–433 (2007)
MathSciNet MATH Google Scholar
Jin, R., Xiang, Y., Liu, L.: Cartesian contour: a concise representation for a collection of frequent sets. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 417–425, Paris, France, June 2009
Google Scholar
Karonski, M., Palka, Z.: One standard Marczewski-Steinhaus outdistances between hypergraphs. Zastosowania Matematyki Applicationes Mathematicae 16(1), 47–57 (1977)
MathSciNet MATH Google Scholar
Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 7(1), 69–79 (1999)
Article Google Scholar
Kavvadias, D., Stavropoulos, E.: An efficient algorithm for the transversal hypergraph generation. Graph Algorithms Appl. 9(2), 239–264 (2005)
Article MathSciNet MATH Google Scholar
Lin, D.-I., Kedem, Z.M.: Pincer search: a new algorithm for discovering the maximum frequent set. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 105–119. Springer, Heidelberg (1998)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997)
Article Google Scholar
Moens, S., Goethals, B.: Randomly sampling maximal itemsets. In: Proceedings of ACM SIGKDD Workshop on Interactive Data Exploration and Analytics (IDEA 2013), pp. 79–86, Chicago, Illinois, USA (2013)
Google Scholar
Murakami, K., Uno, T.: Efficient algorithms for dualizing large-scale hypergraphs. Discrete Appl. Math. 170, 83–94 (2014)
Article MathSciNet MATH Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46. Elsevier (1999)
Google Scholar
Ramamohanarao, K., Bailey, J., Fan, H.: Efficient mining of contrast patterns and their applications to classification. In: Proceedings of the 3rd International Conference on Intelligent Sensing and Information Processing (ICISIP 2005), pp. 39–47, Bangalore, India, December 2005
Google Scholar
Rioult, F., Zanuttini, B., Crémilleux, B.: Nonredundant generalized rules and their impact in classification. In: Ras, Z.W., Tsay, L.-S. (eds.) Advances in Intelligent Information Systems. SCI, vol. 265, pp. 3–25. Springer, Heidelberg (2010)
Chapter Google Scholar
Ruchkys, D.P., Song, S.W.: A parallel approximation hitting set algorithm for gene expression analysis. In: Proceedings of the 14th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2002), pp. 75–81, Washington, DC, USA, October 2002
Google Scholar
Satoh, K., Uno, T.: Enumerating maximal frequent sets using irredundant dualization. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 256–268. Springer, Heidelberg (2003)
Chapter Google Scholar
Vinterbo, S., Øhrn, A.: Minimal approximate hitting sets and rule templates. Approx. Reason. 25, 123–143 (2000)
Article MathSciNet MATH Google Scholar
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining Itemsets that Compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Article MathSciNet MATH Google Scholar
Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of International Conference on Knowledge Discovery in Databases (KDD 2004), pp. 344–353, Seattle, WA, USA (2004)
Google Scholar
Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining colossal frequent patterns by core pattern fusion. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE 2007), pp. 706–715, Istanbul, Turkey, April 2007
Google Scholar

Download references

Author information

Authors and Affiliations

Aix Marseille Université, CNRS, ENSAM, Université de Toulon, LSIS UMR 7296, 13397, Marseille, France
Nicolas Durand & Mohamed Quafafou

Authors

Nicolas Durand
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Quafafou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Durand .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
LIAS/ISAE-ENSMA, Chasseneuil, France
Ladjel Bellatreche
IBM India Research Lab, New Delhi, India
Mukesh Mohania

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Durand, N., Quafafou, M. (2016). Frequent Itemset Border Approximation by Dualization. In: Hameurlain, A., Küng, J., Wagner, R., Bellatreche, L., Mohania, M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVI. Lecture Notes in Computer Science(), vol 9670. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49784-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-49784-5_2
Published: 18 March 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49783-8
Online ISBN: 978-3-662-49784-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics