On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets

Boros, E.; Gurvich, V.; Khachiyan, L.; Makino, K.

doi:10.1007/3-540-45841-7_10

E. Boros⁶,
V. Gurvich⁶,
L. Khachiyan⁷ &
…
K. Makino⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2285))

Included in the following conference series:

Annual Symposium on Theoretical Aspects of Computer Science

604 Accesses
41 Citations

Abstract

Let A be an m × n binary matrix, t ∈,{1,… ,m} be a threshold, and ε > 0 be a positive parameter. We show that given a family of O(n^ε) maximal t-frequent column sets for A, it is NP-complete to decide whether A has any further maximal t-frequent sets, or not, even when the number of such additional maximal t-frequent column sets may be exponentially large. In contrast, all minimal t-infrequent sets of columns of A can be enumerated in incremental quasi-polynomial time. The proof of the latter result follows from the inequality α ≤ (m-t+1)β, where α and β are respectively the numbers of all maximal t-frequent and all minimal t-infrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed t-frequent column sets for a given binary matrix.

This research is supported in part by the National Science Foundation (Grant IIS- 0118635), the Office of Naval Research (Grant N00014-92-J-1375), and Grants-in- Aid for Scientific Research of the Ministry of Education, Culture, Sports, Science and Technology of Japan. Visits of the second author to Rutgers University were also supported by DIMACS, the National Science Foundation’s Center for Discrete Mathematics and Theoretical Computer Science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski and A. Swami. Mining associations between sets of items in massive databases. In: Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data, pp. 207–216.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, Fast discovery of association rules, In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy eds., Advances in Knowledge Discoveryand Data Mining, 307–328, AAAI Press, Menlo Park, California, 1996.
Google Scholar
R. Agrawal and R. Srikant. Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering, 1995, pp.3–14.
Google Scholar
R.J. Bayardo, Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data, pp. 85–93.
Google Scholar
J. C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions, Information and Computation 123 (1995) 50–63.
Article MATH MathSciNet Google Scholar
E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Generating partial and multiple transversals of a hypergraph. In: Proceedings of the 27th International Colloquium on Automata, Languages and Programming (ICALP), (U. Montanari, J.D.P. Rolim and E. Welzl, eds.) Lecture Notes in Computer Science 1853 pp. 588–599, (Springer Verlag, Berlin, Heidelberg, New York, 2000).
Google Scholar
E. Boros, V. Gurvich, L. Khachiyan and K. Makino, GeneratingWeighted Transversals of a Hypergraph, DIMACS Technical Report 00-17, Rutgers University, 2000. (http://dimacs.rutgers.edu/TechnicalReports/2000.html)
S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generaliing association rules to correlations. In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 265–276.
Google Scholar
S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 255–264.
Google Scholar
G. Dong and J. Li. Efficient mining of emerging patterns. In: Proceeding of the 1999 ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, pp. 43–52.
Google Scholar
T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems, SIAM Journal on Computing, 24 (1995) 1278–1304.
Article MATH MathSciNet Google Scholar
D. Eppstein, Arboricity and bipartite subgraph listing algorithms, Information Processing Letters 51 (1994), pp. 207–211.
Article MATH MathSciNet Google Scholar
J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, In: Proceedings of the 2000 ACM-SIGMOD Conference on Management of Data, pp. 1–12.
Google Scholar
M. L. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms. J. Algorithms, 21 (1996) 618–628.
Article MATH MathSciNet Google Scholar
M. R. Garey and D. S. Johnson, Computers and Intractability, Freeman, New York, 1979.
Google Scholar
D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, Data mining, hypergraph transversals and machine learning. In: Proceedings of the 16th ACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (1997) pp. 12–15.
Google Scholar
V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions, Discrete Applied Mathematics, 1996–97, issue 1–3, (1999) 363–373.
Google Scholar
S. O. Kuznetsov, Interpretation on graphs and complexity characteristics of a search for specifi0c patterns, Nauchn. Tekh. Inf., Ser. 2 (Automatic Document. Math. Linguist.) 23(1), (1989) pp. 23–37.
Google Scholar
D. Lin and Z.M. Kedem. Pincer-search: a new algorithm for discovering the maximum frequent set. In: Proceedings of the Sixth European Conference on Extending Database Technology, to appear.
Google Scholar
K. Makino and T. Ibaraki, Inner-core and outer-core functions of partially defined Boolean functions, Discrete Applied Mathematics, 1996–97, issue 1–3 (1999), 307–326.
Google Scholar
H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations. In: Proceedings of the 2nd International Conference on Knowledge Discoveryand Data Mining, (1996) pp. 189–194.
Google Scholar
H. Mannila and H. Toivonen, Levelwise search and borders of theories in knowledge discovery. Series of Publications C C-1997-8, University of Helsinki, Department of Computer Science (1997).
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1 (1997), 259–289.
Article Google Scholar
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Discovering frequent closed itemsets for association rules. Proc. of the 7th ICDT Conference, Jerusalem, Israel, January 10-12, 1999; Lecture Notes in Computer Science, 1540, pp. 398–416, Springer Verlag, 1999.
Google Scholar
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Closed Set Based Discovery of Small Covers for Association Rules, Proc. 15emes Journees Bases de Donnees Avancees, BDA, pp. 361–381, 1999.
Google Scholar
R. H. Sloan, K. Takata, G. Turan, On frequent sets of Boolean matrices, Annals of Mathematics and Artificial Intelligence 24 (1998) 1–4.
Article MathSciNet Google Scholar
M.J. Zaki and M. Ogihara, Theoretical foundations of association rules, 3rd SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, June 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

RUTCOR, Rutgers University, 640 Bartholomew Road, Piscataway, 08854-8003, New Jersey
E. Boros & V. Gurvich
Department of Computer Science, Rutgers University, 110 Frelinghuysen RoadPiscataway, 08854-8019, New Jersey
L. Khachiyan
Division of Systems Science, Graduate School of Engineering Science, OsakaUniversity, 560-8531, Toyonaka, Osaka, Japan
K. Makino

Authors

E. Boros
View author publications
You can also search for this author in PubMed Google Scholar
V. Gurvich
View author publications
You can also search for this author in PubMed Google Scholar
L. Khachiyan
View author publications
You can also search for this author in PubMed Google Scholar
K. Makino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Computer Science, Free University of Berlin, Takustraße 9, 14195, Berlin, Germany
Helmut Alt
CNRS - I3S - INRIA Sophia Antipolis, 2004 Route des Lucioles, 06902, Sophia Antipolis, France
Afonso Ferreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boros, E., Gurvich, V., Khachiyan, L., Makino, K. (2002). On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets. In: Alt, H., Ferreira, A. (eds) STACS 2002. STACS 2002. Lecture Notes in Computer Science, vol 2285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45841-7_10

Download citation

DOI: https://doi.org/10.1007/3-540-45841-7_10
Published: 21 February 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43283-8
Online ISBN: 978-3-540-45841-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics