Skip to main content

On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets

  • Conference paper
  • First Online:
STACS 2002 (STACS 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2285))

Included in the following conference series:

Abstract

Let A be an m × n binary matrix, t ∈,{1,… ,m} be a threshold, and ε > 0 be a positive parameter. We show that given a family of O(nε) maximal t-frequent column sets for A, it is NP-complete to decide whether A has any further maximal t-frequent sets, or not, even when the number of such additional maximal t-frequent column sets may be exponentially large. In contrast, all minimal t-infrequent sets of columns of A can be enumerated in incremental quasi-polynomial time. The proof of the latter result follows from the inequality α ≤ (m-t+1)β, where α and β are respectively the numbers of all maximal t-frequent and all minimal t-infrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed t-frequent column sets for a given binary matrix.

This research is supported in part by the National Science Foundation (Grant IIS- 0118635), the Office of Naval Research (Grant N00014-92-J-1375), and Grants-in- Aid for Scientific Research of the Ministry of Education, Culture, Sports, Science and Technology of Japan. Visits of the second author to Rutgers University were also supported by DIMACS, the National Science Foundation’s Center for Discrete Mathematics and Theoretical Computer Science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski and A. Swami. Mining associations between sets of items in massive databases. In: Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data, pp. 207–216.

    Google Scholar 

  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, Fast discovery of association rules, In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy eds., Advances in Knowledge Discoveryand Data Mining, 307–328, AAAI Press, Menlo Park, California, 1996.

    Google Scholar 

  3. R. Agrawal and R. Srikant. Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering, 1995, pp.3–14.

    Google Scholar 

  4. R.J. Bayardo, Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data, pp. 85–93.

    Google Scholar 

  5. J. C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions, Information and Computation 123 (1995) 50–63.

    Article  MATH  MathSciNet  Google Scholar 

  6. E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Generating partial and multiple transversals of a hypergraph. In: Proceedings of the 27th International Colloquium on Automata, Languages and Programming (ICALP), (U. Montanari, J.D.P. Rolim and E. Welzl, eds.) Lecture Notes in Computer Science 1853 pp. 588–599, (Springer Verlag, Berlin, Heidelberg, New York, 2000).

    Google Scholar 

  7. E. Boros, V. Gurvich, L. Khachiyan and K. Makino, GeneratingWeighted Transversals of a Hypergraph, DIMACS Technical Report 00-17, Rutgers University, 2000. (http://dimacs.rutgers.edu/TechnicalReports/2000.html)

  8. S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generaliing association rules to correlations. In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 265–276.

    Google Scholar 

  9. S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 255–264.

    Google Scholar 

  10. G. Dong and J. Li. Efficient mining of emerging patterns. In: Proceeding of the 1999 ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, pp. 43–52.

    Google Scholar 

  11. T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems, SIAM Journal on Computing, 24 (1995) 1278–1304.

    Article  MATH  MathSciNet  Google Scholar 

  12. D. Eppstein, Arboricity and bipartite subgraph listing algorithms, Information Processing Letters 51 (1994), pp. 207–211.

    Article  MATH  MathSciNet  Google Scholar 

  13. J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, In: Proceedings of the 2000 ACM-SIGMOD Conference on Management of Data, pp. 1–12.

    Google Scholar 

  14. M. L. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms. J. Algorithms, 21 (1996) 618–628.

    Article  MATH  MathSciNet  Google Scholar 

  15. M. R. Garey and D. S. Johnson, Computers and Intractability, Freeman, New York, 1979.

    Google Scholar 

  16. D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, Data mining, hypergraph transversals and machine learning. In: Proceedings of the 16th ACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (1997) pp. 12–15.

    Google Scholar 

  17. V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions, Discrete Applied Mathematics, 1996–97, issue 1–3, (1999) 363–373.

    Google Scholar 

  18. S. O. Kuznetsov, Interpretation on graphs and complexity characteristics of a search for specifi0c patterns, Nauchn. Tekh. Inf., Ser. 2 (Automatic Document. Math. Linguist.) 23(1), (1989) pp. 23–37.

    Google Scholar 

  19. D. Lin and Z.M. Kedem. Pincer-search: a new algorithm for discovering the maximum frequent set. In: Proceedings of the Sixth European Conference on Extending Database Technology, to appear.

    Google Scholar 

  20. K. Makino and T. Ibaraki, Inner-core and outer-core functions of partially defined Boolean functions, Discrete Applied Mathematics, 1996–97, issue 1–3 (1999), 307–326.

    Google Scholar 

  21. H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations. In: Proceedings of the 2nd International Conference on Knowledge Discoveryand Data Mining, (1996) pp. 189–194.

    Google Scholar 

  22. H. Mannila and H. Toivonen, Levelwise search and borders of theories in knowledge discovery. Series of Publications C C-1997-8, University of Helsinki, Department of Computer Science (1997).

    Google Scholar 

  23. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1 (1997), 259–289.

    Article  Google Scholar 

  24. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Discovering frequent closed itemsets for association rules. Proc. of the 7th ICDT Conference, Jerusalem, Israel, January 10-12, 1999; Lecture Notes in Computer Science, 1540, pp. 398–416, Springer Verlag, 1999.

    Google Scholar 

  25. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Closed Set Based Discovery of Small Covers for Association Rules, Proc. 15emes Journees Bases de Donnees Avancees, BDA, pp. 361–381, 1999.

    Google Scholar 

  26. R. H. Sloan, K. Takata, G. Turan, On frequent sets of Boolean matrices, Annals of Mathematics and Artificial Intelligence 24 (1998) 1–4.

    Article  MathSciNet  Google Scholar 

  27. M.J. Zaki and M. Ogihara, Theoretical foundations of association rules, 3rd SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, June 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boros, E., Gurvich, V., Khachiyan, L., Makino, K. (2002). On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets. In: Alt, H., Ferreira, A. (eds) STACS 2002. STACS 2002. Lecture Notes in Computer Science, vol 2285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45841-7_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-45841-7_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43283-8

  • Online ISBN: 978-3-540-45841-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics