Boolean Formulas and Frequent Sets

  • Jouni K. Seppänen
  • Heikki Mannila
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3848)

Abstract

We consider the problem of how one can estimate the support of Boolean queries given a collection of frequent itemsets. We describe an algorithm that truncates the inclusion-exclusion sum to include only the frequencies of known itemsets, give a bound for its performance on disjunctions of attributes that is smaller than the previously known bound, and show that this bound is in fact achievable. We also show how to generalize the algorithm to approximate arbitrary Boolean queries.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AIS93]
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993, pp. 207–216 (1993)Google Scholar
  2. [AMS+96]
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, ch. 12, pp. 307–328. AAAI Press, Menlo Park (1996)Google Scholar
  3. [BBR00]
    Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. [Bol88]
    Bollobás, B.: Combinatorics: set systems, hypergraphs, families of vectors and combinatorial probability, U Cambridge (1988)Google Scholar
  5. [BSH04]
    Bykowski, A., Seppänen, J.K., Hollmén, J.: Model-independent bounding of the supports of Boolean formulae in binary data. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 234–249. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. [CG02]
    Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. [GS96]
    Galambos, J., Simonelli, I.: Bonferroni-type Inequalities with Applications. In: Probability and its Applications. Springer, Heidelberg (1996)Google Scholar
  8. [GZ03]
    Goethals, B., Zaki, M.J. (eds.): Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI 2003), Melbourne, Florida. CEUR-WS, vol. 90 (2003), http://CEUR-WS.org/Vol-90/
  9. [Knu92]
    Knuth, D.E.: Two notes on notation. Am. Math. Monthly 99(5), 403–422 (1992)MATHCrossRefMathSciNetGoogle Scholar
  10. [LN90]
    Linial, N., Nisan, N.: Approximate inclusion-exclusion. Combinatorica 10(4), 349–365 (1990)MATHCrossRefMathSciNetGoogle Scholar
  11. [Man02]
    Mannila, H.: Local and global methods in data mining: Basic techniques and open problems. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 57–68. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. [MT96]
    Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: KDD 1996, Portland, Oregon, pp. 189–194. AAAI Press, Menlo Park (August 1996)Google Scholar
  13. [PMS00]
    Pavlov, D., Mannila, H., Smyth, P.: Probabilistic models for query approximation with large sparse binary datasets. In: UAI (2000)Google Scholar
  14. [PS01]
    Pavlov, D., Smyth, P.: Probabilistic query models for transaction data. In: KDD 2001 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jouni K. Seppänen
    • 1
  • Heikki Mannila
    • 1
  1. 1.HIIT Basic Research Unit, Lab. Computer and Information ScienceHelsinki University of TechnologyFinland

Personalised recommendations