Skip to main content

An Exhaustive Covering Approach to Parameter-Free Mining of Non-redundant Discriminative Itemsets

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9829))

Included in the following conference series:

Abstract

Discriminative pattern mining is a promising extension of frequent pattern mining. This paper proposes an algorithm called ExCover, a shorthand for exhaustive covering, for finding non-redundant discriminative itemsets. ExCover outputs non-redundant patterns where each pattern covers best at least one positive transaction. With no control parameters limiting the search space, ExCover efficiently performs an exhaustive search for best-covering patterns using branch-and-bound pruning. During the search, candidate best-covering patterns are concurrently collected for each positive transaction. Formal discussions and experimental results exhibit that ExCover efficiently finds a more compact set of patterns in comparison with previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In the previous example with the target class \(c=+\), a pattern \(\varvec{x}=\{\mathsf {D}\}\) is excluded, since \(p(\varvec{x}\mid c)=3/5=0.6\) and \(p(\varvec{x}\mid \lnot c)=5/5=1\).

  2. 2.

    Equivalent substitutions are also possible: \(p(c\mid \varvec{x}):=1\), \(p(\lnot \varvec{x}\mid \lnot c):=1\), and so on.

  3. 3.

    In other words, ExCover outputs all patterns having the same best score. We only exclude apparently redundant patterns to avoid the loss of crucial information.

  4. 4.

    Remind that \(\mathcal{D}_c(\varvec{x})\) denotes the set of positive transactions covered by a pattern \(\varvec{x}\).

  5. 5.

    More formally, a visited pattern is a pattern closed on the positives produced at Line 5 in the Grow procedure.

References

  1. Aggarwal, C.C.: Data Mining: The Textbook. Springer, Switzerland (2015)

    Book  MATH  Google Scholar 

  2. Bayardo, R., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Data Min. Knowl. Discov. 4, 217–240 (2000)

    Article  Google Scholar 

  3. Cerf, L., Gay, D., Selmaoui, N., Boulicaut, J.-F.: A parameter-free associative classification method. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 293–304. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Cheng, H., Yan, X., Han, J., Yu, P.S.: Direct discriminative pattern mining for effective classification. In: Proceedings of ICDE 2008, pp. 169–178 (2008)

    Google Scholar 

  5. Domingos, P.: The RISE system: conquering without separating. In: Proceedings of ICTAI 1994, pp. 704–707 (1994)

    Google Scholar 

  6. Dong, G., Bailey, J. (eds.): Contrast Data Mining: Concepts, Algorithms, and Applications. CRC Press, Boca Raton (2012)

    Google Scholar 

  7. Fürnkranz, J., Gamberger, D., Lavrač, N.: Foundations of Rule Learning. Springer, Heidelberg (2012)

    Book  MATH  Google Scholar 

  8. Garriga, G.C., Kralj, P., Lavrač, N.: Closed sets for labeled data. J. Mach. Learn. Res. 9, 559–580 (2008)

    MathSciNet  MATH  Google Scholar 

  9. Grosskreutz, H., Lang, B., Trabold, D.: A relevance criterion for sequential patterns. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 369–384. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  10. Grosskreutz, H., Paurat, D.: Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 533–548. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Guns, T., Nijssen, S., De Raedt, L.: \(k\)-Pattern set mining under constraints. IEEE Trans. Knowl. Data Eng. 25(2), 402–418 (2013)

    Article  Google Scholar 

  12. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD 2000, pp. 1–12 (2000)

    Google Scholar 

  13. Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-\(k\) frequent closed patterns without minimum support. In: Proceedings of ICDM 2002, pp. 211–218 (2002)

    Google Scholar 

  14. Kameya, Y., Asaoka, H.: Depth-first traversal over a mirrored space for non-redundant discriminative itemsets. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 196–208. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Novak, P.K., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10, 377–403 (2009)

    MATH  Google Scholar 

  16. Li, J., Li, H., Wong, L., Pei, J., Dong, G.: Minimum description length principle: generators are preferable to closed patterns. In: Proceedings of AAAI 2006, pp. 409–414 (2006)

    Google Scholar 

  17. Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of ICDM 2001, pp. 369–376 (2001)

    Google Scholar 

  18. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of KDD 1998, pp. 80–86 (1998)

    Google Scholar 

  19. Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proceedings of PODS 2000, pp. 226–236 (2000)

    Google Scholar 

  20. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  21. Rijnbeek, P.R., Kors, J.A.: Finding a short and accurate decision rule in disjunctive normal form by exaustive search. Mach. Learn. 80, 33–62 (2010)

    Article  MathSciNet  Google Scholar 

  22. Soulet, A., Crémilleux, B., Rioult, F.: Condensed representation of emerging patterns. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 127–132. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  23. Thabtah, F.: A review of associative classification mining. Knowl. Eng. Rev. 22(1), 37–65 (2007)

    Article  Google Scholar 

  24. Uno, T., Asai, T., Uchida, Y., Arimura, H.: An efficient algorithm for enumerating closed patterns in transaction databases. In: Proceedings of DS 2004, pp. 16–31 (2004)

    Google Scholar 

  25. Wang, J., Karypis, G.: HARMONY: efficiently mining the best rules for classification. In: Proceedings of SDM 2005, pp. 205–216 (2005)

    Google Scholar 

  26. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Diego (2005)

    MATH  Google Scholar 

  27. Xin, D., Han, J., Yan, X., Cheng, H.: On compressing frequent patterns. Data Knowl. Eng. 60(1), 5–29 (2007)

    Article  MathSciNet  Google Scholar 

  28. Yuan, C., Lim, H., Lu, T.C.: Most relevant explanation in Bayesian networks. J Artif. Intell. Res. 42, 309–352 (2011)

    MathSciNet  MATH  Google Scholar 

  29. Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoshitaka Kameya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kameya, Y. (2016). An Exhaustive Covering Approach to Parameter-Free Mining of Non-redundant Discriminative Itemsets. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2016. Lecture Notes in Computer Science(), vol 9829. Springer, Cham. https://doi.org/10.1007/978-3-319-43946-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43946-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43945-7

  • Online ISBN: 978-3-319-43946-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics