Advertisement

Balancing the Analysis of Frequent Patterns

  • Arnaud Giacometti
  • Dominique H. Li
  • Arnaud Soulet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8443)

Abstract

A main challenge in pattern mining is to focus the discovery on high-quality patterns. One popular solution is to compute a numerical score on how well each discovered pattern describes the data. The best rating patterns are then the most analyzed by the data expert. In this paper, we evaluate the quality of discovered patterns by anticipating of how user analyzes them. We show that the examination of frequent patterns with the notion of support led to an unbalanced analysis of the dataset. Certain transactions are indeed completely ignored. Hence, we propose the notion of balanced support that weights the transactions to let each of them receive user specified attention. We also develop an algorithm Absolute for calculating these weights leading to evaluate the quality of patterns. Our experiments on frequent itemsets validate its effectiveness and show the relevance of the balanced support.

Keywords

Pattern mining stochastic model interestingness measure 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Freitas, A.A.: Are we really discovering “interesting” knowledge from data. Expert Update (the BCS-SGAI Magazine) 9, 41–47 (2006)MathSciNetGoogle Scholar
  2. 2.
    McGarry, K.: A survey of interestingness measures for knowledge discovery. Knowledge Eng. Review 20(1), 39–61 (2005)CrossRefGoogle Scholar
  3. 3.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Comput. Surv. 38(3) (2006)Google Scholar
  4. 4.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499. Morgan Kaufmann (1994)Google Scholar
  5. 5.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)CrossRefGoogle Scholar
  6. 6.
    Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: EMNLP, pp. 404–411. ACL (2004)Google Scholar
  7. 7.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997)CrossRefGoogle Scholar
  8. 8.
    Calders, T., Paredaens, J.: Axiomatization of frequent itemsets. Theor. Comput. Sci. 290(1), 669–693 (2003)CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics 22, 49–86 (1951)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Calders, T., Rigotti, C., Boulicaut, J.F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining. LNCS (LNAI), vol. 3848, pp. 64–80. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. TKDD 1(3) (2007)Google Scholar
  12. 12.
    Omiecinski, E.: Alternative interest measures for mining associations in databases. IEEE Trans. Knowl. Data Eng. 15(1), 57–69 (2003)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Tatti, N.: Probably the best itemsets. In: Rao, B., Krishnapuram, B., Tomkins, A., Yang, Q. (eds.) KDD, pp. 293–302. ACM (2010)Google Scholar
  14. 14.
    Webb, G.I.: Self-sufficient itemsets: An approach to screening potentially interesting associations between items. TKDD 4(1) (2010)Google Scholar
  15. 15.
    Mampaey, M., Vreeken, J., Tatti, N.: Summarizing data succinctly with the most informative itemsets. TKDD 6(4), 16 (2012)CrossRefGoogle Scholar
  16. 16.
    Bringmann, B., Zimmermann, A.: The chosen few: On identifying valuable patterns. In: ICDM, pp. 63–72. IEEE Computer Society (2007)Google Scholar
  17. 17.
    Fürnkranz, J., Knobbe, A.: Guest editorial: Global modeling using local patterns. Data Min. Knowl. Discov. 21(1), 1–8 (2010)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: Methodology and application. J. Artif. Intell. Res. (JAIR) 17, 501–527 (2002)MATHGoogle Scholar
  19. 19.
    Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer (2010)Google Scholar
  20. 20.
    Liu, H., Motoda, H.: On issues of instance selection. Data Min. Knowl. Discov. 6(2), 115–130 (2002)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Arnaud Giacometti
    • 1
  • Dominique H. Li
    • 1
  • Arnaud Soulet
    • 1
  1. 1.Université François Rabelais Tours, LI EA 6300BloisFrance

Personalised recommendations