Advertisement

How Your Supporters and Opponents Define Your Interestingness

  • Bruno Crémilleux
  • Arnaud Giacometti
  • Arnaud SouletEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)

Abstract

How can one determine whether a data mining method extracts interesting patterns? The paper deals with this core question in the context of unsupervised problems with binary data. We formalize the quality of a data mining method by identifying patterns – the supporters and opponents – which are related to a pattern extracted by a method. We define a typology offering a global picture of the methods based on two complementary criteria to evaluate and interpret their interests. The quality of a data mining method is quantified via an evaluation complexity analysis based on the number of supporters and opponents of a pattern extracted by the method. We provide an experimental study on the evaluation of the quality of the methods.

Notes

Acknowledgements

The authors thank Albrecht Zimmermann for highly valuable discussions. This work has been partly supported by the QCM-BioChem project (CNRS Mastodons).

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499. Morgan Kaufmann, Burlington (1994)Google Scholar
  2. 2.
    Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7(1), 5–22 (2003)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bringmann, B., Zimmermann, A.: The chosen few: on identifying valuable patterns. In: ICDM, pp. 63–72. Omaha, NE (2007)Google Scholar
  4. 4.
    Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Discov. 14(1), 171–206 (2007)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 64–80. Springer, Heidelberg (2006).  https://doi.org/10.1007/11615576_4CrossRefGoogle Scholar
  6. 6.
    Crémilleux, B., Soulet, A.: Discovering knowledge from local patterns with global constraints. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds.) ICCSA 2008, Part II. LNCS, vol. 5073, pp. 1242–1257. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-69848-7_99CrossRefGoogle Scholar
  7. 7.
    Dheeru, D., Taniskidou, E.K.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
  8. 8.
    Fayyad, U.M., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel: data mining: the next 10 years. ACM SIGKDD Explor. 5(2), 191–196 (2003)CrossRefGoogle Scholar
  9. 9.
    Wai-chee Fu, A., Wang-wai Kwong, R., Tang, J.: Mining N-most interesting itemsets. In: Raś, Z.W., Ohsuga, S. (eds.) ISMIS 2000. LNCS (LNAI), vol. 1932, pp. 59–67. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-39963-1_7CrossRefGoogle Scholar
  10. 10.
    Gallo, A., De Bie, T., Cristianini, N.: MINI: mining informative non-redundant itemsets. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 438–445. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74976-9_44CrossRefGoogle Scholar
  11. 11.
    Garriga, G.C., Kralj, P., Lavrač, N.: Closed sets for labeled data. J. Mach. Learn. Res. 9, 559–580 (2008)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 9 (2006).  https://doi.org/10.1145/1132960.1132963CrossRefGoogle Scholar
  13. 13.
    Giacometti, A., Marcel, P., Soulet, A.: A relational view of pattern discovery. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part I. LNCS, vol. 6587, pp. 153–167. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20149-3_13CrossRefGoogle Scholar
  14. 14.
    Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. TKDD 1(3), 14 (2007)CrossRefGoogle Scholar
  15. 15.
    Hämäläinen, W.: Efficient search for statistically significant dependency rules in binary data. Ph.D. thesis, University of Helsinki (2010)Google Scholar
  16. 16.
    Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)MathSciNetCrossRefGoogle Scholar
  17. 17.
    He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. 2(1), 103–118 (2005)CrossRefGoogle Scholar
  18. 18.
    Lenca, P., Meyer, P., Vaillant, B., Lallich, S.: On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. EJOR 184(2), 610–626 (2008)CrossRefGoogle Scholar
  19. 19.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997)CrossRefGoogle Scholar
  20. 20.
    Morik, K., Boulicaut, J.-F., Siebes, A. (eds.): Local Pattern Detection. LNCS (LNAI), vol. 3539. Springer, Heidelberg (2005).  https://doi.org/10.1007/b137601CrossRefGoogle Scholar
  21. 21.
    Omiecinski, E.: Alternative interest measures for mining associations in databases. IEEE Trans. Knowl. Data Eng. 15(1), 57–69 (2003)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefGoogle Scholar
  23. 23.
    Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Knowledge Discovery in Databases, pp. 229–248. AAAI/MIT Press, Cambridge (1991)Google Scholar
  24. 24.
    Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Inf. Syst. 29(4), 293–313 (2004)CrossRefGoogle Scholar
  25. 25.
    Tew, C.V., Giraud-Carrier, C.G., Tanner, K.W., Burton, S.H.: Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min. Knowl. Discov. 28(4), 1004–1045 (2014)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Vreeken, J., Tatti, N.: Interesting patterns. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 105–134. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07821-2_5CrossRefGoogle Scholar
  27. 27.
    Webb, G.I., Vreeken, J.: Efficient discovery of the most interesting associations. ACM Trans. Knowl. Discov. Data 8(3), 15:1–15:31 (2013)CrossRefGoogle Scholar
  28. 28.
    Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns. In: KDD, pp. 444–453. ACM (2006)Google Scholar
  29. 29.
    Zimmermann, A.: Objectively evaluating condensed representations and interestingness measures for frequent itemset mining. J. Intell. Inf. Syst. 45(3), 299–317 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Bruno Crémilleux
    • 1
  • Arnaud Giacometti
    • 2
  • Arnaud Soulet
    • 2
    Email author
  1. 1.Normandie Univ, UNICAEN, ENSICAEN, CNRS – UMR GREYCCaenFrance
  2. 2.Université de Tours – LIFAT EA 6300BloisFrance

Personalised recommendations