MOCA-I: Discovering Rules and Guiding Decision Maker in the Context of Partial Classification in Large and Imbalanced Datasets

  • Julie Jacques
  • Julien Taillard
  • David Delerue
  • Laetitia Jourdan
  • Clarisse Dhaenens
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7997)

Abstract

This paper focuses on the modeling and the implementation as a multi-objective optimization problem of a Pittsburgh classification rule mining algorithm adapted to large and imbalanced datasets, as encountered in hospital data. We associate to this algorithm an original post-processing method based on ROC curve to help the decision maker to choose the most interesting rules. After an introduction to problems brought by hospital data such as class imbalance, volumetry or inconsistency, we present MOCA-I - a Pittsburgh modelization adapted to this kind of problems. We propose its implementation as a dominance-based local search in opposition to existing multi-objective approaches based on genetic algorithms. Then we introduce the post-processing method to sort and filter the obtained classifiers. Our approach is compared to state-of-the-art classification rule mining algorithms, giving as good or better results, using less parameters. Then it is compared to C4.5 and C4.5-CS on hospital data with a larger set of attributes, giving the best results.

References

  1. 1.
    Fernández, A., Garciá, S., Luengo, J., Bernadó-Mansilla, E., Herrera, F.: Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans. Evol. Comput. 14(6), 913–941 (2010)CrossRefGoogle Scholar
  2. 2.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  3. 3.
    Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn, pp. 875–886. Springer, New York (2010)Google Scholar
  4. 4.
    Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newsl. 6(1), 40–49 (2004)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 1–32 (2006)CrossRefGoogle Scholar
  6. 6.
    Ohsaki, M., Abe, H., Tsumoto, S., Yokoi, H., Yamaguchi, T.: Evaluation of rule interestingness measures in medical knowledge discovery in databases. Artif. Intell. Med. 41, 177–196 (2007)CrossRefGoogle Scholar
  7. 7.
    Greco, S., Pawlak, Z., Slowiński, R.: Can bayesian confirmation measures be useful for rough set decision rules? Eng. Appl. Artif. Intell. 17(4), 345–361 (2004)CrossRefGoogle Scholar
  8. 8.
    Bayardo, J., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the Fifth ACM SIGKDD, ser. KDD ’99, pp. 145–154 (1999)Google Scholar
  9. 9.
    Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)CrossRefMATHGoogle Scholar
  10. 10.
    Reynolds, A., de la Iglesia, B.: Rule induction for classification using multi-objective genetic programming. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 516–530. Springer, Heidelberg (2007)Google Scholar
  11. 11.
    Bacardit, J., Stout, M., Hirst, J.D., Sastry, K., Llorà, X., Krasnogor, N.: Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: GECCO, pp. 346–353 (2007)Google Scholar
  12. 12.
    Corne, D., Dhaenens, C., Jourdan, L.: Synergies between operations research and data mining: the emerging use of multi-objective approaches. Eur. J. Oper. Res. 221(3), 469–479 (2012)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Srinivasan, S., Ramakrishnan, S.: Evolutionary multi objective optimization for rule mining: a review. Artif. Intell. Rev. 36(3), 205–248 (2011)Google Scholar
  14. 14.
    Coello Coello, C.A, Dhaenens, C., Jourdan, L. (eds.): Advances in Multi-Objective Nature Inspired Computing. SCI, vol. 272. Springer, Heidelberg (2010)Google Scholar
  15. 15.
    Casillas, J., Martínez, P., Benítez, A.: Learning consistent, complete and compact sets of fuzzy rules in conjunctive normal form for regression problems. Soft Comput. (A Fusion of Foundations, Methodologies and Applications) 13, 451–465 (2009)Google Scholar
  16. 16.
    Liefooghe, A., Humeau, J., Mesmoudi, S., Jourdan, L., Talbi, E.-G.: On dominance-based multiobjective local search: design, implementation and experimental analysis on scheduling and traveling salesman problems. J. Heuristics 18, 317–352 (2012)CrossRefGoogle Scholar
  17. 17.
    Liefooghe, A., Jourdan, L., Talbi, E.-G.: A software framework based on a conceptual unified model for evolutionary multiobjective optimization: paradiseo-moeo. Eur. J. Oper. Res. 209(2), 104–112 (2011)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Alcalá-Fdez, J., et al.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. (A Fusion of Foundations, Methodologies and Applications) 13, 307–318 (2009)Google Scholar
  20. 20.
    Plantevit, M., Laurent, A., Laurent, D., Teisseire, M., Choong, Y.W.: Mining multidimensional and multilevel sequential patterns. ACM TKDD 4(1), 1–37 (2010)CrossRefGoogle Scholar
  21. 21.
    Zhang, J., Bala, J.W., Hadjarian, A., Han, B.: Learning to rank cases with classification rules. In: Fürnkranz, J., Hüllermeier, E. (eds.) Preference Learning, pp. 155–177. Springer, Heidelberg (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Julie Jacques
    • 1
    • 2
    • 3
  • Julien Taillard
    • 1
  • David Delerue
    • 1
  • Laetitia Jourdan
    • 2
    • 3
  • Clarisse Dhaenens
    • 2
    • 3
  1. 1.Société ALICANTESeclinFrance
  2. 2.INRIA Lille Nord EuropeVilleneuve d’AscqFrance
  3. 3.LIFLUniversité Lille 1Villeneuve d’Ascq cedexFrance

Personalised recommendations