Skip to main content

MOCA-I: Discovering Rules and Guiding Decision Maker in the Context of Partial Classification in Large and Imbalanced Datasets

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7997))

Abstract

This paper focuses on the modeling and the implementation as a multi-objective optimization problem of a Pittsburgh classification rule mining algorithm adapted to large and imbalanced datasets, as encountered in hospital data. We associate to this algorithm an original post-processing method based on ROC curve to help the decision maker to choose the most interesting rules. After an introduction to problems brought by hospital data such as class imbalance, volumetry or inconsistency, we present MOCA-I - a Pittsburgh modelization adapted to this kind of problems. We propose its implementation as a dominance-based local search in opposition to existing multi-objective approaches based on genetic algorithms. Then we introduce the post-processing method to sort and filter the obtained classifiers. Our approach is compared to state-of-the-art classification rule mining algorithms, giving as good or better results, using less parameters. Then it is compared to C4.5 and C4.5-CS on hospital data with a larger set of attributes, giving the best results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    International classification of diseases; http://www.who.int/classifications/icd/en/

References

  1. Fernández, A., Garciá, S., Luengo, J., Bernadó-Mansilla, E., Herrera, F.: Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans. Evol. Comput. 14(6), 913–941 (2010)

    Article  Google Scholar 

  2. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  3. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn, pp. 875–886. Springer, New York (2010)

    Google Scholar 

  4. Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newsl. 6(1), 40–49 (2004)

    Article  MathSciNet  Google Scholar 

  5. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 1–32 (2006)

    Article  Google Scholar 

  6. Ohsaki, M., Abe, H., Tsumoto, S., Yokoi, H., Yamaguchi, T.: Evaluation of rule interestingness measures in medical knowledge discovery in databases. Artif. Intell. Med. 41, 177–196 (2007)

    Article  Google Scholar 

  7. Greco, S., Pawlak, Z., Slowiński, R.: Can bayesian confirmation measures be useful for rough set decision rules? Eng. Appl. Artif. Intell. 17(4), 345–361 (2004)

    Article  Google Scholar 

  8. Bayardo, J., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the Fifth ACM SIGKDD, ser. KDD ’99, pp. 145–154 (1999)

    Google Scholar 

  9. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)

    Article  MATH  Google Scholar 

  10. Reynolds, A., de la Iglesia, B.: Rule induction for classification using multi-objective genetic programming. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 516–530. Springer, Heidelberg (2007)

    Google Scholar 

  11. Bacardit, J., Stout, M., Hirst, J.D., Sastry, K., Llorà, X., Krasnogor, N.: Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: GECCO, pp. 346–353 (2007)

    Google Scholar 

  12. Corne, D., Dhaenens, C., Jourdan, L.: Synergies between operations research and data mining: the emerging use of multi-objective approaches. Eur. J. Oper. Res. 221(3), 469–479 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  13. Srinivasan, S., Ramakrishnan, S.: Evolutionary multi objective optimization for rule mining: a review. Artif. Intell. Rev. 36(3), 205–248 (2011)

    Google Scholar 

  14. Coello Coello, C.A, Dhaenens, C., Jourdan, L. (eds.): Advances in Multi-Objective Nature Inspired Computing. SCI, vol. 272. Springer, Heidelberg (2010)

    Google Scholar 

  15. Casillas, J., Martínez, P., Benítez, A.: Learning consistent, complete and compact sets of fuzzy rules in conjunctive normal form for regression problems. Soft Comput. (A Fusion of Foundations, Methodologies and Applications) 13, 451–465 (2009)

    Google Scholar 

  16. Liefooghe, A., Humeau, J., Mesmoudi, S., Jourdan, L., Talbi, E.-G.: On dominance-based multiobjective local search: design, implementation and experimental analysis on scheduling and traveling salesman problems. J. Heuristics 18, 317–352 (2012)

    Article  Google Scholar 

  17. Liefooghe, A., Jourdan, L., Talbi, E.-G.: A software framework based on a conceptual unified model for evolutionary multiobjective optimization: paradiseo-moeo. Eur. J. Oper. Res. 209(2), 104–112 (2011)

    Article  MathSciNet  Google Scholar 

  18. Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  19. Alcalá-Fdez, J., et al.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. (A Fusion of Foundations, Methodologies and Applications) 13, 307–318 (2009)

    Google Scholar 

  20. Plantevit, M., Laurent, A., Laurent, D., Teisseire, M., Choong, Y.W.: Mining multidimensional and multilevel sequential patterns. ACM TKDD 4(1), 1–37 (2010)

    Article  Google Scholar 

  21. Zhang, J., Bala, J.W., Hadjarian, A., Han, B.: Learning to rank cases with classification rules. In: Fürnkranz, J., Hüllermeier, E. (eds.) Preference Learning, pp. 155–177. Springer, Heidelberg (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laetitia Jourdan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jacques, J., Taillard, J., Delerue, D., Jourdan, L., Dhaenens, C. (2013). MOCA-I: Discovering Rules and Guiding Decision Maker in the Context of Partial Classification in Large and Imbalanced Datasets. In: Nicosia, G., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2013. Lecture Notes in Computer Science(), vol 7997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44973-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-44973-4_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-44972-7

  • Online ISBN: 978-3-642-44973-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics