Journal of Intelligent Information Systems

, Volume 36, Issue 1, pp 73–98 | Cite as

A review and comparison of strategies for handling missing values in separate-and-conquer rule learning

Article

Abstract

In this paper, we review possible strategies for handling missing values in separate-and-conquer rule learning algorithms, and compare them experimentally on a large number of datasets. In particular through a careful study with data with controlled levels of missing values we get additional insights on the strategies’ different biases w.r.t. attributes with missing values. Somewhat surprisingly, a strategy that implements a strong bias against the use of attributes with missing values, exhibits the best average performance on 24 datasets from the UCI repository.

Keywords

Machine learning Inductive rule learning Missing values 

Notes

Acknowledgements

We would like to thank Nada Lavrač and Dragan Gamberger for interesting discussions on their pessimistic value strategy. This research was supported by the German Science Foundation (DFG) under grant FU 580/2.

References

  1. Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. (1984). Classification and regression trees. Pacific Grove: Wadsworth & Brooks.MATHGoogle Scholar
  2. Bruha, I., & Franek, F. (1996). Comparison of various routines for unknown attribute value processing: The covering paradigm. International Journal of Pattern Recognition and Artificial Intelligence, 10(8), 939–955.CrossRefGoogle Scholar
  3. Burdick, D., Deshpande, P. M., Jayram, T. S., Ramakrishnan, R., & Vaithyanathan, S. (2007). OLAP over uncertain and imprecise data. The International Journal on Very Large Data Bases, 16(1), 123–144.CrossRefGoogle Scholar
  4. Clark, P., & Boswell R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European working session on learning (EWSL-91) (pp. 151–163). Porto: Springer.Google Scholar
  5. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283.Google Scholar
  6. Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis, & S. Russell (Eds.), Proceedings of the 12th international conference on machine learning (ML-95) (pp. 115–123). Lake Tahoe: Morgan Kaufmann.Google Scholar
  7. Dardzinska, A., & Ras, Z. W. (2006). Extracting rules from incomplete decision systems: System ERID. In T. Y. Lin, S. Ohsuga, C.-J. Liau, & X. Hu (Eds.), Foundations and novel approaches in data mining. Studies in computational intelligence (Vol. 6, pp. 143–153). Berlin: Springer.Google Scholar
  8. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.Google Scholar
  9. Fujikawa, Y., & Ho, T.-B. (2002). Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining (pakdd 2002). In M.-S. Cheng, P. S. Yu, & Bing Liu (Eds.), PAKDD. Lecture notes in computer science (Vol. 2336, pp. 549–554). Taipei: Springer.Google Scholar
  10. Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13(1), 3–54.MATHCrossRefGoogle Scholar
  11. Gamberger, D., Lavrač, N., & Fürnkranz, J. (2008). Handling unknown and imprecise attribute values in propositional rule learning: A feature-based approach. In T.-B. Ho, & Z.-H. Zhou (Eds.), Proceedings of the 10th Pacific rim international conference on artificial intelligence (PRICAI-08) (pp. 636–645). Hanoi: Springer.Google Scholar
  12. Ghahramani, Z., & Jordan, M. I. (1994). Advances in neural information processing systems 6 (nips-93). In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), NIPS (pp. 120–127). Denver: Morgan Kaufmann.Google Scholar
  13. Grzymala-Busse, J. W. (2005a). LERS—a data mining system. In O. Maimon, & L. Rokach (Eds.), The data mining and knowledge discovery handbook (pp. 1347–1351). Berlin: Springer.CrossRefGoogle Scholar
  14. Grzymala-Busse, J. W. (2005b). Characteristic relations for incomplete data: A generalization of the indiscernibility relation. In J. F. Peters, & A. Skowron (Eds.), Transactions on rough sets IV (pp. 58–68). Berlin: Springer.CrossRefGoogle Scholar
  15. Grzymala-Busse, J. W. (1991). On the unknown attribute values in learning from examples. In Z. W. Ras, & M. Zemankova (Eds.), Proceedings of the 6th international symposium on methodologies for intelligent systems (ISMIS-91) (pp. 368–377). Charlotte, N.C.Google Scholar
  16. Grzymala-Busse, J. W., & Grzymala-Busse, W. J. (2005). Handling missing attribute values. In O. Maimon, & L. Rokach (Eds.), Data mining and knowledge discovery handbook (pp. 37–57). Berlin: Springer.CrossRefGoogle Scholar
  17. Grzymala-Busse, J. W., & Hu, M. (2000). A comparison of several approaches to missing attribute values in data mining. In Rough sets and current trends in computing (pp. 378–385).Google Scholar
  18. Grzymala-Busse, J. W., & Wang, A. Y. (1997). Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In Proceedings of the fifth international workshop on rough sets and soft computing (RSSC 1997) (pp. 69–72).Google Scholar
  19. Grzymala-Busse, J. W., Grzymala-Busse, W. J., & Goodwin, L. K. (1999). A closest fit approach to missing attribute values in preterm birth data. In N. Zhong, A. Skowron, & S. Ohsuga (Eds.), Proceedings of the 7th international workshop on new directions in rough sets, data mining, and granular-soft computing. Lecture notes in computer science (Vol. 1711, pp. 405–413). Yamaguchi: Springer.CrossRefGoogle Scholar
  20. Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. Irvine: Department of Information and Computer Science, University of California at Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
  21. Iman, R. L., & Davenport, J. M. (1980). Approximations in the critical region of the Friedman statistic. Communications in Statistics—Theory and Methods, 9(6), 571–595.CrossRefGoogle Scholar
  22. Janssen, F., & Fürnkranz, J. (2008). An empirical investigation of the trade-off between consistency and coverage in rule learning heuristics. In J.-F. Boulicaut, M. Berthold, & T. Horváth (Eds.), Proceedings of the 11th international conference on discovery science (DS-08) (pp. 40–51). Budapest: Springer.Google Scholar
  23. Janssen, F., & Fürnkranz, J. (2010). On the quest for optimal rule learning heuristics. Machine Learning 78(3), 343–379.CrossRefGoogle Scholar
  24. Kryszkiewicz, M. (1999a). Association rules in incomplete databases. In N. Zhong, & L. Zhou (Eds.), Proceedings of the 3rd Pacific-Asia conference on methodologies for knowledge discovery and data mining (PAKDD-99) (pp. 84–93). Beijing, China.Google Scholar
  25. Kryszkiewicz, M. (1999b). Rules in incomplete information systems. Information Sciences, 113(3–4), 271–292.MATHCrossRefMathSciNetGoogle Scholar
  26. Lakshminarayan, K., Harp, S. A., & Samad, T. (1999). Imputation of missing data in industrial databases. Applied Intelligence, 11(3), 259–275.CrossRefGoogle Scholar
  27. Latkowski, R. (2003). On decomposition for incomplete data. Fundamenta Informaticae, 54(1), 1–16.MATHMathSciNetGoogle Scholar
  28. Latkowski, R., & Mikołajczyk, M. (2004). Data decomposition and decision rule joining for classification of data with missing values. In J. F. Peters, A. Skowron, D. Duboi, J. W. Grzymala-Busse, M. Inuiguchi, & L. Polkowski (Eds.), Transactions on rough sets II (pp. 299–320). Berlin: Springer.CrossRefGoogle Scholar
  29. Lavrač N., Fürnkranz,v, & Gamberger, D. (2010). Explicit feature construction and manipulation for covering rule learning algorithms. In J. Koronacki, Z. Ras, S. T. Wierzchon, & J. Kacprzyk (Eds.), Advances in machine learning II—Dedicated to the memory of Professor Ryszard S. Michalski (pp. 121–146). Berlin: Springer.Google Scholar
  30. Li, D., Deogun, J. S., Spaulding, W., & Shuart, B. (2005). Dealing with missing data: Algorithms based on fuzzy set and rough set theories. In J. F. Peters, & A. Skowron (Eds.), Transactions on rough sets IV (pp. 37–57). Berlin: Springer.CrossRefGoogle Scholar
  31. Li, T., Ruan, D., & Song, J. (2007). Dynamic maintenance and decision rules with rough set under characteristic relation. In Proceedings of the international conference on wireless communications, networking and mobile computing (pp. 3713–3716).Google Scholar
  32. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.MATHGoogle Scholar
  33. Nakata, M., & Sakai, H. (2005). Rough Sets Handling Missing Values Probabilistically Interpreted. In D. Slezak, G. Wang, M. S. Szczuka, I. Düntsch, & Y. Yao (Eds.), Proceedings of the 10th international conference on rough sets, fuzzy sets, data mining, and granular computing (RSFDGrC-05), part I (pp. 325–334).Google Scholar
  34. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic (ISBN 0-7923-1472-7)MATHGoogle Scholar
  35. Ross Quinlan, J. (1989). Unknown attribute values in induction. In Proceedings of the 6th international workshop on machine learning (ML-89) (pp. 164–168).Google Scholar
  36. Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473–489.MATHCrossRefGoogle Scholar
  37. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.CrossRefGoogle Scholar
  38. Saar-Tsechansky, M., & Provost, F. (2007). Handling missing values when applying classification models. Journal of Machine Learning Research, 8, 1625–1657.Google Scholar
  39. Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton: Chapman & Hall/CRC.MATHCrossRefGoogle Scholar
  40. Stefanowski, J., & Tsoukiàs, A. (2001). Incomplete information tables and rough classification. Computational Intelligence, 17(3), 545–566.CrossRefGoogle Scholar
  41. Twala, B., Cartwright,v, & Shepperd, M. J. (2005). Comparison of various methods for handling incomplete data in software engineering databases. In Proceedings of the international symposium on empirical software engineering (ISESE-05) (pp. 105–114).Google Scholar
  42. Wang, G. (2002). Extension of rough set under incomplete information systems. In Proceedings of the IEEE international conference on fuzzy systems (FUZZ-IEEE-02) (pp. 1098–1103).Google Scholar
  43. Witten, I. H., & Frank, E. (2005). Data mining—practical machine learning tools and techniques with Java implementations (2nd ed.). Lake Tahoe: Morgan Kaufmann.Google Scholar
  44. Wohlrab, L. (2009). Comparison of different methods for handling missing attribute values in the SeCo rule learner. Independent Study Project, Knowledge Engineering Group, TU Darmstadt (in German).Google Scholar
  45. Wong, A. K. C., & Chiu, D. K. Y. (1987). Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(6), 796–805.CrossRefGoogle Scholar
  46. Wu, X., & Barbará, D. (2002a). Learning missing values from summary constraints. SIGKDD Explorations, 4(1), 21–30.CrossRefGoogle Scholar
  47. Wu, X., & Barbará, D. (2002). Modeling and Imputation of Large Incomplete Multidimensional Datasets. In Proceedings of the 4th international conference on data warehousing and knowledge discovery (DaWaK-02) (pp. 286–295). Berlin: Springer.Google Scholar
  48. Zou, Y., An, A., & Huang, X. (2005). Evaluation and automatic selection of methods for handling missing data. In X. Hu, Q. Liu, A. Skowron, T. Y. Lin, R. R. Yager, & B. Zhang (Eds.), Proceedings of the IEEE international conference on granular computing (pp. 728–733). Washington: IEEE.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Knowledge Engineering GroupTechnische Universität DarmstadtDarmstadtGermany

Personalised recommendations