Skip to main content

Handling Missing Attribute Values

  • Chapter
  • First Online:
Machine Learning for Data Science Handbook

Abstract

In this chapter, methods of handling missing attribute values in data mining are described. These methods are categorized into sequential and parallel. In sequential methods, missing attribute values are replaced by known values first, as a preprocessing, and then the knowledge is acquired for a data set with all known attribute values. In parallel methods, there is no preprocessing, i.e., knowledge is acquired directly from the original data sets. In this chapter, the main emphasis is put on rule induction. Methods of handling attribute values for decision tree generation are only briefly summarized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Allison, P.D.: Missing Data. Sage Publications, Thousand Oaks, CA (2002)

    Book  MATH  Google Scholar 

  2. Brazdil, P., Bruha, I.: Processing unknown attribute values by ID3. In: Proceedings of the 4-th International Conference on Computing and Information. pp. 227–230 (1992)

    Google Scholar 

  3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks, Monterey, CA (1984)

    Google Scholar 

  4. Bruha, I.: Meta-learner for unknown attribute values processing: Dealing with inconsistency of meta-databases. Journal of Intelligent Information Systems 22, 71–87 (2004)

    Article  Google Scholar 

  5. Chiu, D.K., Wong, A.K.C.: Synthesizing knowledge: A cluster analysis approach using event-covering. IEEE Transactions Syst., Man, and Cybernet 16, 251–259 (1986)

    Google Scholar 

  6. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)

    Article  Google Scholar 

  7. Clark, P.G., Gao, C., Grzymala-Busse, J.W., Mroczek, T.: Characteristic sets and generalized maximal consistent blocks in mining incomplete data. Information Sciences 453, 66–79 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  8. Clark, P.G., Grzymala-Busse, J.W.: Experiments on probabilistic approximations. In: Proceedings of the 2011 IEEE International Conference on Granular Computing. pp. 144–149 (2011)

    Google Scholar 

  9. Clark, P.G., Grzymala-Busse, J.W.: Experiments on rule induction from incomplete data using three probabilistic approximations. In: Proceedings of the 2012 IEEE International Conference on Granular Computing. pp. 90–95 (2012)

    Google Scholar 

  10. Clark, P.G., Grzymala-Busse, J.W., Rzasa, W.: Mining incomplete data with singleton, subset and concept approximations. Information Sciences 280, 368–384 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Dardzinska, A., Ras, Z.W.: Chasing unknown values in incomplete information systems. In: Workshop Notes, Foundations and New Directions of Data Mining, in conjunction with the 3rd International Conference on Data Mining. pp. 24–30 (2003)

    Google Scholar 

  12. Dardzinska, A., Ras, Z.W.: On rule discovery from incomplete information systems. In: Workshop Notes, Foundations and New Directions of Data Mining, in conjunction with the 3-rd International Conference on Data Mining. pp. 24–30 (2003)

    Google Scholar 

  13. Greco, S., Matarazzo, B., Slowinski, R.: Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. In: Zanakis, H., Doukidis, G., Zopounidised, Z. (eds.) Decision Making: Recent Developments and Worldwide Applications, pp. 295–316. Kluwer Academic Publishers, Dordrecht, Boston, London (2000)

    Chapter  Google Scholar 

  14. Grzymala-Busse, J.W.: Knowledge acquisition under uncertainty—A rough set approach. Journal of Intelligent & Robotic Systems 1, 3–16 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  15. Grzymala-Busse, J.W.: On the unknown attribute values in learning from examples. In: Proceedings of the 6th International Symposium on Methodologies for Intelligent Systems. pp. 368–377 (1991)

    Google Scholar 

  16. Grzymala-Busse, J.W.: LERS—a system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht, Boston, London (1992)

    Chapter  Google Scholar 

  17. Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)

    Article  MATH  Google Scholar 

  18. Grzymala-Busse, J.W.: MLEM2: A new algorithm for rule induction from imperfect data. In: Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. pp. 243–250 (2002)

    Google Scholar 

  19. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Notes of the Workshop on Foundations and New Directions of Data Mining, in conjunction with the Third International Conference on Data Mining. pp. 56–63 (2003)

    Google Scholar 

  20. Grzymala-Busse, J.W.: Characteristic relations for incomplete data: A generalization of the indiscernibility relation. In: Proceedings of the Fourth International Conference on Rough Sets and Current Trends in Computing. pp. 244–253 (2004)

    Google Scholar 

  21. Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets 1, 78–95 (2004)

    MATH  Google Scholar 

  22. Grzymala-Busse, J.W.: Rough set approach to incomplete data. In: Proceedings of the ICAISC, the Seventh International Conference on Artificial Intelligence and Soft Computing. pp. 50–55 (2004)

    Google Scholar 

  23. Grzymala-Busse, J.W.: Generalized parameterized approximations. In: Proceedings of the 6th International Conference on Rough Sets and Knowledge Technology. pp. 136–145 (2011)

    Google Scholar 

  24. Grzymala-Busse, J.W., Grzymala-Busse, W.J., Goodwin, L.K.: A comparison of three closest fit approaches to missing attribute values in preterm birth data. International Journal of Intelligent Systems 17(2), 125–134 (2002)

    Article  MATH  Google Scholar 

  25. Grzymala-Busse, J.W., Hu, M.: A comparison of several approaches to missing attribute values in data mining. In: Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing. pp. 340–347 (2000)

    Google Scholar 

  26. Grzymala-Busse, J.W., Siddhaye, S.: Rough set approaches to rule induction from incomplete data. In: Proceedings of the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. pp. 923–930 (2004)

    Google Scholar 

  27. Grzymala-Busse, J.W., Wang, A.Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In: Proceedings of the 5th International Workshop on Rough Sets and Soft Computing in conjunction with the Third Joint Conference on Information Sciences. pp. 69–72 (1997)

    Google Scholar 

  28. Imielinski, T., Lipski, W.J.: Incomplete information in relational databases. Journal of the ACM 31, 761–791 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  29. Kryszkiewicz, M.: Rough set approach to incomplete information systems. In: Proceedings of the Second Annual Joint Conference on Information Sciences. pp. 194–197 (1995)

    Google Scholar 

  30. Kryszkiewicz, M.: Rules in incomplete information systems. Information Sciences 113(3–4), 271–292 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  31. Lakshminarayan, K., A., H.S., Samad, T.: Imputation of missing data in industrial databases. Applied Intelligence 11, 259–275 (1999)

    Article  Google Scholar 

  32. Latkowski, R.: On decomposition for incomplete data. Fundamenta Informaticae 54, 1–16 (2003)

    MathSciNet  MATH  Google Scholar 

  33. Latkowski, R., Mikolajczyk, M.: Data decomposition and decision rule joining for classification of data with missing values. In: Proceedings of the Fourth International Conference on Rough Sets and Current Trends in Computing. pp. 254–263 (2004)

    Google Scholar 

  34. Leung, Y., Li, D.: Maximal consistent block technique for rule acquisition in incomplete information systems. Information Sciences 153, 85–106 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  35. Leung, Y., Wu, W., Zhang, W.: Knowledge acquisition in incomplete information systems: A rough set approach. European Journal of Operational Research 168, 164–180 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  36. Lipski, W.J.: On semantic issues connected with incomplete information databases. ACM Transactions on Database Systems 4, 262–296 (1979)

    Article  Google Scholar 

  37. Lipski, W.J.: On databases with incomplete information. Journal of the ACM 28, 41–70 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  38. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, Second Edition. J. Wiley & Sons, Inc., Hoboken, NJ (2002)

    Book  MATH  Google Scholar 

  39. McKnight, P.E., McKnight, K.M., Sidani, S., Figueredo, A.J.: Missing Data. A Gentle Introduction. The Guilford Press, New York, NY (2007)

    Google Scholar 

  40. Meng, Z., Shi, Z.: Extended rough set-based attribute reduction in inconsistent incomplete decision systems. Information Sciences 204, 44–69 (2012)

    Article  MathSciNet  Google Scholar 

  41. Nakata, M., Sakai, H.: Applying rough sets to information tables containing missing values. In: Proceedings of the 39th International Symposium on Multiple-Valued Logic. pp. 286–291 (2009)

    Google Scholar 

  42. Quinlan, J.R.: Unknown attribute values in induction. In: Proceedings of the 6th Int. Workshop on Machine Learning. pp. 164–168 (1989)

    Google Scholar 

  43. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA (1993)

    Google Scholar 

  44. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)

    Book  MATH  Google Scholar 

  45. Stefanowski, J.: Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan, Poland (2001)

    Google Scholar 

  46. Stefanowski, J., Tsoukias, A.: On the extension of rough sets under incomplete information. In: Proceedings of the RSFDGrC’1999, 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing. pp. 73–81 (1999)

    Google Scholar 

  47. Stefanowski, J., Tsoukias, A.: Incomplete information tables and rough classification. Computational Intelligence 17(3), 545–566 (2001)

    Article  MATH  Google Scholar 

  48. Weiss, S., Kulikowski, C.A.: Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems. Morgan Kaufmann Publ., San Mateo, CA (1991)

    Google Scholar 

  49. Wong, A.K.C., Chiu, D.K.Y.: Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 9, 796–805 (1987)

    Article  Google Scholar 

  50. Wu, X., Barbara, D.: Learning missing values from summary constraints. ACM SIGKDD Explorations Newsletter 4, 21–30 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerzy W. Grzymala-Busse .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Grzymala-Busse, J.W., Grzymala-Busse, W.J. (2023). Handling Missing Attribute Values. In: Rokach, L., Maimon, O., Shmueli, E. (eds) Machine Learning for Data Science Handbook. Springer, Cham. https://doi.org/10.1007/978-3-031-24628-9_2

Download citation

Publish with us

Policies and ethics