Skip to main content

Handling Missing Attribute Values

  • Chapter

Abstract

In this chapter methods of handling missing attribute values in Data Mining are described. These methods are categorized into sequential and parallel. In sequential methods, missing attribute values are replaced by known values first, as a preprocessing, then the knowledge is acquired for a data set with all known attribute values. In parallel methods, there is no preprocessing, i.e., knowledge is acquired directly from the original data sets. In this chapter the main emphasis is put on rule induction. Methods of handling attribute values for decision tree generation are only briefly summarized.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   229.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Allison P.D. Missing Data. Sage Publications, 2002.

    Google Scholar 

  • Brazdii P. and Bruha I. Processing unknown attribute values by ID3. Proceedings of the 4-th Int. Conference Computing and Information, Toronto, 1992, 227–230

    Google Scholar 

  • Breiman L., Friedman J.H., Olshen R.A., Stone CJ. Classification and Regression Trees. Wadsworth & Brooks, Monterey, CA, 1984.

    MATH  Google Scholar 

  • Bruha I. Meta-learner for unknown attribute values processing: Dealing with inconsistency of meta-databases. Journal of Intelligent Information Systems 22 71–87, 2004.

    Article  Google Scholar 

  • Chiu, D. K. and Wong A. K. C. Synthesizing knowledge: A cluster analysis approach using event-covering. IEEE Trans. Syst., Man, and Cybern. SMC-16 251–259, 1986.

    Article  Google Scholar 

  • Clark P. and Niblett T. The CN2 induction algorithm. Machine Learning 3 261–283, 1989.

    Google Scholar 

  • Dardzinska A. and Ras Z.W. Chasing unknown values in incomplete information systems. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 24–30, 2003A.

    Google Scholar 

  • Dardzinska A. and Ras Z.W. On rule discovery from incomplete information systems. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 31–35, 2003B.

    Google Scholar 

  • Greco S., Matarazzo B., and Slowinski R. Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. In Decision Making: Recent developments and Worldwide Applications, ed. by S. H. Zanakis, G. Doukidis, and Z. Zopounidis, Kluwer Academic Publishers, Dordrecht, Boston, London, 2000, 295–316.

    Google Scholar 

  • Grzymala-Busse J.W. Knowledge acquisition under uncertainty—A rough set approach. Journal of Intelligent & Robotic Systems 1 (1988) 3–16.

    Article  MathSciNet  Google Scholar 

  • Grzymala-Busse J.W. On the unknown attribute values in learning from examples. Proc. of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Charlotte, North Carolina, October 16–19, 1991. Lecture Notes in Artificial Intelligence, vol. 542, Springer-Verlag, Berlin, Heidelberg, New York, 1991, 368–377.

    Google Scholar 

  • Grzymala-Busse J.W. LERS—A system for learning from examples based on rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, ed. by R. Slowinski, Kluwer Academic Publishers, Dordrecht, Boston, London, 1992, 3–18.

    Google Scholar 

  • Grzymala-Busse J.W. A new version of the rule induction system LERS, Fundamenta Informaticae 31 (1997) 27–39.

    MATH  Google Scholar 

  • Grzymala-Busse J.W. MLEM2: A new algorithm for rule induction from imperfect data. Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, Annecy, France, July 1–5, 2002, 243–250.

    Google Scholar 

  • Grzymala-Busse J.W. Rough set strategies to data with missing attribute values. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 2003, 56–63.

    Google Scholar 

  • Grzymala-Busse J.W. Data with missing attribute values: Generalization of in-discernibility relation and rule induction. Transactions on Rough Sets, Lecture Notes in Computer Science Journal Subline, Springer-Verlag, vol. 1 78–95, 2004A.

    Google Scholar 

  • Grzymala-Busse J.W. Characteristic relations for incomplete data: A generalization of the indiscernibility relation. Proceedings of the RSCTC’2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 15, 2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag pp.244–253, 2004B.

    Google Scholar 

  • Grzymala-Busse J.W. Rough set approach to incomplete data. Proceedings of the ICAISC’2004, the Seventh International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, June 711, 2004. Lecture Notes in Artificial Intelligence 3070, Springer-Verlag pp.50–55, 2004.

    Google Scholar 

  • Grzymala-Busse J.W., Grzymala-Busse W.J., and Goodwin L.K. A comparison of three closest fit approaches to missing attribute values in preterm birth data. International Journal of Intelligent Systems 17 (2002) 125–134.

    Article  MATH  Google Scholar 

  • Grzymala-Busse, J.W. and Hu, M. A comparison of several approaches to missing attribute values in Data Mining. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing RSCTC’2000, Banff, Canada, October 16–19, 2000, 340–347.

    Google Scholar 

  • Grzymala-Busse, J.W. and Wang A.Y. Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC’97) at the Third Joint Conference on Information Sciences (JCIS’97), Research Triangle Park, NC, March 2–5, 1997, 69–72.

    Google Scholar 

  • Grzymala-Busse J.W. and Siddhaye S. Rough set approaches to rule induction from incomplete data. Proceedings of the IPMU’2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 49, 2004, vol. 2, 923930.

    Google Scholar 

  • Imielinski T. and Lipski W. Jr. Incomplete information in relational databases, Journal of the ACM 31 (1984) 761–791.

    Article  MathSciNet  MATH  Google Scholar 

  • Kononenko I., Bratko I., and Roskar E. Experiments in automatic learning of medical diagnostic rules. Technical Report, Jozef Stefan Institute, Lljubl-jana, Yugoslavia, 1984

    Google Scholar 

  • Kryszkiewicz M. Rough set approach to incomplete information systems. Proceedings of the Second Annual Joint Conference on Information Sciences, Wrightsville Beach, NC, September 28-October 1, 1995, 194–197.

    Google Scholar 

  • Kryszkiewicz M. Rules in incomplete information systems. Information Sciences 113 (1999) 271–292.

    Article  MATH  MathSciNet  Google Scholar 

  • Lakshminarayan K., Harp S.A., and Samad T. Imputation of missing data in industrial databases. Applied Intelligence 11 (1999) 259–275.

    Article  Google Scholar 

  • Latkowski, R. On decomposition for incomplete data. Fundamenta Informaticae 54 (2003) 1–16.

    MATH  MathSciNet  Google Scholar 

  • Latkowski R. and Mikolajczyk M. Data decomposition and decision rule joining for classification of data with missing values. Proceedings of the RSCTC2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 1–5,2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag 2004, 254–263.

    Google Scholar 

  • Lipski W. Jr. On semantic issues connected with incomplete information databases. ACM Transactions on Database Systems 4 (1979), 262–296.

    Article  Google Scholar 

  • Lipski W. Jr. On databases with incomplete information. Journal of the ACM 28(1981) 41–70.

    Article  MATH  MathSciNet  Google Scholar 

  • Little R.J.A. and Rubin D.B. Statistical Analysis with Missing Data, Second Edition, J. Wiley & Sons, Inc., 2002.

    Google Scholar 

  • Pawlak Z. Rough Sets. International Journal of Computer and Information Sciences 11 (1982) 341–356.

    Article  MATH  MathSciNet  Google Scholar 

  • Pawlak Z. Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London, 1991.

    MATH  Google Scholar 

  • Pawlak Z., Grzymala-Busse J.W., Slowinski R., and Ziarko, W. Rough sets. Communications of the ACM 38 (1995) 88–95.

    Article  Google Scholar 

  • Polkowski L. and Skowron A. (eds.) Rough Sets in Knowledge Discovery, 2, Applications, Case Studies and Software Systems, Appendix 2: Software Systems. Physica Verlag, Heidelberg New York (1998) 551–601.

    Google Scholar 

  • Quinlan J.R. Unknown attribute values in induction. Proc. of the 6-th Int. Workshop on Machine Learning, Ithaca, NY, 1989, 164–168.

    Google Scholar 

  • Quinlan J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo CA (1993).

    Google Scholar 

  • Schafer J.L. Analysis of Incomplete Multivariate Data. Chapman and Hall, London, 1997.

    MATH  Google Scholar 

  • Slowinski R. and Vanderpooten D. A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12 (2000) 331–336.

    Article  Google Scholar 

  • Stefanowski J. Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan, Poland (2001).

    Google Scholar 

  • Stefanowski J. and Tsoukias A. On the extension of rough sets under incomplete information. Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, RSFDGrC’ 1999, Ube, Yamaguchi, Japan, November 8–10, 1999, 73–81.

    Google Scholar 

  • Stefanowski J. and Tsoukias A. Incomplete information tables and rough classification. Computational Intelligence 17 (2001) 545–566.

    Article  Google Scholar 

  • Weiss S. and Kulikowski C.A. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, chapter How to Estimate the True Performance of a Learning System, pp. 17–49, San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1991.

    Google Scholar 

  • Wong K.C. and Chiu K.Y. Synthesizing statistical knowledge for incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 9 (1987) 796805.

    Article  Google Scholar 

  • Wu X. and Barbara D. Learning missing values from summary constraints. ACM SIGKDD Explorations Newsletter 4 (2002) 21–30.

    Google Scholar 

  • Wu X. and Barbara D. Modeling and imputation of large incomplete multidimensional datasets. Proc. of the 4-th Int. Conference on Data Warehousing and Knowledge Discovery, Aix-en-Provence, France, 2002, 286–295

    Google Scholar 

  • Yao Y.Y. On the generalizing rough set theory. Proc. of the 9th Int. Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFD-GrC’2003), Chongqing, China, October 19–22, 2003, 44–51.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Grzymala-Busse, J.W., Grzymala-Busse, W.J. (2005). Handling Missing Attribute Values. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_3

Download citation

  • DOI: https://doi.org/10.1007/0-387-25465-X_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-24435-8

  • Online ISBN: 978-0-387-25465-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics