Handling Missing Attribute Values

Grzymala-Busse, Jerzy W.; Grzymala-Busse, Witold J.

doi:10.1007/0-387-25465-X_3

Handling Missing Attribute Values

Jerzy W. Grzymala-Busse² &
Witold J. Grzymala-Busse³

Chapter

20k Accesses
27 Citations

Abstract

In this chapter methods of handling missing attribute values in Data Mining are described. These methods are categorized into sequential and parallel. In sequential methods, missing attribute values are replaced by known values first, as a preprocessing, then the knowledge is acquired for a data set with all known attribute values. In parallel methods, there is no preprocessing, i.e., knowledge is acquired directly from the original data sets. In this chapter the main emphasis is put on rule induction. Methods of handling attribute values for decision tree generation are only briefly summarized.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allison P.D. Missing Data. Sage Publications, 2002.
Google Scholar
Brazdii P. and Bruha I. Processing unknown attribute values by ID3. Proceedings of the 4-th Int. Conference Computing and Information, Toronto, 1992, 227–230
Google Scholar
Breiman L., Friedman J.H., Olshen R.A., Stone CJ. Classification and Regression Trees. Wadsworth & Brooks, Monterey, CA, 1984.
MATH Google Scholar
Bruha I. Meta-learner for unknown attribute values processing: Dealing with inconsistency of meta-databases. Journal of Intelligent Information Systems 22 71–87, 2004.
Article Google Scholar
Chiu, D. K. and Wong A. K. C. Synthesizing knowledge: A cluster analysis approach using event-covering. IEEE Trans. Syst., Man, and Cybern. SMC-16 251–259, 1986.
Article Google Scholar
Clark P. and Niblett T. The CN2 induction algorithm. Machine Learning 3 261–283, 1989.
Google Scholar
Dardzinska A. and Ras Z.W. Chasing unknown values in incomplete information systems. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 24–30, 2003A.
Google Scholar
Dardzinska A. and Ras Z.W. On rule discovery from incomplete information systems. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 31–35, 2003B.
Google Scholar
Greco S., Matarazzo B., and Slowinski R. Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. In Decision Making: Recent developments and Worldwide Applications, ed. by S. H. Zanakis, G. Doukidis, and Z. Zopounidis, Kluwer Academic Publishers, Dordrecht, Boston, London, 2000, 295–316.
Google Scholar
Grzymala-Busse J.W. Knowledge acquisition under uncertainty—A rough set approach. Journal of Intelligent & Robotic Systems 1 (1988) 3–16.
Article MathSciNet Google Scholar
Grzymala-Busse J.W. On the unknown attribute values in learning from examples. Proc. of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Charlotte, North Carolina, October 16–19, 1991. Lecture Notes in Artificial Intelligence, vol. 542, Springer-Verlag, Berlin, Heidelberg, New York, 1991, 368–377.
Google Scholar
Grzymala-Busse J.W. LERS—A system for learning from examples based on rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, ed. by R. Slowinski, Kluwer Academic Publishers, Dordrecht, Boston, London, 1992, 3–18.
Google Scholar
Grzymala-Busse J.W. A new version of the rule induction system LERS, Fundamenta Informaticae 31 (1997) 27–39.
MATH Google Scholar
Grzymala-Busse J.W. MLEM2: A new algorithm for rule induction from imperfect data. Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, Annecy, France, July 1–5, 2002, 243–250.
Google Scholar
Grzymala-Busse J.W. Rough set strategies to data with missing attribute values. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 2003, 56–63.
Google Scholar
Grzymala-Busse J.W. Data with missing attribute values: Generalization of in-discernibility relation and rule induction. Transactions on Rough Sets, Lecture Notes in Computer Science Journal Subline, Springer-Verlag, vol. 1 78–95, 2004A.
Google Scholar
Grzymala-Busse J.W. Characteristic relations for incomplete data: A generalization of the indiscernibility relation. Proceedings of the RSCTC’2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 15, 2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag pp.244–253, 2004B.
Google Scholar
Grzymala-Busse J.W. Rough set approach to incomplete data. Proceedings of the ICAISC’2004, the Seventh International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, June 711, 2004. Lecture Notes in Artificial Intelligence 3070, Springer-Verlag pp.50–55, 2004.
Google Scholar
Grzymala-Busse J.W., Grzymala-Busse W.J., and Goodwin L.K. A comparison of three closest fit approaches to missing attribute values in preterm birth data. International Journal of Intelligent Systems 17 (2002) 125–134.
Article MATH Google Scholar
Grzymala-Busse, J.W. and Hu, M. A comparison of several approaches to missing attribute values in Data Mining. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing RSCTC’2000, Banff, Canada, October 16–19, 2000, 340–347.
Google Scholar
Grzymala-Busse, J.W. and Wang A.Y. Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC’97) at the Third Joint Conference on Information Sciences (JCIS’97), Research Triangle Park, NC, March 2–5, 1997, 69–72.
Google Scholar
Grzymala-Busse J.W. and Siddhaye S. Rough set approaches to rule induction from incomplete data. Proceedings of the IPMU’2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 49, 2004, vol. 2, 923930.
Google Scholar
Imielinski T. and Lipski W. Jr. Incomplete information in relational databases, Journal of the ACM 31 (1984) 761–791.
Article MathSciNet MATH Google Scholar
Kononenko I., Bratko I., and Roskar E. Experiments in automatic learning of medical diagnostic rules. Technical Report, Jozef Stefan Institute, Lljubl-jana, Yugoslavia, 1984
Google Scholar
Kryszkiewicz M. Rough set approach to incomplete information systems. Proceedings of the Second Annual Joint Conference on Information Sciences, Wrightsville Beach, NC, September 28-October 1, 1995, 194–197.
Google Scholar
Kryszkiewicz M. Rules in incomplete information systems. Information Sciences 113 (1999) 271–292.
Article MATH MathSciNet Google Scholar
Lakshminarayan K., Harp S.A., and Samad T. Imputation of missing data in industrial databases. Applied Intelligence 11 (1999) 259–275.
Article Google Scholar
Latkowski, R. On decomposition for incomplete data. Fundamenta Informaticae 54 (2003) 1–16.
MATH MathSciNet Google Scholar
Latkowski R. and Mikolajczyk M. Data decomposition and decision rule joining for classification of data with missing values. Proceedings of the RSCTC2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 1–5,2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag 2004, 254–263.
Google Scholar
Lipski W. Jr. On semantic issues connected with incomplete information databases. ACM Transactions on Database Systems 4 (1979), 262–296.
Article Google Scholar
Lipski W. Jr. On databases with incomplete information. Journal of the ACM 28(1981) 41–70.
Article MATH MathSciNet Google Scholar
Little R.J.A. and Rubin D.B. Statistical Analysis with Missing Data, Second Edition, J. Wiley & Sons, Inc., 2002.
Google Scholar
Pawlak Z. Rough Sets. International Journal of Computer and Information Sciences 11 (1982) 341–356.
Article MATH MathSciNet Google Scholar
Pawlak Z. Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London, 1991.
MATH Google Scholar
Pawlak Z., Grzymala-Busse J.W., Slowinski R., and Ziarko, W. Rough sets. Communications of the ACM 38 (1995) 88–95.
Article Google Scholar
Polkowski L. and Skowron A. (eds.) Rough Sets in Knowledge Discovery, 2, Applications, Case Studies and Software Systems, Appendix 2: Software Systems. Physica Verlag, Heidelberg New York (1998) 551–601.
Google Scholar
Quinlan J.R. Unknown attribute values in induction. Proc. of the 6-th Int. Workshop on Machine Learning, Ithaca, NY, 1989, 164–168.
Google Scholar
Quinlan J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo CA (1993).
Google Scholar
Schafer J.L. Analysis of Incomplete Multivariate Data. Chapman and Hall, London, 1997.
MATH Google Scholar
Slowinski R. and Vanderpooten D. A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12 (2000) 331–336.
Article Google Scholar
Stefanowski J. Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan, Poland (2001).
Google Scholar
Stefanowski J. and Tsoukias A. On the extension of rough sets under incomplete information. Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, RSFDGrC’ 1999, Ube, Yamaguchi, Japan, November 8–10, 1999, 73–81.
Google Scholar
Stefanowski J. and Tsoukias A. Incomplete information tables and rough classification. Computational Intelligence 17 (2001) 545–566.
Article Google Scholar
Weiss S. and Kulikowski C.A. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, chapter How to Estimate the True Performance of a Learning System, pp. 17–49, San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1991.
Google Scholar
Wong K.C. and Chiu K.Y. Synthesizing statistical knowledge for incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 9 (1987) 796805.
Article Google Scholar
Wu X. and Barbara D. Learning missing values from summary constraints. ACM SIGKDD Explorations Newsletter 4 (2002) 21–30.
Google Scholar
Wu X. and Barbara D. Modeling and imputation of large incomplete multidimensional datasets. Proc. of the 4-th Int. Conference on Data Warehousing and Knowledge Discovery, Aix-en-Provence, France, 2002, 286–295
Google Scholar
Yao Y.Y. On the generalizing rough set theory. Proc. of the 9th Int. Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFD-GrC’2003), Chongqing, China, October 19–22, 2003, 44–51.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Kansas, USA
Jerzy W. Grzymala-Busse
FilterLogix, USA
Witold J. Grzymala-Busse

Authors

Jerzy W. Grzymala-Busse
View author publications
You can also search for this author in PubMed Google Scholar
Witold J. Grzymala-Busse
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Industrial Engineering, Tel-Aviv University, 69978, Ramat-Aviv, Israel
Oded Maimon & Lior Rokach &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grzymala-Busse, J.W., Grzymala-Busse, W.J. (2005). Handling Missing Attribute Values. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_3

Download citation

DOI: https://doi.org/10.1007/0-387-25465-X_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics