Experiments on Data with Three Interpretations of Missing Attribute Values—A Rough Set Approach

  • Jerzy W. Grzymała-Busse
  • Steven Santoso
Part of the Advances in Soft Computing book series (AINSC, volume 35)


In this paper we distinguish three different types of missing attribute values: lost values (e.g., erased values), “do not care” conditions (attribute values that were irrelevant for classification a case), and attribute-concept values (“do not care” conditions restricted to a specific concept). As it is known, subset and concept approximations should be used for knowledge acquisition from incomplete data sets. We report results of experiments on seven well-known incomplete data sets using nine strategies: interpreting missing attribute values in three different ways and using both lower and upper, subset and concept approximations (note that subset lower approximations are identical with concept lower approximations). Additionally, in the data sets cases with more than approximately 70% of missing attribute values, these values were removed from the original data sets and then all nine strategies were applied. Our conclusions are that any two of our nine strategies are incomparable in terms of error rates (5% significance level, two-tailed test). However, for some data sets removing cases with an excessive number of missing attribute values improves the error rate.


Rule Induction Granular Computing Indiscernibility Relation Rule Induction Algorithm Incomplete Information System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    1. Greco, S., Matarazzo, B., and Slowinski, R.: Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. In Decision Making: Recent developments and Worldwide Applications, ed. by S. H. Zanakis, G. Doukidis, and Z. Zopounidis, Kluwer Academic Publishers, Dordrecht, Boston, London, 2000, 295–316.Google Scholar
  2. 2.
    2. Grzymala-Busse, J.W.: On the unknown attribute values in learning from examples. Proc. of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Charlotte, North Carolina, October 16–19, 1991. Lecture Notes in Artificial Intelligence, vol. 542, Springer-Verlag, Berlin, Heidelberg, New York (1991) 368–377.Google Scholar
  3. 3.
    3. Grzymala-Busse, J. W.: LERS—A system for learning from examples based on rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Slowinski, R. (ed.), Kluwer Academic Publishers, Dordrecht, Boston, London (1992) 3–18.Google Scholar
  4. 4.
    4. Grzymala-Busse, J. W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31 (1997), 27–39.zbMATHGoogle Scholar
  5. 5.
    5. Grzymala-Busse., J.W.: MLEM2: A new algorithm for rule induction from imperfect data. Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, July 1–5, Annecy, France, 243–250.Google Scholar
  6. 6.
    6. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. Workshop Notes, Foundations and New Directions of Data Mining, the 3-rd International Conference on Data Mining, Melbourne, FL, USA, November 19–22, 2003, 56–63.Google Scholar
  7. 7.
    7. Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of idiscernibility relation and rule induction. Transactions on Rough Sets, Lecture Notes in Computer Science Journal Subline, Springer-Verlag, vol. 1 (2004) 78– 95.Google Scholar
  8. 8.
    8. Grzymala-Busse, J.W.: Characteristic relations for incomplete data: A generalization of the indiscernibility relation. Proceedings of the RSCTC'2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 1–5, 2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag 2004, 244–253.Google Scholar
  9. 9.
    9. Grzymala-Busse, J.W.: Three approaches to missing attribute values—A rough set perspective. Proceedings of the Workshop on Foundation of Data Mining, associated with the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, 55–62.Google Scholar
  10. 10.
    10. Grzymala-Busse, J.W.: Incomplete data and generalization of indiscernibility relation, definability, and approximations. Proceedings of the RSFDGrC'2005, the Tenth International Conference on Rough Sets, Fuzzy Sets, data Mining, and Granular Computing, Springer-Verlag, Regina, Canada, September 1–3, 2005, 244–253.Google Scholar
  11. 11.
    11. Grzymala-Busse, J.W. and Hu, M.: A comparison of several approaches to missing attribute values in data mining. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing RSCTC'2000, Ban., Canada, October 16–19, 2000, 340–347.Google Scholar
  12. 12.
    12. Grzymala-Busse, J.W. and Siddhaye, S.: Rough set approaches to rule induction from incomplete data. Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4–9, 2004, vol. 2, 923–930.Google Scholar
  13. 13.
    13. Grzymala-Busse, J.W. and Wang A.Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC'97) at the Third Joint Conference on Information Sciences (JCIS'97), Research Triangle Park, NC, March 2–5, 1997, 69–72.Google Scholar
  14. 14.
    14. Kryszkiewicz, M.: Rough set approach to incomplete information systems. Proceedings of the Second Annual Joint Conference on Information Sciences, Wrightsville Beach, NC, September 28–October 1, 1995, 194–197.Google Scholar
  15. 15.
    15. Kryszkiewicz, M.: Rules in incomplete information systems. Information Sciences 113 (1999) 271–292.zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    16. Lin, T.Y.: Topological and fuzzy rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, ed. by R. Slowinski, Kluwer Academic Publishers, Dordrecht, Boston, London (1992) 287–304.Google Scholar
  17. 17.
    17. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11 (1982) 341–356.CrossRefMathSciNetGoogle Scholar
  18. 18.
    18. Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London (1991).Google Scholar
  19. 19.
    19. Slowinski, R. and Vanderpooten, D.: A generalized de.nition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12 (2000) 331–336.CrossRefGoogle Scholar
  20. 20.
    20. Stefanowski, J.: Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan, Poland (2001).Google Scholar
  21. 21.
    21. Stefanowski, J. and Tsoukias, A.: On the extension of rough sets under incomplete information. Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, RSFDGrC' 1999, Ube, Yamaguchi, Japan, November 8-10, 1999, 73–81.Google Scholar
  22. 22.
    22. Stefanowski, J. and Tsoukias, A.: Incomplete information tables and rough classi.cation. Computational Intelligence 17 (2001) 545–566.CrossRefGoogle Scholar
  23. 23.
    23. Yao, Y.Y.: On the generalizing rough set theory. Proc. of the 9th Int. Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC'2003), Chongqing, China, October 19-22, 2003, 44–51.Google Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Jerzy W. Grzymała-Busse
    • 1
    • 2
  • Steven Santoso
    • 1
  1. 1.Department of Electrical Engineering and Computer ScienceUniversity of KansasLawrenceUSA
  2. 2.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations