Skip to main content

MVC — A Preprocessing Method to Deal with Missing Values

  • Conference paper
Research and Development in Expert Systems XV

Abstract

Many of analysis tasks have to deal with missing values and have developed specific and internal treatments to guess them. In this paper we present an external method, MVC (Missing Values Completion), to improve performances of completion and also declarativity and interactions with the user for this problem. Such qualities will allow to use it for the data cleaning step of the KDD1 process[6]. The core of MVC, is the RAR2 algorithm that we have proposed in [15]. This algorithm extends the concept of association rules[l] for databases with multiple missing values. It allows MVC to be an efficient preprocessing method: in our experiments with the c4.5[13] decision tree program, MVC has permitted to divide, up to two, the error rate in classification, independently of a significant gain of declarativity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, p 207–216, Washington, USA, 1993.

    Google Scholar 

  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo. Fast Discovery of Association Rules. In Advances in Knowledge Discovery and Data Mining, p 307–328, MIT Press, 1996.

    Google Scholar 

  3. L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone. Classification and Regression Trees, The Wadsworth Statistics/Probability Series, 1984.

    Google Scholar 

  4. G. Celeux. Le traitement des données manquantes dans le logiciel SICLA. Technical reports number 102, INRIA, France, December 1988.

    Google Scholar 

  5. P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor and D. Freeman. Bayesian Classification. In Proc. of American Association of Artificial Intelligence (AAAI), p. 607–611, San Mateo, USA, 1988.

    Google Scholar 

  6. U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, p. 1–36, MIT Press, 1996.

    Google Scholar 

  7. L. Hyafile and R. Rivest. Constructing optimal binary decision trees is np-complete. Information Processing Letters number 5, p. 15–17, 1976.

    Google Scholar 

  8. K. Lakshminarayan, S.A. Harp, R. Goldman and T. Samad. Imputation of missing data using machine learning techniques. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), MIT Press, 1996.

    Google Scholar 

  9. R.J.A. Little, D.B. Rubin. Statistical Analysis with Missing Data, Wiley series in probability and mathematical statistics, John Wiley and Sons, USA, 1987.

    Google Scholar 

  10. W.Z. Liu, A.P. White, S.G. Thompson and M.A. Bramer. Techniques for Dealing with Missing Values in Classification. In Second Int’l Symposium on Intelligent Data Analysis, London, 1997.

    Google Scholar 

  11. J.R. Quinlan. Induction of decision trees. Machine learning, Vol 1, p. 81–106, 1986.

    Google Scholar 

  12. J.R. Quinlan. Unknown Attribute Values in Induction, in Segre A.M.(ed.), In Proc. of the Sixth Int’l Workshop on Machine Learning, p. 164–168, Morgan Kaufmann, Los Altos, USA, 1989.

    Google Scholar 

  13. J.R. Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, USA, 1993.

    Google Scholar 

  14. A. Ragel: Traitement des valeurs manquantes dans les arbres de décision. Technical reports, Les cahiers du GREYC, number 2, University of Caen, France, 1997.

    Google Scholar 

  15. A. Ragel and B. Crémilleux. Treatment of Missing Values for Association Rules. In Proc. of The Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), volume 1394 of Lecture Notes in Artificial Intelligence, p. 258–270, Melbourne, Australia, 1998.

    Google Scholar 

  16. H. Toivonen. Sampling large databases for association rules. In Proc. of the 22nd Int’l Conference on Very Large Databases (VLDB’96), p. 134–145, Morgan Kaufmann, India, 1996. [available on the web from http://www.informatik.uni-trier.de/ley/db/conf/vldb/Toivonen96.html]

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this paper

Cite this paper

Ragel, A., Crémilleux, B. (1999). MVC — A Preprocessing Method to Deal with Missing Values. In: Miles, R., Moulton, M., Bramer, M. (eds) Research and Development in Expert Systems XV. Springer, London. https://doi.org/10.1007/978-1-4471-0835-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0835-1_11

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-086-6

  • Online ISBN: 978-1-4471-0835-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics