Preprocessing of missing values using robust association rules

  • Arnaud Ragel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1510)


Many of analysis tasks have to deal with missing values and have developed specific and internal treatments to guess them. In this paper we present an external method for this problem to improve performances of completion and especially declarativity and interactions with the user. Such qualities will allow to use it for the data cleaning step of the KDD1 process[6]. The core of this method, called MVC (Missing Values Completion), is the RAR2 algorithm that we have proposed in [14]. This algorithm extends the concept of association rules[1] for databases with multiple missing values. It allows MVC to be an efficient preprocessing method: in our experiments with the c4.5[12] decision tree program, MVC has permitted to divide, up to two, the error rate in classification, independently of a significant gain of declarativity.


Association rules Missing Values Preprocessing Decision Trees 


  1. 1.
    R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, Washington, D.C., p 207–216, May 1993.Google Scholar
  2. 2.
    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo. Fast Discovery of Association Rules. In Advances in Knowledge Discovery and Data Mining, Chapter 12, AAAI/MIT Press, 1996.Google Scholar
  3. 3.
    L. Breiman, J.H Friedman, R.A Olshen, C.J Stone. Classification and Regression Trees, Wadsworth Int’l Group, Belmont, CA, The Wadsworth Statistics/Probability Series, 1984.MATHGoogle Scholar
  4. 4.
    G. Celeux. Le traitement des données manquantes dans le logiciel SICLA. Technical reports number 102. INRIA, France, December 1988.Google Scholar
  5. 5.
    P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor and D. Freeman. Bayesian Classification. In Proc. of American Association of Artificial Intelligence(AAAI), 607–611, San Mateo, CA, 1988.Google Scholar
  6. 6.
    U.M Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, pages 1–36, AAAI/MIT Press, 1996.Google Scholar
  7. 7.
    K. Lakshminarayan, S.A Harp, R. Goldman and T. Samad. Imputation of missing data using machine learning techniques. Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), AAAI/MIT Press, 1996.Google Scholar
  8. 8.
    R.J.A Little, D.B Rubin. Statistical Analysis with Missing Data. John Wiley and Sons, N.Y., 1987.MATHGoogle Scholar
  9. 9.
    W.Z Liu, A.P White, S.G Thompson and M.A Bramer. Techniques for Dealing with Missing Values in Classification. In Second Int’s Symposium on Intelligent Data Analysis, London, 1997.Google Scholar
  10. 10.
    J.R Quinlan. Induction of decision trees. Machine learning, 1, p. 81–106, 1986.Google Scholar
  11. 11.
    J.R Quinlan. Unknown Attribute Values in Induction, in Segre A.M. (ed.), Proc. of the Sixth Int’l Workshop on Machine Learning, Morgan Kaufmann, Los Altos, CA, p. 164–168, 1989.Google Scholar
  12. 12.
    J.R Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.Google Scholar
  13. 13.
    A. Ragel: Traitement des valeurs manquantes dans les arbres de décision. Technical reports, Les cahiers du GREYC. University of Caen, France, 1997.Google Scholar
  14. 14.
    A. Ragel and B. Crémilleux. Treatment of Missing Values for Association Rules. In Proc. of The Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), p. 258–270, Melbourne, Australia, 1998.Google Scholar
  15. 15.
    H. Toivonen. Sampling large databases for association rules. In Proc. of the 22nd Int’l Conference on Very Large Databases (VLDB’96), p. 134–145, India, 1996Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Arnaud Ragel
    • 1
  1. 1.GREYC-CNRS UPRESA 6072Université de CaenCaen cedexFrance

Personalised recommendations