Advertisement

A New Method for Estimation of Missing Data Based on Sampling Methods for Data Mining

  • Rima Houari
  • Ahcéne Bounceur
  • Tahar Kechadi
  • Tari Abdelkamel
  • Reinhardt Euler
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 225)

Abstract

Today we collect large amounts of data and we receive more than we can handle, the accumulated data are often raw and far from being of good quality they contain Missing Values and noise.

The presence of Missing Values in data are major disadvantages for most Datamining algorithms. Intuitively, the pertinent information is embedded in many attributes and its extraction is only possible if the original data are cleaned and pre-treated.

In this paper we propose a new technique for preprocessing data that aims to estimate Missing Values, in order to obtain representative Samples of good qualities, and also to assure that the information extracted is more safe and reliable.

Keywords

Datamining Copulas Missing Value Multidimensional Sampling Sampling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)MathSciNetMATHGoogle Scholar
  2. 2.
    Allison, P.D.: Multiple Imputation for Missing Data: A Cautionary Tale. Sociological Methods Research 28(3), 301–309 (2000)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Chen, J., Shao, J.: Nearest neighbor imputation for survey data. Journal of Official Statistics 16(2), 113–131 (2000)MATHGoogle Scholar
  4. 4.
    DeSarbo, W.S., Green, P.E., Carroll, J.D.: Missing data in product-concept testing. Decision Sciences 17, 163–185 (1986)CrossRefGoogle Scholar
  5. 5.
    Engels, J.M., Diehr, P.: Imputation of missing longitudinal data: a comparison of methods. Journal of Clinical Epidemiology 56(10), 968–976 (2003); Statistical Association  83, 1198–1202Google Scholar
  6. 6.
    Saporta, G.: Probabilités, analyse des données et statistique, Editions Technip, Paris (2006)Google Scholar
  7. 7.
    Frane, J.W.: Some simple procedures for handling missing data in multivariate analysis. Psychometrika 41, 409–415 (1976)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2006)Google Scholar
  9. 9.
    Joe, H.: Multivariate Models and Dependence Concepts. Monographs on Statistics and Applied Probability, vol. 73. Chapman and Hall, London (1997)MATHCrossRefGoogle Scholar
  10. 10.
    Kline, R.B.: Principles and Practice of Structural Equation Modelling. Guilford Press, New York (1989)Google Scholar
  11. 11.
    Kaufman, C.J.: The application of logical imputation to household measurement. Journal of the Market Research Society 30, 453–466 (1988)Google Scholar
  12. 12.
    Kim, J.O.: Curry, The treatment of missing data in multivariate analysis. Sociological Methods and Research 6, 215–241 (1977)CrossRefGoogle Scholar
  13. 13.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, pp. 11–13. John Wiley and Sons, Inc., New York (2002)MATHGoogle Scholar
  14. 14.
    Laird, N.M.: Missing data in longitudinal studies. Statistics in Medicine 7, 305–315 (1988)CrossRefGoogle Scholar
  15. 15.
    Lee, S.Y., Chiu, Y.M.: Analysis of multivariate polychoric correlation models with incomplete data. British Journal of Mathematical and Statistical Psychology 43, 145–154 (1990)MATHCrossRefGoogle Scholar
  16. 16.
    Hu, M., Salvucci, S.M., Cohen, M.P.: Evaluation of some popular imputation algorithms. In: Section on Survey Research Methods, pp. 309–313. American Statistical Association (2000)Google Scholar
  17. 17.
    Cicognani, M.G., Berchtold, A.: Imputation des données manquantes:Comparaison de différentes approches. J. Statist, Plann. Inference, inria -00494698, version 1 (2010)Google Scholar
  18. 18.
    Malhotra, N.K.: Analyzing marketing research data with incomplete information on the dependent variable. Journal of Marketing Research 24, 74–84 (1987)CrossRefGoogle Scholar
  19. 19.
    Deheuvels, La, P.: fonction de dépendance empirique et ses propriétés. Un test non paramétrique d’indépendance, Académie Royale de Belgique, Bulletin de la Classe des Sciences, 5me sérieGoogle Scholar
  20. 20.
    Song, Q., Shepperd, M.: A new imputation method for small software project data sets. Journal of Systems and Software 80(1), 51–62 (2007)CrossRefGoogle Scholar
  21. 21.
    Nielsen, R.B.: An introduction to copulas, 2nd edn. springer (2005)Google Scholar
  22. 22.
    Ruschendorf, L.: On the distributional transform, Sklar’s theorem, and the empirical copula process. J. Statist. Plann. Inference 139(11), 3921–3927 (2009)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Roth, P.L.: Missing data: A conceptual review for applied psychologists. Personnel Psychology 47, 537–560 (1994)CrossRefGoogle Scholar
  24. 24.
    Ruud, P.A.: Extensions of estimation methods using the EM algorithm. Journal of Econometrics 49, 305–341 (1991)MathSciNetMATHCrossRefGoogle Scholar
  25. 25.
    Sinharay, S., Russell, H.S.: The use of multiple imputation for the analysis of missing data. Psychological Methods 6(4), 317–329 (2001)CrossRefGoogle Scholar
  26. 26.
    Barnett, V., Lewis, T.: Outliers in statistical data. John Wiley and Sons (1994)Google Scholar
  27. 27.
    Zhang, S.: Parimputation From imputation and null-imputation to partially imputation. IEEE Intelligent Informatics Bulletin 9(1), 32–38 (2008)Google Scholar
  28. 28.
    Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing Value Imputation Based on Data Clustering. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science I. LNCS, vol. 4750, pp. 128–138. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  29. 29.
    Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an EM approach. In: Cowan, J.D., Tesauro, G. (eds.) Advances in Neural Information Processing Systems 6, pp. 120–127. Morgan Kaufman (1994)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Rima Houari
    • 1
  • Ahcéne Bounceur
    • 2
  • Tahar Kechadi
    • 3
  • Tari Abdelkamel
    • 1
  • Reinhardt Euler
    • 2
  1. 1.University of Abderrahmane Mira BejaiaBejaiaAlgeria
  2. 2.Lab-STICC LaboratoryEuropean University of Britanny - University of BrestBrestFrance
  3. 3.University College DublinDublinIreland

Personalised recommendations