Abstract
Data plays a vital role as a source of information to organizations, especially in times of information and technology. One encounters a not-so-perfect database from which data is missing, and the results obtained from such a database may provide biased or misleading solutions. Therefore, imputing missing data to a database has been regarded as one of the major steps in data mining. The present research used different methods of data mining to construct imputative models in accordance with different types of missing data. When the missing data is continuous, regression models and Neural Networks are used to build imputative models. For the categorical missing data, the logistic regression model, neural network, C5.0 and CART are employed to construct imputative models. The results showed that the regression model was found to provide the best estimate of continuous missing data; but for categorical missing data, the C5.0 model proved the best method.
Similar content being viewed by others
References
Alan, A. (1996). An introduction to categorical data analysis. Wiley Interscience.
Craven, M. P. (1997). A faster learning neural network classifier using selective backpropagation. In Proceedings of the fourth IEEE international conference on electronics, circuits and systems. Cairo, Egypt, 1, 254–258.
Ford B.L. (1983) An overview of hot-deck procedures. In: Madow W.G., Olkin I., Rubin D.B. (eds). Incomplete data in sample surveys, Volume 2 Theory and Bibliographies. Academic Press, New York, NY. pp. 185–207
Friedman J.H. (1997). A recursive partitioning decision rule for nonparametric classifiers. IEEE Transactions on Computers 26, 404–408
Huberty C.J. (1989) Problems with stepwise methods—better alter-natives. In: Thompson B. (eds). Advances in social science methodology. Vol. 1. JAI Press Inc., Greenwich, pp. 43–70
John,O.R., Sastry,G.P., & David, A. D. (1998). Applied regression analysis—a research tool, 2nd ed. Springer.
Joop J.H. (1999). A review of current software for handing missing data. Kwantitatieve Methoden 62, 123–138
Judi S. (2002). Dealing with missing data. Research Letters in the Information and Mathematical Sciences 3, 153–160
Kalton, G., & Kasprzyk, D. (1982). Imputing for missing survey responses. Proceedings of the Section on Survey Research Methods, American Statistical Association, 22–23.
Kalton G., Kasprzyk D. (1986). The treatment of missing survey data. Survey Methodology 12(1): 1–16
Lessler J.T., Kalsbeek W.D. (1992). Nonsampling error in surveys. John Wiley & Sons, Inc, New York
Li J.R., Khoo L.P., Tor S.B. (2006). RMINE: A rough set based data mining prototype for the reasoning of incomplete data in condition-based fault diagnosis. Journal of Intelligent Manufacturing 17, 163–176
Little R.J.A., Rubin D.B. (1987). Statistical analysis with missing data. John Wiley & Sons, New York
Little R.J.A., Rubin D.B. (2002). Statistical analysis with missing data, 2nd ed. John Wiley & Sons, New York
Margaret, H. D. (2002). Data mining—introductory and advanced topics. Prentice Hall.
Robert E.F. (1996). Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical Association, 91(434): 490–498
Rubin D.B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons, New York
Werbos, P. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. Harvard University.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yeh, RL., Liu, C., Shia, BC. et al. Imputing manufacturing material in data mining. J Intell Manuf 19, 109–118 (2008). https://doi.org/10.1007/s10845-007-0067-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10845-007-0067-z