kDMI: A Novel Method for Missing Values Imputation Using Two Levels of Horizontal Partitioning in a Data set
- 2.3k Downloads
Imputation of missing values is an important data mining task for improving the quality of data mining results. The imputation based on similar records is generally more accurate than the imputation based on all records of a data set. Therefore, in this paper we present a novel algorithm called kDMI that employs two levels of horizontal partitioning (based on a decision tree and k-NN algorithm) of a data set, in order to find the records that are very similar to the one with missing value/s. Additionally, it uses a novel approach to automatically find the value of k for each record. We evaluate the performance of kDMI over three high quality existing methods on two real data sets in terms of four evaluation criteria. Our initial experimental results, including 95% confidence interval analysis and statistical t-test analysis, indicate the superiority of kDMI over the existing methods.
KeywordsData pre-processing data cleansing missing value imputation EM algorithm Decision Trees
Unable to display preview. Download preview PDF.
- 6.Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml (accessed July 7, 2013)
- 7.Han, J., Kamber, M.: Data mining: Concepts and techniques. The Morgan Kaufmann Series in data management systems (2000)Google Scholar
- 10.Maletic, J., Marcus, A.: Data cleansing: Beyond integrity analysis. In: Proceedings of the Conference on Information Quality, pp. 200–209. Citeseer (2000)Google Scholar
- 12.Rahman, M.G., Islam, M.Z.: A decision tree-based missing value imputation technique for data pre-processing. In: Australasian Data Mining Conference (AusDM 2011). CRPIT, vol. 121, pp. 41–50. ACS, Ballarat (2011)Google Scholar
- 13.Rahman, M.G., Islam, M.Z.: Data quality improvement by imputation of missing values. In: International Conference on Computer Science and Information Technology (CSIT 2013), Yogyakarta, Indonesia (2013)Google Scholar
- 18.Zhu, X., Wu, X., Yang, Y.: Error detection and impact-sensitive instance ranking in noisy datasets. In: Proceedings of the National Conference on Artificial Intelligence, pp. 378–384. AAAI Press; MIT Press, Menlo Park, CA; Cambridge, MA (2004)Google Scholar