Advertisement

Incomplete Data Classification Based on Multiple Views

  • Ming Sun
  • Hongzhi WangEmail author
  • Fanshan Meng
  • Jianzhong Li
  • Hong Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9932)

Abstract

Missing values have negative impacts on big data analysis. However, in absence of extra knowledge, exact imputation can hardly be conducted for many data sets. Therefore, we have to tolerate missing values and perform data mining on incomplete data sets directly. To achieve high quality data mining on incomplete data, we propose a classification approach based on multiple views. We use various complete views of the data set to generate the base classifiers and combine the results of base classifiers. Since the amount of base classifiers will affect the effectiveness and efficiency of the classification, we aim to find proper view sets. We prove that the view set selection problem is an NP-hard problem and develop an approximation algorithm with approximate ratio \(ln|S|+1\) where S is the feature set of original data set. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed approaches.

Notes

Acknowledgement

This paper was partially supported by National Sci-Tech Support Plan 2015BAH10F01 and NSFC grant U1509216,61472099,61133002 and the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Provience LC2016026.

References

  1. 1.
    Troyanskaya, O., Cantor, M., Sherlock, G., Brown, T., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)CrossRefGoogle Scholar
  2. 2.
    Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K.I., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)CrossRefGoogle Scholar
  3. 3.
    Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)CrossRefGoogle Scholar
  4. 4.
    Setiawan, N.A., Venkatachalam, P.A., Hani, A.F.M.: Missing attribute value prediction based on artificial neural network and rough set theory. In: International Conference on BioMedical Engineering and Informatics, BMEI 2008. IEEE (2008)Google Scholar
  5. 5.
    Abdella, M., Marwala, T.: The use of genetic algorithms and neural networks to approximate missing data in database. In: IEEE 3rd International Conference on Computational Cybernetics, ICCC 2005. IEEE (2005)Google Scholar
  6. 6.
    Hagan, M.T., Demuth, H.B., Beale, M.H., De Jesús, O.: Neural Network Design. PWS publishing company, Boston (1996)Google Scholar
  7. 7.
    Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference very large Data Bases, VLDB (1994)Google Scholar
  8. 8.
    Pei, J., Han, J., Mao, R., et al.: Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2000)Google Scholar
  9. 9.
    Christofides, N.: Graph Theory–An Algorithmic Approach. Academic Press Inc., New York (1975)zbMATHGoogle Scholar
  10. 10.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)CrossRefzbMATHGoogle Scholar
  11. 11.
    Jin, L.: Research on missing value imputation of incomplete data. Harbin Institute of Technology (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ming Sun
    • 1
  • Hongzhi Wang
    • 1
    Email author
  • Fanshan Meng
    • 1
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.Departmemt of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations