Advertisement

Neural Computing and Applications

, Volume 16, Issue 2, pp 167–172 | Cite as

Handling of incomplete data sets using ICA and SOM in data mining

  • Hongyi Peng
  • Siming Zhu
Original Article

Abstract

Based on independent component analysis (ICA) and self-organizing maps (SOM), this paper proposes an ISOM-DH model for the incomplete data’s handling in data mining. Under these circumstances the data remain dependent and non-Gaussian, this model can make full use of the information of the given data to estimate the missing data and can visualize the handled high-dimensional data. Compared with mixture of principal component analyzers (MPCA), mean method and standard SOM-based fuzzy map model, ISOM-DH model can be applied to more cases, thus performing its superiority. Meanwhile, the correctness and reasonableness of ISOM-DH model is also validated by the experiment carried out in this paper.

Keywords

Incomplete data ICA (independent component analysis) SOM (self-organizing maps) Dependence Non-Gaussian distribution 

References

  1. 1.
    Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Wang S (2003) Application of self-organising maps for data mining with incomplete data sets. Neural Comput Appl 12:42–48CrossRefGoogle Scholar
  3. 3.
    Chang P-C, Lai C-Y (2005) A hybrid system combining self-organizing maps with case-based reasoning in wholesaler’s new-release book for forecasting. Expert Syst Appl 29:183–192CrossRefGoogle Scholar
  4. 4.
    Oba S et al (2002) Missing value estimation using mixture of PCAs. LNCS 2415, pp 492–497Google Scholar
  5. 5.
    Ad Feelders (1999) Handling missing data in trees-surrogate splits or statistical imputation. LNAI 1704, pp 329–334Google Scholar
  6. 6.
    Grzymala-Busse JW (2004) Rough set approach to incomplete data. LNAI 3070, pp 50–55Google Scholar
  7. 7.
    Gerardo BD et al (2004) The association rule algorithm with missing data in data mining. LNCS3043, pp 97–105Google Scholar
  8. 8.
    Li D et al (2004) Towards missing data imputation—a study of fuzzy K-means clustering method. LNAI 3066, pp 573–579Google Scholar
  9. 9.
    Zs. J. Viharos et al (2002) Training and application of artificial neural networks with incomplete data. LNAI 2358, pp 649–659Google Scholar
  10. 10.
    Latkowski R (2002) Incomplete data decomposition for classification. LNAI 2475, pp 413–420Google Scholar
  11. 11.
    Jutten C, Herault J (1998) Independent component analysis versus PCA. In: Proceeding of European signal processing conference, 287–314Google Scholar
  12. 12.
    Singh Y, Rai CS (2003) A simplified approach to independent component analysis. Neural Comput Appl 12:173–177CrossRefGoogle Scholar
  13. 13.
    Kocsor A, Csirik J (2001) Fast independent component analysis in kernel feature spaces. LNCS 2234, pp 271–281Google Scholar
  14. 14.
    Theis FJ et al (2002) Overcomplete ICA with a geometric algorithm. LNCS 2415, pp 1049–1054Google Scholar
  15. 15.
    Vapnik V (2004) Statistical learning theory. Publishing House of Electronics Industry, BeijingGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2006

Authors and Affiliations

  1. 1.Department of Applied MathematicsSun Yat-sen UniversityGuangzhouChina

Personalised recommendations