Using SOM-Based Data Binning to Support Supervised Variable Selection

  • Sampsa Laine
  • Timo Similä
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3316)

Abstract

We propose a robust and understandable algorithm for supervised variable selection. The user defines a problem by manually selecting the variables Y that are used to train a Self-Organizing Map (SOM), which best describes the problem of interest. This is an illustrative problem definition even in multivariate case. The user also defines another set X, which contains variables that may be related to the problem. Our algorithm browses subsets of X and returns the one, which contains most information of the user’s problem. We measure information by mapping small areas of the studied subset to the SOM lattice. We return the variable set providing, on average, the most compact mapping. By analysis of public domain data sets and by comparison against other variable selection methods, we illustrate the main benefit of our method: understandability to the common user.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery in Databases. AI Magazine 17, 37–54 (1996)Google Scholar
  2. 2.
    Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., Wirth, R.: CRISP-DM 1.0 Step-by-Step Data Mining Guide. Technical report, CRISM-DM Consortium (2000), http://www.crisp-dm.org
  3. 3.
    Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Heidelberg (2001)MATHGoogle Scholar
  4. 4.
    Vesanto, J.: Data Exploration Process Based on the Self-Organizing Map. PhD thesis, Helsinki University of Technology (2002), http://lib.hut.fi/Diss/2002/isbn9512258978/
  5. 5.
    Bonnlander, B., Weigend, A.: Selecting Input Variables Using Mutual Information and Nonparametric Density Estimation. In: Procoodings of the International Symposium on Artificial Neural Networks (ISANN), pp. 42–50 (1994)Google Scholar
  6. 6.
    Laine, S.: Using Visualization, Variable Selection and Feature Extraction to Learn from Industrial Data. PhD thesis, Helsinki University of Technology (2003), http://lib.hut.fi/Diss/2003/isbn9512266709/
  7. 7.
    Glymour, C., Madigan, D., Pregibon, D., Smyth, P.: Statistical Themes and Lessons for Data Mining. Data Mining and Knowledge Discovery 1, 11–28 (1997)CrossRefGoogle Scholar
  8. 8.
    Dash, M., Liu, H., Yao, J.: Dimensionality Reduction for Unsupervised Data. In: Proceedings of the 9th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 532–539 (1997)Google Scholar
  9. 9.
    Lagus, K., Alhoniemi, E., Valpola, H.: Independent Variable Group Analysis. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN), pp. 203–210 (2001)Google Scholar
  10. 10.
    Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)Google Scholar
  11. 11.
    Selim, S.Z., Ismail, M.A.: K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 81–87 (1984)MATHCrossRefGoogle Scholar
  12. 12.
    Ward, J.H.: Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association 58, 236–244 (1963)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Vesanto, J., Alhoniemi, E.: Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks 11, 586–600 (2000)CrossRefGoogle Scholar
  14. 14.
    Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, Englewood Cliffs (1998)Google Scholar
  15. 15.
    Doksum, K., Samarov, A.: Nonparametric Estimation of Global Functionals and a Measure of the Explanatory Power of Covariates in Regression. The Annals of Statistics 23, 1443–1473 (1995)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Sampsa Laine
    • 1
  • Timo Similä
    • 1
  1. 1.Laboratory of Computer and Information ScienceHelsinki University of TechnologyFinland

Personalised recommendations