Advertisement

On the Selection of Dimension Reduction Techniques for Scientific Applications

  • Ya Ju FanEmail author
  • Chandrika Kamath
Chapter
Part of the Annals of Information Systems book series (AOIS, volume 17)

Abstract

Many dimension reduction methods have been proposed to discover the intrinsic, lower dimensional structure of a high-dimensional dataset. However, determining critical features in datasets that consist of a large number of features is still a challenge. In this article, through a series of carefully designed experiments on real-world datasets, we investigate the performance of different dimension reduction techniques, ranging from feature subset selection to methods that transform the features into a lower dimensional space. We also discuss methods that calculate the intrinsic dimensionality of a dataset in order to understand the reduced dimension. Using several evaluation strategies, we show how these different methods can provide useful insights into the data. These comparisons enable us to provide guidance to users on the selection of a technique for their dataset.

Keywords

Principal Component Analysis Feature Selection Method Locally Linear Embedding Feature Subset Selection Wind Power Generation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Becker, R.H., White, R.L., Helfand, D.J.: The FIRST survey: Faint images of the Radio Sky at Twenty-cm. Astrophys. J. 450, 55–9 (1995)CrossRefGoogle Scholar
  2. 2.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)CrossRefGoogle Scholar
  3. 3.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CRC, Boca Raton (1984)Google Scholar
  4. 4.
    Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)CrossRefGoogle Scholar
  5. 5.
    Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)CrossRefGoogle Scholar
  6. 6.
    Donoho, D.L., Grimes, C.: Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)CrossRefGoogle Scholar
  7. 7.
    Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM. 5, 345 (1962)CrossRefGoogle Scholar
  8. 8.
    Fukunaga, K., Olsen, D.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. C-20(2), 176–183 (1971)CrossRefGoogle Scholar
  9. 9.
    Gabriel, K.R.: The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3), 453–467 (1971)CrossRefGoogle Scholar
  10. 10.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Machine Learn. Res. 3, 1157–1182 (2003)Google Scholar
  11. 11.
    Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3, 610–621 (1973)CrossRefGoogle Scholar
  12. 12.
    Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)Google Scholar
  13. 13.
    He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: 10th IEEE International Conference on Computer Vision, vol. 2, pp. 1208–1213 (2005)Google Scholar
  14. 14.
    Huang, S.H.: Dimensionality reduction on automatic knowledge acquisition: A simple greedy search approach. IEEE Trans. Knowl. Data Eng. 15(6), 1364–1373 (2003)CrossRefGoogle Scholar
  15. 15.
    Kamath, C.: Associating weather conditions with ramp events in wind power generation. In: Power Systems Conference and Exposition (PSCE), IEEE/PES, pp. 1-8, 20-23 (2011). http://ckamath.org/publications_by_project/windsense. Accessed date March 2011
  16. 16.
    Kamath, C., Cantú-Paz, E., Fodor, I.K., Tang, N.: Searching for bent-double galaxies in the first survey. In: Grossman, R., Kamath, C., Kegelmeyer, W.P., Kumar, V., Buru, R.N. (eds.) Data Mining for Scientific and Engineering Applications, pp. 95–114. Kluwer, Boston (2001)Google Scholar
  17. 17.
    Kamath, C., Cantú-Paz, E., Littau, D.: Approximate splitting for ensembles of trees using histograms. In: Proceedings, 2nd SIAM International Conference on Data Mining, pp. 370–383 (2002)Google Scholar
  18. 18.
    Kegl, B.: Intrinsic dimension estimation using packing numbers. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, MIT Press (2003)Google Scholar
  19. 19.
    Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefGoogle Scholar
  20. 20.
    Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)CrossRefGoogle Scholar
  21. 21.
    Kokiopoulou, E., Saad, Y.: Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2143–2156 (2007)CrossRefGoogle Scholar
  22. 22.
    Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, New York (2007)CrossRefGoogle Scholar
  23. 23.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Machine Learn. Res. 9, 2579–2605 (2008)Google Scholar
  24. 24.
    van der Maaten, L., Postma, E., van den Herik, J.: Dimensionality reduction: a comparative review. Tech. Rep. TiCC TR 2009–005, Tilburg University (2009)Google Scholar
  25. 25.
    Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)CrossRefGoogle Scholar
  26. 26.
    Newsam, S., Kamath, C.: Retrieval using texture features in high-resolution, multi-spectral satellite imagery. In: Data Mining and Knowledge Discovery: Theory, Tools, and Technology, VI, Proceedings of SPIE, vol. 5433, pp. 21–32. SPIE Press (2004)Google Scholar
  27. 27.
    Niskanen, M., Silvén, O.: Comparison of dimensionality reduction methods for wood surface inspection. In: Proceedings of the 6th International Conference on Quality Control by Artificial Vision, pp. 178–188 (2003)Google Scholar
  28. 28.
    Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Phenomenol. 2(6), 559–572 (1901)Google Scholar
  29. 29.
    Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learn. 53, 23–69 (2003)CrossRefGoogle Scholar
  30. 30.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)CrossRefGoogle Scholar
  31. 31.
    Sabato, S., Shalev-Shwartz, S.: Ranking categorical features using generalization properties. J. Machine Learn. Res. 9, 1083–1114 (2008). http://dl.acm.org/citation.cfm?id=1390681.1390718. Accessed date June 1, 2008
  32. 32.
    Saul, L.K., Roweis, S.T., Singer, Y.: Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Machine Learn. Res. 4, 119–155 (2003)Google Scholar
  33. 33.
    Shaw, B., Jebara, T.: Structure preserving embedding. In: Proceedings of the 26th International Conference on Machine Learning (2009)Google Scholar
  34. 34.
    Smith, L.A.: Intrinsic limits on dimension calculations. Phys. Lett. A 133(6), 283–288 (1988)CrossRefGoogle Scholar
  35. 35.
    Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  36. 36.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)Google Scholar
  37. 37.
    Trunk, G.V.: Statistical estimation of the intrinsic dimensionality of a noisy signal collection. IEEE Trans. Comput. C-25(2), 165–171 (1976)CrossRefGoogle Scholar
  38. 38.
    Tsai, F.S.: Comparative study of dimensionality reduction techniques for data visualization. J. Artif. Intell. 3(3), 119–134 (2010)CrossRefGoogle Scholar
  39. 39.
    Valle, S., Li, W., Qin, S.J.: Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 38(11), 4389–4401 (1999)CrossRefGoogle Scholar
  40. 40.
    Weinberger, K., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1683–1686. Boston, MA (2006)Google Scholar
  41. 41.
    Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 26, 313–338 (2002)CrossRefGoogle Scholar
  42. 42.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Center for Applied Scientific ComputingLawrence Livermore National LaboratoryLivermoreUSA

Personalised recommendations