Abstract
Modern data analysis often faces high-dimensional data. Nevertheless, most neural network data analysis tools are not adapted to high-dimensional spaces, because of the use of conventional concepts (as the Euclidean distance) that scale poorly with dimension. This paper shows some limitations of such concepts and suggests some research directions as the use of alternative distance definitions and of non-linear dimension reduction.
MV is a Senior research associate at the Belgian FNRS. GS is funded by the Belgian FRIA. The work of DF and VW is supported by the Interuniversity Attraction Pole (IAP), initiated by the Belgian Federal State, Ministry of Sciences, Technologies and Culture. The scientific responsibility rests with the authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Scott, D.W., Thompson, J. R.: Probability density estimation in higher dimensions. In: Douglas, S.R. (ed): Computer Science and Statistics. Proceedings of the Fifteenth Symposium on the Interface, North Holland-Elsevier, Amsterdam, New York, Oxford (1983) 173–179
Demartines, P.: Analyse de données par réseaux de neurones auto-organisés. Ph.D. dissertation (in French), Institut National Polytechnique de Grenoble-France (1994)
Aggarwal, C. C., Hinneburg, A., Keim, D. A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Van den Bussche, J., Vianu, V. (eds): Proceedings of Database Theory—ICDT 2001, 8th International Conference Lecture Notes in Computer Science, vol 1973. Springer, London, UK (2001) 420–434
Beyer K. S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds): Poceedings of Database Theory—ICDT’ 99, 7th International Conference. Lecture Notes in Computer Sciences, vol 1540, Springer, Jerusalem, Israel (1999) 217–235
Bellmann, R.: Adaptive Control Processes: A Guided Tour. Princeton Univ. Press (1961)
Silverman, B.W.: Density estimation for statistics and data analysis. Chapman & Hall (1986)
Fukunaga K.: Introduction to Statistical Pattern Recognition. Academic Press, Boston, MA, (1990)
Hérault, J., Guérin-Dugué, A., Villemain, P.: Searching for the embedded manifolds in highdimensional data, problems and unsolved questions. Proceedings of ESANN’2002—European Symposium on Artificial Neural Networks, d-side public, Bruges-Belgium (2002) 173–184
Steinbach, M., Ertoz, L., Kumar, V.: Challenges of clustering high dimensional data. New Vistas in Statistical Physics—Applications in Econo-physics, Bioinformatics, and Pattern Recognition, Springer-Verlag (2003)
Verleysen, M.: Learning high-dimensional data. Acc. for public. in Ablameyko, S., Goras, L., Gori, M., Piuri, V. (eds): Limitations and future trends in neural computation, IOS Press.
Shepard, R. N.: The analysis of proximities: Multidimensional scaling with an unknown distance function, parts I and II, Psychometrika, 27 (1962) 125–140 and 219-246
Shepard, R.N, Carroll, J.D: Parametric representation of nonlinear data structures. In P. R. Krishnaiah (ed.): International Symposium on Multivariate Analysis, Academic Press, (1965) 561–592
Sammon,:A nonlinear mapping algorithm for data structure analysis, IEEE Trans. on Computers, C-18 (1969) 401–409
Demartines, P., Hérault, J.: Curvilinear Component Analysis: a self-organizing neural network for nonlinear mapping of data sets, IEEE T. Neural Networks,. 8-1 (1997) 148–154
Lee, J. A., Lendasse, A., Verleysen, M: Curvilinear Distance Analysis versus Isomap. In: Proceedings of ESANN’2002, 10th European Symposium on Artificial Neural Networks, dside public, Bruges—Belgium, (2002) 185–192
Lendasse, A., Lee, J. A., de Bodt, E., Wertz, V., Verleysen, M.: Dimension reduction of technical indicators for the prediction of financial time series—Application to the Bel 20 market index. European Journal of Economic and Social Systems, 15-2 (2001), pp. 31–48
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Verleysen, M., Francois, D., Simon, G., Wertz, V. (2003). On the effects of dimensionality on data analysis with neural networks. In: Mira, J., Álvarez, J.R. (eds) Artificial Neural Nets Problem Solving Methods. IWANN 2003. Lecture Notes in Computer Science, vol 2687. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44869-1_14
Download citation
DOI: https://doi.org/10.1007/3-540-44869-1_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40211-4
Online ISBN: 978-3-540-44869-3
eBook Packages: Springer Book Archive