Feature Selection for Unsupervised Learning via Comparison of Distance Matrices

Dreiseitl, Stephan

doi:10.1007/978-3-642-53856-8_26

Stephan Dreiseitl¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8111))

Included in the following conference series:

International Conference on Computer Aided Systems Theory

1311 Accesses

Abstract

Feature selection for unsupervised learning is generally harder than for supervised learning, because the former lacks the class information of the latter, and thus an obvious way by which to measure the quality of a feature subset. In this paper, we propose a new method based on representing data sets by their distance matrices, and judging feature combinations by how well the distance matrix using only these features resembles the distance matrix of the full data set. Using articial data for which the relevant features were known, we observed that the results depend on the data dimensionality, the fraction of relevant features, the overlap between clusters in the relevant feature subspaces, and how to measure the similarity of distance matrices. Our method consistently achieved higher than 80% detection rates of relevant features for a wide variety of experimental configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Article Google Scholar
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
MATH Google Scholar
Liu, H., Motoda, H., Setiono, R., Zhao, Z.: Feature selection: An ever evolving frontier in data mining. In: Proceedings of the 4th International Workshop on Feature Selection in Data Mining, pp. 4–13 (2010)
Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensinal data: A review. ACM SIGKDD Explorations 6, 90–105 (2004)
Article Google Scholar
Dy, J., Brodley, C.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)
MathSciNet MATH Google Scholar
Dash, M., Liu, H.: Feature selection for clustering. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 110–121. Springer, Heidelberg (2000)
Chapter Google Scholar
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering — a filter solution. In: Proceedings of the Second International Conference on Data Mining, pp. 115–122 (2002)
Google Scholar
Mitra, P., Murthy, C., Pal, S.: Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 1–13 (2002)
Article Google Scholar
Escoufier, Y.: Le traitement des variables vectorielles. Biometrics 29, 751–760 (1973)
Article MathSciNet Google Scholar
Kullback, S., Leibler, R.: On information and sufficiency. Annals of Mathematical Statistics 22, 79–86 (1951)
Article MathSciNet MATH Google Scholar
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distribution. Bulletin of the Calcutta Mathematical Society 35, 99–109 (1943)
MathSciNet MATH Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, San Diego (1990)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Software Engineering, Upper Austria University of Applied Sciences, A-4232, Hagenberg, Austria
Stephan Dreiseitl

Authors

Stephan Dreiseitl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad de Las Palmas de Gran Canaria, Instituto Universitario de Ciencias y Tecnologías Cibernéticas, Campus de Tafira, 35017, Las Palmas de Gran Canaria, Spain
Roberto Moreno-Díaz & Alexis Quesada-Arencibia &
Johannes Kepler Universität Linz, Altenbergerstrasse 69, 4040, Linz, Austria
Franz Pichler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dreiseitl, S. (2013). Feature Selection for Unsupervised Learning via Comparison of Distance Matrices. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory - EUROCAST 2013. EUROCAST 2013. Lecture Notes in Computer Science, vol 8111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53856-8_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-53856-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53855-1
Online ISBN: 978-3-642-53856-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics