Skip to main content

Using k-Nearest Neighbor and Feature Selection as an Improvement to Hierarchical Clustering

  • Conference paper
Methods and Applications of Artificial Intelligence (SETN 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3025))

Included in the following conference series:

Abstract

Clustering of data is a difficult problem that is related to various fields and applications. Challenge is greater, as input space dimensions become larger and feature scales are different from each other. Hierarchical clustering methods are more flexible than their partitioning counterparts, as they do not need the number of clusters as input. Still, plain hierarchical clustering does not provide a satisfactory framework for extracting meaningful results in such cases. Major drawbacks have to be tackled, such as curse of dimensionality and initial error propagation, as well as complexity and data set size issues. In this paper we propose an unsupervised extension to hierarchical clustering in the means of feature selection, in order to overcome the first drawback, thus increasing the robustness of the whole algorithm. The results of the application of this clustering to a portion of dataset in question are then refined and extended to the whole dataset through a classification step, using k-nearest neighbor classification technique, in order to tackle the latter two problems. The performance of the proposed methodology is demonstrated through the application to a variety of well known publicly available data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hirota, K., Pedrycz, W.: Fuzzy computing for data mining. Proceedings of the IEEE 87, 1575–1600 (1999)

    Article  Google Scholar 

  2. Kohavi, R., Sommerfield, D.: Feature Subset Selection Using theWrapper Model: Overfitting and Dynamic Search Space Topology. In: Proceedings of KDD-1995 (1995)

    Google Scholar 

  3. Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms. Machine Learning 40, 203–229 (2000)

    Article  MATH  Google Scholar 

  4. Miyamoto, S.: Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer Academic Publishers, Dordrecht (1990)

    MATH  Google Scholar 

  5. Swiniarski, R.W., Skowron, A.: Rough set methods in feature selection and recognition. Pattern Recognition Letters 24, 833–849 (2003)

    Article  MATH  Google Scholar 

  6. Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (1998)

    MATH  Google Scholar 

  7. Tsapatsoulis, N., Wallace, M., Kasderidis, S.: Improving the Performance of Resource Allocation Networks through Hierarchical Clustering of High – Dimensional Data. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN), Istanbul, Turkey (2003)

    Google Scholar 

  8. Wallace, M., Stamou, G.: Towards a Context Aware Mining of User Interests for Consumption of Multimedia Documents. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Lausanne, Switzerland (2002)

    Google Scholar 

  9. Yager, R.R.: Intelligent control of the hierarchical agglomerative clustering process. IEEE Transactions on Systems, Man and Cybernetics, Part B 30(6), 835–845 (2000); Tsapatsoulis, N., Wallace, M. and Kasderidis, S.

    Article  MathSciNet  Google Scholar 

  10. Wallace, M., Mylonas, P.: Detecting and Verifying Dissimilar Patterns in Unlabelled Data. In: 8th Online World Conference on Soft Computing in Industrial Applications, September 29-October 17 (2003)

    Google Scholar 

  11. Mitchell, T.M.: Machine Learning. McGraw-Hill Companies, Inc., New York (1997)

    MATH  Google Scholar 

  12. Wallace, M., Kollias, S.: Soft Attribute Selection for Hierarchical Clustering in High Dimensions. In: Proceedings of the International Fuzzy Systems Association World Congress( IFSA), Istanbul, Turkey, June-July (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mylonas, P., Wallace, M., Kollias, S. (2004). Using k-Nearest Neighbor and Feature Selection as an Improvement to Hierarchical Clustering. In: Vouros, G.A., Panayiotopoulos, T. (eds) Methods and Applications of Artificial Intelligence. SETN 2004. Lecture Notes in Computer Science(), vol 3025. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24674-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24674-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21937-8

  • Online ISBN: 978-3-540-24674-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics