Localized Graph-Based Feature Selection for Clustering

  • Zhihong Zhang
  • Edwin R. Hancock
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7324)


In many data analysis tasks, one is often confronted with very high dimensional data. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. On the one hand, to overcome this problem traditional feature selection methods frequently assume either that the features independently influence the class variable or do so only involving pairwise feature interactions. On the other hand, they attempt to select a common feature subset for all the clusters present in the data. However, in doing so they neglect the fact that different features may have different discriminating power for different classes present in data. To tackle the above problems, we propose a localized graph-based feature selection algorithm consisting of three steps, namely, i) based on the label information, we first construct a graph for each class of dataset in which each node corresponds to a feature, and each edge has a weight corresponding to the mutual information (MI) between features connected by that edge, ii) we then perform dominant set clustering for the graphs to select a highly coherent set of features, iii) we further refine the selected features based on a new measure called multidimensional interaction information (MII). The advantage of MII is that it can go beyond pairwise interaction and consider third or higher order feature interactions. Using dominant set clustering, which can extract the most informative features in the leading dominant set as a preprocessing step and in doing so we can limit the search space for higher order interactions. We use a variational EM (VBEM) algorithm to learn a Gaussian mixture model on the selected feature subset for clustering. Experimental results demonstrate the effectiveness of our localized feature selection method on a number of standard data-sets.


Feature Vector Feature Selection Mutual Information Gaussian Mixture Model Feature Subset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zhang, Z., Hancock, E.R.: A Graph-Based Approach to Feature Selection. In: Graph-Based Representations in Pattern Recognition, pp. 205–214 (2011)Google Scholar
  2. 2.
    Battiti, R.: Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE TNN 5(4), 537–550 (2002)Google Scholar
  3. 3.
    Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-dependency, Max-relevance, and Min-redundancy. IEEE TPAMI 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  4. 4.
    Pavan, M., Pelillo, M.: A New Graph-Theoretic Approach to Clustering and Segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 145–152 (2003)Google Scholar
  5. 5.
    Shannon, C.E.: A Mathematical Theory of Communication. ACM SIGMOBILE Mobile Computing and Communications Review 5(1), 3–55 (2001)CrossRefGoogle Scholar
  6. 6.
    Yu, J., Amores, J., Sebe, N., Tian, Q.: Toward robust distance metric analysis for similarity estimation. In: Proc. CVPR, vol. 1, pp. 316–322 (2006)Google Scholar
  7. 7.
    Mitra, P., Murthy, C.A., Pal, S.: Unsupervised feature selection using feature similarity. IEEE TPMI 24(3), 301–312 (2002)CrossRefGoogle Scholar
  8. 8.
    Devijver, P.A., Kittler, J.: Pattern recognition: A statistical approach. Prentice Hall, Englewood Cliffs (1982)zbMATHGoogle Scholar
  9. 9.
    He, X., Cai, D., Niyogi, P.: Laplacian Score for Feature Selection. In: NIPS (2005)Google Scholar
  10. 10.
    Bishop, C.M.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Zhihong Zhang
    • 1
  • Edwin R. Hancock
    • 1
  1. 1.Department of Computer ScienceUniversity of YorkYorkUK

Personalised recommendations