Supervised Selection of Dynamic Features, with an Application to Telecommunication Data Preparation

  • Sylvain Ferrandiz
  • Marc Boullé
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4065)


In the field of data mining, data preparation has more and more in common with a bottleneck. Indeed, collecting and storing data becomes cheaper while modelling costs remain unchanged. As a result, feature selection is now usually performed. In the data preparation step, selection often relies on feature ranking. In the supervised classification context, ranking is based on the information that the explanatory feature brings on the target categorical attribute.

With the increasing presence in the database of feature measured over time, i.e. dynamic features, new supervised ranking methods have to be designed. In this paper, we propose a new method to evaluate dynamic features, which is derived from a probabilistic criterion. The criterion is non-parametric and handles automatically the problem of overfitting the data. The resulting evaluation produces reliable results. Furthermore, the design of the criterion relies on an understandable and simple approach. This allows to provide meaningful visualization of the evaluation, in addition to the computed score. The advantages of the new method are illustrated on a telecommunication dataset.


Feature Selection Dynamic Feature Variable Neighborhood Search Voronoi Cell Target Attribute 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boullé, M.: A grouping method for categorical attributes having very large number of values. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 228–242. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Boullé, M.: A bayesian approach for supervised discretization. In: Data Mining V, Zanasi and Ebecken and Brebbia, pp. 199–208. WIT Press (2004)Google Scholar
  3. 3.
    Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0: step-by-step data mining guide. Applied Statistics Algorithms (2000)Google Scholar
  4. 4.
    Fawcett, T.: ROC Graphs: notes and practical considerations for reseachers. Technical report HPL-2003-4 (2003)Google Scholar
  5. 5.
    Ferrandiz, S., Boullé, M.: Supervised evaluation of Voronoi partitions. Journal of intelligent data analysis (published, 2006)Google Scholar
  6. 6.
    Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection - theory and algorithms. In: Proceedings of the 21’st international conference on machine learning (2004)Google Scholar
  7. 7.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of machine learning research 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Hansen, P., Mladenovic, N.: Variable neighborhood search: principles and applications. European journal of operational research 130, 449–467 (2001)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence 97, 273–324 (1997)MATHCrossRefGoogle Scholar
  10. 10.
    Kohavi, R., Sahami, M.: Error-based and entropy-based Discretization of continuous features. In: Proceedings of the 2’nd international conference on knowledge discovery and data mining, pp. 114–119 (1996)Google Scholar
  11. 11.
    Shannon, C.E.: A mathematical theory of communication. Bell systems technical journal 27, 379–423, 623–656 (1948)MATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sylvain Ferrandiz
    • 1
    • 2
  • Marc Boullé
    • 1
  1. 1.France Télécom R&DLANNION CedexFrance
  2. 2.Université de Caen, GREYCCaen CedexFrance

Personalised recommendations