Feature Construction and δ-Free Sets in 0/1 Samples

  • Nazha Selmaoui
  • Claire Leschi
  • Dominique Gay
  • Jean-François Boulicaut
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4265)


Given the recent breakthrough in constraint-based mining of local patterns, we decided to investigate its impact on feature construction for classification tasks. We discuss preliminary results concerning the use of the so-called δ-free sets. Our guess is that their minimality might help to collect important features. Once these sets are computed, we propose to select the essential ones w.r.t. class separation and generalization as new features. Our experiments have given encouraging results.


Frequent Itemset Class Separation Feature Construction Interestingness Measure Viral Meningitis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 75–85. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  2. 2.
    Boulicaut, J.-F., Crémilleux, B.: Simplest rules characterizing classes generated by delta-free sets. In: 22nd SGAI International Conference on Knowledge Based Systems and Applied Artificial Intelligence, ES 2002, pp. 33–46 (2002)Google Scholar
  3. 3.
    Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.F., Gandrillon, O.: Strong association rule mining for large gene expression data analysis: A case study on human SAGE data. Genome Biology 12 (2002)Google Scholar
  4. 4.
    Li, J., Li, H., Wong, L., Pei, J., Dong, G.: Minimum description length principle: Generators are preferable to closed patterns. In: Proceedings 21st National Conference on Artificial Intelligence. The AAAI Press, Menlo Park (2006)Google Scholar
  5. 5.
    Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  6. 6.
    Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining Knowledge Discovery 7, 5–22 (2003)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations 2, 66–75 (2000)CrossRefGoogle Scholar
  8. 8.
    Durand, N., Crémilleux, B.: Ecclat: A new approach of clusters discovery in categorical data. In: 22nd SGAI International Conference on Knowledge Based Systems and Applied Artificial Intelligence, ES 2002, pp. 177–190 (2002)Google Scholar
  9. 9.
    Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 43–52. ACM Press, New York (1999)CrossRefGoogle Scholar
  10. 10.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2005)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nazha Selmaoui
    • 1
  • Claire Leschi
    • 2
  • Dominique Gay
    • 1
  • Jean-François Boulicaut
    • 2
  1. 1.ERIMUniversity of New Caledonia 
  2. 2.INSA Lyon, LIRIS CNRS UMR 5205 

Personalised recommendations