Unsupervised feature construction for improving data representation and semantics


Attribute-based format is the main data representation format used by machine learning algorithms. When the attributes do not properly describe the initial data, performance starts to degrade. Some algorithms address this problem by internally changing the representation space, but the newly constructed features rarely have any meaning. We seek to construct, in an unsupervised way, new attributes that are more appropriate for describing a given dataset and, at the same time, comprehensible for a human user. We propose two algorithms that construct the new attributes as conjunctions of the initial primitive attributes or their negations. The generated feature sets have reduced correlations between features and succeed in catching some of the hidden relations between individuals in a dataset. For example, a feature like \(sky \wedge \neg building \wedge panorama\) would be true for non-urban images and is more informative than simple features expressing the presence or the absence of an object. The notion of Pareto optimality is used to evaluate feature sets and to obtain a balance between total correlation and the complexity of the resulted feature set. Statistical hypothesis testing is employed in order to automatically determine the values of the parameters used for constructing a data-dependent feature set. We experimentally show that our approaches achieve the construction of informative feature sets for multiple datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.


  2. 2.


  3. 3.



  1. Benjamini, Y., & Liu, W. (1999). A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. Journal of Statistical Planning and Inference, 82(1–2), 163–170.

    MathSciNet  MATH  Article  Google Scholar 

  2. Blockeel, H., De Raedt, L., Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63).

  3. Bloedorn, E., & Michalski, R.S. (1998). Data-driven constructive induction. Intelligent Systems and their Applications, 13(2), 30–37.

    Article  Google Scholar 

  4. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297

    MATH  Google Scholar 

  5. Dunteman, G.H. (1989). Principal components analysis (Vol. 69). SAGE publications, Inc.

  6. Feller, W. (1950). An introduction to probability theory and its applications (Vol. I). Wiley.

  7. Ge, Y., Dudoit, S., Speed, T.P. (2003). Resampling-based multiple testing for microarray data analysis. Test, 12(1), 1–77.

    MathSciNet  Article  Google Scholar 

  8. Gomez, G., & Morales, E. (2002). Automatic feature construction and a simple rule induction algorithm for skin detection. In Proc. of the ICML workshop on machine learning in computer vision (pp. 31–38).

  9. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.

    Google Scholar 

  10. Huo, X., Ni, X.S., Smith, A.K. (2006). A survey of manifold-based learning methods. In Mining of Enterprise Data, emerging nonparametric methodology (Chapter 1, pp. 06–40). Springer.

  11. Lallich, S., & Rakotomalala, R. (2000). Fast feature selection using partial correlation for multi-valued attributes. In Zighed, D.A., Komorowski, J., Zytkow, J.M. (Eds.), Proceedings of the 4th European conference on principles of data mining and knowledge discovery, LNAI (pp. 221–231). Springer-Verlag.

  12. Lallich, S., Teytaud, O., Prudhomme, E. (2006). Statistical inference and data mining: False discoveries control. In COMPSTAT: Proceedings in computational statistics: 17th symposium (p. 325). Springer.

  13. Liu, H., & Motoda, H. (1998). Feature extraction, construction and selection: A data mining perspective. Springer.

  14. Matheus, C.J. (1990). Adding domain knowledge to sbl through feature construction. In Proceedings of the eighth national conference on artificial intelligence (pp. 803–808).

  15. Michalski, R.S. (1983). A theory and methodology of inductive learning. Artificial Intelligence 20(2), 111–161.

    MathSciNet  Article  Google Scholar 

  16. Mo, D., & Huang, S.H. (2011). Feature selection based on inference correlation. Intelligent Data Analysis 15(3), 375–398.

    Google Scholar 

  17. Motoda, H., & Liu, H. (2002). Feature selection, extraction and construction. Communication of IICM (Institute of Information and Computing Machinery), 5, 67–72.

    Google Scholar 

  18. Murphy, P.M., & Pazzani, M.J. (1991). Id2-of-3: Constructive induction of m-of-n concepts for discriminators in decision trees. In Proceedings of the 8th international workshop on machine learning (pp. 183–187).

  19. Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5(1), 71–99.

    Article  Google Scholar 

  20. Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, 229, 229–248.

    Google Scholar 

  21. Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Google Scholar 

  22. Quinlan, J.R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.

  23. Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T. (2008). Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1), 157–173.

    Article  Google Scholar 

  24. Sawaragi, Y., Nakayama, H., Tanino, T. (1985). Theory of multiobjective optimization (Vol. 176). New York: Academic Press.

    Google Scholar 

  25. Storey, J.D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 64(3), 479–498.

    MathSciNet  MATH  Article  Google Scholar 

  26. Yang, D.S., Rendell, L., Blix, G. (1991). A scheme for feature construction and a comparison of empirical methods. In Proceedings of the 12th international joint conference on artificial intelligence (pp. 699–704).

  27. Zheng, Z. (1995). Constructing nominal x-of-n attributes. In Proceedings of international joint conference on artificial intelligence (Vol. 14, pp. 1064–1070).

  28. Zheng, Z. (1996). A comparison of constructive induction with different types of new attribute. Tech. rep., School of Computing and Mathematics, Deakin University, Geelong.

  29. Zheng, Z. (1998). Constructing conjunctions using systematic search on decision trees. Knowledge-Based Systems, 10(7), 421–430.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Marian-Andrei Rizoiu.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Rizoiu, MA., Velcin, J. & Lallich, S. Unsupervised feature construction for improving data representation and semantics. J Intell Inf Syst 40, 501–527 (2013). https://doi.org/10.1007/s10844-013-0235-x

Download citation


  • Unsupervised feature construction
  • Feature evaluation
  • Nonparametric statistics
  • Data mining
  • Clustering
  • Representations
  • Algorithms for data and knowledge management
  • Heuristic methods
  • Pattern analysis