Information Systems Frontiers

, Volume 21, Issue 1, pp 125–142 | Cite as

Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks

  • Eric GolinkoEmail author
  • Xingquan Zhu


Feature embedding is an emerging research area which intends to transform features from the original space into a new space to support effective learning. Many feature embedding algorithms exist, but they often suffer from several major drawbacks, including (1) only handle single feature types, or users have to clearly separate features into different feature views and supply such information for feature embedding learning; (2) designed for either supervised or unsupervised learning tasks, but not for both; and (3) feature embedding for new out-of-training samples have to be obtained through a retraining phase, therefore unsuitable for online learning tasks. In this paper, we propose a generalized feature embedding algorithm, GEL, for both supervised, unsupervised, and online learning tasks. GEL learns feature embedding from any type of data or data with mixed feature types. For supervised learning tasks with class label information, GEL leverages a Class Partitioned Instance Representation (CPIR) process to arrange instances, based on their labels, as a dense binary representation via row and feature vectors for feature embedding learning. If class labels are unavailable, CPIR is naturally degenerated and treats all instances as one class. Based on the CPIR representation, GEL uses eigenvector decomposition to convert the proximity matrix into a low-dimensional space. For new out-of-training samples, their low-dimensional representation are derived through a direct conversion without a retraining phase. The learned numerical embedding features can be directly used to represent instances for effective learning. Experiments and comparisons on 28 datasets, including categorical, numerical, and ordinal features, demonstrate that embedding features learned from GEL can effectively represent the original instances for clustering, classification, and online learning.


Representation learning Feature embedding Dimension reduction Supervised learning Clustering Online learning 


  1. Abdi, H., & Valentin, D. (2007). Multiple correspondence analysis. In Encyclopedia of measurement and statistics (pp. 651–657).Google Scholar
  2. Alamuri, M., Surampudi, B.R., Negi, A. (2014). A survey of distance/similarity measures for categorical data. In 2014 International joint conference on neural networks (IJCNN) (pp. 1907–1914).Google Scholar
  3. Argyriou, A., & Evgeniou, T. (2007). Multi-task feature learning. In Proceedings of neural information processing systems (NIPS).Google Scholar
  4. Argyriou, A., Evgeniou, T., Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.CrossRefGoogle Scholar
  5. Axler, S.J. (1997). Linear algebra done right Vol. 2. Berlin: Springer.CrossRefGoogle Scholar
  6. Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. In Proceedings of the IEEE international conference on computer vision (pp. 1269–1277).Google Scholar
  7. Bates, D., & Eddelbuettel, D. (2013). Fast and elegant numerical linear algebra using the RcppEigen package. Journal of Statistical Software, 52(5), 1–24.CrossRefGoogle Scholar
  8. Benoit, K., & Nulty, P. (2016). quanteda: quantitative analysis of textual data. R package version 0.9, 8.Google Scholar
  9. Bro, R., & Smilde, A.K. (2014). Principal component analysis. Analytical Methods, 6(9), 2812–2831.CrossRefGoogle Scholar
  10. Chen, C., Shyu, M.-L., Chen, S.-C. (2016). Weighted subspace modeling for semantic concept retrieval using gaussian mixture models. Information Systems Frontiers, 18(5), 877–889.CrossRefGoogle Scholar
  11. Choi, S.-S., Cha, S.-H., Tappert, C.C. (2010). A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 8(1), 43–48.Google Scholar
  12. Crane, H. (2015). Clustering from categorical data sequences. Journal of the American Statistical Association, 110(510), 810–823.CrossRefGoogle Scholar
  13. de Leeuw, J. (2011). Principal component analysis of binary data. applications to roll-call-analysis. Department of statistics, UCLA.Google Scholar
  14. Ditzler, G., & Polikar, R. (2013). Incremental learning of concept drift from streaming imbalanced data. ieee transactions on knowledge and data engineering, 25(10), 2283–2301.CrossRefGoogle Scholar
  15. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55 (10), 78–87.CrossRefGoogle Scholar
  16. Gal, Y., Chen, Y., Ghahramani, Z. (2015). Latent gaussian processes for distribution estimation of multivariate categorical data. In Proceedings of the 32nd international conference on machine learning (ICML-15) (pp. 645–654).Google Scholar
  17. Gelbard, R. (2013). padding bitmaps to support similarity and mining. Information Systems Frontiers, 15(1), 99–110.CrossRefGoogle Scholar
  18. Golinko, E., & Zhu, X. (2017). Gfel: Generalized feature embedding learning using weighted instance matching. In 2017 IEEE International conference on information reuse and integration (IRI) (pp. 235–244).Google Scholar
  19. Greenacre, M. (2007). Correspondence analysis in practice. CRC press.Google Scholar
  20. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research 3:1157–1182.Google Scholar
  21. Hou, C., Nie, F., Li, X., Yi, D., Wu, Y. (2014). Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Transactions on Cybernetics, 44(6), 793–804.CrossRefGoogle Scholar
  22. Hsu, C.-W., Chang, C.-C., Lin, C.-J., et al. (2003). A practical guide to support vector classification.Google Scholar
  23. Hsu, C.-C., & Huang, W.-H. (2016). Integrated dimensionality reduction technique for mixed-type data involving categorical values. Applied Soft Computing, 43, 199–209.CrossRefGoogle Scholar
  24. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACM multimedia conference.Google Scholar
  25. Juan, A., & Vidal, E. (2004). Bernoulli mixture models for binary images. In Proceedings of the 17th international conference on Pattern recognition, 2004. ICPR 2004, (Vol. 3 pp. 367–370). IEEE.Google Scholar
  26. Kaban, A., Bingham, E., Hirsimäki, T. (2004). Learning to read between the lines The aspect bernoulli model. In Proceedings of the 2004 SIAM international conference on data mining (pp. 462–466). SIAM.Google Scholar
  27. Kaggle. (2017).
  28. Krijthe, J. (2015). Rtsne: T-distributed stochastic neighbor embedding using barnes-hut implementation. R package version 0.10,
  29. Lee, S. (2009). Principal components analysis for binary data. PhD thesis: Texas A&M University.Google Scholar
  30. Lee, S., Huang, J.Z., Hu, J. (2010). Sparse logistic principal components analysis for binary data. The annals of applied statistics, 4(3), 1579.CrossRefGoogle Scholar
  31. Lichman, M. (2013). UCI machine learning repository.Google Scholar
  32. van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(Nov), 2579–2605.Google Scholar
  33. Malik, Z.K., Hussain, A., Wu, J. (2016). An online generalized eigenvalue version of laplacian eigenmaps for visual big data. Neurocomputing, 173, 127–136.CrossRefGoogle Scholar
  34. Meyer, D., & Buchta, C. proxy: Distance and Similarity Measures, 2016. R package version 0.4-16.Google Scholar
  35. Muhlbaier, M.D., & Polikar, R. (2007). An ensemble approach for incremental learning in nonstationary environments. In International workshop on multiple classifier systems (pp. 490–500). Berlin: Springer.Google Scholar
  36. Müller, B., Reinhardt, J., Strickland, M.T. (2012). Neural networks: an introduction. Berlin: Springer Science & Business Media.Google Scholar
  37. Najafi, A., Motahari, A., Rabiee, H.R. (2017). Reliable learning of bernoulli mixture models. arXiv:1710.02101.
  38. Nenadic, O., & Greenacre, M. (2007). Correspondence analysis in r, with two-and three-dimensional graphics The ca package. Journal of Statistical Software.Google Scholar
  39. Pan, S., Wu, J.W., Zhu, X., Zhang, C., Wang, Y. (2016). Tri-party deep network representation. In Proc. of international joint conference on artificial intelligence.Google Scholar
  40. Plaza, A., Benediktsson, J.A., Boardman, J.W., Brazile, J., Bruzzone, L., Camps-Valls, G., Chanussot, J., Fauvel, M., Gamba, P., Gualtieri, A., et al. (2009). Recent advances in techniques for hyperspectral image processing. Remote sensing of environment, 113, S110–S122.CrossRefGoogle Scholar
  41. Qian, Y., Li, F., Liang, J., Liu, B., Dang, C. (2016). Space structure and clustering of categorical data. IEEE transactions on neural networks and learning systems, 27(10), 2047–2059.CrossRefGoogle Scholar
  42. Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning.Google Scholar
  43. Rokach, L., & Maimon, O. (2005). Decision trees. Data mining and knowledge discovery handbook, pp. 165–192.Google Scholar
  44. Rokach, L., & Maimon, O. (2014). Data mining with decision trees: theory and applications. Singapore: World scientific.CrossRefGoogle Scholar
  45. Romero, C., Ventura, S., Espejo, P.G., Hervás, C. (2008). Data mining algorithms to classify students. In Educational data mining 2008.Google Scholar
  46. Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500), 2323–2326.CrossRefGoogle Scholar
  47. Shen, L., Wang, H., Xu, L.D., Ma, X., Chaudhry, S., He, W. (2016). Identity management based on pca and svm. Information Systems Frontiers, 18(4), 711–716.CrossRefGoogle Scholar
  48. Shlens, J. (2014). A tutorial on principal component analysis. arXiv:1404.1100.
  49. Shmelkov, K., Schmid, C., Alahari, K. (2017). Incremental learning of object detectors without catastrophic forgetting. arXiv:1708.06977.
  50. Strange, H., & Zwiggelaar, R. (2011). A generalised solution to the out-of-sample extension problem in manifold learning. In AAAI (pp. 293–296).Google Scholar
  51. Sun, B.-Y., Zhang, X.-M., Li, J., Mao, X.-M. (2010). Feature fusion using locally linear embedding for classification. IEEE Transactions on Neural Networks, 21(1), 163–168.CrossRefGoogle Scholar
  52. Tsymbal, A., Puuronen, S., Pechenizkiy, M., Baumgarten, M., Patterson, D.W. (2002). Eigenvector-based feature extraction for classification. In FLAIRS Conference (pp. 354–358).Google Scholar
  53. Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S, 4th edn. New York: Springer. ISBN 0-387-95457-0.CrossRefGoogle Scholar
  54. Vural, E., & Guillemot, C. (2016). Out-of-sample generalizations for supervised manifold learning for classification. IEEE Transactions on Image Processing, 25(3), 1410–1424.CrossRefGoogle Scholar
  55. Xie, J., Szymanski, B.K., Zaki, M.J. (2010). Learning dissimilarities for categorical symbols. FSDM, 10, 97–106.Google Scholar
  56. Zhang, D., Yin, J., Zhu, X., Zhang, C. (2017). User profile preserving social network embedding. In Proc. of international joint conference on artificial intelligence.Google Scholar
  57. Zhang, H. (2004). The optimality of naive bayes. AA, 1(2), 3.Google Scholar
  58. Zhang, L., Zhang, Q., Zhang, L., Tao, D., Huang, X., Du, B. (2015). Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognition, 48(10).Google Scholar
  59. Zhang, P., Zhu, X., Shi, Y. (2008). Categorizing and mining concept drifting data streams. In ACM SIGKDD Conference (pp. 812–820).Google Scholar
  60. Zheng, L., Wang, S., Tian, Q. (2014). Coupled binary embedding for large-scale image retrieval. IEEE Transactions on Image processing, 23(8), 3368–3380.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer, Electrical Engineering and Computer ScienceFlorida Atlantic UniversityBoca RatonUSA

Personalised recommendations