Multi-view heterogeneous fusion and embedding for categorical attributes on mixed data

  • Qiude Li
  • Qingyu XiongEmail author
  • Shengfen Ji
  • Min Gao
  • Yang Yu
  • Chao Wu
Methodologies and Application


Categorical attributes are ubiquitous in real-world collected data. However, such attributes lack a well-defined distance metric and cannot be directly manipulated per algebraic operations, so many data mining algorithms are unable to work directly on them. Learning an appropriate metric or an effective numerical embedding is very vital yet challenging, for categorical attributes with multi-view heterogeneous data characteristics. This paper proposes a novel multi-view heterogeneous fusion model (MVHF), which first captures basic coupling information for each view and then fuses these heterogeneous information from different views by multi-kernel metric learning, to measure the intrinsic distances between this type of categorical attributes; based on these measured distances, further, we use the manifold learning method to learn a high-quality numerical embedding for each categorical value. Experiments on 33 mixed data sets demonstrate that MVHF-enabled classification significantly enhances the performance, compared with state-of-the-art distance metrics or embedding competitors.


Categorical attributes Coupling learning Heterogeneous fusion Metric learning Embedding learning 



We thank anonymous reviewers for their valuable comments and suggestions. The work was supported by the Key Research Program of Chongqing Science & Technology Commission (Grant No. CSTC2017jcyjBX0025 and CSTC2019jscx-zdztzx0043), the Science and Technology Major Special Project of Guangxi (Grant No. GKAA17129002), the National Natural Science Foundations of China (Grant No. 61771077), and the National Key R&D Program of China (Grant No. 2018YFF0214706), Graduate Scientific Research and Innovation Foundation of Chongqing (Grant No. CYB19072 and CYS19028).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66Google Scholar
  2. Aitchison J, Aitken CG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63(3):413–420MathSciNetzbMATHCrossRefGoogle Scholar
  3. Alexandridis A, Chondrodima E, Giannopoulos N, Sarimveis H (2017) A fast and efficient method for training categorical radial basis function networks. IEEE Trans Neural Netw Learn Syst 28(11):2831–2836CrossRefGoogle Scholar
  4. Bashon Y, Neagu D, Ridley MJ (2013) A framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes. Soft Comput 17(9):1595–1615CrossRefGoogle Scholar
  5. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828CrossRefGoogle Scholar
  6. Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 2008 SIAM international conference on data mining, SIAM, pp 243–254Google Scholar
  7. Cao L (2015) Coupling learning of complex interactions. Inf Process Manag 51(2):167–186CrossRefGoogle Scholar
  8. Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127CrossRefGoogle Scholar
  9. Cerda P, Varoquaux G, Kégl B (2018) Similarity encoding for learning with dirty categorical variables. Mach Learn 107:1477–1494MathSciNetzbMATHCrossRefGoogle Scholar
  10. Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank-k projections for bilinear analysis. IEEE Trans Neural Netw Learn Syst 27(7):1502–1513MathSciNetCrossRefGoogle Scholar
  11. Chang X, Yu Y, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632CrossRefGoogle Scholar
  12. Chen L, Wang S, Wang K, Zhu J (2016a) Soft subspace clustering of categorical data with probabilistic distance. Pattern Recognit 51:322–332CrossRefGoogle Scholar
  13. Chen L, Ye Y, Guo G, Zhu J (2016b) Kernel-based linear classification on categorical data. Soft Comput 20(8):2981–2993zbMATHCrossRefGoogle Scholar
  14. Cohen P, West SG, Aiken LS (2014) Applied multiple regression/correlation analysis for the behavioral sciences. Psychology Press, LondonCrossRefGoogle Scholar
  15. Cox MAA, Cox TF (2001) Multidimensional scaling. J R Stat Soc 46(2):1050–1057zbMATHGoogle Scholar
  16. Croft WB, Metzler D, Strohman T (2010) Search engines: Information retrieval in practice, vol 283. Addison-Wesley, ReadingGoogle Scholar
  17. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30MathSciNetzbMATHGoogle Scholar
  18. Diab DM, El Hindi K (2018) Using differential evolution for improving distance measures of nominal values. Appl Soft Comput 64:14–34CrossRefGoogle Scholar
  19. Frank A, Asuncion A (2010) UCI machine learning repository. School of Information and Computer Science, University of California, IrvineGoogle Scholar
  20. Golinko E, Sonderman T, Zhu X (2017) CNFL: categorical to numerical feature learning for clustering and classification. In: 2017 IEEE second international conference on data science in cyberspace (DSC). IEEE, pp 585–594Google Scholar
  21. Guo C, Berkhahn F (2016) Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737
  22. Hernández-Pereira E, Suárez-Romero JA, Fontenla-Romero O, Alonso-Betanzos A (2009) Conversion methods for symbolic features: a comparison applied to an intrusion detection problem. Expert Syst Appl 36(7):10612–10617CrossRefGoogle Scholar
  23. Hsu CW, Chang CC, Lin CJ et al (2003) A practical guide to support vector classificationGoogle Scholar
  24. Ienco D, Pensa RG (2016) Positive and unlabeled learning in categorical data. Neurocomputing 196:113–124CrossRefGoogle Scholar
  25. Ienco D, Pensa RG, Meo R (2012) From context to distance: learning dissimilarity for categorical data clustering. ACM Trans Knowl Discov Data (TKDD) 6(1):1CrossRefGoogle Scholar
  26. Jain P, Kulis B, Dhillon IS (2010) Inductive regularized learning of kernel functions. In: Advances in neural information processing systems, pp 946–954Google Scholar
  27. Jain P, Kulis B, Davis JV, Dhillon IS (2012) Metric and kernel learning using a linear transformation. J Mach Learn Res 13(Mar):519–547MathSciNetzbMATHGoogle Scholar
  28. Jia H, Cheung J, Liu J (2016) A new distance metric for unsupervised learning of categorical data. IEEE Trans Neural Netw Learn Syst 27(5):1065–1079MathSciNetCrossRefGoogle Scholar
  29. Jian S, Cao L, Lu K, Gao H (2018a) Unsupervised coupled metric similarity for non-IID categorical data. IEEE Trans Knowl Data Eng 30:1810–1823CrossRefGoogle Scholar
  30. Jian S, Pang G, Cao L, Lu K, Gao H (2018b) CURE: flexible categorical data representation by hierarchical coupling learning. IEEE Trans Knowl Data Eng 31:853–866CrossRefGoogle Scholar
  31. Kasif S, Salzberg S, Waltz D, Rachlin J, Aha DW (1998) A probabilistic framework for memory-based reasoning. Artif Intell 104(1–2):287–311MathSciNetzbMATHCrossRefGoogle Scholar
  32. Kim K, Js Hong (2017) A hybrid decision tree algorithm for mixed numeric and categorical data in regression analysis. Pattern Recognit Lett 98:39–45CrossRefGoogle Scholar
  33. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86MathSciNetzbMATHCrossRefGoogle Scholar
  34. Le SQ, Ho TB (2005) An association-based dissimilarity measure for categorical data. Pattern Recognit Lett 26(16):2549–2557CrossRefGoogle Scholar
  35. LeCun Y, Bottou L, Orr GB, Müller K (2012) Efficient backprop. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade, 2nd edn. Springer, Berlin, pp 9–48CrossRefGoogle Scholar
  36. Li C, Jiang L, Li H, Wu J, Zhang P (2017a) Toward value difference metric with attribute weighting. Knowl Inf Syst 50(3):795–825CrossRefGoogle Scholar
  37. Li Z, Nie F, Chang X, Yang Y (2017b) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110CrossRefGoogle Scholar
  38. Li Q, Xiong Q, Ji S, Wen J, Gao M, Yu Y, Xu R (2019) Using fine-tuned conditional probabilities for data transformation of nominal attributes. Pattern Recognit Lett 128:107–114CrossRefGoogle Scholar
  39. Müller B, Reinhardt J, Strickland MT (2012) Neural networks: an introduction. Springer, BerlinzbMATHGoogle Scholar
  40. Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52(3):239–281zbMATHCrossRefGoogle Scholar
  41. Ng MK, Mark Junjie L, Joshua Zhexue H, Zengyou H (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507CrossRefGoogle Scholar
  42. Ortakaya AF (2017) Independently weighted value difference metric. Pattern Recognit Lett 97:61–68CrossRefGoogle Scholar
  43. Ouyang D, Li Q, Racine J (2006) Cross-validation and the estimation of probability distributions with categorical data. J Nonparametr Stat 18(1):69–100MathSciNetzbMATHCrossRefGoogle Scholar
  44. Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1986) Numerical recipes. The art of scientific computing. Cambridge University, LondonzbMATHGoogle Scholar
  45. Stanfill C, Waltz D (1986) Toward memory-based reasoning. Commun ACM 29(12):1213–1228CrossRefGoogle Scholar
  46. Wang C, Dong X, Zhou F, Cao L, Chi CH (2015) Coupled attribute similarity learning on categorical data. IEEE Trans Neural Netw Learn Syst 26(4):781–797MathSciNetCrossRefGoogle Scholar
  47. Wang H, Feng L, Liu Y (2016) Metric learning with geometric mean for similarities measurement. Soft Comput 20(10):3969–3979CrossRefGoogle Scholar
  48. Zhang K, Wang Q, Chen Z, Marsic I, Kumar V, Jiang G, Zhang J (2015) From categorical to numerical: multiple transitive distance learning and embedding. In: Proceedings of the 2015 SIAM international conference on data mining. SIAM, pp 46–54Google Scholar
  49. Zhao W, Li Q, Zhu C, Song J, Liu X, Yin J (2018) Model-aware categorical data embedding: a data-driven approach. Soft Comput 22:3603–3619zbMATHCrossRefGoogle Scholar
  50. Zheng Q, Diao X, Cao J, Liu Y, Li H, Yao J, Chang C, Lv G (2019) From whole to part: reference-based representation for clustering categorical data. IEEE Trans Neural Netw Learn Syst. CrossRefGoogle Scholar
  51. Zhou ZH (2016) Machine learning. Tsinghua Press, BeijingGoogle Scholar
  52. Zhu C, Cao L, Liu Q, Yin J, Kumar V (2018) Heterogeneous metric learning of categorical data with hierarchical couplings. IEEE Trans Knowl Data Eng 30(7):1254–1267CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Key Laboratory of Dependable Service Computing in Cyber Physical SocietyChongqing University, Ministry of EducationChongqingChina
  2. 2.School of Big Data and Software EngineeringChongqing UniversityChongqingChina
  3. 3.School of Biology and EngineeringGuizhou Medical UniversityGuiyangChina
  4. 4.Foreign Language Teaching CenterGuizhou Institute of TechnologyGuiyangChina

Personalised recommendations