Abstract
Symbolic data are usually composed of some categorical variables used to represent discrete entities in many real-world applications. Mining of symbolic data is more difficult than numerical data due to the lack of inherent geometric properties of this type of data. In this paper, we use two kinds of kernel learning methods to create a kernel estimation model and a nonlinear classification algorithm for symbolic data. By using the kernel smoothing method, we construct a squared-error consistent probability estimator for symbolic data, followed by a new data transformation model proposed to embed symbolic data into Euclidean space. Based on the model, the inner product and distance measure between symbolic data objects are reformulated, allowing a new Support Vector Machine (SVM), called SVM-S, to be defined for nonlinear classification on symbolic data using the Mercer kernel learning method. The experiment results show that SVM can be much more effective for symbolic data classification based on our proposed model and measures.
Similar content being viewed by others
References
Agresti A (2008) An introduction to categorical data analysis. Wiley, New York
Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63(3):413–420
Alaya MZ, Bussy S, Gaiffas S, Guilloux A (2017) Binarsity: a penalization for one-hot encoded features. J Machine Learn Res 20:1–34
Boriah S, Chandola V, Kumar V (2008). Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 8th SIAM international conference on data mining, pp 243–254
Breiman L (2001) Random forests. Machine Learn 45(1):5–32
Bremner AP, Taplin RH (2002) Theory & methods: modified classification and regression tree splitting criteria for data with interactions. Aust & N. Z. J Stat 44(2):169–176
Buttrey SE (1998) Nearest-neighbor classification with categorical variables. Comput Stat Data Anal 28(2):157–169
Casquilho JP (2020) On the weighted gini-simpson index: estimating feasible weights using the optimal point and discussing a link with possibility theory. Soft Comput 24(22):17187–17194
Cerda P, Varoquaux G, Kégl B (2018) Similarity encoding for learning with dirty categorical variables. Machine Learn 107(8–10):1477–1494
Chen L, Guo G (2015) Nearest neighbor classification of categorical data by attributes weighting. Expert Syst Appl 42(6):3142–3149
Chen L, Ye Y, Guo G, Zhu J (2016) Kernel-based linear classification on categorical data. Soft Comput 20(8):2981–2993
Chen L, Wang S, Wang K, Zhu J (2016) Soft subspace clustering of categorical data with probabilistic distance. Pattern Recognit 51:322–332
Cheng L, Wang Y, Ma X (2019) A neural probabilistic outlier detection method for categorical data. Neurocomputing 365:325–335
Chen T, Guestrin C (2016). XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (KDD’16), pp 785–794
Chen L, Guo G, Wang S, Kong X (2014b). Kernel learning method for distance-based classification of categorical data. In: Proceedings of the 14th UK workshop on computational intelligence (UKCI’14), pp 58–63
Chen L, Wang S (2013). Central clustering of categorical data with automated feature weighting. In: Proceedings of the 23th international joint conference on artificial intelligence (IJCAI’13), pp 1260–1266
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learn 20:273–297
Deng G, Manton JH, Wang S (2018) Fast kernel smoothing by a low-rank approximation of the kernel toeplitz matrix. J Math Imaging Vis 60(8):1181–1195
Dos Santos TRL, Zárate LE (2015) Categorical data clustering: What similarity measure to recommend? Expert Syst Appl 42(3):1247–1260
Ghosh S (2018) Kernel smoothing principles. Wiley, Hoboken
Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inform Syst 25(5):345–366
Han E, Karypis G (2000). Centroid-based document classification: analysis & experimental results. In: Proceedings of the 4th European conference on principles and practice of knowledge discovery in databases (PKDD’00), pp 424–431
He Z, Xu X, Deng S (2008) K-ANMI: a mutual information based clustering algorithm for categorical data. Inform Fusion 9(2):223–233
Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220
Huang Z (1998) Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Mining Knowl Discovery 2(3):283–304
Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in K-means type clustering. IEEE Trans Pattern Anal Machine Intell 27(5):657–668
Jin W, Li ZJ, Wei LS, Zhen H (2000). The improvements of BP neural network learning algorithm. In: Proceedings of the 5th international conference on signal processing, pp 1647–1649
Larochelle H, Mandel M, Pascanu R, Bengio Y (2012) Learning algorithms for the classification restricted boltzmann machine. J Machine Learn Res 13(1):643–669
Li Q, Racine JS (2007) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton
Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Ouyang D, Li Q, Racine JS (2006) Cross-validation and the estimation of probability distributions with categorical data. J Nonparametric Stat 18(1):69–100
Qian Y, Li F, Liang J, Liu B, Dang C (2016) Space structure and clustering of categorical data. IEEE Trans Neural Netw Learn Syst 27(10):2047–2059
Quinlan J (1995). C4.5: Programms for machine learning. Morgan Kaufmann Publishers Inc
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323
Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. Wiley, New York
Seeger M (2006). Bayesian modeling in machine learning: a tutorial review. Tutorial, Saarland University. http://lapmal.epfl.ch/papers/bayes-review
Stone CJ (1984) An asymptotically optimal window selection rule for kernel density estimates. Ann Stat 12(4):1285-1297
Vo KT, Sowmya A (2010). Multiple kernel learning for classification of diffuse lung disease using HRCT lung images. In: Proceedings of the 2010 annual international conference of the IEEE engineering in medicine and biology, pp 3085–3088
Wang MQ, Yue XD, Gao C, Chen Y (2018). Feature selection ensemble for symbolic data classification with AHP. In: Proceedings of the 24th international conference on pattern recognition (ICPR’08), pp 868–873
Wang Z, Zhu Z, Li D (2020) Collaborative and geometric multi-kernel learning for multi-class classification. Pattern Recognit 99:107050
Wang R, Li Z, Cao J, Chen T, Wang L (2019). Convolutional recurrent neural networks for text classification. In: Proceedings of the 2019 international joint conference on neural networks (IJCNN), pp 1–6
Wang D, Tanaka T (2016). Sparse kernel principal component analysis based on elastic net regularization. In: Proceedings of the 2016 international joint conference on neural networks (IJCNN), pp 3703–3708
Yan X, Chen L, Guo G (2018) Center-based clustering of categorical data using kernel smoothing methods. Front Computer Sci 12(5):1032–1034
Zhang J, Chen L, Guo G (2013) Projected-prototype-based classifier for text categorization. Knowl Based Syst 49:179–189
Zhong S, Chen T, He F, Niu Y (2014) Fast gaussian kernel learning for classification tasks based on specially structured global optimization. Neural Netw 57:51–62
Zhou J, Chen L, Chen CLP, Zhang Y, Li HX (2016) Fuzzy clustering with the entropy of attribute weights. Neurocomputing 198(19):125–134
Zhu S, Xu L (2018) Many-objective fuzzy centroids clustering algorithm for categorical data. Expert Syst Appl 96:230–248
Acknowledgements
X. Yan, L. Chen and G. Guo’s work was supported by the National Natural Science Foundation of China under Grant Nos. U1805263, 61976053. X. Yan’s work was also supported by the National Natural Science Foundation of China under Grant No. 61772004 and the Guiding Foundation of Fujian Province of China No. 2020H0011.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
A Proof of Theorem 1
Since \({[I\left( \cdot \right) ]}^{2} = I\left( \cdot \right) \) and \(\sum _{o \in O_{d}}^{}{[p(o)]} = 1 ,\) the expectation of \(\hat{p}\left( o_{dl} \big | \lambda _{d} \right) \) can be obtained from Eq. (4):
So, the \(\text {Bias}\left( \hat{p}\left( o_{dl} \big | \lambda _{d} \right) \right) \) and the\(\text {Var}\left( \hat{p}\left( o_{dl} \big | \lambda _{d} \right) \right) \) can be computed as:
and
By combining the above two equalities, the theorem is proved.
B Proof of Theorem 2
For each \(o_{dl}\) in Eq. (6), we have that
Base on the facts that \(E\left[f(o_{dl}) \right]=p( o_{dl}) \) and \({[I(\cdot )]}^{2} = I(\cdot )\), the above equality can be simplified as
Therefore, \(\mathcal {L}\left( \lambda _{d} \right) \) can be computed as
Let \(\frac{\partial \mathcal {L}\left( \lambda _{d} \right) }{\partial \lambda _{d}} = 0\), we get the optimal estimate of \(\lambda _{d}\), and Eq. (7).
Rights and permissions
About this article
Cite this article
Yan, X., Chen, L. & Guo, G. Kernel-based data transformation model for nonlinear classification of symbolic data. Soft Comput 26, 1249–1259 (2022). https://doi.org/10.1007/s00500-021-06600-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06600-9