Kernel-based data transformation model for nonlinear classification of symbolic data

Yan, Xuanhui; Chen, Lifei; Guo, Gongde

doi:10.1007/s00500-021-06600-9

Kernel-based data transformation model for nonlinear classification of symbolic data

Data analytics and machine learning
Published: 17 January 2022

Volume 26, pages 1249–1259, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

255 Accesses
1 Citation
Explore all metrics

Abstract

Symbolic data are usually composed of some categorical variables used to represent discrete entities in many real-world applications. Mining of symbolic data is more difficult than numerical data due to the lack of inherent geometric properties of this type of data. In this paper, we use two kinds of kernel learning methods to create a kernel estimation model and a nonlinear classification algorithm for symbolic data. By using the kernel smoothing method, we construct a squared-error consistent probability estimator for symbolic data, followed by a new data transformation model proposed to embed symbolic data into Euclidean space. Based on the model, the inner product and distance measure between symbolic data objects are reformulated, allowing a new Support Vector Machine (SVM), called SVM-S, to be defined for nonlinear classification on symbolic data using the Mercer kernel learning method. The experiment results show that SVM can be much more effective for symbolic data classification based on our proposed model and measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

catch22: CAnonical Time-series CHaracteristics

Article Open access 09 August 2019

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

References

Agresti A (2008) An introduction to categorical data analysis. Wiley, New York
MATH Google Scholar
Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63(3):413–420
Article MathSciNet Google Scholar
Alaya MZ, Bussy S, Gaiffas S, Guilloux A (2017) Binarsity: a penalization for one-hot encoded features. J Machine Learn Res 20:1–34
MATH Google Scholar
Boriah S, Chandola V, Kumar V (2008). Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 8th SIAM international conference on data mining, pp 243–254
Breiman L (2001) Random forests. Machine Learn 45(1):5–32
Article Google Scholar
Bremner AP, Taplin RH (2002) Theory & methods: modified classification and regression tree splitting criteria for data with interactions. Aust & N. Z. J Stat 44(2):169–176
Article MathSciNet Google Scholar
Buttrey SE (1998) Nearest-neighbor classification with categorical variables. Comput Stat Data Anal 28(2):157–169
Article Google Scholar
Casquilho JP (2020) On the weighted gini-simpson index: estimating feasible weights using the optimal point and discussing a link with possibility theory. Soft Comput 24(22):17187–17194
Article Google Scholar
Cerda P, Varoquaux G, Kégl B (2018) Similarity encoding for learning with dirty categorical variables. Machine Learn 107(8–10):1477–1494
Article MathSciNet Google Scholar
Chen L, Guo G (2015) Nearest neighbor classification of categorical data by attributes weighting. Expert Syst Appl 42(6):3142–3149
Article Google Scholar
Chen L, Ye Y, Guo G, Zhu J (2016) Kernel-based linear classification on categorical data. Soft Comput 20(8):2981–2993
Article Google Scholar
Chen L, Wang S, Wang K, Zhu J (2016) Soft subspace clustering of categorical data with probabilistic distance. Pattern Recognit 51:322–332
Article Google Scholar
Cheng L, Wang Y, Ma X (2019) A neural probabilistic outlier detection method for categorical data. Neurocomputing 365:325–335
Article Google Scholar
Chen T, Guestrin C (2016). XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (KDD’16), pp 785–794
Chen L, Guo G, Wang S, Kong X (2014b). Kernel learning method for distance-based classification of categorical data. In: Proceedings of the 14th UK workshop on computational intelligence (UKCI’14), pp 58–63
Chen L, Wang S (2013). Central clustering of categorical data with automated feature weighting. In: Proceedings of the 23th international joint conference on artificial intelligence (IJCAI’13), pp 1260–1266
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learn 20:273–297
MATH Google Scholar
Deng G, Manton JH, Wang S (2018) Fast kernel smoothing by a low-rank approximation of the kernel toeplitz matrix. J Math Imaging Vis 60(8):1181–1195
Article MathSciNet Google Scholar
Dos Santos TRL, Zárate LE (2015) Categorical data clustering: What similarity measure to recommend? Expert Syst Appl 42(3):1247–1260
Article Google Scholar
Ghosh S (2018) Kernel smoothing principles. Wiley, Hoboken
MATH Google Scholar
Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inform Syst 25(5):345–366
Article Google Scholar
Han E, Karypis G (2000). Centroid-based document classification: analysis & experimental results. In: Proceedings of the 4th European conference on principles and practice of knowledge discovery in databases (PKDD’00), pp 424–431
He Z, Xu X, Deng S (2008) K-ANMI: a mutual information based clustering algorithm for categorical data. Inform Fusion 9(2):223–233
Article Google Scholar
Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220
Article MathSciNet Google Scholar
Huang Z (1998) Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Mining Knowl Discovery 2(3):283–304
Article Google Scholar
Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in K-means type clustering. IEEE Trans Pattern Anal Machine Intell 27(5):657–668
Article Google Scholar
Jin W, Li ZJ, Wei LS, Zhen H (2000). The improvements of BP neural network learning algorithm. In: Proceedings of the 5th international conference on signal processing, pp 1647–1649
Larochelle H, Mandel M, Pascanu R, Bengio Y (2012) Learning algorithms for the classification restricted boltzmann machine. J Machine Learn Res 13(1):643–669
MathSciNet MATH Google Scholar
Li Q, Racine JS (2007) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton
MATH Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Ouyang D, Li Q, Racine JS (2006) Cross-validation and the estimation of probability distributions with categorical data. J Nonparametric Stat 18(1):69–100
Article MathSciNet Google Scholar
Qian Y, Li F, Liang J, Liu B, Dang C (2016) Space structure and clustering of categorical data. IEEE Trans Neural Netw Learn Syst 27(10):2047–2059
Article MathSciNet Google Scholar
Quinlan J (1995). C4.5: Programms for machine learning. Morgan Kaufmann Publishers Inc
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323
Article Google Scholar
Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. Wiley, New York
Book Google Scholar
Seeger M (2006). Bayesian modeling in machine learning: a tutorial review. Tutorial, Saarland University. http://lapmal.epfl.ch/papers/bayes-review
Stone CJ (1984) An asymptotically optimal window selection rule for kernel density estimates. Ann Stat 12(4):1285-1297
Article MathSciNet Google Scholar
Vo KT, Sowmya A (2010). Multiple kernel learning for classification of diffuse lung disease using HRCT lung images. In: Proceedings of the 2010 annual international conference of the IEEE engineering in medicine and biology, pp 3085–3088
Wang MQ, Yue XD, Gao C, Chen Y (2018). Feature selection ensemble for symbolic data classification with AHP. In: Proceedings of the 24th international conference on pattern recognition (ICPR’08), pp 868–873
Wang Z, Zhu Z, Li D (2020) Collaborative and geometric multi-kernel learning for multi-class classification. Pattern Recognit 99:107050
Article Google Scholar
Wang R, Li Z, Cao J, Chen T, Wang L (2019). Convolutional recurrent neural networks for text classification. In: Proceedings of the 2019 international joint conference on neural networks (IJCNN), pp 1–6
Wang D, Tanaka T (2016). Sparse kernel principal component analysis based on elastic net regularization. In: Proceedings of the 2016 international joint conference on neural networks (IJCNN), pp 3703–3708
Yan X, Chen L, Guo G (2018) Center-based clustering of categorical data using kernel smoothing methods. Front Computer Sci 12(5):1032–1034
Article Google Scholar
Zhang J, Chen L, Guo G (2013) Projected-prototype-based classifier for text categorization. Knowl Based Syst 49:179–189
Article Google Scholar
Zhong S, Chen T, He F, Niu Y (2014) Fast gaussian kernel learning for classification tasks based on specially structured global optimization. Neural Netw 57:51–62
Article Google Scholar
Zhou J, Chen L, Chen CLP, Zhang Y, Li HX (2016) Fuzzy clustering with the entropy of attribute weights. Neurocomputing 198(19):125–134
Article Google Scholar
Zhu S, Xu L (2018) Many-objective fuzzy centroids clustering algorithm for categorical data. Expert Syst Appl 96:230–248
Article Google Scholar

Download references

Acknowledgements

X. Yan, L. Chen and G. Guo’s work was supported by the National Natural Science Foundation of China under Grant Nos. U1805263, 61976053. X. Yan’s work was also supported by the National Natural Science Foundation of China under Grant No. 61772004 and the Guiding Foundation of Fujian Province of China No. 2020H0011.

Author information

Authors and Affiliations

College of Computer and Cyber Security, Digital Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou, China
Xuanhui Yan, Lifei Chen & Gongde Guo

Authors

Xuanhui Yan
View author publications
You can also search for this author in PubMed Google Scholar
Lifei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gongde Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lifei Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Theorem 1

Since ${[I\left( \cdot \right) ]}^{2} = I\left( \cdot \right) $ and $\sum _{o \in O_{d}}^{}{[p(o)]} = 1 ,$ the expectation of $\hat{p}\left( o_{dl} \big | \lambda _{d} \right) $ can be obtained from Eq. (4):

$$\begin{aligned} \begin{array}{l} E\left( \hat{p}\left( o_{dl} \big | \lambda _{d} \right) \right) \\ \quad = E\left[\ell \left( X_{d},o_{dl},\lambda _{d} \right) \right]\\ \quad = \sum _{o \in O_{d}}^{}{[\frac{1}{{|O}_{d}|}\lambda _{d} + \left( 1 - \lambda _{d} \right) I\left( o = o_{dl} \right) ]p(o)}\\ \quad = \frac{\lambda _{d}}{{|O}_{d}|} + \left( 1 - \lambda _{d} \right) p\left( o_{dl} \right) \text { .} \end{array} \end{aligned}$$

So, the $\text {Bias}\left( \hat{p}\left( o_{dl} \big | \lambda _{d} \right) \right) $ and the$\text {Var}\left( \hat{p}\left( o_{dl} \big | \lambda _{d} \right) \right) $ can be computed as:

$$\begin{aligned}&[Bias\left( \hat{p}\left( o_{dl} \big | \lambda _{d} \right) \right) ]^{2}\\&= \left[\frac{\lambda _{d}}{{|O}_{d}|} - \lambda _{d}p\left( o_{dl} \right) \right]^{2}\\&= \lambda _{d}^{2}\left[{{|O}_{d}|}^{- 1} - p\left( o_{dl} \right) \right]^{2}, \end{aligned}$$

and

$$\begin{aligned}&\text {Var}\left( \hat{p}\left( o_{dl} \big | \lambda _{d} \right) \right) \nonumber \\&\quad = \frac{1}{N}\text {Var}\left[\ell \left( X_{d},o_{dl},\lambda _{d} \right) \right]\nonumber \\&\quad = \frac{1}{N}\left[E\left( \ell ^{2}\left( X_{d},o_{dl},\lambda _{d} \right) \right) - \left( E\left( \ell \left( X_{d},o_{dl},\lambda _{d} \right) \right) \right) ^{2} \right]\nonumber \\&\quad = \frac{1}{N}\left\{ \sum _{o \in O_{d}}^{}\left[\frac{\lambda _{d}}{{|O}_{d}|} + \left. (1 - \lambda _{d} \right. ) I\left( o = o_{dl} \right) \right]^{2} p(o)\right. \\&\qquad \left. - \left[\frac{\lambda _{d}}{{|O}_{d}|} + \left( 1 - \lambda _{d} \right) p\left( o_{dl} \right) \right]^{2} \right\} \nonumber \\&\quad = {\frac{1}{N}[\left. (1 - \lambda _{d} \right. )}^{2}p\left( o_{dl} \right) - \left. (1 - \lambda _{d} \right. )^{2}p^{2}\left( o_{dl} \right) ]\nonumber \\&\quad = \frac{\left. (1 - \lambda _{d} \right. )^{2}}{N}\left[p\left( o_{dl} \right) - p^{2}\left( o_{dl} \right) \right]\nonumber \end{aligned}$$

(15)

By combining the above two equalities, the theorem is proved.

B Proof of Theorem 2

For each $o_{dl}$ in Eq. (6), we have that

$$\begin{aligned}&E\left[{(\left( 1 - \lambda _{d} \right) f\left( o_{dl} \right) + \frac{\lambda _{d}}{{|O}_{d}|} - \ p{(o}_{dl}))}^{2} \right]\\&\quad =\left( 1 - \lambda _{d} \right) ^{2}E\left[f^{2}\left( o_{dl} \right) \right]\\&\qquad + 2\left[\frac{\lambda _{d} - \lambda _{d}^{2}}{{|O}_{d}|} + (\lambda _{d} - 1)p(o_{dl}))\right]\\&\quad E\left[f\left( o_{dl} \right) \right]+ [{p{(o}_{dl})]}^{2} - \frac{2\lambda _{d}}{{|O}_{d}|}p{(o}_{dl}) + \frac{\lambda _{d}^{2}}{{{|O}_{d}|}^{2}}. \end{aligned}$$

Base on the facts that $E\left[f(o_{dl}) \right]=p( o_{dl}) $ and ${[I(\cdot )]}^{2} = I(\cdot )$, the above equality can be simplified as

$$\begin{aligned} \begin{array}{l} \left( 1 - \lambda _{d} \right) ^{2} \left( E\left[f^{2}\left( o_{dl} \right) \right]- (E{\left[f \left( o_{dl} \right) \right])}^{2} \right) \\ \qquad + \left( 1 - \lambda _{d} \right) ^{2}p^{2}{(o}_{dl}) + \ 2\left[\frac{\lambda _{d} - \lambda _{d}^{2}}{{|O}_{d}|} + (\lambda _{d} - 1)p{(o}_{dl})) \right]\\ \qquad p{(o}_{dl}) + p^{2}{(o}_{dl}) - \frac{2\lambda _{d}}{{|O}_{d}|}p{(o}_{dl}) + \frac{\lambda _{d}^{2}}{{{|O}_{d}|}^{2}}\\ \quad = \left( 1 - \lambda _{d} \right) ^{2}\frac{p{(o}_{dl})(1 - p{(o}_{dl}))}{N} + \lambda _{d}^{2} [{p(o_{dl})}]^{2}\\ \qquad - \frac{2\lambda _{d}^{2}}{|O_{d}|}p(o_{dl}) + \frac{\lambda _{d}^{2}}{{{|O}_{d}|}^{2}}\\ \quad = \left[\lambda _{d}^{2} - \frac{\left( 1 - \lambda _{d} \right) ^{2}}{N} \right]p^{2}{(o}_{dl}) + \left[\frac{\left( 1 - \lambda _{d} \right) ^{2}}{N} - \frac{2\lambda _{d}^{2}}{{|O}_{d}|} \right]p{(o}_{dl})\\ \qquad + \frac{\lambda _{d}^{2}}{{{|O}_{d}|}^{2}}. \end{array} \end{aligned}$$

Therefore, $\mathcal {L}\left( \lambda _{d} \right) $ can be computed as

$$\begin{aligned} \mathcal {L}\left( \lambda _{d} \right)= & {} \left[\lambda _{d}^{2} - \frac{\left( 1 - \lambda _{d} \right) ^{2}}{N} \right]\sum _{o_{dl} \in O_{d}}^{}{[{p{(o}_{dl})]}^{2}}\\&+ \frac{\left( 1 - \lambda _{d} \right) ^{2}}{N} - \frac{\lambda _{d}^{2}}{{|O}_{d}|} \\= & {} \left( 1 - \frac{1}{{|O}_{d}|} \right) \lambda _{d}^{2} + \left[\frac{\left( 1 - \lambda _{d} \right) ^{2}}{N} - \lambda _{d}^{2} \right]\sigma _{d}^{2}\text {\ .} \end{aligned}$$

Let $\frac{\partial \mathcal {L}\left( \lambda _{d} \right) }{\partial \lambda _{d}} = 0$, we get the optimal estimate of $\lambda _{d}$, and Eq. (7).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, X., Chen, L. & Guo, G. Kernel-based data transformation model for nonlinear classification of symbolic data. Soft Comput 26, 1249–1259 (2022). https://doi.org/10.1007/s00500-021-06600-9

Download citation

Accepted: 19 November 2021
Published: 17 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00500-021-06600-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel-based data transformation model for nonlinear classification of symbolic data

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

catch22: CAnonical Time-series CHaracteristics

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

A Proof of Theorem 1

B Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel-based data transformation model for nonlinear classification of symbolic data

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

catch22: CAnonical Time-series CHaracteristics

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

A Proof of Theorem 1

B Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation