Abstract
This paper introduces a semi-supervised learning technique for model-based clustering. Our research focus is on applying it to matrices of ordered categorical response data, such as those obtained from the surveys with Likert scale responses. We use the proportional odds model, which is popular and widely used for analyzing such data, as the model structure. Our proposed technique is designed for analyzing datasets that contain both labeled and unlabeled observations from multiple clusters. The model fitting is performed using the expectation-maximization (EM) algorithm, incorporating the labeled cluster memberships, to cluster the unlabeled observations.
To evaluate the performance of our proposed model, we conducted a simulation study in which we tested the model from eight different scenarios, each with varying combinations and proportions of known and unknown cluster memberships. The fitted models accurately estimate the parameters in most of the designed scenarios, indicating that our technique is effective in clustering partially-labeled data with ordered categorical response variables.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agresti, A.: An Introduction to Categorical Data Analysis. Wiley Series in Probability and Statistics. Wiley-Interscience, 2nd edn. (2007)
Agresti, A.: Analysis of Ordinal Categorical Data. Wiley Series in Probability and Statistics. Wiley, 2nd edn. (2010)
Anderson, J.A.: Regression and ordered categorical variables. J. Roy. Stat. Soc. Series B Methodol. 46(1), 1ā30 (1984)
Anderson, J.A., Philips, P.R.: Regression, discrimination and measurement models for ordered categorical variables. Appl. Stat. 30, 22ā31 (1981)
Bƶhning, D., Seidel, W., AlfĆ³, M., Garel, B., Patilea, V., Walther, G.: Advances in mixture models. Comput. Stat. Data Anal. 51(11), 5205ā5210 (2007)
BĆ¼rkner, P., Vuorre, M.: Ordinal regression models in psychology: a tutorial. Adv. Methods Pract. Psychol. Sci. 2(1), 77ā101 (2019)
Cawthron: Cawthron. https://www.cawthron.org.nz/about-us/ (2023)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B Methodol. 39(1), 1ā38 (1977)
Everitt, B.S., Leese, M., Landau, S.: Cluster Analysis. Hodder Arnold Publication, 4th edn. (2001)
FernĆ”ndez, D., Arnold, R., Pledger, S.: Mixture-based clustering for the ordered stereotype model. Comput. Stat. Data Anal. 93, 46ā75 (2014)
Grossman, S.I.: Calculus, 3rd edn. Academic Press (1984)
Janitza, S., Tutz, G., Boulesteix, A.L.: Random forest for ordinal responses: prediction and variable selection. Comput. Stat. Data Anal. 96, 57ā73 (2016)
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika, pp. 241ā254 (1967)
Lanfranchi, M., Giannetto, C., Zirilli, A.: Analysis of demand determinants of high quality food products through the application of the cumulative proportional odds model. Appl. Math. Sci. 8, 3297ā3305 (2014)
Lloyd, S.P.: Least square quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129ā137 (1957)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability 1, pp. 281ā297 (1967)
McCullagh, P.: Regression models for ordinal data. J. Roy. Stat. Soc. Ser. B, Methodol. 42(2), 109ā142 (1980)
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Statistics, Textbooks and Monographs, M. Dekker (1988)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. John Wiley and Sons Inc. (2015)
McLachlan, G.J., Peel, D.: Finite mixture models. Wiley Series in Probability and Statistics (2000)
McNicholas, P.D.: Mixture Model-Based Classification. CRC Press, Boca Raton (2017)
Melnykov, V., Maitra, R.: Finite mixture models and model-based clustering. lowa State University Digital Repository (2010)
Pechey, R., Monsivais, P., Ng, Y.L., Marteau, T.M.: Why donāt poor men eat fruit? Socioeconomic differences in motivations for fruit consumption. Appetite 84, 271ā279 (2015)
Skolnick, B.E., et al.: A clinical trial of the progesterone for severe traumatic brain injury. N. Engl. J. Med. 371, 2467ā2476 (2014)
Zhang, Y., Wen, J., Wang, X., Jiang, Z.: Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning. J. Intell. Inf. Syst. 45(1), 113ā130 (2013)
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures Artif. Intell. Mach. Learn. 3(1), 1ā130 (2009)
Acknowledgements
This work was supported by MBIE Data Science SSIF Fund under the contract RTVU1914.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cui, Y., McMillan, L., Liu, I. (2024). Semi-supervised Model-Based Clustering forĀ Ordinal Data. In: Benavides-Prado, D., Erfani, S., Fournier-Viger, P., Boo, Y.L., Koh, Y.S. (eds) Data Science and Machine Learning. AusDM 2023. Communications in Computer and Information Science, vol 1943. Springer, Singapore. https://doi.org/10.1007/978-981-99-8696-5_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-8696-5_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8695-8
Online ISBN: 978-981-99-8696-5
eBook Packages: Computer ScienceComputer Science (R0)