Semi-supervised Model-Based Clustering for Ordinal Data

Cui, Ying; McMillan, Louise; Liu, Ivy

doi:10.1007/978-981-99-8696-5_3

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1943))

Included in the following conference series:

Australasian Conference on Data Science and Machine Learning

185 Accesses

Abstract

This paper introduces a semi-supervised learning technique for model-based clustering. Our research focus is on applying it to matrices of ordered categorical response data, such as those obtained from the surveys with Likert scale responses. We use the proportional odds model, which is popular and widely used for analyzing such data, as the model structure. Our proposed technique is designed for analyzing datasets that contain both labeled and unlabeled observations from multiple clusters. The model fitting is performed using the expectation-maximization (EM) algorithm, incorporating the labeled cluster memberships, to cluster the unlabeled observations.

To evaluate the performance of our proposed model, we conducted a simulation study in which we tested the model from eight different scenarios, each with varying combinations and proportions of known and unknown cluster memberships. The fitted models accurately estimate the parameters in most of the designed scenarios, indicating that our technique is effective in clustering partially-labeled data with ordered categorical response variables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agresti, A.: An Introduction to Categorical Data Analysis. Wiley Series in Probability and Statistics. Wiley-Interscience, 2nd edn. (2007)
Google Scholar
Agresti, A.: Analysis of Ordinal Categorical Data. Wiley Series in Probability and Statistics. Wiley, 2nd edn. (2010)
Google Scholar
Anderson, J.A.: Regression and ordered categorical variables. J. Roy. Stat. Soc. Series B Methodol. 46(1), 1–30 (1984)
Google Scholar
Anderson, J.A., Philips, P.R.: Regression, discrimination and measurement models for ordered categorical variables. Appl. Stat. 30, 22–31 (1981)
Article MathSciNet MATH Google Scholar
Böhning, D., Seidel, W., Alfó, M., Garel, B., Patilea, V., Walther, G.: Advances in mixture models. Comput. Stat. Data Anal. 51(11), 5205–5210 (2007)
Article MathSciNet MATH Google Scholar
Bürkner, P., Vuorre, M.: Ordinal regression models in psychology: a tutorial. Adv. Methods Pract. Psychol. Sci. 2(1), 77–101 (2019)
Article Google Scholar
Cawthron: Cawthron. https://www.cawthron.org.nz/about-us/ (2023)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B Methodol. 39(1), 1–38 (1977)
Google Scholar
Everitt, B.S., Leese, M., Landau, S.: Cluster Analysis. Hodder Arnold Publication, 4th edn. (2001)
Google Scholar
Fernández, D., Arnold, R., Pledger, S.: Mixture-based clustering for the ordered stereotype model. Comput. Stat. Data Anal. 93, 46–75 (2014)
Article MathSciNet MATH Google Scholar
Grossman, S.I.: Calculus, 3rd edn. Academic Press (1984)
Google Scholar
Janitza, S., Tutz, G., Boulesteix, A.L.: Random forest for ordinal responses: prediction and variable selection. Comput. Stat. Data Anal. 96, 57–73 (2016)
Article MathSciNet MATH Google Scholar
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika, pp. 241–254 (1967)
Google Scholar
Lanfranchi, M., Giannetto, C., Zirilli, A.: Analysis of demand determinants of high quality food products through the application of the cumulative proportional odds model. Appl. Math. Sci. 8, 3297–3305 (2014)
Google Scholar
Lloyd, S.P.: Least square quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1957)
Article MathSciNet MATH Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability 1, pp. 281–297 (1967)
Google Scholar
McCullagh, P.: Regression models for ordinal data. J. Roy. Stat. Soc. Ser. B, Methodol. 42(2), 109–142 (1980)
Google Scholar
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Statistics, Textbooks and Monographs, M. Dekker (1988)
MATH Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. John Wiley and Sons Inc. (2015)
Google Scholar
McLachlan, G.J., Peel, D.: Finite mixture models. Wiley Series in Probability and Statistics (2000)
Google Scholar
McNicholas, P.D.: Mixture Model-Based Classification. CRC Press, Boca Raton (2017)
MATH Google Scholar
Melnykov, V., Maitra, R.: Finite mixture models and model-based clustering. lowa State University Digital Repository (2010)
Google Scholar
Pechey, R., Monsivais, P., Ng, Y.L., Marteau, T.M.: Why don’t poor men eat fruit? Socioeconomic differences in motivations for fruit consumption. Appetite 84, 271–279 (2015)
Article Google Scholar
Skolnick, B.E., et al.: A clinical trial of the progesterone for severe traumatic brain injury. N. Engl. J. Med. 371, 2467–2476 (2014)
Article Google Scholar
Zhang, Y., Wen, J., Wang, X., Jiang, Z.: Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning. J. Intell. Inf. Syst. 45(1), 113–130 (2013)
Article Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported by MBIE Data Science SSIF Fund under the contract RTVU1914.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
Ying Cui, Louise McMillan & Ivy Liu
Centre for Data Science and Artificial Intelligence, Victoria University of Wellington, Wellington, New Zealand
Ying Cui, Louise McMillan & Ivy Liu

Authors

Ying Cui
View author publications
You can also search for this author in PubMed Google Scholar
Louise McMillan
View author publications
You can also search for this author in PubMed Google Scholar
Ivy Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Cui .

Editor information

Editors and Affiliations

The University of Auckland, Auckland, New Zealand
Diana Benavides-Prado
The University of Melbourne, Carlton, VIC, Australia
Sarah Erfani
Shenzhen University, Shenzhen, China
Philippe Fournier-Viger
RMIT University, Melbourne, VIC, Australia
Yee Ling Boo
The University of Auckland, Auckland, New Zealand
Yun Sing Koh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cui, Y., McMillan, L., Liu, I. (2024). Semi-supervised Model-Based Clustering for Ordinal Data. In: Benavides-Prado, D., Erfani, S., Fournier-Viger, P., Boo, Y.L., Koh, Y.S. (eds) Data Science and Machine Learning. AusDM 2023. Communications in Computer and Information Science, vol 1943. Springer, Singapore. https://doi.org/10.1007/978-981-99-8696-5_3

Download citation

DOI: https://doi.org/10.1007/978-981-99-8696-5_3
Published: 05 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8695-8
Online ISBN: 978-981-99-8696-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi-supervised Model-Based Clustering for Ordinal Data