Skip to main content

Semi-supervised Model-Based Clustering forĀ Ordinal Data

  • Conference paper
  • First Online:
Data Science and Machine Learning (AusDM 2023)

Abstract

This paper introduces a semi-supervised learning technique for model-based clustering. Our research focus is on applying it to matrices of ordered categorical response data, such as those obtained from the surveys with Likert scale responses. We use the proportional odds model, which is popular and widely used for analyzing such data, as the model structure. Our proposed technique is designed for analyzing datasets that contain both labeled and unlabeled observations from multiple clusters. The model fitting is performed using the expectation-maximization (EM) algorithm, incorporating the labeled cluster memberships, to cluster the unlabeled observations.

To evaluate the performance of our proposed model, we conducted a simulation study in which we tested the model from eight different scenarios, each with varying combinations and proportions of known and unknown cluster memberships. The fitted models accurately estimate the parameters in most of the designed scenarios, indicating that our technique is effective in clustering partially-labeled data with ordered categorical response variables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agresti, A.: An Introduction to Categorical Data Analysis. Wiley Series in Probability and Statistics. Wiley-Interscience, 2nd edn. (2007)

    Google ScholarĀ 

  2. Agresti, A.: Analysis of Ordinal Categorical Data. Wiley Series in Probability and Statistics. Wiley, 2nd edn. (2010)

    Google ScholarĀ 

  3. Anderson, J.A.: Regression and ordered categorical variables. J. Roy. Stat. Soc. Series B Methodol. 46(1), 1ā€“30 (1984)

    Google ScholarĀ 

  4. Anderson, J.A., Philips, P.R.: Regression, discrimination and measurement models for ordered categorical variables. Appl. Stat. 30, 22ā€“31 (1981)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  5. Bƶhning, D., Seidel, W., AlfĆ³, M., Garel, B., Patilea, V., Walther, G.: Advances in mixture models. Comput. Stat. Data Anal. 51(11), 5205ā€“5210 (2007)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  6. BĆ¼rkner, P., Vuorre, M.: Ordinal regression models in psychology: a tutorial. Adv. Methods Pract. Psychol. Sci. 2(1), 77ā€“101 (2019)

    ArticleĀ  Google ScholarĀ 

  7. Cawthron: Cawthron. https://www.cawthron.org.nz/about-us/ (2023)

  8. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B Methodol. 39(1), 1ā€“38 (1977)

    Google ScholarĀ 

  9. Everitt, B.S., Leese, M., Landau, S.: Cluster Analysis. Hodder Arnold Publication, 4th edn. (2001)

    Google ScholarĀ 

  10. FernĆ”ndez, D., Arnold, R., Pledger, S.: Mixture-based clustering for the ordered stereotype model. Comput. Stat. Data Anal. 93, 46ā€“75 (2014)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  11. Grossman, S.I.: Calculus, 3rd edn. Academic Press (1984)

    Google ScholarĀ 

  12. Janitza, S., Tutz, G., Boulesteix, A.L.: Random forest for ordinal responses: prediction and variable selection. Comput. Stat. Data Anal. 96, 57ā€“73 (2016)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  13. Johnson, S.C.: Hierarchical clustering schemes. Psychometrika, pp. 241ā€“254 (1967)

    Google ScholarĀ 

  14. Lanfranchi, M., Giannetto, C., Zirilli, A.: Analysis of demand determinants of high quality food products through the application of the cumulative proportional odds model. Appl. Math. Sci. 8, 3297ā€“3305 (2014)

    Google ScholarĀ 

  15. Lloyd, S.P.: Least square quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129ā€“137 (1957)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  16. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability 1, pp. 281ā€“297 (1967)

    Google ScholarĀ 

  17. McCullagh, P.: Regression models for ordinal data. J. Roy. Stat. Soc. Ser. B, Methodol. 42(2), 109ā€“142 (1980)

    Google ScholarĀ 

  18. McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Statistics, Textbooks and Monographs, M. Dekker (1988)

    MATHĀ  Google ScholarĀ 

  19. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. John Wiley and Sons Inc. (2015)

    Google ScholarĀ 

  20. McLachlan, G.J., Peel, D.: Finite mixture models. Wiley Series in Probability and Statistics (2000)

    Google ScholarĀ 

  21. McNicholas, P.D.: Mixture Model-Based Classification. CRC Press, Boca Raton (2017)

    MATHĀ  Google ScholarĀ 

  22. Melnykov, V., Maitra, R.: Finite mixture models and model-based clustering. lowa State University Digital Repository (2010)

    Google ScholarĀ 

  23. Pechey, R., Monsivais, P., Ng, Y.L., Marteau, T.M.: Why donā€™t poor men eat fruit? Socioeconomic differences in motivations for fruit consumption. Appetite 84, 271ā€“279 (2015)

    ArticleĀ  Google ScholarĀ 

  24. Skolnick, B.E., et al.: A clinical trial of the progesterone for severe traumatic brain injury. N. Engl. J. Med. 371, 2467ā€“2476 (2014)

    ArticleĀ  Google ScholarĀ 

  25. Zhang, Y., Wen, J., Wang, X., Jiang, Z.: Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning. J. Intell. Inf. Syst. 45(1), 113ā€“130 (2013)

    ArticleĀ  Google ScholarĀ 

  26. Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures Artif. Intell. Mach. Learn. 3(1), 1ā€“130 (2009)

    ArticleĀ  MATHĀ  Google ScholarĀ 

Download references

Acknowledgements

This work was supported by MBIE Data Science SSIF Fund under the contract RTVU1914.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Cui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cui, Y., McMillan, L., Liu, I. (2024). Semi-supervised Model-Based Clustering forĀ Ordinal Data. In: Benavides-Prado, D., Erfani, S., Fournier-Viger, P., Boo, Y.L., Koh, Y.S. (eds) Data Science and Machine Learning. AusDM 2023. Communications in Computer and Information Science, vol 1943. Springer, Singapore. https://doi.org/10.1007/978-981-99-8696-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8696-5_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8695-8

  • Online ISBN: 978-981-99-8696-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics