Simultaneous Supervised and Unsupervised Classification Modeling for Assessing Cluster Analysis and Improving Results Interpretability

Fordellone, Mario; Vichi, Maurizio

doi:10.1007/978-3-030-21140-0_3

Mario Fordellone²¹ &
Maurizio Vichi²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Included in the following conference series:

Scientific Meeting of the Classification and Data Analysis Group of the Italian Statistical Society

1091 Accesses

Abstract

In the unsupervised classification field, the unknown number of clusters and the lack of assessment and interpretability of the final partition by means of inferential tools denote important limitations that could negatively influence the reliability of the final results. In this work, we propose to combine unsupervised classification with supervised methods in order to enhance the assessment and interpretation of the obtained partition. In particular, the approach consists in combining of the clustering method k-means (KM) with logistic regression (LR) modeling to have an algorithm that allows an evaluation of the partition identified through KM, to assess the correct number of clusters, and to verify the selection of the most important variables. An application on real data is presented to better clarify the utility of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agresti, A., Kateri, M.: Categorical data analysis. In: International Encyclopedia of Statistical Science, pp. 206–208 (2011)
Chapter Google Scholar
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Chaovalit, P., Zhou, L.: Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, HICSS’05 (2005)
Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)
Article Google Scholar
Dua, D., Taniskidou, E.K.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2017)
Google Scholar
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Miscellaneous clustering methods. In: Cluster Analysis, 5th edn., pp. 215–255. Wiley, New York (2011)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The Elements of Statistical Learning, pp. 485–585. Springer, Dordrecht (2009)
Google Scholar
Hepner, G., Logan, T., Ritter, N., Bryant, N.: Artificial neural network classification using a minimal training set. Comparison to conventional supervised classification. Photogramm. Eng. Remote. Sens. 56(4), 469–473 (1990)
Google Scholar
Krzanowski, W.J., Lai, Y.T.: A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44, 23–34 (1988)
Article MathSciNet Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1(14), 281–297 (1967)
MathSciNet MATH Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 63(2), 411–423 (2001)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Mario Fordellone & Maurizio Vichi

Authors

Mario Fordellone
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Vichi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mario Fordellone .

Editor information

Editors and Affiliations

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
Francesca Greselin
Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
Laura Deldossi
Department of Economic and Social Sciences, Università Cattolica del Sacro Cuore, Piacenza, Italy
Luca Bagnato
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fordellone, M., Vichi, M. (2019). Simultaneous Supervised and Unsupervised Classification Modeling for Assessing Cluster Analysis and Improving Results Interpretability. In: Greselin, F., Deldossi, L., Bagnato, L., Vichi, M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-21140-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-21140-0_3
Published: 07 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21139-4
Online ISBN: 978-3-030-21140-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics