Abstract
Unsupervised machine learning methods are important analytical tools that can facilitate the analysis and interpretation of high-dimensional data. Unsupervised machine learning methods identify latent patterns and hidden structures in high-dimensional data and can help simplify complex datasets. This article provides an overview of key unsupervised machine learning techniques including K-means clustering, hierarchical clustering, principal component analysis, and factor analysis. With a deeper understanding of these analytical tools, unsupervised machine learning methods can be incorporated into health sciences research to identify novel risk factors, improve prevention strategies, and facilitate delivery of personalized therapies and targeted patient care.
Level of evidence: I
Similar content being viewed by others
References
Altman NKM (2017) Clustering. Nat Methods 14:545–546
Angelini F, Widera P, Mobasheri A, Blair J, Struglics A, Uebelhoer M et al (2022) Osteoarthritis endotype discovery via clustering of biochemical marker data. Ann Rheum Dis 81:666–675
Bastanlar Y, Ozuysal M (2014) Introduction to machine learning. Methods Mol Biol 1107:105–128
Cadima J, Cerdeira JO, Minhoto M (2004) Computational aspects of algorithms for variable selection in the context of principal components. Comput Stat Data Anal 47:225–236
Davenport T, Kalakota R (2019) The potential for artificial intelligence in healthcare. Future Healthc J 6:94–98
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
Eckhardt CM, Gambazza S, Bloomquist TR, De Hoff P, Vuppala A, Vokonas PS et al (2022) Extracellular vesicle-encapsulated microRNAs as novel biomarkers of lung health. Am J Respir Crit Care Med. https://doi.org/10.1164/rccm.202109-2208OC
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 1st edn. Springer, New York, NY
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32:241–254
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 374:20150202
Lever J, Krzywinski M, Altman N (2017) Principal component analysis. Nat Methods 14:641–642
MacQueen J (1967) Classification and analysis of multivariate observations. In 5th Berkeley Symp Math Statist Probability 281–297
Martin JA, Stiffler-Joachim MR, Wille CM, Heiderscheit BC (2022) A hierarchical clustering approach for examining potential risk factors for bone stress injury in runners. J Biomech 141:111136. https://doi.org/10.1016/j.jbiomech.2022.111136
Nwachukwu BU, Beck EC, Lee EK, Cancienne JM, Waterman BR, Paul K et al (2020) Application of machine learning for predicting clinically meaningful outcome after arthroscopic femoroacetabular impingement surgery. Am J Sports Med 48:415–423
Pourahmad S, Basirat A, Rahimi A, Doostfatemeh M (2020) Does Determination of initial cluster centroids improve the performance of K-means clustering algorithm? Comparison of three hybrid methods by genetic algorithm, minimum spanning tree, and hierarchical clustering in an applied study. Comput Math Methods Med 2020:7636857
Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6:81–87
Steinley D, Brusco MJ (2007) Initializing K-means batch clustering: a critical evaluation of several techniques. J Classif 24:99–121
Tavakol M, Wetzel A (2020) Factor analysis: a means for theory and instrument development in support of construct validity. Int J Med Educ 11:245–247
Velten B, Braunger JM, Argelaguet R, Arnol D, Wirbel J, Bredikhin D et al (2022) Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat Methods 19:179–186
Walsh BM, Kosik KB, Bain KA, Houston MN, Hoch MC, Gribble P et al (2022) Exploratory factor analysis of the fear-avoidance beliefs questionnaire in patients with chronic ankle instability. Foot (Edinb) 51:101902
Walters SJ, Campbell MJ (2004) The use of bootstrap methods for analysing health-related quality of life outcomes (particularly the SF-36). Health Qual Life Outcomes 2:70. https://doi.org/10.1186/1477-7525-2-70
Xu N, Finkelman RB, Dai S, Xu C, Peng M (2021) Average linkage hierarchical clustering algorithm for determining the relationships between elements in coal. ACS Omega 6:6206–6217
Yocum D, Reinbolt J, Weinhandl JT, Standifird TW, Fitzhugh E, Cates H et al (2021) Principal component analysis of knee joint differences between bilateral and unilateral total knee replacement patients during level walking. J Biomech Eng 143(11):111003. https://doi.org/10.1115/1.4051524 (PMID: 34159353)
Funding
There is no funding source.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Eckhardt, C.M., Madjarova, S.J., Williams, R.J. et al. Unsupervised machine learning methods and emerging applications in healthcare. Knee Surg Sports Traumatol Arthrosc 31, 376–381 (2023). https://doi.org/10.1007/s00167-022-07233-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00167-022-07233-7