Abstract
Archetypal analysis expresses observations in terms of a limited number of archetypes, defined as convex combinations of observed units. Such archetypes are found by minimizing a suitable loss function according to the ordinary least squares approach. The technique usually requires a data matrix without missing values. Obviously, this limits its applicability whenever at least one data entry is not available. For this purpose, extensions of archetypal analysis with missing data are developed. In line with recent advances in this domain, this is done by introducing a weighting system giving null weights to the missing entries that are imputed in order to determine the archetypes. This can be done by approaching the problem by means of weighted least squares. The effectiveness of the proposals, also in comparison with existing techniques, is explored by applications to simulated and real data sets.
Similar content being viewed by others
Data availability
The data set analyzed in the paper is available at https://www.fao.org/faostat/en/#home and is given in the supplementary material.
Notes
Matlab routines are available upon request from the corresponding author.
References
Asakawa M, Okano M (2013) Japanese consumer’s food selection criteria and gender-based differences. Behaviormetrika 40:41–55
De Leeuw J, Heiser WJ (1980) Multidimensional scaling with restrictions on the configuration. In: Krishnaiah PR (Ed) Multivariate analysis V. Amsterdam, North-Holland, pp 501–522
Bro R, De Jong S (1997) A fast non-negativity-constrained least squares algorithm. J Chemom 11:393–401
Chi JT, Chi EC, Baraniuk RG (2016) k-POD: a method for k-means clustering of missing data. Am Stat 70:91–99
Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36:338–347
Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9:617–621
Epifanio I (2013) h-plots for displaying nonmetric dissimilarity matrices. Statistical Analysis Data Mining 6:136–143
Epifanio I, Ibáñez MV, Simó A (2018) Archetypal shapes based on landmarks and extension to handle missing data. Adv Data Anal Classif 12:705–735
Epifanio I, Ibáñez MV, Simó A (2020) Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles. Am Stat 74:169–183
Eugster MJA, Leisch F (2009) From spider-man to hero - archetypal analysis in R. J Stat Softw 30:1–23
Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic Press, London
Gillis N (2020) Nonnegative matrix factorization. SIAM, Philadelphia
Heiser WJ (1987) Correspondence analysis with least absolute residuals. Comput Stat Data Anal 5:337–356
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken
Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62:251–266
Lawson CL, Hanson RJ (1995) Solving least squares problems (Classics in applied mathematics Vol. 15). Philadelphia, SIAM
Lindsay AC, Sitthisongkram S, Greaney ML, Wallington SF, Ruengdej P (2017) Non-responsive feeding practices, unhealthy eating behaviors, and risk of child overweight and obesity in Southeast Asia: a systematic review. Int J Environ Res Public Health 14:436
Little R, Rubin D (2002) Statistical analysis with missing data. Wiley, Hoboken
Mørup M, Hansen LK (2012) Archetypal analysis for machine learning and data mining. Neurocomputing 80:54–63
Nakayama A (2005) A multidimensional scaling model for three-way data analysis. Behaviormetrika 32:95–110
Reddy S, Anitha M (2015) Culture and its influence on nutrition and oral health. Biomed Pharmacol J 8:613–620
Steinschneider S, Lall U (2015) Daily precipitation and tropical moisture exports across the Eastern United States: an application of archetypal analysis to identify spatiotemporal structure. J Clim 28:8585–8602
Suleman A (2015) A convex semi-nonnegative matrix factorisation approach to fuzzy c-means clustering. Fuzzy Sets Syst 270:90–110
Tsuchida J, Yadohisa H (2016) Asymmetric multidimensional scaling of n-mode M-way categorical data using a log-linear model. Behaviormetrika 43:103–138
Vinué G, Epifanio I, Alemany S (2015) Archetypoids: a new approach to define representative archetypal data. Comput Stat Data Anal 87:102–115
Wohlrabe K, Gralka S (2020) Using archetypoid analysis to classify institutions and faculties of Economics. Scientometrics 123:159–179
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Communicated by Alfonso Iodice D’Enza.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix 1
Appendix 1
Let f(M | X, W) be the loss function to be minimized where X, W denote the data and weight matrices, respectively, and M is the model description for X. Assuming that the minimum of f(M | X, W) cannot be found by using a closed-form solution, f(M | X, W) can be minimized finding a monotonically decreasing sequence of loss function values, which converges because the loss function is bounded below by zero.
Majorization tools (see, e.g., de Leeuw and Heiser 1980), can be used to decrease the loss function value at every iteration. The idea of majorization is to find, at each iteration t and given the current estimate M(t), a majorizing function g(M | M(t), X, W) such that
Therefore, the majorizing function is always higher than the loss function, except at the current estimate, where the two functions coincide.
The minimization of g is easier than that of f. If M(t+1) is the minimizer of g, from (41) it is clear that the following inequalities hold:
Thus, by minimizing g subsequently, we decrease the value of f (see also Fig.
9). Following Heiser (1987), in order to choose g, the majorizing function of f is
where W2 = W * W and d2 = max(W2). If the elements of W are either 1 or 0, it is clear that W2 = W and d2 = 1 and therefore (43) can be simplified as
In (44), \({\text{f(}}{\mathbf{M}}{\text{|}}{\mathbf{X}}^{{{\text{(t)}}}} {\text{) = }}\) || X(t) – M ||2, where X(t) takes the form
with V = 1n × p – W.
The importance of Kiers (1997) is based on the result in (44) and (45).
The iterative procedure can be summarized as follows.
Step 0 (Inizialitation).
Initialize M as M(t) with t = 0 and compute f(M(t) | X, W). Note that f(M(t) | X, W) is the sum of squared residuals for the non-missing entries.
Step 1 (Computation of the data matrix).
Compute X(t) = V * M(t) + W * X.
Step 2 (OLS estimation).
Update M as M(t+1) that minimizes || X(t) – M ||2, i.e., the OLS problem using the data in X(t).
Step 3 (Convergence).
Compute f(M(t+1) | X, W). Letting ε be the pre-specified convergence criterion (e.g., 10–6), if f(M(t) | X, W) – f(M(t+1) | X, W) > ε f(M(t) | X, W), set t = t + 1 and go to Step 1; else consider the algorithm converged.
About this article
Cite this article
Giordani, P., Kiers, H.A.L. Weighted least squares for archetypal analysis with missing data. Behaviormetrika 51, 441–475 (2024). https://doi.org/10.1007/s41237-023-00220-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41237-023-00220-3