Weighted least squares for archetypal analysis with missing data

Giordani, Paolo; Kiers, Henk A. L.

doi:10.1007/s41237-023-00220-3

Weighted least squares for archetypal analysis with missing data

Original Paper
Published: 29 December 2023

Volume 51, pages 441–475, (2024)
Cite this article

Behaviormetrika Aims and scope Submit manuscript

104 Accesses
1 Citation
Explore all metrics

Abstract

Archetypal analysis expresses observations in terms of a limited number of archetypes, defined as convex combinations of observed units. Such archetypes are found by minimizing a suitable loss function according to the ordinary least squares approach. The technique usually requires a data matrix without missing values. Obviously, this limits its applicability whenever at least one data entry is not available. For this purpose, extensions of archetypal analysis with missing data are developed. In line with recent advances in this domain, this is done by introducing a weighting system giving null weights to the missing entries that are imputed in order to determine the archetypes. This can be done by approaching the problem by means of weighted least squares. The effectiveness of the proposals, also in comparison with existing techniques, is explored by applications to simulated and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted Optimization with Thresholding for Complete-Case Analysis

Elemental Estimates, Influence, and Algorithmic Leveraging

Slice weighted average regression

Article Open access 20 July 2023

Data availability

The data set analyzed in the paper is available at https://www.fao.org/faostat/en/#home and is given in the supplementary material.

Notes

Matlab routines are available upon request from the corresponding author.

References

Asakawa M, Okano M (2013) Japanese consumer’s food selection criteria and gender-based differences. Behaviormetrika 40:41–55
Article Google Scholar
De Leeuw J, Heiser WJ (1980) Multidimensional scaling with restrictions on the configuration. In: Krishnaiah PR (Ed) Multivariate analysis V. Amsterdam, North-Holland, pp 501–522
Bro R, De Jong S (1997) A fast non-negativity-constrained least squares algorithm. J Chemom 11:393–401
Article Google Scholar
Chi JT, Chi EC, Baraniuk RG (2016) k-POD: a method for k-means clustering of missing data. Am Stat 70:91–99
Article MathSciNet Google Scholar
Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36:338–347
Article MathSciNet Google Scholar
Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9:617–621
Article Google Scholar
Epifanio I (2013) h-plots for displaying nonmetric dissimilarity matrices. Statistical Analysis Data Mining 6:136–143
Article MathSciNet Google Scholar
Epifanio I, Ibáñez MV, Simó A (2018) Archetypal shapes based on landmarks and extension to handle missing data. Adv Data Anal Classif 12:705–735
Article MathSciNet Google Scholar
Epifanio I, Ibáñez MV, Simó A (2020) Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles. Am Stat 74:169–183
Article MathSciNet Google Scholar
Eugster MJA, Leisch F (2009) From spider-man to hero - archetypal analysis in R. J Stat Softw 30:1–23
Article Google Scholar
Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic Press, London
Google Scholar
Gillis N (2020) Nonnegative matrix factorization. SIAM, Philadelphia
Book Google Scholar
Heiser WJ (1987) Correspondence analysis with least absolute residuals. Comput Stat Data Anal 5:337–356
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken
Book Google Scholar
Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62:251–266
Article Google Scholar
Lawson CL, Hanson RJ (1995) Solving least squares problems (Classics in applied mathematics Vol. 15). Philadelphia, SIAM
Lindsay AC, Sitthisongkram S, Greaney ML, Wallington SF, Ruengdej P (2017) Non-responsive feeding practices, unhealthy eating behaviors, and risk of child overweight and obesity in Southeast Asia: a systematic review. Int J Environ Res Public Health 14:436
Article Google Scholar
Little R, Rubin D (2002) Statistical analysis with missing data. Wiley, Hoboken
Book Google Scholar
Mørup M, Hansen LK (2012) Archetypal analysis for machine learning and data mining. Neurocomputing 80:54–63
Article Google Scholar
Nakayama A (2005) A multidimensional scaling model for three-way data analysis. Behaviormetrika 32:95–110
Article MathSciNet Google Scholar
Reddy S, Anitha M (2015) Culture and its influence on nutrition and oral health. Biomed Pharmacol J 8:613–620
Article Google Scholar
Steinschneider S, Lall U (2015) Daily precipitation and tropical moisture exports across the Eastern United States: an application of archetypal analysis to identify spatiotemporal structure. J Clim 28:8585–8602
Article Google Scholar
Suleman A (2015) A convex semi-nonnegative matrix factorisation approach to fuzzy c-means clustering. Fuzzy Sets Syst 270:90–110
Article MathSciNet Google Scholar
Tsuchida J, Yadohisa H (2016) Asymmetric multidimensional scaling of n-mode M-way categorical data using a log-linear model. Behaviormetrika 43:103–138
Article Google Scholar
Vinué G, Epifanio I, Alemany S (2015) Archetypoids: a new approach to define representative archetypal data. Comput Stat Data Anal 87:102–115
Article MathSciNet Google Scholar
Wohlrabe K, Gralka S (2020) Using archetypoid analysis to classify institutions and faculties of Economics. Scientometrics 123:159–179
Article Google Scholar

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Paolo Giordani
Department of Psychology, University of Groningen, Groningen, Netherlands
Henk A. L. Kiers

Authors

Paolo Giordani
View author publications
You can also search for this author in PubMed Google Scholar
Henk A. L. Kiers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paolo Giordani.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Communicated by Alfonso Iodice D’Enza.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 697 KB)

Appendix 1

Let f(M | X, W) be the loss function to be minimized where X, W denote the data and weight matrices, respectively, and M is the model description for X. Assuming that the minimum of f(M | X, W) cannot be found by using a closed-form solution, f(M | X, W) can be minimized finding a monotonically decreasing sequence of loss function values, which converges because the loss function is bounded below by zero.

Majorization tools (see, e.g., de Leeuw and Heiser 1980), can be used to decrease the loss function value at every iteration. The idea of majorization is to find, at each iteration t and given the current estimate M^(t), a majorizing function g(M | M^(t), X, W) such that

$$\left\{ {\begin{array}{*{20}c} {{{g(\mathbf{M}|\mathbf{M}^{(t)}, \mathbf{X}, \mathbf{W}) }} \ge {{ f(\mathbf{M}|\mathbf{X}, \mathbf{W})}}} & {\forall {\mathbf{M}}}, \\ {{g(\mathbf{M}|\mathbf{M}^{(t)}, \mathbf{X}, \mathbf{W}) = f(\mathbf{M}|\mathbf{X}, \mathbf{W})}} & {{\text{if }}\mathbf{M} = {\mathbf{M}}^{(t)}{.}} \\ \end{array} } \right.$$

(41)

Therefore, the majorizing function is always higher than the loss function, except at the current estimate, where the two functions coincide.

The minimization of g is easier than that of f. If M^(t+1) is the minimizer of g, from (41) it is clear that the following inequalities hold:

$$\begin{array}{*{20}l} {f({\mathbf{M}}^{{(t)}} |{\mathbf{X}},{\text{ }}{\mathbf{W}}){\text{ }} = {\text{ }}g({\mathbf{M}}^{{(t)}} |{\mathbf{M}}^{{(t)}} ,{\text{ }}{\mathbf{X}},{\text{ }}{\mathbf{W}})\,{\mkern 1mu} \ge } \hfill \\ {g({\mathbf{M}}^{{(t + 1)}} |{\mathbf{M}}^{{(t)}} ,{\text{ }}{\mathbf{X}},{\text{ }}{\mathbf{W}}){\mkern 1mu} \ge f({\mathbf{M}}^{{(t + 1)}} |{\mathbf{X}},{\text{ }}{\mathbf{W}}).} \hfill \\ \end{array}$$

(42)

Thus, by minimizing g subsequently, we decrease the value of f (see also Fig.

9). Following Heiser (1987), in order to choose g, the majorizing function of f is

$$\begin{array}{*{20}l} {g({\mathbf{M}}|{\mathbf{M}}^{{(t)}} ,{\text{ }}{\mathbf{X}},{\text{ }}{\mathbf{W}}){\text{ }} = {\text{ }}{\mkern 1mu} {\mkern 1mu} } \hfill \\ {a{\text{ }} + {\text{ }}d^{2} ||{\mathbf{M}}^{{(t)}} - d^{{ - 2}} {\mathbf{W}}_{2} *{\mathbf{M}}^{{(t)}} + d^{{ - 2}} {\mathbf{W}}_{2} *{\mathbf{X}}{\text{ }} - {\mkern 1mu} {\mathbf{M}}{\mkern 1mu} ||^{2} = a{\text{ }} + {\text{ }}d^{2} f({\mathbf{M}}|{\mathbf{X}}^{{(t)}} ),} \hfill \\ \end{array}$$

(43)

where W₂ = W * W and d² = max(W₂). If the elements of W are either 1 or 0, it is clear that W₂ = W and d² = 1 and therefore (43) can be simplified as

$$g({\mathbf{M}}|{\mathbf{M}}^{{(t)}} ,{\text{ }}{\mathbf{X}},{\text{ }}{\mathbf{W}}){\text{ }} = a{\text{ }} + {\text{ }}||{\mathbf{M}}^{{(t)}} - {\mathbf{W}}*{\mathbf{M}}^{{(t)}} + {\mathbf{W}}*{\mathbf{X}}{\text{ }} - {\text{ }}{\mathbf{M}}{\mkern 1mu} ||^{2} = a{\text{ }} + {\text{ }}f({\mathbf{M}}|{\mathbf{X}}^{{(t)}} ).$$

(44)

In (44), ${\text{f(}}{\mathbf{M}}{\text{|}}{\mathbf{X}}^{{{\text{(t)}}}} {\text{) = }}$ || X^(t) – M ||², where X^(t) takes the form

$${\mathbf{X}}^{\left( t \right)} = {\mathbf{M}}^{(t)} {-}{\mathbf{W}}*{\mathbf{M}}^{(t)} + {\mathbf{W}}*{\mathbf{X}} = {\mathbf{V}}*{\mathbf{M}}^{(t)} + {\mathbf{W}}*{\mathbf{X}},$$

(45)

with V = 1_{n × p} – W.

The importance of Kiers (1997) is based on the result in (44) and (45).

The iterative procedure can be summarized as follows.

Step 0 (Inizialitation).

Initialize M as M^(t) with t = 0 and compute f(M^(t) | X, W). Note that f(M^(t) | X, W) is the sum of squared residuals for the non-missing entries.

Step 1 (Computation of the data matrix).

Compute X^(t) = V * M^(t) + W * X.

Step 2 (OLS estimation).

Update M as M^(t+1) that minimizes || X^(t) – M ||², i.e., the OLS problem using the data in X^(t).

Step 3 (Convergence).

Compute f(M^(t+1) | X, W). Letting ε be the pre-specified convergence criterion (e.g., 10^–6), if f(M^(t) | X, W) – f(M^(t+1) | X, W) > ε f(M^(t) | X, W), set t = t + 1 and go to Step 1; else consider the algorithm converged.

About this article

Cite this article

Giordani, P., Kiers, H.A.L. Weighted least squares for archetypal analysis with missing data. Behaviormetrika 51, 441–475 (2024). https://doi.org/10.1007/s41237-023-00220-3

Download citation

Received: 19 November 2022
Accepted: 13 December 2023
Published: 29 December 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s41237-023-00220-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted least squares for archetypal analysis with missing data

Abstract

Access this article

Similar content being viewed by others

Weighted Optimization with Thresholding for Complete-Case Analysis

Elemental Estimates, Influence, and Algorithmic Leveraging

Slice weighted average regression

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (PDF 697 KB)

Appendix 1

About this article

Cite this article

Keywords

Navigation

Weighted least squares for archetypal analysis with missing data

Abstract

Access this article

Similar content being viewed by others

Weighted Optimization with Thresholding for Complete-Case Analysis

Elemental Estimates, Influence, and Algorithmic Leveraging

Slice weighted average regression

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (PDF 697 KB)

Appendix 1

Appendix 1

About this article

Cite this article

Share this article

Keywords

Search

Navigation