Skip to main content
Log in

Weighted least squares for archetypal analysis with missing data

  • Original Paper
  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

Archetypal analysis expresses observations in terms of a limited number of archetypes, defined as convex combinations of observed units. Such archetypes are found by minimizing a suitable loss function according to the ordinary least squares approach. The technique usually requires a data matrix without missing values. Obviously, this limits its applicability whenever at least one data entry is not available. For this purpose, extensions of archetypal analysis with missing data are developed. In line with recent advances in this domain, this is done by introducing a weighting system giving null weights to the missing entries that are imputed in order to determine the archetypes. This can be done by approaching the problem by means of weighted least squares. The effectiveness of the proposals, also in comparison with existing techniques, is explored by applications to simulated and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The data set analyzed in the paper is available at https://www.fao.org/faostat/en/#home and is given in the supplementary material.

Notes

  1. Matlab routines are available upon request from the corresponding author.

References

  • Asakawa M, Okano M (2013) Japanese consumer’s food selection criteria and gender-based differences. Behaviormetrika 40:41–55

    Article  Google Scholar 

  • De Leeuw J, Heiser WJ (1980) Multidimensional scaling with restrictions on the configuration. In: Krishnaiah PR (Ed) Multivariate analysis V. Amsterdam, North-Holland, pp 501–522

  • Bro R, De Jong S (1997) A fast non-negativity-constrained least squares algorithm. J Chemom 11:393–401

    Article  Google Scholar 

  • Chi JT, Chi EC, Baraniuk RG (2016) k-POD: a method for k-means clustering of missing data. Am Stat 70:91–99

    Article  MathSciNet  Google Scholar 

  • Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36:338–347

    Article  MathSciNet  Google Scholar 

  • Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9:617–621

    Article  Google Scholar 

  • Epifanio I (2013) h-plots for displaying nonmetric dissimilarity matrices. Statistical Analysis Data Mining 6:136–143

    Article  MathSciNet  Google Scholar 

  • Epifanio I, Ibáñez MV, Simó A (2018) Archetypal shapes based on landmarks and extension to handle missing data. Adv Data Anal Classif 12:705–735

    Article  MathSciNet  Google Scholar 

  • Epifanio I, Ibáñez MV, Simó A (2020) Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles. Am Stat 74:169–183

    Article  MathSciNet  Google Scholar 

  • Eugster MJA, Leisch F (2009) From spider-man to hero - archetypal analysis in R. J Stat Softw 30:1–23

    Article  Google Scholar 

  • Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic Press, London

    Google Scholar 

  • Gillis N (2020) Nonnegative matrix factorization. SIAM, Philadelphia

    Book  Google Scholar 

  • Heiser WJ (1987) Correspondence analysis with least absolute residuals. Comput Stat Data Anal 5:337–356

    Article  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  Google Scholar 

  • Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken

    Book  Google Scholar 

  • Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62:251–266

    Article  Google Scholar 

  • Lawson CL, Hanson RJ (1995) Solving least squares problems (Classics in applied mathematics Vol. 15). Philadelphia, SIAM

  • Lindsay AC, Sitthisongkram S, Greaney ML, Wallington SF, Ruengdej P (2017) Non-responsive feeding practices, unhealthy eating behaviors, and risk of child overweight and obesity in Southeast Asia: a systematic review. Int J Environ Res Public Health 14:436

    Article  Google Scholar 

  • Little R, Rubin D (2002) Statistical analysis with missing data. Wiley, Hoboken

    Book  Google Scholar 

  • Mørup M, Hansen LK (2012) Archetypal analysis for machine learning and data mining. Neurocomputing 80:54–63

    Article  Google Scholar 

  • Nakayama A (2005) A multidimensional scaling model for three-way data analysis. Behaviormetrika 32:95–110

    Article  MathSciNet  Google Scholar 

  • Reddy S, Anitha M (2015) Culture and its influence on nutrition and oral health. Biomed Pharmacol J 8:613–620

    Article  Google Scholar 

  • Steinschneider S, Lall U (2015) Daily precipitation and tropical moisture exports across the Eastern United States: an application of archetypal analysis to identify spatiotemporal structure. J Clim 28:8585–8602

    Article  Google Scholar 

  • Suleman A (2015) A convex semi-nonnegative matrix factorisation approach to fuzzy c-means clustering. Fuzzy Sets Syst 270:90–110

    Article  MathSciNet  Google Scholar 

  • Tsuchida J, Yadohisa H (2016) Asymmetric multidimensional scaling of n-mode M-way categorical data using a log-linear model. Behaviormetrika 43:103–138

    Article  Google Scholar 

  • Vinué G, Epifanio I, Alemany S (2015) Archetypoids: a new approach to define representative archetypal data. Comput Stat Data Anal 87:102–115

    Article  MathSciNet  Google Scholar 

  • Wohlrabe K, Gralka S (2020) Using archetypoid analysis to classify institutions and faculties of Economics. Scientometrics 123:159–179

    Article  Google Scholar 

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Giordani.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Communicated by Alfonso Iodice D’Enza.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 697 KB)

Appendix 1

Appendix 1

Let f(M | X, W) be the loss function to be minimized where X, W denote the data and weight matrices, respectively, and M is the model description for X. Assuming that the minimum of f(M | X, W) cannot be found by using a closed-form solution, f(M | X, W) can be minimized finding a monotonically decreasing sequence of loss function values, which converges because the loss function is bounded below by zero.

Majorization tools (see, e.g., de Leeuw and Heiser 1980), can be used to decrease the loss function value at every iteration. The idea of majorization is to find, at each iteration t and given the current estimate M(t), a majorizing function g(M | M(t), X, W) such that

$$\left\{ {\begin{array}{*{20}c} {{{g(\mathbf{M}|\mathbf{M}^{(t)}, \mathbf{X}, \mathbf{W}) }} \ge {{ f(\mathbf{M}|\mathbf{X}, \mathbf{W})}}} & {\forall {\mathbf{M}}}, \\ {{g(\mathbf{M}|\mathbf{M}^{(t)}, \mathbf{X}, \mathbf{W}) = f(\mathbf{M}|\mathbf{X}, \mathbf{W})}} & {{\text{if }}\mathbf{M} = {\mathbf{M}}^{(t)}{.}} \\ \end{array} } \right.$$
(41)

Therefore, the majorizing function is always higher than the loss function, except at the current estimate, where the two functions coincide.

The minimization of g is easier than that of f. If M(t+1) is the minimizer of g, from (41) it is clear that the following inequalities hold:

$$\begin{array}{*{20}l} {f({\mathbf{M}}^{{(t)}} |{\mathbf{X}},{\text{ }}{\mathbf{W}}){\text{ }} = {\text{ }}g({\mathbf{M}}^{{(t)}} |{\mathbf{M}}^{{(t)}} ,{\text{ }}{\mathbf{X}},{\text{ }}{\mathbf{W}})\,{\mkern 1mu} \ge } \hfill \\ {g({\mathbf{M}}^{{(t + 1)}} |{\mathbf{M}}^{{(t)}} ,{\text{ }}{\mathbf{X}},{\text{ }}{\mathbf{W}}){\mkern 1mu} \ge f({\mathbf{M}}^{{(t + 1)}} |{\mathbf{X}},{\text{ }}{\mathbf{W}}).} \hfill \\ \end{array}$$
(42)

Thus, by minimizing g subsequently, we decrease the value of f (see also Fig. 

Fig. 9
figure 9

Illustration of the majorization principle

9). Following Heiser (1987), in order to choose g, the majorizing function of f is

$$\begin{array}{*{20}l} {g({\mathbf{M}}|{\mathbf{M}}^{{(t)}} ,{\text{ }}{\mathbf{X}},{\text{ }}{\mathbf{W}}){\text{ }} = {\text{ }}{\mkern 1mu} {\mkern 1mu} } \hfill \\ {a{\text{ }} + {\text{ }}d^{2} ||{\mathbf{M}}^{{(t)}} - d^{{ - 2}} {\mathbf{W}}_{2} *{\mathbf{M}}^{{(t)}} + d^{{ - 2}} {\mathbf{W}}_{2} *{\mathbf{X}}{\text{ }} - {\mkern 1mu} {\mathbf{M}}{\mkern 1mu} ||^{2} = a{\text{ }} + {\text{ }}d^{2} f({\mathbf{M}}|{\mathbf{X}}^{{(t)}} ),} \hfill \\ \end{array}$$
(43)

where W2 = W * W and d2 = max(W2). If the elements of W are either 1 or 0, it is clear that W2 = W and d2 = 1 and therefore (43) can be simplified as

$$g({\mathbf{M}}|{\mathbf{M}}^{{(t)}} ,{\text{ }}{\mathbf{X}},{\text{ }}{\mathbf{W}}){\text{ }} = a{\text{ }} + {\text{ }}||{\mathbf{M}}^{{(t)}} - {\mathbf{W}}*{\mathbf{M}}^{{(t)}} + {\mathbf{W}}*{\mathbf{X}}{\text{ }} - {\text{ }}{\mathbf{M}}{\mkern 1mu} ||^{2} = a{\text{ }} + {\text{ }}f({\mathbf{M}}|{\mathbf{X}}^{{(t)}} ).$$
(44)

In (44), \({\text{f(}}{\mathbf{M}}{\text{|}}{\mathbf{X}}^{{{\text{(t)}}}} {\text{) = }}\) || X(t)M ||2, where X(t) takes the form

$${\mathbf{X}}^{\left( t \right)} = {\mathbf{M}}^{(t)} {-}{\mathbf{W}}*{\mathbf{M}}^{(t)} + {\mathbf{W}}*{\mathbf{X}} = {\mathbf{V}}*{\mathbf{M}}^{(t)} + {\mathbf{W}}*{\mathbf{X}},$$
(45)

with V = 1n × pW.

The importance of Kiers (1997) is based on the result in (44) and (45).

The iterative procedure can be summarized as follows.

Step 0 (Inizialitation).

Initialize M as M(t) with t = 0 and compute f(M(t) | X, W). Note that f(M(t) | X, W) is the sum of squared residuals for the non-missing entries.

Step 1 (Computation of the data matrix).

Compute X(t) = V * M(t) + W * X.

Step 2 (OLS estimation).

Update M as M(t+1) that minimizes || X(t)M ||2, i.e., the OLS problem using the data in X(t).

Step 3 (Convergence).

Compute f(M(t+1) | X, W). Letting ε be the pre-specified convergence criterion (e.g., 10–6), if f(M(t) | X, W) – f(M(t+1) | X, W) > ε f(M(t) | X, W), set t = t + 1 and go to Step 1; else consider the algorithm converged.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Giordani, P., Kiers, H.A.L. Weighted least squares for archetypal analysis with missing data. Behaviormetrika 51, 441–475 (2024). https://doi.org/10.1007/s41237-023-00220-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41237-023-00220-3

Keywords

Navigation