Skip to main content
Log in

A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses

  • Theory & Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Grade of membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore, GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD)-based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Airoldi, E. M., Blei, D., Erosheva, E. A., & Fienberg, S. E. (2014). Handbook of mixed membership models and their applications. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981–2014.

    PubMed  Google Scholar 

  • Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. Selected papers of Hirotugu Akaike (pp. 199–213).

  • Araújo, M. C. U., Saldanha, T. C. B., Galvao, R. K. H., Yoneyama, T., Chame, H. C., & Visani, V. (2001). The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems, 57(2), 65–73.

    Article  Google Scholar 

  • Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., & Plemmons, R. J. (2007). Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 52(1), 155–173.

    Article  MathSciNet  Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

    Google Scholar 

  • Borsboom, D., Rhemtulla, M., Cramer, A. O., van der Maas, H. L., Scheffer, M., & Dolan, C. V. (2016). Kinds versus continua: A review of psychometric approaches to uncover the structure of psychiatric constructs. Psychological Medicine, 46(8), 1567–1579.

    Article  CAS  PubMed  Google Scholar 

  • Chen, Y., Chi, Y., Fan, J., & Ma, C. (2021). Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5), 566–806.

    Article  Google Scholar 

  • Chen, Y., Li, X., & Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84, 124–146.

    Article  MathSciNet  PubMed  Google Scholar 

  • Chen, Y., Li, X., & Zhang, S. (2020). Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association, 115(532), 1756–1770.

    Article  MathSciNet  CAS  Google Scholar 

  • Chen, Y., Ying, Z., & Zhang, H. (2021). Unfolding-model-based visualization: Theory, method and applications. Journal of Machine Learning Research, 22, 11.

    MathSciNet  Google Scholar 

  • Dobriban, E., & Owen, A. B. (2019). Deterministic parallel analysis: An improved method for selecting factors and principal components. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(1), 163–183.

    Article  MathSciNet  Google Scholar 

  • Donoho, D., & Stodden, V. (2003). When does non-negative matrix factorization give a correct decomposition into parts? Advances in Neural Information Processing Systems, 16.

  • Embretson, S. E., & Reise, S. P. (2013). Item response theory. New York: Psychology Press.

    Book  Google Scholar 

  • Erosheva, E. A. (2002). Grade of membership and latent structure models with application to disability survey data. PhD thesis, Carnegie Mellon University.

  • Erosheva, E. A. (2005). Comparing latent structures of the grade of membership, Rasch, and latent class models. Psychometrika, 70(4), 619–628.

    Article  MathSciNet  Google Scholar 

  • Erosheva, E. A., Fienberg, S. E., & Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics, 1(2), 346.

    Article  MathSciNet  PubMed  Google Scholar 

  • Freyaldenhoven, S., Ke, S., Li, D., & Olea, J. L. M. (2023). On the testability of the anchor words assumption in topic models. Technical report, working paper, Cornell University.

  • Gillis, N., & Vavasis, S. A. (2013). Fast and robust recursive algorithms for separable nonnegative matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(4), 698–714.

    Article  Google Scholar 

  • Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.

    Article  MathSciNet  Google Scholar 

  • Gormley, I. C., & Murphy, T. B. (2009). A grade of membership model for rank data. Bayesian Analysis, 4(2), 265–295.

    Article  MathSciNet  Google Scholar 

  • Gu, Y., Erosheva, E. E., Xu, G., & Dunson, D. B. (2023). Dimension-grouped mixed membership models for multivariate categorical data. Journal of Machine Learning Research, 24(88), 1–49.

    MathSciNet  Google Scholar 

  • Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185.

    Article  CAS  PubMed  Google Scholar 

  • Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5(9), 1457–1469.

    MathSciNet  Google Scholar 

  • Jin, J., Ke, Z. T., & Luo, S. (2023). Mixed membership estimation for social networks. Journal of Econometrics. https://doi.org/10.1016/j.jeconom.2022.12.003

    Article  Google Scholar 

  • Ke, Z. T., & Jin, J. (2023). Special invited paper: The score normalization, especially for heterogeneous network and text data. Stat, 12(1), e545.

    Article  MathSciNet  Google Scholar 

  • Ke, Z. T., & Wang, M. (2022). Using SVD for topic modeling. Journal of the American Statistical Association, 2022, 1–16.

    Article  ADS  Google Scholar 

  • Klopp, O., Panov, M., Sigalla, S., & Tsybakov, A. (2023). Assigning topics to documents by successive projections. Annals of Statistics (to appear).

  • Koopmans, T. C., & Reiersol, O. (1950). The identification of structural characteristics. The Annals of Mathematical Statistics, 21(2), 165–181.

    Article  MathSciNet  Google Scholar 

  • Manrique-Vallier, D., & Reiter, J. P. (2012). Estimating identification disclosure risk using mixed membership models. Journal of the American Statistical Association, 107(500), 1385–1394.

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  • Mao, X., Sarkar, P., & Chakrabarti, D. (2021). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association, 116(536), 1928–1940.

    Article  MathSciNet  CAS  Google Scholar 

  • Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 16, 1–32.

    Article  MathSciNet  Google Scholar 

  • Pokropek, A. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300–325.

    Article  Google Scholar 

  • Robitzsch, A., & Robitzsch, M. A. (2022). Packag ‘sirt’: Supplementary item response theory models.

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

    Article  MathSciNet  ADS  Google Scholar 

  • Shang, Z., Erosheva, E. A., & Xu, G. (2021). Partial-mastery cognitive diagnosis models. Annals of Applied Statistics, 15(3), 1529–1555.

    Article  MathSciNet  Google Scholar 

  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583–639.

    Article  MathSciNet  Google Scholar 

  • Woodbury, M. A., Clive, J., & Garson, A., Jr. (1978). Mathematical typology: A grade of membership technique for obtaining disease definition. Computers and Biomedical Research, 11(3), 277–298.

    Article  CAS  PubMed  Google Scholar 

  • Zhang, H., Chen, Y., & Li, X. (2020). A note on exploratory item factor analysis by singular value decomposition. Psychometrika, 85, 358–372.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work is partially supported by National Science Foundation Grant DMS-2210796. The authors thank the editor Prof. Matthias von Davier, an associate editor, and three anonymous reviewers for their helpful and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuqi Gu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code Availability

The MATLAB code implementing the proposed method is available at this link: https://github.com/lscientific/spectral_GoM.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs of the Identifiability Results

Proof of Proposition 1

If we take the rows that correspond to \({\textbf{S}}\) of both sides in the SVD (6) and use the fact that \(\varvec{\Pi }_{{\textbf{S}},:}=\textbf{I}_K\), then

$$\begin{aligned} \textbf{U}_{{\textbf{S}},:}\varvec{\Sigma }\textbf{V}^{\top }=[\textbf{R}_0]_{{\textbf{S}},:} =\varvec{\Pi }_{{\textbf{S}},:} \varvec{\Theta }^{\top }=\varvec{\Theta }^{\top }. \end{aligned}$$
(15)

This gives an expression of \(\varvec{\Theta }\)

$$\begin{aligned} \varvec{\Theta }= \textbf{V}\varvec{\Sigma } \textbf{U}^{\top }_{{\textbf{S}},:}. \end{aligned}$$
(16)

Further note that

$$\begin{aligned} \textbf{U}=\textbf{R}_0\textbf{V}\varvec{\Sigma }^{-1}=\varvec{\Pi }\varvec{\Theta }^{\top }\textbf{V}\varvec{\Sigma }^{-1}. \end{aligned}$$
(17)

If we plug (15) into (17) and note that \(\textbf{V}\) have orthogonal columns, we have

$$\begin{aligned} \textbf{U}=\varvec{\Pi }\textbf{U}_{{\textbf{S}},:}\varvec{\Sigma }\textbf{V}^{\top }\textbf{V}\varvec{\Sigma }^{-1}=\varvec{\Pi }\textbf{U}_{{\textbf{S}},:}. \end{aligned}$$
(18)

Equation (18) also tells us that \(\textbf{U}_{{\textbf{S}},:}\) must be full rank since both \(\textbf{U}\) and \(\varvec{\Pi }\) have rank K. Therefore, we have an expression of \(\varvec{\Pi }\):

$$\begin{aligned} \varvec{\Pi }=\textbf{U}(\textbf{U}_{{\textbf{S}},:})^{-1}. \end{aligned}$$

On the other hand, based on the singular value decomposition \(\textbf{U}\varvec{\Sigma }\textbf{V}^{\top } = \varvec{\Pi } \varvec{\Theta }^{\top }\), we can left multiply \((\varvec{\Pi }^{\top }\varvec{\Pi })^{-1}\varvec{\Pi }^{\top }\) with both hand sides to obtain

$$\begin{aligned}&(\varvec{\Pi }^{\top }\varvec{\Pi })^{-1}\varvec{\Pi }^{\top } \textbf{U}\varvec{\Sigma }\textbf{V}^{\top } =\varvec{\Theta }^{\top }\nonumber \\&\quad \Longrightarrow \quad \varvec{\Theta }=\textbf{V}\varvec{\Sigma }\textbf{U}^{\top }\varvec{\Pi }(\varvec{\Pi }^{\top }\varvec{\Pi })^{-1}. \end{aligned}$$
(19)

We next show that (16) and (19) are equivalent. Since \(\textbf{U}^{\top }\textbf{U}=\textbf{I}_K\), (18) leads to

$$\begin{aligned} \textbf{U}_{{\textbf{S}},:}^{\top }\varvec{\Pi }^{\top }\varvec{\Pi }\textbf{U}_{{\textbf{S}},:}=\textbf{I}_K, \end{aligned}$$

which yields

$$\begin{aligned} (\varvec{\Pi }^{\top }\varvec{\Pi })^{-1}=\textbf{U}_{{\textbf{S}},:}\textbf{U}^{\top }_{{\textbf{S}},:}. \end{aligned}$$

Plugging this equation into (19), we have

$$\begin{aligned} \varvec{\Theta }&= \textbf{V}\varvec{\Sigma }\textbf{U}^{\top }\varvec{\Pi }\textbf{U}_{{\textbf{S}},:}\textbf{U}^{\top }_{{\textbf{S}},:}\\&= \textbf{V}\varvec{\Sigma }\textbf{U}^{\top }\textbf{U}(\textbf{U}_{{\textbf{S}},:})^{-1}\textbf{U}_{{\textbf{S}},:} \textbf{U}^{\top }_{{\textbf{S}},:} \\&= \textbf{V}\varvec{\Sigma }\textbf{U}^{\top }_{{\textbf{S}},:} \end{aligned}$$

This shows the equivalence of (16) and (19) and completes the proof of the proposition. \(\square \)

Proof of Theorem 1

Without loss of generality, assume that the first extreme latent profile does not have a pure subject. Then \(\pi _{i1}\le 1-\delta ,\ \forall i=1,\dots , N\) for some \(\delta >0\). For each \(0<\epsilon <\delta \), define a \(K\times K\) matrix

$$\begin{aligned} \textbf{M}_{\epsilon } = \begin{bmatrix} 1+(K-1)\epsilon ^2 &{} -\epsilon ^2\textbf{1}_{K-1}^{\top } \\ \textbf{0}_{K-1} &{} \epsilon \textbf{1}_{K-1}\textbf{1}_{K-1}^{\top } +(1-(K-1)\epsilon )\textbf{I}_{K-1} \end{bmatrix}. \end{aligned}$$

We will show that \(\widetilde{\varvec{\Pi }}_{\epsilon }=\varvec{\Pi }\textbf{M}_{\epsilon }\) and \(\widetilde{\varvec{\Theta }}_{\epsilon }=\varvec{\Theta }\textbf{M}_{\epsilon }^{-1}\) form a valid parameter set. That is, each element of \(\widetilde{\varvec{\Pi }}_{\epsilon }\) and \(\widetilde{\varvec{\Theta }}_{\epsilon }\) lies in [0, 1] and the rows of \(\widetilde{\varvec{\Pi }}_{\epsilon }\) sum to one. Since \(\textbf{M}_0=\textbf{I}_K\) and by the continuity of matrix determinant, \(\textbf{M}_{\epsilon }\) is full rank when \(\epsilon \) is small enough. Also notice that \(\textbf{M}_{\epsilon }\textbf{1}_K=\textbf{1}_K\). Therefore, \(\widetilde{\varvec{\Pi }}_{\epsilon }\textbf{1}_K=\varvec{\Pi }\textbf{M}_{\epsilon }\textbf{1}_K=\varvec{\Pi }\textbf{1}_K=\textbf{1}_N\). For each \(i=1,\dots , N\), \(\widetilde{\pi }_{i1}=\pi _{i1}(1+(K-1)\epsilon ^2)\ge 0\). For any fixed \(k=2,\dots ,K\), \((\textbf{M}_{\epsilon })_{kk} = 1 - (K-2)\epsilon \) and \((\textbf{M}_{\epsilon })_{mk}=\epsilon \) for \(m\ne k\). Thus when \(\epsilon \le 1/(K-1)\), we have \((\textbf{M}_{\epsilon })_{mk}\ge \epsilon \) for any \(m=1,\dots , K\). Therefore, the following inequalities hold for each \(i=1,\dots , N\) and \(k=2,\dots , K\):

$$\begin{aligned} \widetilde{\pi }_{ik}&=-\epsilon ^2\pi _{i1}+\sum _{m=2}^K \pi _{im}(\textbf{M}_{\epsilon })_{mk} {\ge } -\epsilon ^2\pi _{i1} +\epsilon \sum _{m=2}^K\pi _{im}\\&\ge -\epsilon ^2(1-\delta )+\epsilon (1-\pi _{i1}) \ge -\epsilon ^2(1-\delta ) + \epsilon \delta \ge \epsilon \delta ^2 >0. \end{aligned}$$

Here we also used \(\sum _{k=1}^K\pi _{ik}=1\), \(\pi _{i1}\le 1-\delta \), and \(\epsilon <\delta \).

Further notice that

$$\begin{aligned} \textbf{M}_{\epsilon }-\textbf{I}_K= \begin{bmatrix} \epsilon ^2(K-1) &{} -\epsilon ^2\textbf{1}_{K-1}^{\top } \\ \textbf{0}_{K-1} &{} \epsilon \textbf{1}_{K-1}\textbf{1}_{K-1}^{\top } -\epsilon (K-1)\textbf{I}_{K-1} \end{bmatrix}, \end{aligned}$$

which leads to \(\Vert \textbf{M}_{\epsilon }-\textbf{I}_K\Vert _F {\mathop {\longrightarrow }\limits ^{\epsilon \rightarrow 0}}0\). Here \(\Vert \textbf{A}\Vert _F=\sqrt{\sum _{i=1}^m\sum _{j=1}^n a_{ij}^2}\) is the Frobenius norm of any matrix \(\textbf{A}=(a_{ij})\in {\mathbb {R}}^{m\times n}\). By the continuity of matrix inverse and Frobenius norm,

$$\begin{aligned} \Vert \textbf{M}_{\epsilon }^{-1}-\textbf{I}_K\Vert _F {\mathop {\longrightarrow }\limits ^{\epsilon \rightarrow 0}}0. \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert \widetilde{\varvec{\Theta }}-\varvec{\Theta }\Vert _F = \Vert \varvec{\Theta }(\textbf{I}_K -\textbf{M}_{\epsilon }^{-1})\Vert _F\le \Vert \varvec{\Theta }\Vert _2\Vert \textbf{I}_K -\textbf{M}_{\epsilon }^{-1}\Vert _F {\mathop {\longrightarrow }\limits ^{\epsilon \rightarrow 0}}0. \end{aligned}$$

Since all the elements of \(\varvec{\Theta }\) are strictly in (0, 1), the elements of \(\widetilde{\varvec{\Theta }}\) must be in [0, 1] when \(\epsilon \) is small enough. Also note that \(\textbf{M}_{\epsilon }\) is not a permutation matrix when \(\epsilon >0\); thus, the GoM model is not identifiable up to a permutation. This completes the proof of the theorem. \(\square \)

Proof of Theorem 2

Suppose \(rank(\varvec{\Theta })=r\le K\). Now, consider SVD \(\textbf{R}_0=\textbf{U}\varvec{\Sigma }\textbf{V}^{\top }\) with \(\textbf{U}\in {\mathbb {R}}^{N\times r}, \textbf{V}\in {\mathbb {R}}^{J\times r}, \varvec{\Sigma }\in {\mathbb {R}}^{r\times r}\). For simplicity, we continue to use the same notations \(\textbf{U},\varvec{\Sigma },\textbf{V}\) here even though the matrix dimensions have changed. Without loss of generality, we can reorder the subjects and memberships so that \(\varvec{\Pi }_{1:K,:}=\textbf{I}_K\) from Assumption 1. According to Proposition 1,

$$\begin{aligned} \textbf{U}=\varvec{\Pi }\textbf{U}_{1:K,:}. \end{aligned}$$
(20)

Since \(rank(\varvec{\Pi })=K, rank(\textbf{U})=r\), we must have \(\textrm{rank} (\textbf{U}_{1:K,:})=\textrm{rank}(\textbf{U})=r\).

Suppose another set of parameters \((\widetilde{\varvec{\Pi }},\widetilde{\varvec{\Theta }})\) yields the same \(\textbf{R}_0\) and we denote its corresponding pure subject index vector as \(\widetilde{{\textbf{S}}}\) so that \(\widetilde{\varvec{\Pi }}_{\widetilde{{\textbf{S}}},:}=\textbf{I}_K\). Similarly, we have

$$\begin{aligned} \textbf{U}=\widetilde{\varvec{\Pi }}\textbf{U}_{\widetilde{{\textbf{S}}},:}. \end{aligned}$$
(21)

Taking the \(\widetilde{{\textbf{S}}}\) rows of both sides of (20) and the first K rows of both sides of (21) yields

$$\begin{aligned} \varvec{\Pi }_{\widetilde{{\textbf{S}}},:}\textbf{U}_{1:K,:} =\textbf{U}_{\widetilde{{\textbf{S}}},:},\ \textbf{U}_{1:K,:} =\widetilde{\varvec{\Pi }}_{1:K,:}\textbf{U}_{\widetilde{{\textbf{S}}},:}. \end{aligned}$$

The above equation shows that \(\textbf{U}_{\widetilde{{\textbf{S}}},:}\) is in the convex hull created by the rows of \(\textbf{U}_{1:K,:}\), and \(\textbf{U}_{1:K,:}\) is in the convex hull created by the rows of \(\textbf{U}_{\widetilde{{\textbf{S}}},:}\). Therefore, there must exist a permutation matrix \(\textbf{P}\) such that \(\textbf{U}_{\widetilde{{\textbf{S}}},:}=\textbf{P}\textbf{U}_{1:K,:}\). Combining this fact with (20) and (21) leads to

$$\begin{aligned} (\varvec{\Pi }- \widetilde{\varvec{\Pi }}\textbf{P}) \textbf{U}_{1:K,:} = 0. \end{aligned}$$
(22)

Proof of part (a). For part (a), \(r=K\) and \(\textbf{U}_{1:K,:}\) is full rank according to (20). In this case, (22) directly leads to \(\varvec{\Pi }=\widetilde{\varvec{\Pi }}\textbf{P}\) and thus \(\widetilde{\varvec{\Theta }}=\varvec{\Theta }\textbf{P}^{\top }\).

Now generally consider \(r<K\). By permuting the rows and columns of \(\varvec{\Theta }\), we can write

$$\begin{aligned} \varvec{\Theta }= \begin{bmatrix} {\textbf{C}} &{} {\textbf{C}}\textbf{W}_1 \\ \textbf{W}_2^{\top }{\textbf{C}} &{} \textbf{W}_2^{\top }{\textbf{C}}\textbf{W}_1 \end{bmatrix}, \end{aligned}$$
(23)

where \({\textbf{C}}\in {\mathbb {R}}^{r\times r}\) is full rank, \(\textbf{W}_1\in {\mathbb {R}}^{r\times (K-r)}\) and \(\textbf{W}_2\in {\mathbb {R}}^{r\times (J-r)}\). Now comparing the block columns of (23) and \(\varvec{\Theta }=\textbf{V}\varvec{\Sigma }(\textbf{U}_{1:K,:})^{\top }\) gives

$$\begin{aligned} \begin{bmatrix} \textbf{I}_r \\ \textbf{W}_2^{\top } \end{bmatrix}{\textbf{C}}&= \textbf{V}\varvec{\Sigma }(\textbf{U}_{1:r,:})^{\top },\nonumber \\ \begin{bmatrix} \textbf{I}_r \\ \textbf{W}_2^{\top } \end{bmatrix}{\textbf{C}}\textbf{W}_1&= \textbf{V}\varvec{\Sigma }(\textbf{U}_{(r+1):N,:})^{\top }. \end{aligned}$$
(24)

Since \({\textbf{C}}\) is full rank, \(\textbf{U}_{1:r,:}\) has to also be full rank and (24) can be translated into

$$\begin{aligned} \textbf{U}_{(r+1):N,:}=\textbf{W}_1^{\top } \textbf{U}_{1:r,:}. \end{aligned}$$

Therefore,

$$\begin{aligned} \textbf{U}_{1:K,:}=\begin{bmatrix} \textbf{I}_r \\ \textbf{W}_1^{\top } \end{bmatrix}\textbf{U}_{1:r,:}. \end{aligned}$$
(25)

By plugging the (25) into (22) and again using the fact that \(\textbf{U}_{1:r,:}\) is full rank, we have

$$\begin{aligned} (\varvec{\Pi }-\widetilde{\varvec{\Pi }}\textbf{P}) \begin{bmatrix} \textbf{I}_r \\ \textbf{W}_1^{\top } \end{bmatrix} = \textbf{0}. \end{aligned}$$
(26)

Proof of part (b). Denote \(\textbf{A}:=\varvec{\Pi }-\widetilde{\varvec{\Pi }}\textbf{P}\). If \(r=K-1\), \(\textbf{W}_1 = (W_{1,1}, \dots , W_{1,K-1})\) is a \((K-1)\)-dimensional vector and (26) gives us

$$\begin{aligned} \textbf{A}_{:,j} + W_{1,j}\textbf{A}_{:,k}=0, \quad \forall j=1,\dots , K-1. \end{aligned}$$
(27)

Denote an r-dimensional column vector with all entries equal to one by \(\textbf{1}_r\). Right multiplying \(\textbf{1}_r\) to both sides of (26) yields

$$\begin{aligned} \textbf{A}\begin{bmatrix} \textbf{1}_r \\ \textbf{W}_1^{\top }\textbf{1}_r \end{bmatrix} = 0. \end{aligned}$$

Also, note that both \(\varvec{\Pi }\) and \(\widetilde{\varvec{\Pi }}\textbf{P}\) have row sums of 1. Hence,

$$\begin{aligned}&\sum _{j=1}^{K}\textbf{A}_{:,j} =\textbf{0}_N,\\&\sum _{j=1}^{K-1}\textbf{A}_{:,j} + \textbf{W}_1^{\top } \textbf{1}_r\textbf{A}_{:,K} =\textbf{0}_N. \end{aligned}$$

Taking the difference of the two equations above gives \((1-\textbf{W}_1^{\top }\textbf{1}_r) \textbf{A}_{:,K} = \textbf{0}_N\). If \(\textbf{W}_1^{\top }\textbf{1}_r\ne 1\), then \(\textbf{A}_{:,K}\) has to be \(\textbf{0}_N\), which implies \(\textbf{A}_{:,j}=\textbf{0}_N\) for all \(j=1\dots , K-1\) according to (27). Therefore, \(\textbf{A}=\varvec{\Pi }-\widetilde{\varvec{\Pi }}\textbf{P}=\textbf{0}_N\), which leads to \(\widetilde{\varvec{\Theta }}=\varvec{\Theta }\textbf{P}^{\top }\).

Note that using (25) leads to

$$\begin{aligned} \varvec{\Theta }^{\top }=\textbf{U}_{1:K,:}\varvec{\Sigma }\textbf{V}^{\top }=\begin{bmatrix} \textbf{I}_{K-1} \\ \textbf{W}_1^{\top } \end{bmatrix}\textbf{U}_{1:(K-1),:}\varvec{\Sigma }\textbf{V}^{\top } = \begin{bmatrix} \textbf{I}_{K-1} \\ \textbf{W}_1^{\top } \end{bmatrix}(\varvec{\Theta }_{:,1:(K-1)})^{\top }. \end{aligned}$$

Hence \(\varvec{\Theta }_{:,K}=\varvec{\Theta }_{:,1:(K-1)}\textbf{W}_1\). Therefore, the condition \(\textbf{W}_1^{\top }\textbf{1}_r\ne 1\) is equivalent to the K-th column of \(\varvec{\Theta }\) not being an affine combination of the other columns.

Proof of part (c). Now consider the case of either \(r=K-1\) with \(\textbf{W}_1^{\top }\textbf{1}_r=1\), or \(r<K-1\). Assume subject m is completely mixed so that \(\pi _{m,k}>0,\forall k=1,\dots , K\). Define

$$\begin{aligned} \widetilde{\varvec{\pi }}_i^{\top }= {\left\{ \begin{array}{ll} \varvec{\pi }_i^{\top } &{} \quad \text {if } i\ne m\\ \varvec{\pi }_m^{\top } + \epsilon \varvec{\beta }^{\top } [-\textbf{W}_1^{\top } , ~\textbf{I}_{K-r}] &{} \quad \text {if } i=m \end{array}\right. }, \end{aligned}$$

where \(\epsilon >0\) is small enough so that \(\widetilde{\varvec{\pi }}_i\in (0,1)\), and \(\varvec{\beta }\in {\mathbb {R}}^{K-r}\) is such that \(\varvec{\beta }^{\top }(\textbf{1}_{K-r}-\textbf{W}_1^{\top } \textbf{1}_{r})=0\). Note that such \(\varvec{\beta } \ne \textbf{0}\) always exists under the assumption in part (c), because if \(r=K-1\) with \(\textbf{W}_1^{\top }\textbf{1}_r=1\), then \(\varvec{\beta }^{\top }(\textbf{1}_{K-r}-\textbf{W}_1^{\top } \textbf{1}_{r}) = \beta (1-1)=0\) holds for any \(\beta \in {\mathbb {R}}\); if \(r<K-1\), then \(K-r\ge 2\) and \(\varvec{\beta }\) has dimension at least two, so the inner product equation \(\varvec{\beta }^{\top }(\textbf{1}_{K-r}-\textbf{W}_1^{\top } \textbf{1}_{r})=0\) must have a nonzero solution \(\varvec{\beta }\). The constructed \(\widetilde{\varvec{\Pi }}\) have row sums of 1 by the construction of \(\varvec{\beta }\). Furthermore, \(\widetilde{\varvec{\Pi }}\textbf{U}_{1:K,:}\) and \(\varvec{\Pi }\textbf{U}_{1:K,:}\) can only be different on the m-th row, and

$$\begin{aligned} \widetilde{\varvec{\pi }}_m^{\top } \textbf{U}_{1:K,:}=\varvec{\pi }^{\top }_m\textbf{U}_{1:K,:} + \epsilon \varvec{\beta }^{\top }[-\textbf{W}_1^{\top }, ~ \textbf{I}_{K-r}] \begin{bmatrix} \textbf{I}_r \\ \textbf{W}_1^{\top } \end{bmatrix}\textbf{U}_{1:r,:}=\varvec{\pi }_m^{\top }\textbf{U}_{1:K,:}. \end{aligned}$$

Hence, \(\widetilde{\varvec{\Pi }}\textbf{U}_{1:K,:}=\varvec{\Pi }\textbf{U}_{1:K,:}\). This gives us

$$\begin{aligned} \varvec{\Pi }\varvec{\Theta }^{\top } = \varvec{\Pi }\textbf{U}_{1:K,:}\varvec{\Sigma }\textbf{V}^{\top } = \widetilde{\varvec{\Pi }} \textbf{U}_{1:K,:}\varvec{\Sigma }\textbf{V}^{\top }=\widetilde{\varvec{\Pi }}\varvec{\Theta }^{\top }. \end{aligned}$$

We can see that \((\varvec{\Pi },\varvec{\Theta })\) and \((\widetilde{\varvec{\Pi }}, \varvec{\Theta })\) yield the same model but \(\varvec{\Pi }\ne \widetilde{\varvec{\Pi }}\). This completes the proof for part (c). \(\square \)

Appendix B: Proof of the Consistency Theorem 3

For any matrix \(\textbf{A}\) with SVD \(\textbf{A}=\textbf{U}_{\textbf{A}}\varvec{\Sigma }_{\textbf{A}}\textbf{V}_{\textbf{A}}^{\top }\), define

$$\begin{aligned} \text {sgn}(\textbf{A}) := \textbf{U}_{\textbf{A}}\textbf{V}_{\textbf{A}}^{\top }. \end{aligned}$$

According to Remark 4.1 in Chen et al. (2021a), for any two matrices \(\textbf{A}, \textbf{B}\in {\mathbb {R}}^{n\times r}, r\le n\):

$$\begin{aligned} \text {sgn}(\textbf{A}^{\top }\textbf{B}) = \arg \min _{\textbf{O}\in {\mathcal {O}}^{r\times r}} \Vert \textbf{A}\textbf{O}- \textbf{B}\Vert , \end{aligned}$$

where \({\mathcal {O}}^{r\times r}\) is the set of all orthonormal matrices of size \(r\times r\). The 2-to-\(\infty \) norm of matrix \(\textbf{A}\) is defined as the maximum row \(l_2\) norm, i.e., \(\left\Vert {\textbf{A}}\right\Vert _{2,\infty }=\max _i \Vert \textbf{e}_i^{\top } \textbf{A}\Vert \). Define

$$\begin{aligned} r=\frac{\max \{N,J\}}{\min \{N,J\}}. \end{aligned}$$

Under Condition 2, we have \(\kappa (\textbf{R}_0) = \frac{\sigma _1(\varvec{\Pi }\varvec{\Theta }^{\top })}{\sigma _K(\varvec{\Pi }\varvec{\Theta }^{\top })} \le \frac{\sigma _1(\varvec{\Pi })\sigma _1(\varvec{\Theta })}{\sigma _K(\varvec{\Pi })\sigma _K(\varvec{\Theta })} = \kappa (\varvec{\Pi })\kappa (\varvec{\Theta })\lesssim 1\) and \(\sigma _K(\textbf{R}_0) \ge \sigma _K(\varvec{\Pi }) \sigma _K(\varvec{\Theta }) \succsim \sqrt{NJ}\).

Lemma 1

Under Condition 2, if \(N/J^2\rightarrow 0\) and \(J/N^2\rightarrow 0\), then with probability at least \(1-O((N+J)^{-5})\), one has

$$\begin{aligned} \Vert \widehat{\textbf{U}} -\textbf{U}\cdot \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}}) \Vert _{2,\infty }&\lesssim \frac{\sqrt{r} +\sqrt{\log (N+J)}}{\sqrt{NJ}} \end{aligned}$$
(28)
$$\begin{aligned} \Vert \widehat{\textbf{U}}\widehat{\varvec{\Sigma }}\widehat{\textbf{V}}^{\top } -\textbf{U}\varvec{\Sigma } \textbf{V}^{\top }\Vert _{\infty }&\lesssim \sqrt{\frac{r\log (N+J)}{\min \{N,J\}}}. \end{aligned}$$
(29)

Here the infinity norm \(\Vert \textbf{A}\Vert _{\infty }\) for any matrix \(\textbf{A}\) is defined as the maximum absolute entry value. We write the RHS of (28) as \(\varepsilon \) and the RHS of (29) as \(\eta \).

Proof of Lemma 1

We will use Theorem 4.4 in Chen et al. (2021a) to prove the lemma and will verify the conditions of that theorem are satisfied.

Define the incoherence parameter \(\mu := \max \left\{ \frac{N\left\Vert {\textbf{U}}\right\Vert _{2,\infty }^2}{K}, \frac{J\left\Vert {\textbf{V}}\right\Vert _{2,\infty }^2}{K}\right\} \). Note that

$$\begin{aligned} \Vert \textbf{U}\Vert _{2,\infty } \le \Vert \textbf{U}_{{\textbf{S}},:}\Vert _{2,\infty } \le \Vert \textbf{U}_{{\textbf{S}},:}\Vert = \frac{1}{\sigma _K(\varvec{\Pi })} \lesssim \frac{1}{\sqrt{N}}, \end{aligned}$$

since all rows of \(\textbf{U}\) are convex combinations of \(\textbf{U}_{{\textbf{S}},:}\). On the other hand,

$$\begin{aligned} \Vert \textbf{V}\Vert _{2,\infty }&= \Vert \varvec{\Theta }\textbf{U}^{-\top }_{{\textbf{S}},:} \varvec{\Sigma }^{-1}\Vert _{2,\infty } \le \Vert \varvec{\Theta }\Vert _{2,\infty } \Vert \textbf{U}^{-\top }_{{\textbf{S}},:} \varvec{\Sigma }^{-1}\Vert \\&\le \Vert \varvec{\Theta }\Vert _{2,\infty } \Vert \textbf{U}^{-1}_{{\textbf{S}},:}\Vert \cdot \frac{1}{\sigma _K(\varvec{\Pi }\varvec{\Theta }^{\top })} = \frac{\Vert \varvec{\Theta }\Vert _{2,\infty }\sigma _1(\varvec{\Pi })}{\sigma _K(\varvec{\Pi }\varvec{\Theta }^{\top })} \\&\le \frac{\Vert \varvec{\Theta }\Vert _{2,\infty }\kappa (\varvec{\Pi })}{\sigma _K(\varvec{\Theta })} \le \frac{\sqrt{K} \kappa (\varvec{\Pi })}{\sigma _K(\varvec{\Theta })} \lesssim \frac{1}{\sqrt{J}}. \end{aligned}$$

Therefore, \(\mu \lesssim 1\).

On the other hand, we will show that \(\sqrt{\log (N+J)/\min \{N, J\}}\lesssim 1\). By the symmetry of N and J, we assume \(J\le N\) without loss of generality. Thus,

$$\begin{aligned} \sqrt{\frac{\log (N+J)}{\min \{N, J\}}} =\sqrt{\frac{\log (N+J)}{J}} \lesssim \sqrt{\frac{\log (J^2+J)}{J}} \rightarrow 0. \end{aligned}$$

Therefore, Assumption 4.2 in Chen et al. (2021a) holds and (28) and (29) can be directly obtained from Theorem 4.4 in Chen et al. (2021a). \(\square \)

Lemma 2

Let Conditions 1 and 2 hold. Then, there exists a permutation matrix \(\textbf{P}\) such that with probability at least \(1-O((N+J)^{-5})\),

$$\begin{aligned} \Vert \widehat{\textbf{U}}_{\widehat{\textbf{S}},:} - \textbf{P}\textbf{U}_{{\textbf{S}},:} \cdot \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})\Vert \lesssim \varepsilon . \end{aligned}$$
(30)

Proof of Lemma 2

Using Proposition 1, we will apply Theorem 4 in Klopp et al. (2023) with \(\widetilde{\textbf{G}}=\widehat{\textbf{U}}\), \(\textbf{G}=\textbf{U}\cdot \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})\), \(\textbf{W} =\varvec{\Pi }\), \(\textbf{Q}=\textbf{U}_{{\textbf{S}},:}\cdot \text {sgn}(\textbf{U}^{\top } \widehat{\textbf{U}})\), \(\textbf{N}=\widehat{\textbf{U}}-\textbf{U}\text {sgn} (\textbf{U}^{\top }\widehat{\textbf{U}})\), \(\textbf{N} =\widehat{\textbf{U}}-\textbf{U}\cdot \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})\). According to Lemma 1, \(\Vert \textbf{e}_i^{\top }\textbf{N}\Vert \le \varepsilon \) and \(\varepsilon \lesssim \frac{\sqrt{r} +\sqrt{\log (N+J)}}{\sqrt{NJ}}\). On the other hand, \(\sigma _K(\textbf{Q})=\sigma _K(\textbf{U}_{{\textbf{S}},:}) =\frac{1}{\sigma _1(\varvec{\Pi })}\ge \frac{1}{\sqrt{N}}\) since \(\textbf{U}=\varvec{\Pi }\textbf{U}_{{\textbf{S}},:}\) and \(\sigma _1(\varvec{\Pi }) \le \left\Vert {\varvec{\Pi }}\right\Vert _F \le \sqrt{N} \max _i\left\Vert {\textbf{e}_i^{\top }\varvec{\Pi }}\right\Vert _{2} \le \sqrt{N}\). Therefore, \(\varepsilon \le C_*\frac{\sigma _K (\textbf{U}_{{\textbf{S}},:})}{K\sqrt{K}}\) for some \(C_*>0\) small enough. Then we can use Theorem 4 in Klopp et al. (2023) to get

$$\begin{aligned} \Vert \widehat{\textbf{U}} - \textbf{P}\textbf{U}_{{\textbf{S}},:}\cdot \text {sgn} (\textbf{U}^{\top }\widehat{\textbf{U}})\Vert&\le C_0\sqrt{K}\kappa (\textbf{U}_{{\textbf{S}},:}\cdot \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}}))\varepsilon \\&= C_0\sqrt{K}\kappa (\textbf{U}_{{\textbf{S}},:})\varepsilon {\mathop {=}\limits ^{(i)}} C_0\sqrt{K}\kappa (\varvec{\Pi })\varepsilon \\&\lesssim \varepsilon ~~ \text { with probability at least}\ 1-O((N+J)^{-5}). \end{aligned}$$

Here (i) is because \(\textbf{U}=\varvec{\Pi }\textbf{U}_{{\textbf{S}},:}\). \(\square \)

Proof of Theorem 3

First show that \(\widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:}\) is not degenerate. By Weyl’s inequality and Lemma 2, with probability at least \(1-O((N+J)^{-5})\), we have

$$\begin{aligned} \sigma _K(\widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:})&\ge \sigma _K(\textbf{P}\textbf{U}_{{\textbf{S}},:}\textbf{O}) - \Vert \widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:} -\textbf{P}\textbf{U}_{{\textbf{S}},:}\cdot \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})\Vert \\&\ge \sigma _K(\textbf{U}_{{\textbf{S}},:}) -\Vert \widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:} -\textbf{P}\textbf{U}_{{\textbf{S}},:}\cdot \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})\Vert _F\\&\succsim \frac{1}{\sigma _1(\varvec{\Pi })} - \varepsilon \\&\succsim \frac{1}{\sqrt{N}} - \frac{\sqrt{r} +\sqrt{\log (N+J)}}{\sqrt{NJ}}\\&\succsim \frac{1}{\sqrt{N}} \end{aligned}$$

when NJ are large enough and \(\frac{N}{J^2}\) converges to zero. Therefore, \(\widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:}\) is invertible.

For the estimation of \(\varvec{\Pi }\),

$$\begin{aligned} \Vert \widetilde{\varvec{\Pi }}-\varvec{\Pi }\textbf{P}\Vert _F&= \Vert \widehat{\textbf{U}} \widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:}^{-1} - \textbf{U}\textbf{U}^{-1}_{{\textbf{S}},:}\textbf{P}\Vert _F\\&\le \underbrace{\Vert \widehat{\textbf{U}} (\widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:}^{-1} -\text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})^{\top } \textbf{U}^{-1}_{{\textbf{S}},:}\textbf{P})\Vert _F}_{I_1} + \underbrace{\Vert (\widehat{\textbf{U}} - \textbf{U}\text {sgn} (\textbf{U}^{\top }\widehat{\textbf{U}}))[\textbf{P}^{-1}\textbf{U}_{{\textbf{S}},:} \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})]^{-1}\Vert _F}_{I_2}\\&=: I_1 + I_2. \end{aligned}$$

We will look at \(I_1\) and \(I_2\) separately.

$$\begin{aligned} I_1&= \Vert \widehat{\textbf{U}}(\widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:}^{-1} -\text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})^{\top }\textbf{U}^{-1}_{{\textbf{S}},:} \textbf{P}^{\top })\Vert _F \\&\le \Vert \widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:}^{-1} - \text {sgn} (\textbf{U}^{\top }\widehat{\textbf{U}})^{\top }\textbf{U}^{-1}_{{\textbf{S}},:}\textbf{P}^{\top }\Vert _F\\&\le \Vert \widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:}^{-1}\Vert \Vert \text {sgn} (\textbf{U}^{\top }\widehat{\textbf{U}})^{\top }\textbf{U}^{-1}_{{\textbf{S}},:} \textbf{P}^{\top }\Vert \Vert \widehat{\textbf{U}}_{\widehat{{\textbf{S}}},:} - \textbf{P}\textbf{U}_{{\textbf{S}},:}\text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})\Vert _F\\&\lesssim \sqrt{N} \cdot \sigma _1(\varvec{\Pi }) \cdot \varepsilon \\&\lesssim \sqrt{N}\cdot \frac{\sqrt{r}+\sqrt{\log (N+J)}}{\sqrt{J}} ~~ \text {with probability at least}\ 1-O((N+J)^{-5}); \end{aligned}$$

and

$$\begin{aligned} I_2&= \Vert (\widehat{\textbf{U}} - \textbf{U}\text {sgn}(\textbf{U}^{\top } \widehat{\textbf{U}}))[\textbf{P}^{-1}\textbf{U}_{{\textbf{S}},:}\text {sgn} (\textbf{U}^{\top }\widehat{\textbf{U}})]^{-1}\Vert _F\\&\le \Vert \widehat{\textbf{U}} - \textbf{U}\text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}}) \Vert _F \Vert \textbf{U}_{{\textbf{S}},:}^{-1}\Vert \\&\le \sqrt{N} \cdot \varepsilon \cdot \sigma _1(\varvec{\Pi }) \\&\lesssim \sqrt{N}\cdot \frac{\sqrt{r}+\sqrt{\log (N+J)}}{\sqrt{J}} ~~ \text {with probability at least}\ 1-O((N+J)^{-5}). \end{aligned}$$

Therefore, with probability at least \(1-O((N+J)^{-5})\),

$$\begin{aligned} \frac{1}{\sqrt{NK}} \Vert \widetilde{\varvec{\Pi }}-\varvec{\Pi }\textbf{P}\Vert _F&\lesssim \frac{\sqrt{r}+\sqrt{\log (N+J)}}{\sqrt{J}}\\&= {\left\{ \begin{array}{ll} \frac{\sqrt{N}}{J} + \frac{\sqrt{\log (N+J)}}{\sqrt{J}} &{} \text {if } N > J, \\ \frac{1}{\sqrt{N}} + \frac{\sqrt{\log (N+J)}}{\sqrt{J}} &{} \text {if } N \le J. \end{array}\right. } \end{aligned}$$

Therefore, \(\frac{1}{\sqrt{NK}} \Vert \widetilde{\varvec{\Pi }}-\varvec{\Pi }\textbf{P}\Vert _F\) converges to zero in probability as \(N, J\rightarrow \infty \) and \(\frac{N}{J^2}\rightarrow 0\).

For the estimation of \(\varvec{\Theta }\),

$$\begin{aligned}&\Vert \widetilde{\varvec{\Theta }}\textbf{P}-\varvec{\Theta }\Vert _F \\&\quad = \Vert \textbf{P}^{\top }\widehat{\textbf{U}}_{\widehat{\textbf{S}},:} \widehat{\varvec{\Sigma }}\widehat{\textbf{V}}^{\top } - \textbf{U}_{{\textbf{S}},:}\varvec{\Sigma }\textbf{V}^{\top }\Vert _F\\&\quad \le \Vert (\textbf{P}^{\top }\widehat{\textbf{U}}_{\widehat{\textbf{S}},:} -\textbf{U}_{{\textbf{S}},:}\text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})) \widehat{\varvec{\Sigma }}\widehat{\textbf{V}}^{\top }\Vert _F + \Vert (\textbf{U}_{{\textbf{S}},:} \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}}) - \widehat{\textbf{U}}_{{\textbf{S}},:}) \widehat{\varvec{\Sigma }}\widehat{\textbf{V}}^{\top }\Vert _F \\&\qquad + \Vert \widehat{\textbf{U}}_{{\textbf{S}},:} \widehat{\varvec{\Sigma }} \widehat{\textbf{V}}^{\top } - \textbf{U}_{{\textbf{S}},:}\varvec{\Sigma }\textbf{V}^{\top }\Vert _F\\&\quad \le \Vert \textbf{P}^{\top }\widehat{\textbf{U}}_{\widehat{\textbf{S}},:} - \textbf{U}_{{\textbf{S}},:}\text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}})\Vert _F \cdot \sigma _1(\textbf{R})\cdot \Vert \widehat{\textbf{V}}\Vert + \Vert \textbf{U}_{{\textbf{S}},:} \text {sgn}(\textbf{U}^{\top }\widehat{\textbf{U}}) - \widehat{\textbf{U}}_{{\textbf{S}},:} \Vert \cdot \sigma _1(\textbf{R})\cdot \Vert \widehat{\textbf{V}}\Vert \\&\qquad + \sqrt{KJ}\Vert \widehat{\textbf{U}}_{{\textbf{S}},:} \widehat{\varvec{\Sigma }} \widehat{\textbf{V}}^{\top } - \textbf{U}_{{\textbf{S}},:}\varvec{\Sigma }\textbf{V}^{\top }\Vert _{\infty }\\ {\mathop {\lesssim }\limits ^{(ii)}}&\varepsilon \cdot \sigma _1(\textbf{R}_0) +\varepsilon \cdot (\sigma _1(\textbf{R}) - \sigma _1(\textbf{R}_0)) + \sqrt{KJ} \cdot \eta ~~ \text {with probability at least}\ 1-O((N+J)^{-5}), \end{aligned}$$

where (ii) results from Lemma 2. By Weyl’s inequality, \(|\sigma _1(\textbf{R}) - \sigma _1(\textbf{R}_0)| \le \left\Vert {\textbf{R}-\textbf{R}_0}\right\Vert \), where \(\textbf{R}-\textbf{R}_0\) is a mean-zero Bernoulli matrix. According to Eq (3.9) in Chen et al. (2021a), with probability at least \(1-(N+J)^{-8}\),

$$\begin{aligned} \left\Vert {\textbf{R}-\textbf{R}_0}\right\Vert \lesssim \sqrt{N+J} + \sqrt{\log (N+J)}. \end{aligned}$$

Furthermore, \(\sigma _1(\textbf{R}_0)\ge \sigma _K(\textbf{R}_0)\succsim \sqrt{NJ}\) by Condition 2; thus, we know that \(\sigma _1(\textbf{R}_0) \succsim |\sigma _1(\textbf{R}) - \sigma _1(\textbf{R}^*)|\) with probability at least \(1-(N+J)^{-8}\). Therefore, with probability at least \(1-O((N+J)^{-5})\),

$$\begin{aligned} \Vert \widehat{\varvec{\Theta }}\textbf{P}-\varvec{\Theta }\Vert _F&\lesssim \varepsilon \cdot \sigma _1(\textbf{R}_0) + \sqrt{KJ}\cdot \eta \\&\lesssim \frac{\sqrt{r} + \sqrt{\log (N+J)}}{\sqrt{NJ}} \cdot \sqrt{N} \cdot \sqrt{J} + \sqrt{J} \sqrt{\frac{r\log (N+J)}{\min \{N,J\}}}\\&= \sqrt{r} + \sqrt{\log (N+J)} + \sqrt{J} \sqrt{\frac{r\log (N+J)}{\min \{N,J\}}}. \end{aligned}$$

Thus,

$$\begin{aligned} \frac{1}{\sqrt{JK}} \Vert \widehat{\varvec{\Theta }}\textbf{P}-\varvec{\Theta }\Vert _F&\lesssim \frac{\sqrt{r} + \sqrt{\log (N+J)}}{\sqrt{J}} + \sqrt{\frac{r\log (N+J)}{\min \{N,J\}}} \\&= {\left\{ \begin{array}{ll} \frac{\sqrt{N}}{J}+ \frac{\sqrt{\log (N+J)}}{\sqrt{J}} + \frac{\sqrt{N\log (N+J)}}{J} &{} \text {if } N > J \\ \frac{1}{\sqrt{N}} + \frac{\sqrt{\log (N+J)}}{\sqrt{J}} + \frac{\sqrt{J\log (N+J)}}{N} &{} \text {if } N \le J \end{array}\right. }. \end{aligned}$$

Therefore, \(\frac{1}{\sqrt{JK}} \Vert \widehat{\varvec{\Theta }}\textbf{P}-\varvec{\Theta }\Vert _F\) converges to zero in probability as \(N, J\rightarrow \infty \) and \(\frac{N}{J^2}, \frac{J}{N^2}\rightarrow 0\). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Gu, Y. A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses. Psychometrika (2024). https://doi.org/10.1007/s11336-024-09951-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11336-024-09951-y

Keywords

Navigation