Skip to main content
Log in

Identifiability of latent-variable and structural-equation models: from linear to nonlinear

  • INVITED ARTICLE: FOURTH AKAIKE MEMORIAL LECTURE
  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable. In factor analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modeling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic and structural equation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Dimension reduction in itself does not necessarily suffer from such lack of uniqueness if all we want to find is the right subspace. In the case of dimension reduction, factor analysis is actually closely related to principal component analysis (PCA). However, this whole paper is about the case where dimension reduction is not performed; these two are very different problems.

  2. Structural equation models (SEM) are sometimes referred to as structural causal models (SCM) or functional causal models (FCM) in recent machine learning literature.

  3. Alternatively, we could do the same proof in the Fourier domain, i.e., using characteristic functions \(\hat{p}\). Then, \(p(\textbf{x})\) and \(p_i(s_i)\) will be replaced by the characteristic functions, and the Jacobian disappears in the first equations. Thus, we would replace the assumption of smooth pdf’s by the assumption of continuous second derivatives of the characteristic functions of the \(p_i\), denoted by \(\hat{p_i}\). Such an assumption is related to the moment structure of the components: it is just slightly more restrictive than assuming finite variances for the components. The whole proof is valid for characteristic functions with minimal changes. Thus, we get a more general if a bit more complicated proof.

References

  • Alain, G., Bengio, Y. (2018). Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644.

  • Belouchrani, A., Meraim, K. A., Cardoso, J. F., Moulines, E. (1997). A blind source separation technique based on second order statistics. IEEE Transactions on Signal Processing, 45(2), 434–444.

    Article  Google Scholar 

  • Bollen, K.A. (1989). Structural Equations with Latent Variables. Wiley.

  • Brookes, M., Woolrich, M., Luckhoo, H., Price, D., Hale, J., Stephenson, M., Barnes, G., Smith, S., Morris, P. (2011). Investigating the electrophysiological basis of resting state networks using magnetoencephalography. Proceedings of the National Academy of Sciences (USA), 108, 16783–16788.

    Article  Google Scholar 

  • Buchholz, S., Besserve, M., Schölkopf, B. (2022). Function classes for identifiable nonlinear independent component analysis. arXiv preprint arXiv:2208.06406.

  • Cardoso, J. F. (2001). The three easy routes to independent component analysis: contrasts and geometry. Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation (ICA2001), San Diego.

  • Cardoso, J. F., Laheld, B. H. (1996). Equivariant adaptive source separation. IEEE Transactions on Signal Processing, 44(12), 3017–3030.

    Article  Google Scholar 

  • Chen, T., Kornblith, S., Norouzi, M., Hinton, G. (2020). A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709.

  • Comon, P. (1994). Independent component analysis–a new concept? Signal Processing, 36, 287–314.

    Article  Google Scholar 

  • Donoho, D. L., Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10), 5591–5596.

    Article  MathSciNet  Google Scholar 

  • Donoho, D. L., Stodden, V. (2004). When does non-negative matrix factorization give a correct decomposition into parts? Advances in Neural Information Processing 16 (Proceedings of NIPS2003). MIT Press.

  • Eriksson, J., Koivunen, V. (2004). Identifiability, separability, and uniqueness of linear ICA models. Signal Processing Letters, IEEE 11(7), 601–604 (2004).

    Article  Google Scholar 

  • Flanders, H. (1966). Liouville’s theorem on conformal mapping. Journal of Mathematics and Mechanics, 15(1), 157–161.

    MathSciNet  Google Scholar 

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 2672–2680.

  • Gresele, L., Fissore, G., Javaloy, A., Schölkopf, B., Hyvärinen, A. (2020a). Relative gradient optimization of the jacobian term in unsupervised deep learning. Advances in Neural Information Processing Systems (NeurIPS2020).

  • Gresele, L., Rubenstein, P. K., Mehrjou, A., Locatello, F., Schölkopf, B. (2020b). The Incomplete Rosetta Stone Problem: Identifiability Results for Multi-View Nonlinear ICA. Uncertainty in Artificial Intelligence, 217–227. Proceedings of Machine Learning Research.

  • Gresele, L., Von Kügelgen, J., Stimper, V., Schölkopf, B., Besserve, M. (2021). Independent mechanism analysis, a new concept? Advances in neural information processing systems, 34, 28233–28248.

    Google Scholar 

  • Hälvä, H., Hyvärinen, A. (2020). Hidden Markov nonlinear ICA: Unsupervised learning from nonstationary time series. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI2020), Toronto.

  • Hälvä, H., Corff, S. L., Lehéricy, L., So, J., Zhu, Y., Gassiat, E., Hyvärinen, A. (2021). Disentangling identifiable features from noisy data with structured nonlinear ICA. Advances in Neural Information Processing Systems (NeurIPS2021).

  • Harman, H. H. (1967). Modern Factor Analysis. 2nd ed. University of Chicago Press.

  • Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K. R. (2003). Kernel-based nonlinear blind source separation. Neural Computation, 15(5), 1089–1124.

    Article  Google Scholar 

  • Horan, D., Richardson, E., Weiss, Y. (2021). When is unsupervised disentanglement possible? Advances in Neural Information Processing Systems, 34, 5150–5161.

    Google Scholar 

  • Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5, 1457–1469.

    MathSciNet  Google Scholar 

  • Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., & Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. Advances in Neural Information Processing Systems, vol 21, pp 689–696. MIT Press.

  • Hoyer, P. O., Shimizu, S., Kerminen, A. J., Palviainen, M. (2008). Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49, 362–378.

    Article  MathSciNet  Google Scholar 

  • Huang, C. W., Krueger, D., Lacoste, A., Courville, A. (2018). Neural autoregressive flows. International Conference on Machine Learning, 2078–2087. Proceedings of Machine Learning Research.

  • Hyttinen, A., Barin-Pacela, V., Hyvärinen, A. (2022). Binary independent component analysis: A non-stationarity-based approach. Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence (UAI2022), 874–884, Eindhoven.

  • Hyvärinen, A. (1997). One-unit contrast functions for independent component analysis: A statistical analysis. Neural Networks for Signal Processing VII (Proceedings of the IEEE Workshop on Neural Networks for Signal Processing), 388–397, Amelia Island.

  • Hyvärinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626–634.

    Article  Google Scholar 

  • Hyvärinen, A., Morioka, H. (2016). Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. Advances in Neural Information Processing Systems (NIPS2016), Barcelona.

  • Hyvärinen, A., Morioka, H. (2017). Nonlinear ICA of temporally dependent stationary sources. Proceedings of the Artificial Intelligence and Statistics (AISTATS2017), Fort Lauderdale.

  • Hyvärinen, A., Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural Networks, 13(4–5), 411–430.

    Article  Google Scholar 

  • Hyvärinen, A., Pajunen, P. (1999). Nonlinear independent component analysis: Existence and uniqueness results. Neural Networks, 12(3), 429–439.

    Article  Google Scholar 

  • Hyvärinen, A., Smith S.M. (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14, 111–152.

    MathSciNet  Google Scholar 

  • Hyvärinen, A., Karhunen, J., Oja, E. (2001). Independent Component Analysis. Wiley Interscience.

  • Hyvärinen, A., Hurri, J., Hoyer, P. O. (2009). Natural Image Statistics. Springer-Verlag.

    Book  Google Scholar 

  • Hyvärinen, A., Ramkumar, P., Parkkonen, L., Hari, R. (2010). Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis. NeuroImage, 49(1), 257–271.

    Article  Google Scholar 

  • Hyvärinen, A., Sasaki, H., Turner, R. (2019). Nonlinear ICA using auxiliary variables and generalized contrastive learning. Proceedings of the Artificial Intelligence and Statistics (AISTATS2019), Okinawa.

  • Hyvärinen, A., Khemakhem, I., Morioka, H. (2023). Nonlinear independent component analysis for principled disentanglement in unsupervised deep learning. arXiv preprint arXiv:2303.16535.

  • Immer, A., Schultheiss, C., Vogt, J. E., Schölkopf, B., Bühlmann, P., Marx, A. (2022). On the identifiability and estimation of causal location-scale noise models. arXiv preprint arXiv:2210.09054.

  • Jakobsen, M. E., Shah, R. D., Bühlmann, P., Peters, J. (2022). Structure learning for directed trees. Journal of Machine Learning Research, 23, 159.

    MathSciNet  Google Scholar 

  • Jutten, C., Hérault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24, 1–10.

    Article  Google Scholar 

  • Khemakhem, I., Kingma, D. P., Monti, R. P., Hyvärinen, A. (2020a). Variational autoencoders and nonlinear ICA: A unifying framework. Proceedings of the Artificial Intelligence and Statistics (AISTATS2020).

  • Khemakhem, I., Monti, R. P., Kingma, D. P., Hyvärinen, A. (2020b). ICE-BeeM: Identifiable conditional energy-based deep models based on nonlinear ICA. Advances in Neural Information Processing Systems (NeurIPS2020).

  • Khemakhem, I., Monti, R. P., Leech, R., Hyvärinen, A. (2021). Causal autoregressive flows. Proceedings of the Artificial Intelligence and Statistics (AISTATS2021).

  • Kingma, D. P., Welling, M. (2014). Auto-encoding variational Bayes. Proceedings of the International Conference on Learning Representations (ICLR2014), Banff.

  • Kivva, B., Rajendran, G., Ravikumar, P., Aragam, B. (2022). Identifiability of deep generative models under mixture priors without auxiliary information. arXiv preprint arXiv:2206.10044.

  • Klindt, D., Schott, L., Sharma, Y., Ustyuzhaninov, I., Brendel, W., Bethge, M., Paiton, D. (2020). Towards nonlinear disentanglement in natural data with temporal sparse coding. arXiv preprint arXiv:2007.10930.

  • Kumar, A., Poole, B. (2020). On implicit regularization in \( \beta \)-vaes. In International Conference on Machine Learning, 5480–5490. Proceedings of Machine Learning Research.

  • Lacerda, G., Spirtes, P., Ramsey, J., Hoyer, P. O. (2008). Discovering cyclic causal models by independent components analysis. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI2008), Helsinki.

  • Lachapelle, S., Rodriguez, P., Sharma, Y., Everett, K. E., Le Priol, R., Lacoste, A., Lacoste-Julien, S. (2022). Disentanglement via mechanism sparsity regularization: A new principle for nonlinear ICA. Conference on Causal Learning and Reasoning, 428–484. Proceedings of Machine Learning Research.

  • Lee, D. D., Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791.

    Article  Google Scholar 

  • Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Schölkopf, B., Bachem, O. (2019). Challenging common assumptions in the unsupervised learning of disentangled representations. International Conference on Machine Learning, 4114–4124. Proceedings of Machine Learning Research.

  • Matsuoka, K., Ohya, M., Kawamoto, M. (1995). A neural net for blind separation of nonstationary signals. Neural Networks, 8(3), 411–419.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.

  • Monti, R. P., Hyvärinen, A. (2018). A unified probabilistic model for learning latent factors and their connectivities from high-dimensional data.Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI2018), Monterey.

  • Monti, R. P., Zhang, K., Hyvärinen, A. (2019). Causal discovery with general non-linear relationships using non-linear ICA.Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI2019), Tel Aviv.

  • Moran, G. E., Sridhar, D., Wang, Y., Blei, D. M. (2021). Identifiable variational autoencoders via sparse decoding. arXiv preprint arXiv:2110.10804.

  • Morioka, H., Hälvä, H., Hyvärinen, A. (2021). Independent innovation analysis for nonlinear vector autoregressive process. Proceedings of the Artificial Intelligence and Statistics (AISTATS2021).

  • Morioka, H., Hyvärinen, A. (2023). Connectivity-contrastive learning: Combining causal discovery and representation learning for multimodal data. Proceedings of the Artificial Intelligence and Statistics (AISTATS2023), Valencia, Spain.

  • Nevanlinna, R. (1960). On differentiable mappings. Analytic functions, 3–9.

  • Olshausen, B. A., Field, D.J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37, 3311–3325.

    Article  Google Scholar 

  • Paatero, P., Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5, 111–126.

    Article  Google Scholar 

  • Pearl, J. (2009). Causality, Cambridge University Press.

    Book  Google Scholar 

  • Peters, J., Bühlmann, P. (2014). Identifiability of gaussian structural equation models with equal error variances. Biometrika, 101(1), 219–228.

    Article  MathSciNet  Google Scholar 

  • Peters, J., Janzing, D., Schölkopf, B. (2017). Elements of causal inference: foundations and learning algorithms. MIT press.

  • Peters, J., Mooij, J. M., Janzing, D., Schölkopf, B. (2014). Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15, 2009–2053.

    MathSciNet  Google Scholar 

  • Pham, D. T., Cardoso, J. F. (2001). Blind separation of instantaneous mixtures of nonstationary sources. IEEE Transactions Signal Processing, 49(9), 1837–1848.

    Article  MathSciNet  Google Scholar 

  • Pham, D. T., Garrat, P. (1997). Blind separation of mixture of independent sources through a quasi-maximum likelihood approach. IEEE Transactions on Signal Processing, 45(7), 1712–1725.

    Article  Google Scholar 

  • Plumbley, M.D. (2003). Algorithms for non-negative independent component analysis. IEEE Transactions on Neural Networks, 14(3), 534–543.

    Article  Google Scholar 

  • Rezende, D. J., Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770.

  • Sasaki, H., Takenouchi, T., Monti, R. P., Hyvärinen, A. (2020). Robust contrastive learning and nonlinear ICA in the presence of outliers. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI2020), Toronto.

  • Schell, A., Oberhauser, H. (2023). Nonlinear independent component analysis for discrete-time and continuous-time signals. Annals of Statistics, In press.

  • Shimizu, S. (2014). LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1), 65–98.

    Article  Google Scholar 

  • Shimizu, S., Hoyer, P. O., Hyvärinen, A., Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7, 2003–2030.

    MathSciNet  Google Scholar 

  • Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., Hoyer, P. O., Bollen, K. (2011). DirectLiNGAM: A direct method for learning a linear non-gaussian structural equation model. Journal of Machine Learning Research, 12, 1225–1248.

    MathSciNet  Google Scholar 

  • Spirtes, P., Glymour, C., Scheines, R., Heckerman, D., Meek, C., Richardson, T. (2000). Causation, Prediction and Search, MIT Press.

    Google Scholar 

  • Spirtes, P., Zhang, K. (2016). Causal discovery and inference: concepts and recent methodological advances. Applied Informatics.

  • Sprekeler, H., Zito, T., Wiskott, L. (2014). An extension of slow feature analysis for nonlinear blind source separation. Journal of Machine Learning Research, 15(1), 921–947.

    MathSciNet  Google Scholar 

  • Strobl, E. V., Lasko, T. A. (2022). Identifying patient-specific root causes with the heteroscedastic noise model. arXiv preprint arXiv:2205.13085.

  • Tashiro, T., Shimizu, S., Hyvärinen, A., Washio, T. (2014). ParceLiNGAM: a causal ordering method robust against latent confounders. Neural Computation, 26, 57–83.

    Article  MathSciNet  Google Scholar 

  • Tichavsky, P., Koldovsky, Z., Oja, E. (2006). Performance analysis of the fastica algorithm and crame/spl acute/r-rao bounds for linear independent component analysis. IEEE Transactions on Signal Processing, 54(4), 1189–1203.

    Article  Google Scholar 

  • Tong, L., Liu, R. W., Soon, V. C., Huang, Y. F. (1991). Indeterminacy and identifiability of blind identification. IEEE Transactions on Circuits and Systems, 38, 499–509.

    Article  Google Scholar 

  • Wei, Y., Shi, Y., Liu, X., Ji, Z., Gao, Y., Wu, Z., Zuo, W. (2021). Orthogonal Jacobian regularization for unsupervised disentanglement in image generation. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6721–6730.

  • Wiatowski, T., Bölcskei, H. (2017). A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. arXiv preprint arXiv:1512.06293.

  • Willetts, M., Paige, B. (2021). I Don’t Need u: Identifiable Non-Linear ICA Without Side Information. arXiv preprint arXiv:2106.05238.

  • Zhang, K., Chan, L. (2008). Minimal nonlinear distortion principle for nonlinear independent component analysis. Journal of Machine Learning Research, 9, 2455–2487.

    MathSciNet  Google Scholar 

  • Zhang, K., Hyvärinen, A. (2009). On the identifiability of the post-nonlinear causal model. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI2009), 647–655, Montréal.

  • Zhang, K., Hyvärinen, A. (2010). Source separation and higher-order causal analysis of MEG and EEG. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI2010), Catalina Island.

  • Zhu, Y., Parviainen, T., Heinilä, E., Parkkonen, L., Hyvärinen, A. (2023). Unsupervised representation learning of spontaneous MEG data with nonlinear ICA. NeuroImage. In press.

  • Zimmermann, R. S., Sharma, Y., Schneider, S., Bethge, M., Brendel, W. (2021). Contrastive learning inverts the data generating process. International Conference on Machine Learning, 12979–12990. Proceedings of Machine Learning Research.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aapo Hyvärinen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The Related Articles are https://doi.org/10.1007/s10463-023-00885-3; https://doi.org/10.1007/s10463-023-00886-2; https://doi.org/10.1007/s10463-023-00887-1.

A. Liouville’s theorem and nonlinear ICA

A. Liouville’s theorem and nonlinear ICA

A possible line of research for making nonlinear ICA identifiable is imposing suitable conditions on the Jacobian of the transformation. The conditions are typically related to orthogonality of the Jacobian, also called local isometry (Gresele et al. 2021; Zimmermann et al. 2021; Buchholz et al. 2022). In this appendix, we consider the set of invertible mappings within a real space, i.e., such that the dimension is not changed, as is typical in ICA theory. We show that functions which are locally isometric and \(\mathbb {R}^n \rightarrow \mathbb {R}^n\) are necessarily affine, and thus do not provide a meaningful basis for nonlinear ICA theory. We do this by proving a variant of the theorem on conformal mappings by Liouville. While the result can be considered well known, we provide a very simple proof for this variant, which is difficult to find in the literature.

1.1 A.1 Definitions

We start by some definitions:

Definition 1

Let \(\textbf{f}\) be a differentiable mapping from an open subset U of \(\mathbb {R}^n\) to an open subset V of \(\mathbb {R}^n\). The mapping called conformal if

$$\begin{aligned} \textbf{J}\textbf{f}(\textbf{x})^T \textbf{J}\textbf{f}(\textbf{x})=c(\textbf{x}) \textbf{I} \end{aligned}$$
(27)

where \(\textbf{J}\textbf{f}\) is the Jacobian matrix (of partial derivatives) of \(\textbf{f}\), and \(c(\textbf{x})\) takes positive scalar values.

The geometrical meaning of the definition is that the mapping preserves angles, but the scaling can be changed by \(c(\textbf{x})\). An important special case is obtained when the Jacobian is orthogonal:

Definition 2

If in Def. 1, \(c(\textbf{x})\equiv 1\), the mapping is called locally isometric.

In a geometrical interpretation, the mapping is required to preserve both angles and the scaling. This is related to the definition used by Donoho and Grimes (2003), and based on them, by Horan et al. (2021). However, those authors consider the case where the dimension is reduced by the mapping \(\textbf{f}\), and thus our results are rather different. In fact, local isometry is typically defined in the context of manifolds which are of a lower dimension than the space itself, and therefore, our definition is a special case of the more conventional one. The orthogonality of the Jacobian has also been used, more heuristically, as a regularizer (Wei et al. 2021; Kumar and Poole 2020).

Merely for simplicity, we further introduce the following terminology:

Definition 3

A function \(\textbf{f}\) is called orthogonally affine if it is of the form

$$\begin{aligned} \textbf{f}(\textbf{x})=\textbf{U}\textbf{x}+\textbf{b}\end{aligned}$$
(28)

for some constant vector \(\textbf{b}\) and an orthogonal matrix \(\textbf{U}\).

Such a function is also called “rigid motion” in some contexts.

1.2 A.2 Locally isometric functions are orthogonally affine

Our analysis is based on a well-known theorem on conformal mappings by Liouville from 1850. To keep the presentation short, we simply provide our variant of the theorem, and refer the reader to the literature for the original theorem (Flanders 1966; Nevanlinna 1960). Our variant of the theorem is as follows:

Theorem 2

Assume \(\textbf{f}\) is locally isometric (Def. 2), and in \(\mathcal {C}^2\) (i.e., with two continuous derivatives), in a real space of dimension \(n\ge 2\). Then, \(\textbf{f}\) is orthogonally affine (Def. 3).

Proof

This proof is closely related to the first half of the proof of Liouville’s theorem by Flanders (1966); see Nevanlinna (1960) for another well-known proof.

We drop the argument \(\textbf{x}\) for notational simplicity, and denote by \(\textbf{w}\cdot \textbf{v}\) the dot-product. Denote by \(\textbf{J}_i\) the i-th column of the Jacobian of \(\textbf{f}\), i.e., the vector of the partial derivatives with respect to \(x_i\). Consider the ij-th element of the matrix equation defining isometry by orthogonality of the Jacobian:

$$\begin{aligned} \textbf{J}_i \cdot \textbf{J}_j=\delta _{i=j} \end{aligned}$$
(29)

Take the derivative with respect to \(x_k\) of both sides:

$$\begin{aligned} \textbf{J}_i\cdot \textbf{J}_j^k + \textbf{J}_j\cdot \textbf{J}_i^k= 0 \end{aligned}$$
(30)

where \(\textbf{J}_i^k\) is the vector of the partial derivatives with respect to \(x_k\) of the entries of \(\textbf{J}_i\). Thus, we get the following skew-symmetricity condition:

$$\begin{aligned} \textbf{J}_i\cdot \textbf{J}_j^k = - \textbf{J}_j\cdot \textbf{J}_i^k .\end{aligned}$$
(31)

On the other hand, by the fact that the order of derivation can be changed, we have the symmetricity condition:

$$\begin{aligned} \textbf{J}_i\cdot \textbf{J}_j^k = \textbf{J}_i\cdot \textbf{J}_k^j .\end{aligned}$$
(32)

Obviously, a matrix which is both symmetric and skew-symmetric is necessarily zero. Here, we actually have a tensor but we show next that the same principle applies to such a tensor; this result is sometimes known as the Braid Lemma. Take any indices ijk, and apply (31) and (32) in alternation, three times. We get

$$\begin{aligned} \textbf{J}_i\cdot \textbf{J}_j^k = - \textbf{J}_j\cdot \textbf{J}_i^k = - \textbf{J}_j\cdot \textbf{J}_k^i = \textbf{J}_k\cdot \textbf{J}_j^i =\textbf{J}_k\cdot \textbf{J}_i^j = - \textbf{J}_i\cdot \textbf{J}_k^j = - \textbf{J}_i\cdot \textbf{J}_j^k .\end{aligned}$$
(33)

The equality of the first and last term shows that they must be zero. This holds for all i, and the \(\textbf{J}_i\) form an orthogonal basis, which implies that \(\textbf{J}_j^k\) is zero. Thus, all second-order partial derivatives vanish at every point, and \(\textbf{f}\) must be affine. By definition of isometry, it must further be orthogonally affine. \(\square \)

Our theorem extends the Liouville theory in the sense that our theorem applies to \(n\ge 2\) while Liouville’s theorem assumes \(n\ge 3\). On the other hand, ours is a special case since Liouville assumes conformality while we assume local isometry. Given Liouville’s theorem, it would be very easy to prove a corollary which gives the desired result for \(n\ge 3\) but we prefer to prove a theorem which is not a strict corollary but an extension at the same time, without any help from such advanced theory. It was in fact possible to give a very simple proof in our case, while only the very complicated proofs seem to be available for Liouville’s theorem.

The implication is that a nonlinear ICA model (or any deep latent variable model) where the (dimension-preserving) mixing is assumed locally isometric trivially reduces to a linear ICA model. (Note that we did not use any statistical properties of any components here.) The assumption of local isometry is far too strong from this viewpoint.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hyvärinen, A., Khemakhem, I. & Monti, R. Identifiability of latent-variable and structural-equation models: from linear to nonlinear. Ann Inst Stat Math 76, 1–33 (2024). https://doi.org/10.1007/s10463-023-00884-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-023-00884-4

Keywords

Navigation