Skip to main content
Log in

Generative modeling via tree tensor network states

  • Research
  • Published:
Research in the Mathematical Sciences Aims and scope Submit manuscript

Abstract

In this paper, we present a density estimation framework based on tree tensor-network states. The proposed method consists of determining the tree topology with the Chow-Liu algorithm and obtaining a linear system of equations that defines the tensor-network components via sketching techniques. Novel choices of sketch functions are developed in order to consider graphical models that contain loops. For a wide class of d-dimensional density functions admitting the proposed ansatz, fast \(O(d)\) sample complexity guarantees are provided and further corroborated by numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Bhattacharyya, Arnab, Gayen, Sutanu, Price, Eric, Vinodchandran, NV: Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu. In: 2021 Proceedings of the 53rd annual acm SIGACT symposium on theory of computing, pp 147- 160

  2. Bradley, Tai-Danae., Stoudenmire, E Miles, Terilla, John: Modeling sequences with quantum states: a look under the hood. Mach. Learn. Sci. Technol. 1(3), 035008 (2020)

    Article  Google Scholar 

  3. Bresler, Guy, Karzand, Mina: Learning a tree-structured ising model in order to make predictions. Ann. Statist. 48(2), 713–737 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  4. Candes, Emmanuel J., Plan, Yaniv: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010)

    Article  Google Scholar 

  5. Chen, Yuxin, Chi, Yuejie, Fan, Jianqing, Ma, Cong: Spectral methods for data science: a statistical perspective: ISSN=1935-8237 Foundations and Trends in Machine. Learning 14(5), 566–806 (2021). https://doi.org/10.1561/2200000079

    Article  Google Scholar 

  6. Cheng, Song, Wang, Lei, Xiang, Tao, Zhang, Pan: Tree tensor networks for generative modeling. Phys. Rev. B 99(15), 155131 (2019)

    Article  Google Scholar 

  7. Chow, C.K.C.N., Liu, Cong: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14(3), 462–467 (1968)

    Article  MATH  Google Scholar 

  8. Dolgov, Sergey, Anaya-Izquierdo, Karim, Fox, Colin, Scheichl, Robert: Approximation and sampling of multivariate probability distributions in the tensor train decomposition. Statist. Comput. 303, 603–625 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  9. Gandy, Silvia, Recht, Benjamin, Yamada, Isao: Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl. 27(2), 025010 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Glasser, Ivan, Sweke, Ryan, Pancotti, Nicola, Eisert, Jens, Cirac, Ignacio: Expressive power of tensor-network factorizations for probabilistic modeling. Adv. Neural Inform. Process. Syst. 32 (2019)

  11. Gomez, Abigail McClain, Yelin, Susanne F, Najafi, Khadijeh: Born machines for periodic and open XY quantum spin chains, (2021), arXiv preprint arXiv:2112.05326,

  12. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inform. Process. Syst. 27, 139 (2014)

    Google Scholar 

  13. Han, Zhao-Yu., Wang, Jun, Fan, Heng, Wang, Lei, Zhang, Pan: Unsupervised generative modeling using matrix product states. Phys. Rev. X 8(3), 031012 (2018)

    Google Scholar 

  14. Hinton, Geoffrey E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)

    Article  MATH  Google Scholar 

  15. Hur, Yoonhaeng, Hoskins, Jeremy G, Lindsey, Michael, Stoudenmire, E Miles, Khoo, Yuehaw: Generative modeling via tensor train sketching, (2022). arXiv preprint arXiv:2202.11788,

  16. Khoo, Yuehaw, Lu, Jianfeng, Ying, Lexing: Efficient construction of tensor ring representations from sampling, (2017), arXiv preprint arXiv:1711.00954,

  17. Kingma, Diederik P, Welling, Max: Auto-encoding variational bayes, (2013), arXiv preprint arXiv:1312.6114,

  18. LeCun, Yann, Chopra, Sumit, Hadsell, Raia, Ranzato, M., Huang, F.: A tutorial on energy-based learning. Predict Struct. data. 1, 10 (2006)

    Google Scholar 

  19. Lin, Lin, Lu, Jianfeng, Ying, Lexing: Fast construction of hierarchical matrix representation from matrix-vector multiplication. J. Comput. Phys. 230(10), 4071–4087 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. McClean, Jarrod R., Boixo, Sergio, Smelyanskiy, Vadim N., Babbush, Ryan, Neven, Hartmut: Barren plateaus in quantum neural network training landscapes. Nat. Commun. 9(1), 1–6 (2018)

    Article  Google Scholar 

  21. Nakatani, Naoki, Chan, Garnet Kin-Lic.: Efficient tree tensor network states (TTNS) for quantum chemistry: generalizations of the density matrix renormalization group algorithm. J. Chem. Phys. 138(13), 134113 (2013)

    Article  Google Scholar 

  22. Oseledets, Ivan V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  23. Rezende, Danilo, Mohamed, Shakir: Variational inference with normalizing flows. In: PMLR, 2015 International conference on machine learning. pp 1530- 1538 (2015)

  24. Richard, Emile, Montanari, Andrea: A statistical model for tensor pca. Adv. Neural Inform. Process. Syst. 27 (2014)

  25. Shi, Y.-Y., Duan, L.-M., Vidal, Guifre: Classical simulation of quantum many-body systems with a tree tensor network. Phys. Rev. A 74(2), 022320 (2006)

    Article  Google Scholar 

  26. Silverman, Bernard W: Density Estimation for Statistics and Data Analysis, Routledge, (2018)

  27. Song, Yang, Ermon, Stefano: Generative modeling by estimating gradients of the data distribution. Adv. Neural Inform. Process. Syst. 32 (2019)

  28. Tabak, Esteban G., Vanden-Eijnden, Eric: Density estimation by dual ascent of the log-likelihood. Commun. Math. Sci. 8(1), 217–233 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  29. Tropp, Joel A., et al.: An introduction to matrix concentration inequalities. Foundat. Trends. Mach. Learning 8(1–2), 1–230 (2015)

    MATH  Google Scholar 

  30. Verstraete, Frank, Wolf, Michael M., Perez-Garcia, David, Cirac, J Ignacio: Criticality: the area law, and the computational power of projected entangled pair states. Phys. Rev. Lett. 96(22), 220601 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  31. Wendland, Holger: Numerical Linear Algebra: An Introduction, Cambridge Texts in Applied Mathematics. Cambridge University Press, Cambridge (2017)

    Book  Google Scholar 

  32. Woodruff, David P., et al.: Sketching as a tool for numerical linear algebra. Found. Trends. Theoret. Comput. Sci. 10(1–2), 1–157 (2014)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request. The authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xun Tang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Proof of Theorem 7

Proof

(Of Theorem 7) For simplicity, for the remainder of the proof, we fix a structure for all of the high-order tensors we use. For \(\Phi _{w \rightarrow k}\), one reshapes it to the unfolding matrix \(\Phi _{w \rightarrow k}(x_{\mathcal {L}(w) \cup w}; \alpha _{(w, k)})\). For \(\Psi _{w \rightarrow k}\), one reshapes it to the unfolding matrix \(\Psi _{w \rightarrow k}(\alpha _{(w, k)}; x_{\mathcal {R}(w)})\). For \(\Phi _{\mathcal {C}(k) \rightarrow k}\), one reshapes it to the unfolding matrix \(\Phi _{\mathcal {C}(k) \rightarrow k}( x_{\mathcal {L}(k)}; \alpha _{(k, \mathcal {C}(k))})\). For \(G_{k}\), one reshapes it to the unfolding matrix \(G_{k}(\alpha _{(k, \mathcal {C}(k))}; x_{k}, \alpha _{(k, \mathcal {P}(k))})\). For \(p\), we will explicitly write out the unfolding matrix structure to avoid ambiguity.

According to Condition 1, for any edge \(w \rightarrow k\), one has

$$\begin{aligned} p(x_{\mathcal {L}(w)\cup w}; x_{\mathcal {R}(w)}) = \Phi _{w \rightarrow k} \Psi _{w \rightarrow k}. \end{aligned}$$
(46)

Then, define \(\Phi _{w \rightarrow k}^{\dag } :[r_{(w, k)}] \times \left[ \prod _{i \in \mathcal {L}(w) \cup w} n_i\right] \rightarrow \mathbb {R}\) and \(\Psi _{w \rightarrow k}^{\dag } :\left[ \prod _{i \in \mathcal {R}(w)} n_i\right] \times [r_{(w, k)}] \rightarrow \mathbb {R}\) so that \(\Phi _{w \rightarrow k}^{\dag } (\alpha _{(w, k)}; x_{\mathcal {L}(w) \cup w})\) denotes the pseudoinverse of \(\Phi _{w \rightarrow k}\), and \(\Psi _{w \rightarrow k}^{\dag }(x_{\mathcal {R}(w)}; \alpha _{(w, k)})\) denotes the pseudoinverse of \(\Psi _{w \rightarrow k}(\alpha _{(w, k)}; x_{\mathcal {R}(w)})\). Then,

$$\begin{aligned} \Phi _{w \rightarrow k}^{\dag }\Phi _{w \rightarrow k} = \Psi _{w \rightarrow k}\Psi _{w \rightarrow k}^{\dag } = \mathbb {I}_{r_{(w, k)}} \end{aligned}$$

First, we prove the uniqueness of each equation of (5) in the sense of least squares. Note that an exact solution is guaranteed when \(k\) is a leaf, and so one only needs to consider when \(k\) is non-leaf. By assumption, \(\Phi _{w \rightarrow k}\) has a full column rank of \( r_{w \rightarrow k}\). In particular, the Kronecker product structure ensures that \(\Phi _{\mathcal {C}(k) \rightarrow k}\) has full column rank of \( \prod _{w \in \mathcal {C}(k)} r_{w \rightarrow k}\). Therefore, a unique solution to (5) exists in the sense of least squares.

Moreover, when \(k\) is non-leaf and non-root, the pseudoinverse \(\Phi _{\mathcal {C}(k) \rightarrow k}^{\dag }(\alpha _{(k, \mathcal {C}(k))}; x_{\mathcal {L}(k)})\) leads to the following explicit construction of \(G_k\):

$$\begin{aligned} G_k = \Phi _{\mathcal {C}(k) \rightarrow k}^{\dag }\Phi _{k \rightarrow \mathcal {P}(k)}, \end{aligned}$$
(47)

and likewise when \(k\) is root, one has

$$\begin{aligned} G_k = \Phi _{\mathcal {C}(k) \rightarrow k}^{\dag } p(x_{\mathcal {L}(k)}; x_{k}). \end{aligned}$$
(48)

To verify that (5) holds exactly for the construction of \(G_{k}\) in ( 47), one can argue it suffices to check that

$$\begin{aligned} \Phi _{\mathcal {C}(k) \rightarrow k}\Phi _{\mathcal {C}(k) \rightarrow k}^{\dag } p(x_{\mathcal {L}(k)}; x_{k \cup \mathcal {R}(k)}) = p(x_{\mathcal {L}(k)}; x_{k \cup \mathcal {R}(k)}), \end{aligned}$$
(49)

for which we give a brief explanation. When \(k\) is the root, (48) implies that (49) coincides with (5) for when \(k\) is root. When \(k\) is non-root and non-leaf, one can multiply both sides of (49) by \(\Psi _{k \rightarrow \mathcal {P}(k)}^{\dag }\) and sum over \(x_{\mathcal {R}(k)}\). According to (47), the obtained equation coincides with (5) for when \(k\) is non-root and non-leaf.

It remains to show that (49) holds. For an edge \(w \rightarrow k \in E\), define a term \(Q_{w \rightarrow k}\) as follows

$$\begin{aligned} Q_{w \rightarrow k}(x_{\mathcal {L}(w) \cup w}; y_{\mathcal {L}(w) \cup w}):= \sum _{\alpha _{(w,k)}}\Phi _{w \rightarrow k}(x_{\mathcal {L}(w) \cup w}; \alpha _{(w, k)})\Phi _{w \rightarrow k}^{\dag }(\alpha _{(w, k)}; y_{\mathcal {L}(w) \cup w}). \end{aligned}$$

Then, for a generic tensor \(f :[n_{1}] \times \ldots \times [n_{d}] \rightarrow \mathbb {R}\), one can define a projection operator \(P_{w \rightarrow k}\) as follows

$$\begin{aligned} (P_{w \rightarrow k}f)(x_{1}, \ldots , x_{d}) = \sum _{y_{\mathcal {L}(w) \cup w}}Q_{w \rightarrow k}(x_{\mathcal {L}(w) \cup w}, y_{\mathcal {L}(w) \cup w})f(y_{\mathcal {L}(w) \cup w}, x_{\mathcal {R}(w)}). \end{aligned}$$

By commutativity of the sum operations involved, one has

$$\begin{aligned} \Phi _{\mathcal {C}(k) \rightarrow k}\Phi _{\mathcal {C}(k) \rightarrow k}^{\dag } f&= \sum _{y_{\mathcal {L}(k)}}\left( \prod _{w \in \mathcal {C}(k)}{Q_{w \rightarrow k}(x_{\mathcal {L}(w) \cup w}; y_{\mathcal {L}(w) \cup w})}\right) f(x_{1}, \ldots , x_{d})\\&= \sum _{\begin{array}{c} y_{\mathcal {L}(w) \cup w} \\ w \in \mathcal {C}(k) \end{array}}\left( \prod _{w \in \mathcal {C}(k)}{Q_{w \rightarrow k}(x_{\mathcal {L}(w) \cup w}; y_{\mathcal {L}(w) \cup w})}\right) f(x_{1}, \ldots , x_{d}) \\&= \left( \prod _{w \in \mathcal {C}(k)}P_{w \rightarrow k}\right) f \end{aligned}$$

Thus, (49) holds if one can show that \(P_{w \rightarrow k}p = p\) for any \(w \in \mathcal {C}(k)\), but this fact is straightforward:

$$\begin{aligned}{} & {} P_{w \rightarrow k}p =\Phi _{w \rightarrow k}\Phi _{w \rightarrow k}^{\dag } p(x_{\mathcal {L}(w)\cup w}; x_{\mathcal {R}(w)})\\ {}{} & {} = \Phi _{w \rightarrow k}\Phi _{w \rightarrow k}^{\dag } \Phi _{w \rightarrow k}\Psi _{w \rightarrow k} =\Phi _{w \rightarrow k}\Psi _{w \rightarrow k} = p, \end{aligned}$$

and thus (5) exactly holds for the constructed \(\{G_i\}_{i =1}^{d}\).

Lastly, we prove that the solution \(\{G_i\}_{i =1}^{d}\) forms a TTNS tensor core of \(p\). To show this result, it will be much more convenient to use the notion of subgraph TTNS function in Definition 11. We remark that the construction in Definition 11 is only arithmetic and has no dependency on this theorem. For every node \(k \in [d]\), define a subset \(\mathcal {S}_{k}:= \mathcal {L}(k) \cup \{k\}\). Then, for non-root \(k\), we prove that \(\Phi _{k \rightarrow \mathcal {P}(k)}\) is the subgraph TTNS function over \(\{G_{i}\}_{i = 1}^{d}\) and \(T_{\mathcal {S}_{k}}\), i.e. one wishes to show

$$\begin{aligned} \Phi _{k \rightarrow \mathcal {P}(k)}(x_{\mathcal {L}(k) \cup k}, \alpha _{k \rightarrow \mathcal {P}(k)}) = \sum _{\begin{array}{c} \alpha _{e} \\ e \not = (k, \mathcal {P}(k)) \end{array}} \prod _{i \in \mathcal {L}(k) \cup k} G_{i}\left( x_{i}, \alpha _{(i, \mathcal {N}(i))}\right) . \end{aligned}$$
(50)

We prove (50) by induction. Notice that (5) proves (50) when \(k\) is a leaf node. Then, suppose that \(k\) is non-leaf and suppose by induction that \(\Phi _{w \rightarrow k}\) satisfies (50) for all \(w \in \mathcal {C}(k)\). Then, one can rewrite (5) by plugging in \(\Phi _{\mathcal {C}(k) \rightarrow k}\) the form of each \(\Phi _{w \rightarrow k}\) according to (50). The resulting equation is exactly (50) for \(\Phi _{k \rightarrow \mathcal {P}(k)}\). By induction over nodes by topological order, (50) holds for every non-root \(k\).

By the same logic, now consider (5) when \(k\) is the root. One plugs in \(\Phi _{\mathcal {C}(k) \rightarrow k}\) the form of each \(\Phi _{w \rightarrow k}\) according to (50). The resulting equation is exactly (3) in Definition 5, and thus \(\{G_i\}_{i =1}^{d}\) does form a TTNS tensor core of \(p\). \(\square \)

Appendix B Proof of Theorem 9

Proof

(of Theorem 9) For any non-root k, note that \(Z^{\star }_k\) is assumed to be of rank \(r_{(k, \mathcal {P}(k))}\) by (ii) in Condition 3. Let \(Q^{\star }_k\) be as in (16). In other words, \(Q^{\star }_k(\gamma _{(k, \mathcal {P}(k))}; \alpha _{(k, \mathcal {P}(k))})\) is formed by the rank-\(r_{(k, \mathcal {P}(k))}\) SVD decomposition of \(Z^{\star }_k\) in the SystemForming step of Algorithm 1. Thus \(Q^{\star }_k(\gamma _{(k, \mathcal {P}(k))}; \alpha _{(k, \mathcal {P}(k))})\) is of rank \(r_{(k, \mathcal {P}(k))}\), which means it has full column rank. We define

$$\begin{aligned} \Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}(x_{\mathcal {L}(k) \cup k}, \alpha _{(k, \mathcal {P}(k))}) := \sum _{\gamma _{(k, \mathcal {P}(k))}} \bar{\Phi }^{\star }_k(x_{\mathcal {L}(k) \cup k}, \gamma _{(k, \mathcal {P}(k))}) Q^{\star }_k(\gamma _{(k, \mathcal {P}(k))}, \alpha _{(k, \mathcal {P}(k))}). \end{aligned}$$
(51)

Due to (i) in Condition 4 and \(Q_{k}\) having full rank, one can conclude that \(\Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}(x_{\mathcal {L}(k) \cup k}; \alpha _{(k, \mathcal {P}(k))})\) and \(\Phi ^{\Delta }_{k \rightarrow \mathcal {P}(k)}(x_{\mathcal {L}(k) \cup k}; \alpha _{(k, \mathcal {P}(k))})\) have the same column space. Thus, there exists \(\Psi ^{\star }_{(w, k)}\)’s such that \(\{\Phi ^{\star }_{(w, k)}, \Psi ^{\star }_{(w, k)}\}_{(w, k) \in E}\) forms a collection of the low-rank decomposition of \(p^{\star }\) in the sense of Condition 1. We make the following claim, which also justifies \(\star \) upper-index in (51):

Claim: \(\{\Phi ^{\star }_{(w, k)}\}_{(w, k) \in E}\) as defined in (51) satisfies Condition 2

The proof of the claim is somewhat technical and we will defer the proof after stating how it proves this theorem.

Assume the claim is correct and \(\{\Phi ^{\star }_{(w, k)}\}_{(w, k) \in E}\) satisfies Condition 2. As a consequence, if one defines \(\{A^{\star }_i, B^{\star }_i\}\) by

$$\begin{aligned} \{A^{\star }_i, B^{\star }_i\}_{i = 1}^{d} \leftarrow {SystemForming}(\{Z^{\star }_{w \rightarrow k}\}_{w \rightarrow k \in E}, \{Z^{\star }_i\}_{i = 1}^{d}),\end{aligned}$$

then one can alternatively define \(\{A^{\star }_i, B^{\star }_i\}_{i = 1}^{d}\) by (22) with

$$\begin{aligned}\{\Phi _{(w, k)}^\Delta \}_{(w, k) \in E} \leftarrow \{\Phi _{(w, k)}^\star \}_{(w, k) \in E}.\end{aligned}$$

Thus, with the alternative definition in (22), it follows that (23) is a (possibly over-determined) linear system formed by a linear projection of the linear system in (5), where chosen gauge is \(\{\Phi ^{\star }_{(w, k)}\}_{(w,k) \in E}\).

Due to Theorem 7, (5) is an over-determined linear system with a unique and exact solution. Theorem 7 guarantees an exact solution \(\{G^{\star }_i\}_{i = 1}^{d}\) to (5), which is then necessarily an exact solution to (23). If ( 23) satisfies the uniqueness of the solution, then the solution to (23) is a solution to (5), which by Theorem 7 forms a TTNS tensor core of \(p^\star \). Therefore, it suffices to check uniqueness. The uniqueness of solution to (23) when \(k\) is a leaf is trivial. When \(k\) is non-leaf, note that one can apply (iii) in Condition 3 with \(\{\Phi _{(w, k)}^\star \}_{(w, k) \in E}\) as the chosen gauge, which guarantees that \(A^{\star }_k(\beta _{(k, \mathcal {C}(k))}, \alpha _{(k, \mathcal {C}(k))})\) has the full column rank for every non-leaf k. In other words, one is guaranteed the uniqueness of the solution to (23), as desired. For the assertion on the consistency of \(\hat{G}_{k}\), note that \(\lim _{N \rightarrow \infty } \hat{G}_{k} = G^{\star }_{k}\) follows from the fact that \(\lim _{N \rightarrow \infty } \hat{A}_{k} = A^{\star }_{k}\) and \(\lim _{N \rightarrow \infty } \hat{B}_{k} = B^{\star }_{k}\).

We now prove that \(\{\Phi ^{\star }_{(w, k)}\}_{(w, k) \in E}\) satisfies Condition 2. For a clear exposition, we adopt the unfolding 3-tensor structure developed in Sect. 8- 8. We remark that the 3-tensor construction is only arithmetic and does not depend on the validity of this theorem.

For \(Z^{\star }_k\), we reshape it as \(Z^{\star }_{k}(\beta _{(k,\mathcal {C}(k))}; x_{k}; \gamma _{(k, \mathcal {P}(k))})\). For \(U^{\star }_{k}\), we reshape it as \(U^{\star }_{k}(\beta _{(k,\mathcal {C}(k))}; x_{k}; \alpha _{(k, \mathcal {P}(k))})\). For \(Q^{\star }_{k}\), we reshape it as \(Q^{\star }_{k}(\gamma _{(w, k)}; 1;\alpha _{(w, k)} )\). For \(S_{k}, T_{k}\), we reshape as \(S_{k}(\beta _{(k,\mathcal {C}(k))};1; x_{\mathcal {L}(k)})\) and \( T_{k}(x_{\mathcal {R}(k)}; 1; \gamma _{(k, \mathcal {P}(k))})\). For \(\Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}\), we reshape it as \(\Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}(x_{\mathcal {L}(k)}; x_{k}; \alpha _{(k, \mathcal {P}(k))})\). For \(\bar{\Phi }^{\star }_k(x_{\mathcal {L}(k) \cup k}, \gamma _{(k, \mathcal {P}(k))})\), we reshape it as \(\bar{\Phi }^{\star }_k(x_{\mathcal {L}(k)}; x_{k}; \gamma _{(k, \mathcal {P}(k))})\). For \(p^{\star }\), we reshape it as \(p^{\star }(x_{\mathcal {L}(k)}; x_{k}; x_{\mathcal {R}(k)})\).

Then, by the construction of \(Q^{\star }_{k}\) in the SystemForming step of Algorithm 1, one has \(U_{k}^{\star } = Z_k^{\star } \circ Q^{\star }_{k}\). By (51), it follows \(\Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}:= \bar{\Phi }^{\star }_k \circ Q^{\star }_{k}\). Condition 2 is satisfied if \( U_{k}^{\star } = S_{k} \circ \Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}.\) With such a choice of unfolding 3-tensor, one obtains a simple proof as follows

$$\begin{aligned} U_{k}^\star = Z_k^{\star } \circ Q^{\star }_{k} = S_k \circ p^\star \circ T_k \circ Q^{\star }_{k} = S_{k} \circ \bar{\Phi }^{\star }_k \circ Q^{\star }_{k} = S_{k} \circ \Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}, \end{aligned}$$

where the first equality comes from \(Z_k^{\star } =S_k \circ p^\star \circ T_k\), the second equality comes from \(\bar{\Phi }^{\star }_k = p^\star \circ T_k\), and the third equality comes from \(\Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}= \bar{\Phi }^{\star }_k \circ Q^{\star }_{k}\). Thus the claim holds and we are done.

\(\square \)

Appendix C Proof of Lemma 13

Lemma 17

Suppose p satisfies the Markov property given a rooted tree ([d], E). For any subsets \(\mathcal {S}_1 \subset \mathcal {L}(k) \cup k\) and \(\mathcal {S}_2 \subset \mathcal {R}(k)\),

  1. (i)

    \(\mathcal {M}_{\mathcal {S}_1 \cup \mathcal {S}_2} p(x_{\mathcal {S}_1}; x_{\mathcal {S}_2})\) and \(\mathcal {M}_{\mathcal {S}_1 \cup \mathcal {P}(k)} p(x_{\mathcal {S}_1}; x_{\mathcal {P}(k)})\) have the same column space if \(\mathcal {P}(k) \in \mathcal {S}_2\),

  2. (ii)

    \(\mathcal {M}_{\mathcal {S}_1 \cup \mathcal {S}_2} p(x_{\mathcal {S}_1}; x_{\mathcal {S}_2})\) and \(\mathcal {M}_{k \cup \mathcal {S}_2} p(x_k; x_{\mathcal {S}_2})\) have the same row space if \(k \in \mathcal {S}_1\).

Proof

Define a conditional probability tensor as follows:

$$\begin{aligned} \mathcal {M}_{\mathcal {S}_1 | \mathcal {S}_2} p(x_{\mathcal {S}_1}, x_{\mathcal {S}_2}) := \mathbb {P}_{X \sim p}\left[ X_{\mathcal {S}_1} = x_{\mathcal {S}_1} | X_{\mathcal {S}_2} = x_{\mathcal {S}_2}\right] . \end{aligned}$$

Due to the conditional independence property for graphical models, one can write

$$\begin{aligned} \mathcal {M}_{\mathcal {S}_1 \cup \mathcal {S}_2} p(x_{\mathcal {S}_1}, x_{\mathcal {S}_2})&= \mathcal {M}_{\mathcal {S}_1 | \mathcal {P}(k)} p(x_{\mathcal {S}_1}, x_{\mathcal {P}(k)}) \mathcal {M}_{\mathcal {P}(k)} p(x_{\mathcal {P}(k)}) \mathcal {M}_{\mathcal {S}_2 | \mathcal {P}(k)} p(x_{\mathcal {S}_2 \backslash \mathcal {P}(k)}, x_{\mathcal {P}(k)}) \\&= \mathcal {M}_{\mathcal {S}_1 \cup \mathcal {P}(k)} p(x_{\mathcal {S}_1}, x_{\mathcal {P}(k)}) \mathcal {M}_{\mathcal {S}_2 | \mathcal {P}(k)} p(x_{\mathcal {S}_2 \backslash \mathcal {P}(k)}, x_{\mathcal {P}(k)}) \end{aligned}$$

Thus, the column space of \(\mathcal {M}_{\mathcal {S}_1 \cup \mathcal {S}_2} p(x_{\mathcal {S}_1}; x_{\mathcal {S}_2})\) depends solely on \(\mathcal {M}_{\mathcal {S}_1 \cup \mathcal {P}(k)}(x_{\mathcal {S}_1}; x_{\mathcal {P}(k)})\). Therefore, (i) holds.

Similarly,

$$\begin{aligned} \mathcal {M}_{\mathcal {S}_1 \cup \mathcal {S}_2} p(x_{\mathcal {S}_1}, x_{\mathcal {S}_2}) = \mathcal {M}_{\mathcal {S}_1 | k} p(x_{\mathcal {S}_1 \backslash k} , x_{k}) \mathcal {M}_{k \cup \mathcal {S}_2 } p(x_{k}, x_{\mathcal {S}_2}), \end{aligned}$$

which shows that the row space of \(\mathcal {M}_{\mathcal {S}_1 \cup \mathcal {S}_2} p(x_{\mathcal {S}_1}; x_{\mathcal {S}_2})\) depends solely on \(\mathcal {M}_{k \cup \mathcal {S}_2 } p(x_k;x_{\mathcal {S}_2})\). Therefore, (ii) holds. \(\square \)

Proof

(of Lemma 13) We will verify that Condition 3 holds. The Markov sketch function is quite special, and we often refer to a concept of natural identification. To make this concept rigorous, if two matrices \(A(x;y)\) and \(A'(z;w)\) are said to have a natural identification, it then means that \(A(x;y) = A'(z;w)\) entry-wise as matrices. In particular, if one has a natural identification \(A(x;y) = A'(z;w)\), then \(A\) and \(A'\) share column space, row space, and rank.

By the property of the right sketch function in the Markov sketch function, one has the natural identification \(\bar{\Phi }^\star _k(x_{\mathcal {L}(k) \cup k}; \gamma _{(k, \mathcal {P}(k))}) = \mathcal {M}_{\mathcal {L}(k) \cup k \cup \mathcal {P}(k)}p^\star (x_{\mathcal {L}(k) \cup k}; x_{\mathcal {P}(k)})\). Lemma 17 then shows that the column space of \(\bar{\Phi }^\star _k(x_{\mathcal {L}(k) \cup k}; \gamma _{(k, \mathcal {P}(k))})\) equals to that of \(p^\star (x_{\mathcal {L}(k) \cup k}; x_{\mathcal {R}(k)})\). By Condition 1, the column space of \(p^\star (x_{\mathcal {L}(k) \cup k}; x_{\mathcal {R}(k)})\) equals to that of any \(\Phi ^{\Delta }_{(w, k)}(x_{\mathcal {L}(k) \cup k}; \alpha _{(k, \mathcal {P}(k))})\), and so (i) holds.

Similarly, due to the Markov sketch function, one has the natural identification

$$\begin{aligned} Z_{k}^\star (\beta _{(k,\mathcal {C}(k))}, x_{k}; \gamma _{(k, \mathcal {P}(k))}) = \mathcal {M}_{\mathcal {C}(k) \cup k \cup \mathcal {P}(k)}p(x_{\mathcal {C}(k) \cup k}; x_{\mathcal {P}(k)}). \end{aligned}$$

By Lemma 17 and the natural identification of \(Z_{k}^\star \), for any every non-leaf and non-root k, it follows that \(Z_{k}^\star \) has the same row space as that of

$$\begin{aligned} \bar{\Phi }^\star _k(x_{\mathcal {L}(k) \cup k}; \gamma _{(k, \mathcal {P}(k))}) = \mathcal {M}_{\mathcal {L}(k) \cup k \cup \mathcal {P}(k)}p^\star (x_{\mathcal {L}(k) \cup k}; x_{\mathcal {P}(k)}). \end{aligned}$$

Hence, the rank of \(Z_{k}^\star \) equals to the rank of \(\bar{\Phi }^\star _k\). By Lemma 17, the column space of \(\bar{\Phi }^\star _k\) equals to the column space of \(p^\star (x_{\mathcal {L}(k) \cup k}; x_{\mathcal {R}(k)})\). Thus, the rank of \(Z_{k}^\star \) equals to \(r_{(k, \mathcal {P}(k))}\) and (ii) holds.

Because (i) and (ii) hold, the proof of Theorem 9 actually shows that there exists a gauge \(\{\Phi ^{\star }_{(w, k)}\}_{(w, k) \in E}\) which satisfies Condition 2. To verify (iii), it suffices to check (iii) for the gauge \(\{\Phi ^{\star }_{(w, k)}\}_{(w, k) \in E}\) because \(A^{\star }_{k}\) having full column rank leads to any \(A^{\Delta }_k\) having full column rank.

Moreover, it suffices to show that each \(A^{\star }_{w \rightarrow k}(\beta _{(w, k)}; \alpha _{(w, k)})\) has full column rank of \(r_{w \rightarrow k}\). If this holds, then it follows that \(A^{\star }_k = \bigotimes _{w \in \mathcal {C}(k)} A^{\star }_{w \rightarrow k}\) has full column rank of \(\prod _{w \in \mathcal {C}(k)} r_{w \rightarrow k}\). By the SVD step in SystemForming, recall that the column space of \(Q_{w}^\star (\gamma _{(w,k)}; \alpha _{(w,k)})\) is the same as the column space of \(\left( Z^\star _{w}\right) ^\top (\gamma _{(w, k)}; \beta _{(w,\mathcal {C}(w))}, x_{w})\). By the natural identification of

$$\begin{aligned} \left( Z^\star _{w}\right) ^\top (\gamma _{(w, k)};x_{w}, \beta _{(w,\mathcal {C}(w))}) = \mathcal {M}_{k \cup w \cup \mathcal {C}(w)}p(x_{k}; x_{w \cup \mathcal {C}(w)}),\end{aligned}$$

we know that the column space of \(Q_{w}^\star (\gamma _{(w,k)}; \alpha _{(w,k)})\) is the same as that of \(\mathcal {M}_{k \cup w \cup \mathcal {C}(w)}p(x_{k}; x_{w \cup \mathcal {C}(w)})\). By Lemma 17, it then follows that the column space of \(Q_{w}^\star (\gamma _{(w,k)}; \alpha _{(w,k)})\) coincides with that of \(\mathcal {M}_{k \cup w}p(x_k; x_w)\).

Moreover, \(Z^{\star }_{w \rightarrow k}\) has the natural identification \(Z^{\star }_{w \rightarrow k}(\beta _{(w,k)}; \gamma _{(w, k)}) = \mathcal {M}_{w \cup k}p(x_w; x_k)\), and so the column space of \(Z^{\star }_{w \rightarrow k}(\beta _{(w,k)}; \gamma _{(w, k)})\) coincides with that of \( \mathcal {M}_{w \cup k}p(x_w; x_k)\).

By (17), one has

$$\begin{aligned} A^{\star }_{w \rightarrow k}(\beta _{(w,k)}; \alpha _{(w, k)}) = Z^{\star }_{w \rightarrow k}(\beta _{(w,k)}; \gamma _{(w, k)})Q^{\star }_{w}(\gamma _{(w, k)}; \alpha _{(w, k)}), \end{aligned}$$

and so the column space of \(A^{\star }_{w \rightarrow k}\) coincides with that of

$$\begin{aligned}\mathcal {M}_{w \cup k}p(x_w; x_k)\mathcal {M}_{k \cup w}p(x_k; x_w) = \mathcal {M}_{w \cup k}p(x_w; x_k)\left( \mathcal {M}_{w \cup k}p(x_w; x_k)\right) ^\top .\end{aligned}$$

Thus, the rank of \(A^{\star }_{w \rightarrow k}\) coincides with that of \(\mathcal {M}_{w \cup k}p(x_w; x_k)\left( \mathcal {M}_{w \cup k}p(x_w; x_k)\right) ^\top \), which in turn coincides with the rank of \(\mathcal {M}_{w \cup k}p(x_w; x_k)\). By applying Lemma 17, the rank of \(\mathcal {M}_{w \cup k}p(x_w; x_k)\) equals to \(r_{(w,k)}\), and so (iii) holds.

\(\square \)

Appendix D Proof of Theorem 14

After applying the left and right sketching, one has the following form on \(Z^\star _{k}\):

$$\begin{aligned} Z^\star _{k}(x_{k}, \beta _{(k, \mathcal {N}(k))}) = \sum _{\begin{array}{c} \beta _{e}\\ k \not \in e \end{array}} \sum _{\begin{array}{c} x_{i} \\ i \not = k \end{array}}p^\star (x_{1}, \ldots , x_{d})\prod _{i \not = k}s_{i}(x_{i}, \beta _{(i, \mathcal {N}(i))}). \end{aligned}$$
(52)

Let \(\mathcal {S}_{k}:= [d] - \{k\}\), and let \(T_{\mathcal {S}_{k}}\) be the subgraph of \(T\) with vertex set being \(\mathcal {S}_{k}\). Using the definition of subgraph TTNS function in Definition 11, define a tensor

$$\begin{aligned}H_{k} :\prod _{i \in [d], i \not = k}[n_{i}] \times \prod _{w \in \mathcal {N}(k)}[\beta _{(w, k)}] \rightarrow \mathbb {R}\end{aligned}$$

as the subgraph TTNS function over \(\{s_{i}\}_{i \not = k}\) and \(T_{\mathcal {S}_{k}}\), i.e.

$$\begin{aligned} H_{k}(x_{[d] - \{k\}}, \beta _{(k, \mathcal {N}(k))}) = \sum _{\begin{array}{c} \alpha _{e} \\ k \not \in e \end{array}} \prod _{i \not = k} s_{i}\left( x_{i}, \beta _{(i, \mathcal {N}(i))}\right) . \end{aligned}$$
(53)

Then (52) is equivalent to the following equation:

$$\begin{aligned} Z^\star _{k}(x_{k}, \beta _{(k, \mathcal {N}(k))}) = \sum _{\begin{array}{c} x_{w} \\ w \not = k \end{array}}p^\star (x_{1}, \ldots , x_{d})H_{k}(x_{[d] - \{k\}}, \beta _{(k, \mathcal {N}(k))}). \end{aligned}$$
(54)

From (53), one sees that \(H_{k}\) is multi-linear in \(\{s_{i}\}_{i \not = k}\). We thus can apply the binomial theorem to derive a structural form on \(H_{k}\) as a sum of secondary terms. To do so, let \(\mathcal {S}\) be an arbitrary subset of \(\mathcal {S}_{k}\), and define a tensor

$$\begin{aligned}H_{k; \mathcal {S}} :\prod _{i \in [d], i \not = k}[n_{i}] \times \prod _{w \in \mathcal {N}(k)}[\beta _{(w, k)}] \rightarrow \mathbb {R}\end{aligned}$$

as the subgraph TTNS function over \(T_{\mathcal {S}_{k}}\) and \(\{\Delta _{i}\}_{i \in \mathcal {S}_{k}} \cup \{O_{j}\}_{j \in \mathcal {S}_{k} - \mathcal {S}}\), i.e.

$$\begin{aligned} H_{k; \mathcal {S}}(x_{[d] - \{k\}}, \beta _{(k, \mathcal {N}(k))}) = \sum _{\begin{array}{c} \alpha _{e} \\ k \not \in e \end{array}} \prod _{i \in \mathcal {S}} \Delta _{i}\left( x_{i}, \beta _{(i, \mathcal {N}(i))}\right) \prod _{j \in \mathcal {S}_{k} - \mathcal {S}} O_{j}\left( x_{j}, \beta _{(j, \mathcal {N}(j))}\right) . \end{aligned}$$

We now use the fact that \(O_{j}\left( x_{j}, \beta _{(j, \mathcal {N}(j))}\right) = 1\) in Condition 6, and so

$$\begin{aligned} H_{k; \mathcal {S}}(x_{[d] - \{k\}}, \beta _{(k, \mathcal {N}(k))}) = \sum _{\begin{array}{c} \alpha _{e} \\ k \not \in e \end{array}} \prod _{i \in \mathcal {S}} \Delta _{i}\left( x_{i}, \beta _{(i, \mathcal {N}(i))}\right) = \sum _{\beta _{e}, k \not \in e}\Delta _{\mathcal {S}}(x_{\mathcal {S}}, \beta _{\partial \mathcal {S}}), \end{aligned}$$
(55)

where the second equality follows from the Definition of \(\Delta _{\mathcal {S}}\) in (37).

By applying the binomial theorem over the fact that \(s_{i} = \epsilon \Delta _{i} + O_{i}\), one sees that \(H_{k}\) is a sum of \(2^{d - 1}\) terms, each of which formed by corresponding to one \(H_{k; \mathcal {S}}\), i.e.

$$\begin{aligned} H_{k}(x_{[d] - \{k\}}, \beta _{(k, \mathcal {N}(k))}) = \sum _{l = 0}^{d-1} \epsilon ^{l} \sum _{\mathcal {S} \subset [d] - \{k\}, |\mathcal {S}| = l}H_{k; \mathcal {S}}(x_{[d] - \{k\}}, \beta _{(k, \mathcal {N}(k))}). \end{aligned}$$
(56)

Define \(Z^{\star }_{k; \mathcal {S}}\) as the following tensor:

$$\begin{aligned} Z^\star _{k}(x_{k}, \beta _{(k, \mathcal {N}(k))}) := \sum _{\begin{array}{c} x_{w} \\ w \not = k \end{array}}p^\star (x_{1}, \ldots , x_{d})H_{k; \mathcal {S}}(x_{[d] - \{k\}}, \beta _{(k, \mathcal {N}(k))}). \end{aligned}$$
(57)

The proof that \(Z^{\star }_{k; \mathcal {S}}\) satisfies (39) is a simple result of exchanging summation order:

$$\begin{aligned}&\sum _{\begin{array}{c} x_{w} \\ w \not = k \end{array}}p^\star (x_{1}, \ldots , x_{d})H_{k; \mathcal {S}}(x_{[d] - \{k\}}, \beta _{(k, \mathcal {N}(k))})\\ =&\sum _{\begin{array}{c} x_{w} \\ w \in \mathcal {S} \end{array}} \left( \sum _{\begin{array}{c} x_{w} \\ w \in \mathcal {S}_{k} - \mathcal {S} \end{array}}p^{\star }(x_{1}, \ldots , x_{d})\right) \left( \sum _{\beta _{e}, k \not \in e}\Delta _{\mathcal {S}}(x_{\mathcal {S}}, \beta _{\partial \mathcal {S}})\right) \\ =&\sum _{\beta _{e}, k \not \in e}\left( \sum _{x_{\mathcal {S}}}\mathcal {M}_{\mathcal {S} \cup \{k\}} p^{\star }(x_{k},x_{\mathcal {S}})\Delta _{\mathcal {S}}(x_{\mathcal {S}}, \beta _{\partial \mathcal {S}})\right) . \end{aligned}$$

Due to the linear relationship between \(H_k\) and \(Z^{\star }_{k}\) in (54), it follows that the structural form of \(H_{k}\) in (56) leads to the structural form for \(Z^{\star }_{k}\) in (38), as desired.

Appendix E Sample complexity bound of TTNS-Sketch

This section gives an upper bound for the sample complexity of TTNS-Sketch when the sketch functions satisfies recursive sketching in the sense of Condition 4. This setting is considered because one can use the alternative definition of \(A_{k}\) in Proposition 28, which simplifies the analysis. As an application, we obtain a sample complexity bound to the simple case where \(p^{\star }\) is a graphical model over a tree \(T\), and the sketching function is the Markov sketch function.

We give a summary of organization of this section. In Sect. 8, we introduce notations and conventions which are important for sample complexity analysis. In Sect. 8, we prove small perturbations of the cores lead to small perturbations of the obtained TTNS ansatz. In Sect. 8, we prove that small error in the estimator \(\hat{Z}_k\) leads to small perturbation of the cores, leading to an upper bound for the sample complexity of TTNS-Sketch in Theorem 34. In Sect. 8, we give a proof of all the lemmas and corollaries. In Sect. 8, we remark how our derived results can be extended sample complexity bounds for total variation distance.

1.1 E.1 Preliminaries

In what follows, for a given vector v, let \(\Vert v\Vert \) and \(\Vert v\Vert _\infty \) denote its Euclidean norm and its supremum norm, respectively. For any matrix M, denote its spectral norm, Frobenius norm, and the r-th singular value by \(\Vert M\Vert \), \(\Vert M\Vert _F\), and \(\sigma _r(M)\), respectively. Also, for a generic tensor \(p\), let \(\Vert p\Vert _\infty \) denote the largest absolute value of the entries of p. Lastly, the orthogonal group in dimension r is denoted by \({\text {O}}(r)\).

A mathematical structure important in this section is 3-tensors. Similar to unfolding matrix in Definition 6, the 3-tensors we typically use come from viewing high-dimensional tensors in terms of \(3\)-tensors by grouping joint variables:

Definition 18

(Unfolding 3-tensor Notation) For a generic \(d\)-dimensional tensor \(p :[n_1] \times \cdots \times [n_d] \rightarrow \mathbb {R}\) and for three disjoint subsets \(\mathcal {U}, \mathcal {V}, \mathcal {W}\) with \(\mathcal {U} \cup \mathcal {V} \cup \mathcal {W} = [d]\), we define the corresponding unfolding 3-tensor by \(p(x_{\mathcal {U}}; x_{\mathcal {V}}; x_{\mathcal {W}})\). The 3-tensor \(p(x_{\mathcal {U}}; x_{\mathcal {V}}; x_{\mathcal {W}})\) is of size \(\left[ \prod _{i \in \mathcal {U}}n_{i}\right] \times \left[ \prod _{j \in \mathcal {V}}n_{j}\right] \times \left[ \prod _{k \in \mathcal {W}}n_{k}\right] \rightarrow \mathbb {R}\).

It is helpful to introduce a slice of the 3-tensor. In our convention, we only need to consider taking slice at the second component:

Definition 19

(Middle index slice of 3-tensor) For any 3-tensor \(G :[r_{1}] \times [n_{1}] \times [r_{2}] \rightarrow \mathbb {R}\), we use \(G(\cdot , x, \cdot ) :[r_{1}] \times [r_{2}] \rightarrow \mathbb {R}\) to denote an \(r_{1} \times r_{2}\) matrix obtained by fixing the second slot of \(G\) to be \(x\).

In Definition 20- 22, we introduce a new norm and two operations for 3-tensors.

Definition 20

( norm for 3-tensors) Define the norm by

(58)

Definition 21

(contraction operator for 3-tensors) Let \(G :[r_{1}] \times [n_{1}] \times [r_{2}] \rightarrow \mathbb {R}, G' :[r_{3}] \times [n_{2}] \times [r_{4}] \rightarrow \mathbb {R}\) be two 3-tensors. Under the assumption \(r_{2} = r_{3}\), define the 3-tensor \(G \circ G' :[r_{1}] \times [n_{1} \times n_{2}] \times [r_{4}] \rightarrow \mathbb {R}\) by

$$\begin{aligned} G \circ G'(\alpha ; (x, y); \gamma ) = \sum _{\beta \in [r_{2}]} G(\alpha , x, \beta ) G'(\beta , y, \gamma ). \end{aligned}$$
(59)

Definition 22

(tensor product operator for 3-tensors) Let \(G :[r_{1}] \times [n_{1}] \times [r_{2}] \rightarrow \mathbb {R}, G' :[r_{3}] \times [n_{2}] \times [r_{4}] \rightarrow \mathbb {R}\) be two 3-tensors. Define the 3-tensor \(G \otimes G' :[r_{1} \times r_{3}] \times [n_{1} \times n_{2}] \times [r_{2} \times r_{4}] \rightarrow \mathbb {R}\) by

$$\begin{aligned} G \otimes G'((\alpha , \beta ); (x, y); (\gamma , \theta )) = G(\alpha , x, \beta ) G'(\gamma , y, \theta ). \end{aligned}$$
(60)

We summarize the simple properties of the defined operation in Lemma 23, which will be useful for our derivations:

Lemma 23

The following results hold:

  1. (i)

    Associativity of \(\circ \) holds:

    $$\begin{aligned} (G \circ G') \circ G'' = G \circ (G' \circ G''). \end{aligned}$$
    (61)
  2. (ii)

    Associativity of \(\otimes \) holds:

    $$\begin{aligned} (G \otimes G') \otimes G'' = G \otimes (G' \otimes G''). \end{aligned}$$
    (62)
  3. (iii)

    Inequality of \(\circ \) under norm:

    (63)
  4. (iv)

    Equality of \(\otimes \) under norm:

    (64)
  5. (v)

    For a three tensor \(G(\alpha , x, \beta ) :[r_{1}] \times [n] \times [r_{2}] \rightarrow \mathbb {R}\), denote \(G(\alpha , x; \beta ):[r_{1}n] \times [r_{2}] \rightarrow \mathbb {R}\) as the unfolding matrix by grouping the first and second index of \(G\). One has

    (65)

As a consequence of associativity, given any collection of 3-tensors \(\{G_{i}\}_{i=1}^{d}\), one can define \(G_{1} \otimes G_{2} \otimes \ldots \otimes G_{d}\). Moreover, if the collection is such that the size of the third index of \(G_{i}\) coincides with the first index of \(G_{i+1}\), then one can naturally define the 3-tensor \(G_{1} \circ G_{2} \circ \ldots \circ G_{d}\).

1.2 E.2 3-tensor structure for TTNS

For cleaner analysis, one often gives unfolding matrices a 3-tensor structure:

Definition 24

(3-tensor structure for unfolding matrix) Consider a generic \(D\)-dimensional tensor \(f :[n_1] \times \cdots \times [n_D] \rightarrow \mathbb {R}\). Moreover, suppose one picks a disjoint union \(\mathcal {U} \cup \mathcal {V} = [D]\) and forms an unfolding matrix \(f(x_{\mathcal {U}}; x_{\mathcal {V}})\) in the sense of Definition 6. Define \(f(x_{\mathcal {U}}; 1; x_{\mathcal {V}})\) as the 3-tensor of size \(\left[ \prod _{i \in \mathcal {U}}n_{i}\right] \times \left\{ 1\right\} \times \left[ \prod _{j \in \mathcal {V}}n_{j}\right] \rightarrow \mathbb {R}\). One likewise defines 3-tensor structure of \(f(1; x_{\mathcal {U}}; x_{\mathcal {V}})\), whose first index is of size \(1\), and \(f( x_{\mathcal {U}}; x_{\mathcal {V}}; 1)\), whose third index is of size \(1\).

A tensor core from a TTNS ansatz has a default 3-tensor view, whereby the indices are grouped according to tree topology:

Definition 25

(3-tensor structure for TTNS tensor cores) Suppose a tensor \(p\) is defined by a collection of tensor cores \(\{G_i\}_{i = 1}^{d}\) in the sense of Definition 5.

If \(k\) is neither a root node nor a leaf node, then \(G_{k}\) is viewed with the 3-tensor unfolding structure

$$\begin{aligned} G_k(\alpha _{(k, \mathcal {C}(k))}; x_k; \alpha _{(k, \mathcal {P}(k))}) :\left[ \prod _{w \in \mathcal {C}(k)} r_{(w, k)}\right] \times [n_{k}] \times [r_{(k, \mathcal {P}(k))}] \rightarrow \mathbb {R}. \end{aligned}$$

If \(k\) is a leaf node, then \(G_{k}\) is viewed with the 3-tensor unfolding structure

$$\begin{aligned} G_k(1; x_k; \alpha _{(k, \mathcal {P}(k))}) :\left\{ 1\right\} \times [n_{k}] \times [r_{(k, \mathcal {P}(k))}] \rightarrow \mathbb {R}. \end{aligned}$$

If \(k\) is the root node, then \(G_{k}\) is viewed with the 3-tensor unfolding structure

$$\begin{aligned} G_k(\alpha _{(k, \mathcal {C}(k))}; x_k; 1) :\left[ \prod _{w \in \mathcal {C}(k)} r_{(w, k)}\right] \times [n_{k}] \times \left\{ 1\right\} \rightarrow \mathbb {R}. \end{aligned}$$

With this design of norms, one can prove the following result by simple algebra:

Lemma 26

Suppose a generic tensor \(p:[n_1] \times \cdots \times [n_d] \rightarrow \mathbb {R}\) is defined by a collection of tensor cores \(\{G_i\}_{i = 1}^{d}\) in the sense of Definition 5. Moreover, suppose the tensor cores are viewed by 3-tensor structures as in Definition 25. Then

Using Lemma 26, one can bound global errors by errors in tensor cores:

Lemma 27

In Lemma 26, let \(\Delta G_k\) be a perturbation of \(G_k\). Define a tensor \(p':[n_1] \times \cdots \times [n_d] \rightarrow \mathbb {R}\) by tensor cores \(\{G_k + \Delta G_k\}_{k = 1}^{d}\) in the sense of Definition 5, with the tree topology \(T\) and the internal rank \(\{r_{e}\}_{e \in E}\) the same as that of \(p\). Suppose for all \(k \in [d]\), and set \(\Delta p:= p' - p\). Then,

If \(\max _{k \in [d]} \delta _k \le \epsilon / (3 d)\) for some fixed \(\epsilon \in (0, 1)\),

(66)

1.3 E.3 Derivation for sample complexity of TTNS-Sketch

We first give a lemma which bounds the perturbation of solutions of a linear equation \(AX = B\), where in particular XB are two 3-tensors viewed under an unfolding matrix. This result will be the main building block to form our subsequent error analysis:

Lemma 28

Consider a matrix \(A^{\star }(\beta , \alpha ) \in \mathbb {R}^{l \times r}\) with \(\textrm{rank}(A^{\star }) = r \le l\) and a 3-tensor \(B^{\star }(\beta , x, \gamma )\in \mathbb {R}^{l \times n \times m}\) with unfolding matrix structure \(B^{\star }(\beta ; (x, \gamma )) \in \mathbb {R}^{l\times (nm)}\). Let \(X^{\star }(\alpha , x, \gamma ) \in \mathbb {R}^{r \times n \times m}\) be the 3-tensor with an unfolding matrix view \(X^{\star }(\alpha ; (x, \gamma )) \in \mathbb {R}^{r\times (nm)}\) which uniquely solves the linear equation \(A^{\star }X = B^{\star }\) in the sense of least squares:

$$\begin{aligned} \sum _{\alpha }A^{\star }(\beta , \alpha )X(\alpha , (x, \gamma )) = B^{\star }(\beta , (x, \gamma )). \end{aligned}$$

Moreover, let \(\Delta B^{\star }\in \mathbb {R}^{l \times n \times m}\) be a perturbation of \(B^{\star }\), and let \(\Delta A^{\star }\in \mathbb {R}^{l \times n \times m}\) be a perturbation of \(A^{\star }\) with \(\Vert \left( A^{\star }\right) ^{\dag }\Vert \Vert \Delta A^{\star }\Vert < 1\) so that \(\textrm{rank}(A^{\star } + \Delta A^{\star }) = n\). Then, let \(\Delta X^{\star }\) be a 3-tensor so that \(X^{\star } + \Delta X^{\star }\) which uniquely solves the linear equation \((A^{\star }+\Delta A^{\star })X = (B^{\star }+\Delta B^{\star })\) in the sense of least squares:

$$\begin{aligned} \sum _{\alpha } (A^{\star } + \Delta A^{\star })(\beta , \alpha )X(\alpha , (x, \gamma )) = (B^{\star } + \Delta B^{\star })(\beta , (x, \gamma )) \end{aligned}$$

Under the unfolding matrix structure, suppose the column space of \(B^{\star }\) is contained in that of \(A^{\star }\), i.e. \(X^{\star }\) solves \(A^{\star }X = B^{\star }\) exactly, one has

(67)

In particular, if for some constant \(\chi ,\) and \(\Delta A^{\star }\) satisfies \(\Vert \left( A^{\star }\right) ^{\dag }\Vert \Vert \Delta A^{\star }\Vert \le 1 / 2,\) then

(68)

For our use case of Lemma 28, the coefficient matrix is viewed as a Kronecker product of some smaller matrices, e.g. \(A^{\star }_{k}\) is formed by \(\{A^{\star }_{w \rightarrow k}\}_{w \in \mathcal {C}(k)}\). One can bound the \(\Vert \Delta A^{\star }\Vert \) in this case, as the following lemma shows:

Lemma 29

Consider a collection of matrices \(\{E_i,C_i\}_{i \in [n]}\) such that \(E_{i}, C_{i}\) are of the same shape. Moreover, let \(\Vert C_i\Vert \le 1, \Vert E_i\Vert \le \delta _i\). Then

$$\begin{aligned} \left\| \bigotimes _{i = 1}^{n} (C_i + E_i) - \bigotimes _{i = 1}^{n} C_i\right\| \le \left( \sum _{i = 1}^{n} \delta _i\right) \exp \left( \sum _{i = 1}^{n} \delta _i\right) . \end{aligned}$$

Lemma 28 and 29 leads to the proof strategy for obtaining sample complexity. With the particular perturbation \(\Delta p^{\star }:= \hat{p}- p^{\star }\), the terms \(\hat{A}_{k}\) and \(\hat{B}_{k}\) from the Algorithm 1 satisfies \(B_{k}^{\star } + \Delta B^{\star }_{k} = \hat{B}_{k}\) and \(A_{k}^{\star } + \Delta A^{\star }_{k} = \hat{A}_{k}\). The least-squares solution \(G^{\star }_k + \Delta G^{\star }_k\) to the perturbed equation is thus the actual output \(\hat{G}_{k}\) from Algorithm 1.

However, due to the SVD step that is involved in obtaining \(\hat{A}_{k}\) and \(\hat{B}_{k}\), one can only bound the sample estimation error in terms of the following alternative metric:

Definition 30

(\(\textrm{dist}(\cdot , \cdot )\) operator for matrices) For any matrices \(B, B^{\star } \in \mathbb {R}^{n \times m}\), define

$$\begin{aligned} \textrm{dist}(B, B^{\star }) := \min _{R \in {\text {O}}(m)} \Vert B - B^{\star } R\Vert . \end{aligned}$$

In other words, the finite sample estimate of \(\hat{A}_{k}, \hat{B}_{k}\) could be closer to a rotation of \(A^{\star }_{k}, B^{\star }_{k}\), which we will denote as \(A^{\circ }_{k}, B^{\circ }_{k}\). An error bound for of the type \(\textrm{dist}(B, B^{\star })\) exists through Wedin theorem, and thus the magnitude of \(B^{\circ }_{k} - \hat{B}_{k}\) can be bounded, and \(A^{\circ }_{k} - \hat{A}_{k}\) is bounded via ( 28). In Corollary 31- 32, we write out relaxed version of Wedin theorem and the Matrix Bernstein inequality, which we will use to analyze error terms of the type \(\Vert \Delta Z_{k}^{\star }\Vert , \Vert \Delta B_{k}^{\circ }\Vert \), respectively. As a summary of all previous result, in Theorem 33, we form a quite technical proof bounding the error on the rotated cores by the sample estimation error in sketching. In Theorem 34, we form the sample complexity of TTNS-Sketch.

Corollary 31

(Corollary to Wedin theorem, cf. Theorem 2.9 in [5]) Let \(Z^{\star } \in \mathbb {R}^{n \times m}\) be a matrix of rank \(r\) and \(\Delta Z^{\star } \in \mathbb {R}^{n \times m}\) be its perturbation with \(Z:= Z^{\star } + \Delta Z^{\star }\). Moreover, let \(B^{\star }, B \in \mathbb {R}^{n \times r}\) respectively be the first \(r\) left singular vectors of \(Z^{\star }, Z\). If \( \Vert \Delta Z^{\star }\Vert \le (1 - 1 / \sqrt{2}) \sigma _{r}(Z^{\star })\), then

$$\begin{aligned} \textrm{dist}(B, B^{\star }) \le \frac{2 \Vert \Delta Z^{\star }\Vert }{\sigma _{r}(Z^{\star })} \end{aligned}$$

Corollary 32

(Corollary to Matrix Bernstein inequality, cf. Corollary 6.2.1 in [29]) Let \(Z^{\star } \in \mathbb {R}^{n \times m}\) be a matrix, and let \(\{ Z^{(i)} \in \mathbb {R}^{n \times m}\}_{i =1}^{N}\) be a sequence of i.i.d. matrices with \(\mathbb {E}\left[ Z^{(i)}\right] = Z^{\star }\). Denote \(\hat{Z} = \frac{1}{N}\sum _{i = 1}^{N}Z^{(i)}\) and \(\Delta Z^{\star } = \hat{Z} - Z^{\star }\). Let the distribution of \(Z^{(i)}\) be such that there exists a constant \(L\) with \(||Z^{(i)}|| \le L\).

Let \(\gamma := \max {\left( \left\| \mathbb {E}\left[ Z^{(i)}\left( Z^{(i)}\right) ^{\top }\right] \right\| ,\left\| \mathbb {E}\left[ \left( Z^{(i)}\right) ^{\top }Z^{(i)}\right] \right\| \right) }\), and then

$$\begin{aligned} \mathbb {P}\left[ \Vert \Delta Z^{\star }\Vert \ge t \right] \le (m + n) \exp {\left( \frac{-Nt^2/2}{\gamma + 2Lt/3} \right) }. \end{aligned}$$

Using Jensen’s inequality, one has \(\gamma \le L^2\), and

$$\begin{aligned} \mathbb {P}\left[ \Vert \Delta Z^{\star }\Vert \ge t \right] \le (m + n) \exp {\left( \frac{-Nt^2/2}{L^2 + 2Lt/3} \right) }. \end{aligned}$$
(69)

Theorem 33

(Error bound over TTNS tensor cores) Let \(p^\star :[n_1] \times \cdots \times [n_d] \rightarrow \mathbb {R}\) be a density function satisfy the TTNS assumption in Condition 1. Fix a sketch function \(\{T_i, S_i\}_{i=1}^{d}\) which satisfies the recursive sketching assumption in Condition 4. Let \(\{A_i^\star , B_i^\star , G_i^\star , Z_{i}^{\star }\}_{i=1}^{d}\) be as in Theorem 9. Moreover, let \(\{\hat{A}_{i}, \hat{B}_{i}, \hat{G}_{i}, \hat{Z}_{i}\}_{i =1}^{d}\) be as in Algorithm 1 with \(\hat{p}\) as input. Suppose further that for some fixed \(\delta \in (0, 1)\), one has

$$\begin{aligned} \Vert Z^{\star }_{k} - \hat{Z}_{k} \Vert \le \zeta _{k} \delta , \end{aligned}$$
(70)

where \(\zeta _{k}\) is defined by a series of constants as follows:

$$\begin{aligned} \zeta _{k} := \left( 6 \frac{c_{\mathcal {C}}}{c_{k;Z}} \right) ^{-1}\xi ,\quad \xi := 1 \wedge \min _{i \in [d]} \left( 2 c_{i;A} \left( c_{i; S} + c_{i;G}\right) \right) ^{-1}, \end{aligned}$$
(71)

and the constants are defined as follows:

  • \(c_{\mathcal {C}} = \max _{i \in [d]}|\mathcal {C}(i)|\),

  • \(c_{k;Z} = 1\) when \(k = \text {root}\), and \(c_{k;Z} = \sigma _{r_{(k, \mathcal {P}(k))}}( Z_{k}^{\star }(\beta _{(k,\mathcal {C}(k))}, x_{k}; \gamma _{(k, \mathcal {P}(k))} ))\) otherwise.

  • ,

  • \(c_{k;A} = 1\) when \(k = \text {leaf}\), and \(c_{k;A} = \Vert \left( A^{\star }_{k}\right) ^{\dag }\Vert \) otherwise,

  • \(c_{k;S} = 1\) when \(k = \text {leaf}\), and \(c_{k; S} = \prod _{w \in \mathcal {C}(k)} ||s_{w}(\beta _{(w,\mathcal {P}(w))}; \beta _{(w, \mathcal {C}(w))},x_{w})||\) otherwise.

Then, there exists a TTNS tensor core \(\{G_{i}^{\circ }\}_{i=1}^{d}\) for \(p^{\star }\) in the sense of Definition 5, such that , and the following holds:

(72)

We defer the proof of Theorem 33 to the end of this subsection. As a direct application, one obtain the sample complexity of TTNS-Sketch:

Theorem 34

(Sample Complexity of TTNS-Sketch) Assume the setting and notation of Theorem 33. Let \(\hat{p}_{\textrm{TS}}\) denote the TTNS tensor formed by the TTNS tensor core \(\{\hat{G}_{i}\}_{i=1}^{d}\). In particular, \(\{\hat{G}_{i}\}_{i=1}^{d}\) is the output of Algorithm 1 with the empirical distribution \(\hat{p}\) formed by \(N\) i.i.d. samples \((y_1^{(i)}, \ldots , y_d^{(i)})_{i =1}^{N}\). Let \(Z_{k}^{(i)}\) be the \(i\)-th sample estimate of \(Z_{k}^{\star }\), i.e.

$$\begin{aligned} Z_{k}^{(i)}(\beta _{(k,\mathcal {C}(k))}, x_{k}, \gamma _{(k, \mathcal {P}(k))}):= S_{k}(\beta _{(k,\mathcal {C}(k))}, y^{(i)}_{\mathcal {L}(k)}) \textbf{1}(y^{(i)}_k = x_{k}) T_{k}(y^{(i)}_{\mathcal {R}(k)}, \gamma _{(k, \mathcal {P}(k))}), \end{aligned}$$

and set \(L_{k}\) as an upper bound of \( \Vert Z_{k}^{(i)}(\beta _{(k,\mathcal {C}(k))}, x_{k}; \gamma _{(k, \mathcal {P}(k))})\Vert \). Define \(L = \max _{k \in [d]}L_{k}\).

For \(\eta \in (0, 1)\) and \(\epsilon \in (0, 1)\), suppose

$$\begin{aligned} N \ge \frac{18L^2d^2 + 4L\epsilon \zeta d}{\zeta ^2 \epsilon ^2}\log {\left( \frac{(ln + m)d}{\eta }\right) }, \end{aligned}$$

and the constants are defined as follows:

  • \(\zeta = \max _{k \in [d]}\zeta _{k}\), with \(\zeta _k\) as in Theorem 33.

  • \(l = \max _{k \in [d]} l_k\), where \(l_k = \prod _{w \in \mathcal {C}(k)}l_{(w, k)}\)

  • \(m = \max _{k \in [d]} m_k\), where \(m_{k} = m_{(k, \mathcal {P}(k))}\).

  • \(n = \max _{k \in [d]} n_k\).

Then with probability at least \(1 - \eta \) one has

(73)

Proof

(of Theorem 34)

Suppose that the inequality (70) holds with \(\delta = \frac{\epsilon }{3d}\). In the setting of Theorem 33, note that \(p^{\star }\) is formed by \(\{G_{i}^{\circ }\}_{i=1}^{d}\), and \(\hat{p}_{\textrm{TS}}\) is formed by \(\{G_{i}^{\circ } + \Delta G_i^\circ \}_{i=1}^{d}\) with . Moreover, one has . Applying (66) in Lemma 27, one thus has

By a simple union bound argument, it suffices to find a sample size that (70) is guaranteed for each individual \(k \in [d]\) with \(\delta = \frac{\epsilon }{3d}\) and with probability \(1 - \frac{\eta }{d}\). We apply (69) in Corollary 32, where \(Z^{\star }_k\) is a matrix of size \(\mathbb {R}^{l_{k}n_{k} \times m_{k}}\). With the choice of \((l,n,m,L)\) as set in the theorem statement, one has

$$\begin{aligned} \mathbb {P}\left[ \Vert \Delta Z^{\star }_{k}\Vert \ge t \right] \le (ln + m) \exp {\left( \frac{-Nt^2/2}{L^2 + 2Lt/3} \right) }. \end{aligned}$$

It then suffices for one to find a lower bound for \(N\) so that for \(t = \zeta \frac{\epsilon }{3d}\) one has

$$\begin{aligned} (ln + m) \exp {\left( \frac{-Nt^2/2}{L^2 + 2Lt/3} \right) } \le \eta /d. \end{aligned}$$

By simple algebra, it suffices to lower bound \(N\) by the following quantity:

$$\begin{aligned} N \ge \frac{2L^2 + 4Lt/3}{t^2}\log {\left( \frac{(ln + m)d}{\eta }\right) } = \frac{18L^2d^2 + 4L\epsilon \zeta d}{\zeta ^2 \epsilon ^2}\log {\left( \frac{(ln + m)d}{\eta }\right) }. \end{aligned}$$

\(\square \)

As a corollary, for a Markov sketch function, note that each \(Z^{(i)}_{k}\) is a tensor with one entry being of value one, the rest being zero. Under this setting, note that \(\Vert Z^{(i)}_{k}\Vert \le \Vert Z^{(i)}_{k}\Vert _{F} = 1\), and hence one can set \(L = 1\). Let \(\Delta (T)\) denote the maximal degree of a tree \(T\). One has \(l \le n^{\Delta (T) - 1}\) and \(m = n \le ln\). Thus one obtains a sample complexity for TTNS-Sketch under Markov sketching:

Corollary 35

(Sample Complexity of TTNS-Sketch for Markov Sketch function) Suppose that \(p^{\star }\) is a graphical model over a tree \(T\), with the sketching function being the Markov sketch function specified in Lemma 13. Suppose

$$\begin{aligned} N \ge \frac{18d^2 + 4\epsilon \zeta d}{\zeta ^2 \epsilon ^2}\log {\left( \frac{2n^{\Delta (T)}d}{\eta }\right) }. \end{aligned}$$

Then, with probability at least \(1 - \eta \), one has

(74)

In the remainder of this subsection, we give the proof of Theorem 33, which is a culmination of all previous statements, the proof of which are of secondary interest and are included in Sect. 8. For some intuition of Theorem 33, the factors in \(\zeta _{k}\) is set such that \(\xi \) can bound the sample estimation error of the sketched down core determining equation in Algorithm 1. One then uses Lemma 28 to derive (72). As a sanity check of the defined constants, note that \(\xi _{i}:= \left( 2 c_{i;A} \left( c_{i; S} + c_{i;G}\right) \right) ^{-1}\) can be thought of as a homogeneous constant. That is, for any non-zero scaling constant \(\{q_{i}\}_{i = 1}^{d}\), changing the sketch cores from \(\{s_{i}\}_{i = 1}^{d}\) to \(\{q_{i}s_{i}\}_{i = 1}^{d}\) won’t affect \(\xi _{i}\), which is because the resultant multiplicative change to \(\{c_{i; A}, c_{i; S}, c_{i; G}\}\) will be cancelled out in \(\xi _{i}\). One can think of \(\xi = 1 \wedge \min _{i} \xi _{i}\) in Theorem 33 as serving the role of condition number. Moreover, because \((Z^\star _{k} - \hat{Z}_{k}) \propto c_{k; Z}\) by definition, it follows the condition in (70) will not be affected if a scaling constant is applied to sketch cores.

Proof

(of Theorem 33) Following the short-hand in Algorithm 3, for the joint variables we write \(\beta _{k} \leftarrow \beta _{(k,\mathcal {C}(k))}, \gamma _{k} \leftarrow \gamma _{\mathcal {P}(k)}, \alpha _{k} \leftarrow \alpha _{(k,\mathcal {P}(k))}\), and for the bond dimensions we write \(r_{k} \leftarrow r_{(\mathcal {P}(k), k)}, l_{k} \leftarrow \prod _{w \in \mathcal {C}(k)}l_{(w, k)}, m_{k} \leftarrow m_{(k, \mathcal {P}(k))}\). Moreover, if \(k\) is leaf, then we understand \(\beta _{k}\) as a joint variable taking value in \(\{1\}\), and \(l_{k} = 1\). Likewise, if \(k\) is root, then we understand \(\alpha _{k}, \gamma _{k}\) respectively as a joint variable taking value in \(\{1\}\), and \(r_{k} = m_{k} = 1\). In this notation, when \(k\) is leaf or root, the joint variables sketch \(Z_{k}\) is conveniently written as \(Z_{k}(\beta _{k}, x_{k}; \gamma _{k} )\).

For this proof, we will fix a canonical unfolding matrix structure for the tensors used. For \(Z_{k}(\beta _{k}, x_{k}, \gamma _{k})\) being one of \(\{Z^{\star }_{k}, \hat{Z}_{k}, \Delta Z^{\star }_{k}\}\), we reshape it as \(Z_{k}(\beta _{k}, x_{k}; \gamma _{k})\). For \(B_{k}(\beta _{k}, x_{k}, \alpha _{k})\) being one of \(\{B^{\star }_{k}, B^{\circ }_{k}, \hat{B}_{k}, \Delta B^{\star }_{k}, \Delta B^{\circ }_{k}\}\), we reshape it as \(B_{k}(\beta _{k}, x_{k}; \alpha _{k})\). For \(A_{k}(\beta _{k}, \alpha _{(k, \mathcal {C}(k))})\) being one of \(\{A^{\star }_{k}, A^{\circ }_{k}, \hat{A}_{k}, \Delta A^{\star }_{k}, \Delta A^{\circ }_{k}\}\), we reshape it as \(A_{k}(\beta _{k}; \alpha _{(k, \mathcal {C}(k))})\). For \(s_{k}(\beta _{(k,\mathcal {P}(k))}, \beta _{k},x_{k} )\), we reshape it as \(s_{k}(\beta _{(k,\mathcal {P}(k))}; \beta _{k},x_{k} )\). For \(G_{k}(\alpha _{(k, \mathcal {C}(k))}, x_k, \alpha _{k})\) being one of \(\{G^{\star }_{k}, G^{\circ }_{k}, \hat{G}_{k}, \Delta G^{\star }_{k}, \Delta G^{\circ }_{k}\}\), we reshape it as \(G_{k}(\alpha _{(k, \mathcal {C}(k))}; x_k, \alpha _{k})\).

Fix a non-root k, recall that \(B_k^{\star }\) and \(\hat{B}_k\) are the first \(r_{k}\) left singular vectors of \( Z^{\star }_{k}\) and \(\hat{Z}_{k}\), respectively. One applies Corollary 31: if \(\Vert \Delta Z_{k}^{\star }\Vert \le (1 - 1 / \sqrt{2}) \sigma _{r_{k}}(Z^{\star }_{k})\), then one can find \(R_{k}\in {\text {O}}(r_{k})\) such that one can define \(B^{\circ }_{k}:= B^{\star }_k R_{k}\) so that

$$\begin{aligned} \hat{B}_k = B_k^\circ + \Delta B^{\circ }_{k}, \quad \Vert \Delta B^{\circ }_{k}\Vert \le \frac{2 \Vert \Delta Z_{k}^{\star }\Vert }{\sigma _{r_{k}}(Z^{\star }_{k})} \end{aligned}$$

and by (v) in Lemma 23, one has

(75)

Meanwhile, if k is the root, there is no SVD step. In this case, the perturbation \(\Delta B_k^{\star }\) is simply \(\Delta Z_{k}^{\star }\). For consistency, when \(k\) is the root, we set \(B_k^\circ = B_{k}^{\star }\), and the corresponding perturbation \(\Delta B^{\circ }_{k}\) is just \(\Delta Z_{k}^{\star }\).

In summary, \(B_k^\circ \) is a rotation of \(B_k^{\star }\), and \(\hat{B}_k\) differs from \(B_k^\circ \) by a perturbation \(\Delta B^{\circ }_{k}\), for which one has an error bound. For a “rotated” version of \(A_k^{\star }\), define

$$\begin{aligned} A^{\circ }_{k}(\beta _{(k,\mathcal {C}(k))}, \alpha _{(k, \mathcal {C}(k))}) := \prod _{w \in \mathcal {C}(k)} \sum _{(\beta _{w}, x_{w})} s_{w}(\beta _{(w,k)}, \beta _{w},x_{w}) B^{\circ }_{w}(\beta _{w}, x_{w},\alpha _{(w,k)}) \end{aligned}$$
(76)

Viewed in the unfolding matrix structure fixed in the beginning of proof, one can write \(A_k^\circ = \bigotimes _{w \in \mathcal {C}(k)} s_w B_w^\circ \). Likewise, one has \(\hat{A}_k= \bigotimes _{w \in \mathcal {C}(k)} s_w \hat{B}_w = \bigotimes _{w \in \mathcal {C}(k)} \left( s_w B_w^\circ + s_{w} \Delta B^{\circ }_{w}\right) \).

Now, with the chosen unfolding matrix structure, consider the following “rotated” versions of (21):

$$\begin{aligned} \begin{aligned} G_{k}^\circ&= B_{k}^\circ (\beta _{k}; x_{k}, \alpha _{k}) \quad \text {if} ~ k ~ \text {is a leaf}, \\ \textstyle A_{k}^\circ G_{k}^\circ&= B_{k}^\circ (\beta _{k}; x_{k}, \alpha _{k}) \quad \text {otherwise}. \end{aligned} \end{aligned}$$
(77)

We will first prove that \(\{G_{i}^{\circ }\}_{i=1}^{d}\) forms a TTNS tensor core for \(p^{\star }\) in the sense of Definition 5. Suppose one has \(\{\Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}\}_{k \not = \text { root}}\) defined according to Condition 2. Then, Theorem 9 proves that \(\{G_{i}^{\star }\}_{i=1}^{d}\) solves the CDE (5) in Theorem 7 for the choice of gauge as \(\{\Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}\}_{k \not = \text { root}}\). Now consider (5) for a rotated choice of gauge \(\{\Phi ^{\circ }_{k \rightarrow \mathcal {P}(k)}:= \Phi ^{\star }_{k \rightarrow \mathcal {P}(k)}R_{k}\}_{k \not = \text { root}}\). One can directly check that the sketched down equation coincides with (77), and Theorem 9 ensures that the solution \(\{G_{i}^{\circ }\}_{i=1}^{d}\) is unique and forms a TTNS tensor core for \(p^{\star }\).

Next, we prove that , with the 3-tensor view for \(G_k^\circ , G^{\star }_k\) as in Definition 25. As the coefficients \(A_k^\circ \)’s and right-hand sides \(B_k^\circ \)’s are simply the rotations of \(A_k^{\star }\)’s and \(B_k^{\star }\)’s of (21), one can verify that \(G_k^\circ \) is a rotation of \(G_k^{\star }\). If \(k\) is not leaf nor root, the equation for \(G^{\circ }_k\) can be rewritten as

$$\begin{aligned} \left( \bigotimes _{w \in \mathcal {C}(k)} s_w B_w^{\star } R_{w}\right) G_{k}^\circ = B^{\star }_{k}R_{k}, \end{aligned}$$
(78)

whereas the equation for \(G_{k}^{\star }\) is

$$\begin{aligned} \left( \bigotimes _{w \in \mathcal {C}(k)} s_w B_w^{\star } \right) G_{k}^\star = B^{\star }_{k}. \end{aligned}$$

For the rotation matrix \(R_{k}(\alpha ; \beta ) \in {\text {O}}(r_{k})\), one gives it a 3-tensor view as \(R_{k}(\alpha ; 1;\beta )\) in the sense of Definition 24. One can directly verify that the following equation for \(G_k^\circ \) solves ( 78):

$$\begin{aligned} G_k^\circ = \left( \bigotimes _{w \in \mathcal {C}(k)} R_{w}^\top \right) \circ G^{\star }_k \circ R_{k}. \end{aligned}$$
(79)

Likewise, \(G_k^\circ = G^{\star }_k \circ R_{k}\) if \(k\) is leaf, and \(G_k^\circ = \left( \bigotimes _{w \in \mathcal {C}(k)} R_{w}^\top \right) \circ G^{\star }_k\) if \(k\) is root. Then as a consequence. The constructive form in (79) also gives a more intuitive sense of why \(\{G_{i}^{\circ }\}_{i=1}^{d}\) forms a TTNS tensor core in the same way as \(\{G_{i}^{\star }\}_{i=1}^{d}\). The reason is that each \(R_{k}\) and \(R_{k}^{\top }\) comes in pairs, which does not change the formed TTNS tensor itself.

Next, we prove that, for \(\Delta B_k^\circ := \hat{B}_k - B_k^\circ \) and \(\Delta A_k^\circ := \hat{A}_k - A_k^\circ \), the assumption (70) leads to the the following bound:

$$\begin{aligned} \Vert \Delta B_k^\circ \Vert \le \xi _{} \delta , \quad \Vert \Delta A_k^\circ \Vert \le c_{k; S}\xi \delta . \end{aligned}$$
(80)

First, for \(\Delta B_k^\circ \)’s, we will derive a tighter bound

$$\begin{aligned} \Vert \Delta B_k^\circ \Vert \le \frac{\xi _{}\delta }{3 c_{\mathcal {C}}}, \end{aligned}$$
(81)

which implies \( \Vert \Delta B_k^\circ \Vert \le \xi _{} \delta \) as \(3 c_{\mathcal {C}} \ge 1\). To see this, using (75), one has for any non-root k,

$$\begin{aligned} \Vert \Delta B_k^\circ \Vert \le \frac{2 \Vert \Delta Z_{k}^{\star }\Vert }{\sigma _{r_{k}}(Z^{\star }_{k})} \le \frac{2\zeta _{k} \delta }{c_{k;Z}} = \frac{2}{c_{k;Z}} \frac{c_{k;Z}}{6 c_{\mathcal {C}}} \xi _{}\delta \le \frac{\xi _{}\delta }{3 c_{\mathcal {C}}}. \end{aligned}$$

If k is the root, recall that \(\Delta B_k^\circ = \Delta Z_k^\star \), hence using \(c_{k;Z} = 1\),

$$\begin{aligned} \Vert \Delta B_k^\circ \Vert = \Vert \Delta Z_k^\star \Vert \le \zeta _k \delta = \frac{c_{k;Z}}{6 c_{\mathcal {C}}} \xi _{}\delta \le \frac{\xi _{}\delta }{3 c_{\mathcal {C}}}. \end{aligned}$$

Therefore, (81) holds.

Next, for a non-leaf k, recall that

$$\begin{aligned} \Delta A_k^\circ&= \bigotimes _{w \in \mathcal {C}(k)} (s_w B_w^\circ + s_w \Delta B^{\circ }_{w}) - \bigotimes _{w \in \mathcal {C}(k)} s_w B_w^\circ \\&=\left( \bigotimes _{w \in \mathcal {C}(k)}{s_{w}}\right) \left( \bigotimes _{w \in \mathcal {C}(k)} (B_w^\circ + \Delta B^{\circ }_{w}) - \bigotimes _{w \in \mathcal {C}(k)} B_w^\circ \right) \end{aligned}$$

By definition, one has \(\Vert \bigotimes _{w \in \mathcal {C}(k)}{s_{w}}\Vert = c_{k; S}\). Note that \(\Vert B_w^\circ \Vert = 1\) and \(\Vert \Delta B_w^\circ \Vert \le \frac{\xi _{}}{3 c_{\mathcal {C}}} \delta \). Hence, one can apply Lemma 29, which shows

$$\begin{aligned} \Vert \Delta A_k^\circ \Vert \le c_{k; S}\left( \sum _{w \in \mathcal {C}(k)} \Vert \Delta B^{\circ }_{w}\Vert \right) \exp \left( \sum _{w \in \mathcal {C}(k)} \Vert \Delta B^{\circ }_{w}\Vert \right) , \end{aligned}$$

Using (81),

$$\begin{aligned} \sum _{w \in \mathcal {C}(k)} \Vert \Delta B^{\circ }_{w}\Vert \le c_{\mathcal {C}} \cdot \max _{w \in [d]} \Vert \Delta B^{\circ }_{w}\Vert \le c_{\mathcal {C}} \frac{\xi \delta }{3 c_{\mathcal {C}}} \le \frac{\xi \delta }{3}. \end{aligned}$$

Hence,

$$\begin{aligned} \Vert \Delta A_k^\circ \Vert&\le c_{k; S}\left( \sum _{w \in \mathcal {C}(k)} \Vert \Delta B^{\circ }_{w}\Vert \right) \exp \left( \sum _{w \in \mathcal {C}(k)} \Vert \Delta B^{\circ }_{w}\Vert \right) \\&\le c_{k; S}\frac{\xi _{} \delta }{3} \exp (1)\\&\le c_{k; S}\xi _{} \delta , \end{aligned}$$

where the last two steps hold because \(\frac{\xi \delta }{3} < 1\) and \(\exp (1) < 3\).

It remains to show how (80) lead to (72).

If k is a leaf,

where the first equation follows from \(\Delta G_k^\circ = \Delta B_k^\circ \) in (79), the first inequality comes from (V) in Lemma 23, and the last inequality uses \(c_{k;A} = c_{k;S} = 1\) and \(\xi \le \left( 2 c_{k;A} \left( c_{k; S} + c_{k;G}\right) \right) ^{-1} = \frac{1}{2}\left( 1 + c_{k;G}\right) ^{-1}\).

Importantly, note that

$$\begin{aligned} A_k^\circ = \left( \bigotimes _{w \in \mathcal {C}(k)} s_w B_w^\star \right) \left( \bigotimes _{w \in \mathcal {C}(k)} R_w \right) , \end{aligned}$$

and so the fact that each \(R_w\) is orthogonal implies \(\Vert \left( A_k^{\circ }\right) ^{\dag }\Vert = \Vert \left( A_k^{\star }\right) ^{\dag }\Vert = c_{k;A}\). For any non-leaf k, note that \( \xi ,\delta \le 1\) leads to \(\Vert \left( A_k^{\circ }\right) ^{\dag }\Vert \Vert \Delta A_k^{\circ }\Vert \le c_{k;A}c_{k;S}\xi \delta \le 1 / 2\). From Lemma 28 and (V) in Lemma 23, it follows that

and so we are done.

\(\square \)

1.4 E.4 Remarks on sample complexity bound for total variation distance

Using the proof technique as outline before, one can derive a sample complexity upper bound on the total variation norm via the \(l_{1}\) distance between \(p^\star \) and \(\hat{p}_{\textrm{TS}}\). Note that one can define a new norm by

which is a similar definition to .

The proofs in Sect. 8 are also written such that the adaptation to is straightforward. First, all of the results in Lemma 23 will hold in this new norm, with only a change in the constant in (v). Second, from the version of Lemma 23, one can bound the global \(\Vert \cdot \Vert _{1}\) error by the core-wise error by an adaptation of Lemma 27. Finally, for local error on cores, the proof of Lemma 28 also proves that Lemma 28 holds if one replaces by the new norm. Importantly, the \(N = O(d^{2})\) scaling will still hold under the \(l_{1}\)-norm.

1.5 E.5 Proof of results

Proof

(of Lemma 23)

In the notation of Definition 19, one can write the definition of \(\circ \) by

$$\begin{aligned} G \circ G'(\cdot , (x, y), \cdot ) = G(\cdot , x, \cdot ) G'(\cdot , y, \cdot ). \end{aligned}$$

Associativity of \(\circ \) thus follows from associativity of matrix product, and likewise the inequality for \(\circ \) comes from submultiplicativity of matrix product under spectral norm:

$$\begin{aligned}{} & {} \max _{(x,y)} \Vert G \circ G'(\cdot , (x, y), \cdot )\Vert = \max _{(x,y)} \Vert G(\cdot , x, \cdot ) G'(\cdot , y, \cdot )\Vert \\ {}{} & {} \le \left( \max _{x} \Vert G(\cdot , x, \cdot )\Vert \right) \left( \max _{y} \Vert G'(\cdot , y, \cdot )\Vert \right) \end{aligned}$$

Likewise, by abuse of notation, also use \(\otimes \) as the Kronecker product operation over matrices. Then one can simplify and write the definition of \(\otimes \) by

$$\begin{aligned} G \otimes G'(\cdot , (x, y), \cdot ) = G(\cdot , x, \cdot ) \otimes G'(\cdot , y, \cdot ). \end{aligned}$$

Associativity of \(\otimes \) likewise follows from associativity of Kronecker product over matrices. Likewise, the equality for \(\otimes \) comes from multiplicativity of matrix product under spectral norm:

$$\begin{aligned} \Vert G \otimes G'(\cdot , (x, y), \cdot )\Vert = \Vert G(\cdot , x, \cdot )\Vert \cdot \Vert G'(\cdot , y, \cdot )\Vert \end{aligned}$$

We now prove (V). For a vector \(v \in \mathbb {R}^{r_{2}}\), one can view the vector \(G(\alpha , x; \beta )v\) as the concatenation of \(n\) smaller vectors of the form \(G(\cdot , x; \cdot )v\). For the upper bound, one has

where is done after taking supremum over \(v\) with \(\Vert v\Vert = 1\).

For the lower bound, one has

and likewise one is done after taking supremum over \(v\) with \(\Vert v\Vert = 1\).

\(\square \)

Proof

(of Lemma 26) Suppose that in \(T\), the maximum distance from the root node is \(L\). At a level \(l \in \{1, \ldots , L\}\), suppose there are \(d_{l}\) nodes in \(T\) which are of distance \(l\) to the root, denoted by the set \(\{v^{l}_{i}\}_{i = 1}^{d_{l}}\). Then, if one views \(p\) as a 3-tensor of size \(\{1\} \times \left[ \prod _{i = 1}^{d}n_{i}\right] \times \{1\} \rightarrow \mathbb {R}\), then one has

$$\begin{aligned} p = G_{\text {root}(T)} \circ \bigotimes _{i \in [d_{1}]}G_{v^{1}_{i}} \circ \bigotimes _{k \in [d_{2}]}G_{v^{2}_{j}} \circ \ldots \circ \bigotimes _{k \in [d_{L}]}G_{v^{L}_{k}}, \end{aligned}$$
(82)

which is only a consequence of the structure of the TTNS ansatz and the 3-tensor structure of TTNS tensor core in Definition 25. Then, by the chosen 3-tensor structure of \(p\), one has . By Lemma 23, one has

(83)

where the first inequality and the second equality follows from (63) and (64) in Lemma 23.

\(\square \)

Proof

(of Lemma 27) Let \(p_{0}, \ldots , p_{d}\) be a sequence of tensors such that \(p_{0} = p\), and \(p_{k}\) is the tensor formed by the TTNS tensor core \(\{G_{i} + \Delta G_{i}\}_{i = 1}^{k} \cup \{G_{j}\}_{j \not \in [k]}\). One is interested in the error \(\Vert p_{d} - p_{0}\Vert _{\infty }\), and one can bound by

$$\begin{aligned} \Vert p_{d} - p_{0}\Vert _{\infty } \le \sum _{k = 1}^{d}\Vert p_{k} - p_{k-1}\Vert _{\infty }. \end{aligned}$$

One can then bound the magnitude of each term in this telescoping sum. Note that \(p_{k} - p_{k-1}\) is a TTNS ansatz formed by cores \(\{G_{i} + \Delta G_{i}\}_{i = 1}^{k-1} \cup \{\Delta G_{k}\} \cup \{G_{j}\}_{j = k+1}^{d}\), and thus by Lemma 26

Therefore, using \(1 + x \le \exp (x)\), one has

\(\square \)

Proof

(of Lemma 28) Note that (68) is only a corollary of (67). To prove (67), it suffices to prove that for any \(i \in [n]\), one has

$$\begin{aligned} \Vert \Delta X^{\star }(\cdot , i, \cdot )\Vert \le \frac{\Vert \left( A^{\star }\right) ^{\dag }\Vert }{1 - \Vert \left( A^{\star }\right) ^{\dag }\Vert \Vert \Delta A^{\star }\Vert } \left( \Vert \Delta A^{\star }\Vert \Vert X^{\star }(\cdot , i, \cdot )\Vert + \Vert \Delta B^{\star }(\cdot , i, \cdot )\Vert \right) , \end{aligned}$$
(84)

whereby (67) is obtained by taking maximum over \(i \in [n]\) on both sides.

Based on the above observation, one can simplify notation and reduce argument over to regular spectral norm over matrices. For a fixed \(i \in [n]\), define \(C^\star := B^{\star }(\cdot , i, \cdot )\). Let \(Y^\star \) be the matrix which is the unique exact solution the linear equation \(A^{\star }Y = C^{\star }\). Naturally, one has \(Y^\star = X^{\star }(\cdot , i, \cdot )\).

Likewise, define \(\Delta C^{\star } = \Delta B^{\star }(\cdot , i, \cdot )\) as the corresponding perturbation to \(C^{\star }\), and let \(Y^\star + \Delta Y^{\star }\) be the matrix which is the unique solution the linear equation \((A^{\star }+\Delta A^{\star })Y = (C^{\star }+\Delta C^{\star })\) in the sense of least squares. As before, one has \(Y^\star + \Delta Y^\star = X^{\star }(\cdot , i, \cdot ) +\Delta X^{\star }(\cdot , i, \cdot )\). Then, (84) is equivalent to the following inequality:

$$\begin{aligned} \Vert \Delta Y^{\star }\Vert \le \frac{\Vert \left( A^{\star }\right) ^{\dag }\Vert }{1 - \Vert \left( A^{\star }\right) ^{\dag }\Vert \Vert \Delta A^{\star }\Vert } \left( \Vert \Delta A^{\star }\Vert \Vert X^{\star }\Vert + \Vert \Delta C^{\star }\Vert \right) . \end{aligned}$$
(85)

To reduce further, for an arbitrary \(v \in \mathbb {R}^m\), note that it suffices to prove the following result

$$\begin{aligned} \Vert \Delta Y^{\star }v\Vert \le \frac{\Vert \left( A^{\star }\right) ^{\dag }\Vert }{1 - \Vert \left( A^{\star }\right) ^{\dag }\Vert \Vert \Delta A^{\star }\Vert } \left( \Vert \Delta A^{\star }\Vert \Vert Y^{\star }v\Vert + \Vert \Delta C^{\star }v\Vert \right) , \end{aligned}$$
(86)

and (85) follows by taking supremum over \(v\) with \(\Vert v\Vert = 1\).

To simplify further, define \(b^{\star }:= C^{\star }v \in \mathbb {R}^{l}\), and let \(x^{\star }:= Y^{\star }v \in \mathbb {R}^{r}\) be the unique exact solution to \(A^{\star }x = b^{\star }\). Moreover, let \(\Delta b^{\star }:= \Delta C^{\star }v\) and let \(\Delta x^{\star }:= \Delta Y^{\star }v\), and then \(x^{\star }+ \Delta x^{\star }\) solves the linear equation \((A^{\star }+\Delta A^{\star })x = (b^{\star }+\Delta b^{\star })\) in the sense of least squares. This is exactly the setting of Theorem 3.48 in [31], because of which (86) holds as a corollary. Thus we are done. \(\square \)

Proof

(of Lemma 29 )

Let \(C'_i = C_i + E_i\), then

$$\begin{aligned} \begin{aligned} \bigotimes _{i = 1}^{n} (C_i + E_i) - \bigotimes _{i = 1}^{n} C_i&= (C'_1 \otimes \cdots \otimes C'_n) - (C_1 \otimes C'_2 \otimes \cdots \otimes C'_n) \\&\quad + (C_1 \otimes C'_2 \otimes \cdots \otimes C'_n) - (C_1 \otimes C_2 \otimes C'_3 \otimes \cdots \otimes C'_n) \\&\quad + \cdots \\&\quad + (C_1 \otimes \cdots \otimes C_{n - 1} \otimes C'_n) - (C_1 \otimes \cdots \otimes C_n). \end{aligned} \end{aligned}$$
(87)

The first line on the right-hand side of (87) reduces to \(E_1 \otimes C'_2 \otimes \cdots \otimes C'_n\). Since \(\Vert C'_i\Vert \le \Vert C_i\Vert + \Vert E_i\Vert \le 1 + \delta _i\),

$$\begin{aligned} \Vert E_1 \otimes C'_2 \otimes \cdots \otimes C'_n\Vert = \Vert E_1\Vert \Vert C'_2\Vert \cdots \Vert C'_n\Vert \le \delta _1 (1 + \delta _2) \cdots (1 + \delta _n) \le \delta _1 \cdot \prod _{i = 1}^{n} (1 + \delta _i). \end{aligned}$$

The norm of the j-th line on the right-hand side of (87) is upper bounded by \(\delta _j \cdot \prod _{i = 1}^{n} (1 + \delta _i)\). Therefore, using \(1 + x \le \exp (x)\), one has

$$\begin{aligned} \left\| \bigotimes _{i = 1}^{n} (C_i + E_i) - \bigotimes _{i = 1}^{n} C_i\right\| \le \left( \sum _{i = 1}^{n} \delta _i\right) \cdot \prod _{i = 1}^{n} (1 + \delta _i) \le \left( \sum _{i = 1}^{n} \delta _i\right) \exp \left( \sum _{i = 1}^{n} \delta _i\right) . \end{aligned}$$

\(\square \)

Proof

(of Corollary 31 and Corollary 32)

For Corollary 31, we apply Theorem 2.9, (2.26a) in [5]: if \( \Vert \Delta Z^{\star }\Vert \le (1 - 1 / \sqrt{2}) \sigma _{r}(Z^{\star })\), then

$$\begin{aligned}\textrm{dist}(B, B^{\star }) \le \frac{2 \Vert \left( B^{\star }\right) ^{\top }\Delta Z^{\star }\Vert }{\sigma _{r}( Z^{\star }) - \sigma _{r+1}(Z^{\star })} \le \frac{2 \Vert \left( B^{\star }\right) ^{\top }\Vert \Vert \Delta Z^{\star }\Vert }{\sigma _{r}( Z^{\star }) - \sigma _{r+1}(Z^{\star })},\end{aligned}$$

and we are done by applying \(\sigma _{r+1}(Z^{\star }) = 0\) and \(\Vert B^{\star }\Vert = 1\).

For Corollary 32, only (69) is new, and one only needs to justify \(\gamma \le L^2\). By Jensen’s theorem and sub-multiplicativity of spectral norm, one has

$$\begin{aligned} \left\| \mathbb {E}\left[ Z^{(i)}\left( Z^{(i)}\right) ^{\top }\right] \right\| \le \mathbb {E}\left[ \left\| Z^{(i)}\left( Z^{(i)}\right) ^{\top }\right\| \right] \le \mathbb {E}\left[ \left\| Z^{(i)}\right\| ^2\right] \le L^2. \end{aligned}$$

\(\square \)

.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, X., Hur, Y., Khoo, Y. et al. Generative modeling via tree tensor network states. Res Math Sci 10, 19 (2023). https://doi.org/10.1007/s40687-023-00381-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40687-023-00381-3

Keywords

Navigation