Review of Sublinear Modeling in Probabilistic Graphical Models by Statistical Mechanical Informatics and Statistical Machine Learning Theory

Tanaka, Kazuyuki

doi:10.1007/978-981-16-4095-7_10

Kazuyuki Tanaka⁹

5524 Accesses

Abstract

We review sublinear modeling in probabilistic graphical models by statistical mechanical informatics and statistical machine learning theory. Our statistical mechanical informatics schemes are based on advanced mean-field methods including loopy belief propagations. This chapter explores how phase transitions appear in loopy belief propagations for prior probabilistic graphical models. The frameworks are mainly explained for loopy belief propagations in the Ising model which is one of the elementary versions of probabilistic graphical models. We also expand the schemes to quantum statistical machine learning theory. Our framework can provide us with sublinear modeling based on the momentum space renormalization group methods.

You have full access to this open access chapter, Download chapter PDF

Bayesian Computation with Intractable Likelihoods

Explicit Formulae in Probability and in Statistical Physics

A life in statistical mechanics

Article Open access 04 April 2017

1 Introduction

Statistical machine learning frameworks using probabilistic graphical models are useful for many applications, including information communication technologies [1,2,3], compressed sensing [4, 5] and neural information processing systems [6,7,8,9,10] in data-driven sciences.

Most probabilistic graphical models belong to the exponential family [11] and can be regarded as classical spin systems in statistical mechanical informatics [12,13,14,15,16,17]. However, it is well known that many applicable formulations in data sciences as well as computational sciences can be reduced to combinatorial problems with some constraint conditions which can be regarded as an Ising Model in statistical mechanical informatics [18, 19]. Moreover, much interest has focused on applying quantum annealing as a novel high-speed optimization technology to massive optimization problems [20,21,22,23,24].

2 Statistical Machine Learning

In statistical machine learning, most of the mathematical frameworks for machine learning are based on maximum likelihood frameworks [25, 26] from statistical mathematical sciences. The important points are how to assume the prior distribution and the data generative probability distribution and how to express the joint probability between the parameters and the data vector. In this section, we explore maximum likelihood frameworks in terms of model selection and parameter selection from a given data vector.

2.1 Bayesian Statistics and Maximization of Marginal Likelihood

Let us consider a graph specified by nodes and edges, (V, E), where V is the set of all nodes i and E is the set of all edges $\{i,j\}$. State variables $s_{i}$ and $d_{i}$ are associated with each node i. The vectors ${\boldsymbol{s}} = {\left( \begin{array}{ccccc} s_{1} \\ s_{2} \\ {\vdots } \\ s_{|V|} \end{array} \right) }$ and ${\boldsymbol{d}} = {\left( \begin{array}{ccccc} d_{1} \\ d_{2} \\ {\vdots } \\ d_{|V|} \end{array} \right) }$ correspond to the parameters and the data vector, respectively. The state spaces of $s_{i}$ and $d_{i}$ are given by ${\Omega }$ and $(-{\infty },+{\infty })$, respectively. Now ${\rho }({\boldsymbol{d}}|{\boldsymbol{s}},{\beta })$ and $P({\boldsymbol{s}}|{\alpha })$ which correspond to the data generative and prior models, respectively, are assumed to be as follows:

$$\begin{aligned} {\rho }({\boldsymbol{d}}|{\boldsymbol{s}},{\beta }) = {\prod _{i{\in }V}} {\sqrt{{\frac{{\beta }}{2{\pi }}}}} {\exp }{\left( -{\frac{1}{2}}{\beta }{\left( d_{i}-s_{i}\right) }^{2} \right) }, \end{aligned}$$

(10.1)

$$\begin{aligned} P({\boldsymbol{s}}|{\alpha }) = {\frac{ {\displaystyle { {\prod _{\{i,j\}{\in }E}} {\exp }{\left( -{\frac{1}{2}}{\alpha }{\left( s_{i}-s_{j}\right) }^{2} \right) } }} }{ {\displaystyle { {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\prod _{\{i,j\}{\in }E}} {\exp }{\left( -{\frac{1}{2}}{\alpha }{\left( s_{i}-s_{j}\right) }^{2} \right) } }} } }. \end{aligned}$$

(10.2)

The expressions for the posterior probability $P({\boldsymbol{s}}|{\boldsymbol{d}},{\alpha },{\beta })$, joint probability ${\rho }({\boldsymbol{s}},{\boldsymbol{d}}|{\alpha },{\beta })$, and marginal likelihood ${\rho }({\boldsymbol{d}}|{\alpha },{\beta })$ are given by Bayes formulas as follows:

$$\begin{aligned} P({\boldsymbol{s}}|{\boldsymbol{d}},{\alpha },{\beta }) = {\frac{{\rho }({\boldsymbol{s}},{\boldsymbol{d}}|{\alpha },{\beta })}{{\rho }({\boldsymbol{d}}|{\alpha },{\beta })}} = {\frac{{\rho }({\boldsymbol{d}}|{\boldsymbol{s}},{\beta })P({\boldsymbol{s}}|{\alpha })}{{\rho }({\boldsymbol{d}}|{\alpha },{\beta })}}, \end{aligned}$$

(10.3)

$$\begin{aligned} {\rho }({\boldsymbol{s}},{\boldsymbol{d}}|{\alpha },{\beta }) ={\rho }({\boldsymbol{d}}|{\boldsymbol{s}},{\beta })P({\boldsymbol{s}}|{\alpha }), \end{aligned}$$

(10.4)

$$\begin{aligned} {\rho }({\boldsymbol{d}}|{\alpha },{\beta })&= {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\rho }({\boldsymbol{s}},{\boldsymbol{d}}|{\alpha },{\beta }) \nonumber \\&= {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\rho }({\boldsymbol{d}}|{\boldsymbol{s}},{\beta })P({\boldsymbol{s}}|{\alpha }). \end{aligned}$$

(10.5)

Estimates of the hyperparameters and the parameter vector ${\widehat{{\alpha }}}$, ${\widehat{{\beta }}}$, ${\boldsymbol{{\widehat{s}}}}={\left( {\widehat{s}}_{1},{\widehat{s}}_{2},{\cdots },{\widehat{s}}_{|V|}\right) }$ are determined by

$$\begin{aligned} {\big (} {\widehat{{\alpha }}}({\boldsymbol{d}}), {\widehat{{\beta }}}({\boldsymbol{d}}) {\big )}= & {} {\arg }{\max _{{\left( {\alpha },{\beta } \right) }}} {\rho }{\big (} {\boldsymbol{d}}{\big |}{\alpha },{\beta } {\big )}, \end{aligned}$$

(10.6)

$$\begin{aligned} {\widehat{s}}_{i}({\boldsymbol{d}})= & {} {\arg }{\max _{s_{i}{\in }{\Omega }}}P_{i}{\big (}s_{i}{\big |}{\boldsymbol{d}},{\widehat{{\alpha }}}({\boldsymbol{d}}),{\widehat{{\beta }}}({\boldsymbol{d}}){\big )}~(i{\in }V). \end{aligned}$$

(10.7)

Equations (10.6) and (10.7) are referred to as the maximization of marginal likelihood (MML) [25, 26] and the maximization of posterior marginal (MPM) [27], respectively.

2.2 Expectation-Maximization Algorithm

The expectation-maximization (EM) algorithm is often used to maximize the marginal likelihood in Eq. (10.6) [25, 26]. The $\mathcal{{Q}}$-function for the EM algorithm in the present framework is defined by

$$\begin{aligned} \mathcal{{Q}}{\left( {\alpha },{\beta }{\big |}{\alpha }',{\beta }',{\boldsymbol{d}}\right) } \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} P{\left( {\boldsymbol{s}}{\big |}{\boldsymbol{d}},{\alpha }',{\beta }'\right) } {\ln }{\left( {\rho }{\big (}{\boldsymbol{s}},{\boldsymbol{d}}{\big |}{\alpha },{\beta }{\big )}\right) }. \end{aligned}$$

(10.8)

The EM algorithm is a procedure that performs the following procedures of E- and M-step repeatedly for $t=0,1,2,{\cdots }$ until ${\widehat{\alpha }}({\boldsymbol{d}})$ and ${\widehat{\beta }}({\boldsymbol{d}})$ converge:

E-step::: Compute $\mathcal{{Q}}{\big (}{\alpha },{\beta }{\big |}{\alpha }({\boldsymbol{d}},t),{\beta }({\boldsymbol{d}},t),{\boldsymbol{d}}{\big )}$ for various values of ${\alpha }$ and ${\beta }$.
M-step::: Determine ${\big (}{\alpha }({\boldsymbol{d}},t+1),{\beta }({\boldsymbol{d}},t+1){\big )}$ so as to satisfy the extremum conditions of $\mathcal{{Q}}{\big (}{\alpha },{\beta }{\big |}{\alpha }({\boldsymbol{d}},t),{\beta }({\boldsymbol{d}},t),{\boldsymbol{d}}{\big )}$ with respect to ${\alpha }$ and ${\beta }$. Update ${\widehat{\alpha }}({\boldsymbol{d}}){\leftarrow }{\alpha }({\boldsymbol{d}},t+1)$ and ${\widehat{\beta }}({\boldsymbol{d}}){\leftarrow }{\beta }({\boldsymbol{d}},t+1)$.

The update rule from ${\left( {\alpha }({\boldsymbol{d}},t),{\beta }({\boldsymbol{d}},t)\right) }$ to ${\left( {\alpha }({\boldsymbol{d}},t+1),{\beta }({\boldsymbol{d}},t+1)\right) }$ for the extremum conditions can be written as

$$\begin{aligned}&{\frac{1}{|E|}}{\sum _{\{i,j\}{\in }E}} {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\left( s_{i}-s_{j}\right) }^{2} P{\left( s_{1},s_{2},{\cdots },s_{|V|}{\big |}{\alpha }({\boldsymbol{d}},t+1)\right) } \nonumber \\&\qquad\quad{} = {\frac{1}{|E|}} {\sum _{\{i,j\}{\in }E}} {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\left( s_{i}-s_{j}\right) }^{2} P{\left( s_{1},s_{2},{\cdots },s_{|V|}{\big |}{\boldsymbol{d}},{\alpha }({\boldsymbol{d}},t),{\beta }({\boldsymbol{d}},t)\right) }, \nonumber \\ \end{aligned}$$

(10.9)

$$\begin{aligned} {\frac{1}{{\beta }({\boldsymbol{d}},t+1)}} = {\frac{1}{|V|}} {\sum _{i{\in }V}} {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\left( s_{i}-d_{i}\right) }^{2} P{\left( s_{1},s_{2},{\cdots },s_{|V|}{\big |}{\boldsymbol{d}},{\alpha }({\boldsymbol{d}},t),{\beta }({\boldsymbol{d}},t)\right) }. \nonumber \\ \end{aligned}$$

(10.10)

The marginal probability distributions of $P{\left( {\boldsymbol{s}}{\big |}{\boldsymbol{d}},{\alpha },{\beta }\right) }$ and $P{\left( {\boldsymbol{s}}{\big |}{\alpha }\right) }$ are introduced as

$$\begin{aligned} P_{i}{\left( s_{i}{\big |}{\boldsymbol{d}},{\alpha },{\beta }\right) } \equiv {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}} P{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}{\big |}{\boldsymbol{d}},{\alpha },{\beta }\right) } ~(i{\in }V), \end{aligned}$$

(10.11)

$$\begin{aligned} P_{ij}{\left( s_{i},s_{j}{\big |}{\boldsymbol{d}},{\alpha },{\beta }\right) }= & {} P_{ji}{\left( s_{j},s_{i}{\big |}{\boldsymbol{d}},{\alpha },{\beta }\right) } \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}} P{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}{\big |}{\boldsymbol{d}},{\alpha },{\beta }\right) } ~(\{i,j\}{\in }E), \nonumber \\&\end{aligned}$$

(10.12)

$$\begin{aligned} P_{ij}{\left( s_{i},s_{j}{\big |}{\alpha }\right) }= & {} P_{ji}{\left( s_{j},s_{i}{\big |}{\alpha }\right) } \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}} P{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}{\big |}{\alpha }\right) } ~(\{i,j\}{\in }E). \nonumber \\&\end{aligned}$$

(10.13)

In this way, the extremum conditions can be reduced to

$$\begin{aligned}&{\frac{1}{|E|}}{\sum _{\{i,j\}{\in }E}} {\sum _{s_{i}{\in }{\Omega }}} {\sum _{s_{j}{\in }{\Omega }}} {\left( s_{i}-s_{j}\right) }^{2} P_{ij}{\left( s_{i},s_{j}{\big |}{\alpha }({\boldsymbol{d}},t+1)\right) } \nonumber \\&\qquad\quad{} = {\frac{1}{|E|}} {\sum _{\{i,j\}{\in }E}} {\sum _{s_{i}{\in }{\Omega }}} {\sum _{s_{j}{\in }{\Omega }}} {\left( s_{i}-s_{j}\right) }^{2} P_{ij}{\left( s_{i},s_{j}{\big |}{\boldsymbol{d}},{\alpha }({\boldsymbol{d}},t),{\beta }({\boldsymbol{d}},t)\right) },\qquad \end{aligned}$$

(10.14)

$$\begin{aligned} {\frac{1}{{\beta }({\boldsymbol{d}},t+1)}} = {\frac{1}{|V|}} {\sum _{i{\in }V}} {\sum _{s_{i}{\in }{\Omega }}} {\left( s_{i}-d_{i}\right) }^{2} P_{i}{\left( s_{i}{\big |}{\boldsymbol{d}},{\alpha }({\boldsymbol{d}},t),{\beta }({\boldsymbol{d}},t)\right) }. \end{aligned}$$

(10.15)

To realize the EM procedure as a practical algorithm, Markov chain Monte Carlo (MCMC) Methods are often used, which are powerful probabilistic methods [28, 29]. In some recent developments, advanced mean-field methods from statistical mechanical informatics are also used as powerful deterministic algorithms, as shown in Sect. 10.3. Consider the expectation values for both sides of Eqs. (10.9) and (10.10) with respect to the state vector ${\boldsymbol{d}}$ of a data point according to the following probability density function where the hyperparameters ${\alpha }$ and ${\beta }$ are set to their true values ${\alpha }^{*}$ and ${\beta }^{*}$, respectively:

$$\begin{aligned} {\rho }({\boldsymbol{d}}|{\alpha }^{*},{\beta }^{*})= & {} {\sum _{{\boldsymbol{\tau }}{\in }{\Omega }^{|V|}}} {\rho }{\left( {\boldsymbol{d}}|{\boldsymbol{\tau }},{\alpha }^{*},{\beta }^{*}\right) }P{\left( {\boldsymbol{\tau }}|{\alpha }^{*}\right) }, \end{aligned}$$

(10.16)

such that

$$\begin{aligned} {\rho }{\left( d_{1},d_{2},{\cdots },d_{|V|}{\big |}{\alpha }^{*},{\beta }^{*}\right) }= & {} {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\rho }{\left( d_{1},d_{2},{\cdots },d_{|V|}{\big |}{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|},{\beta }^{*}\right) } \nonumber \\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \times P{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}{\big |}{\alpha }^{*}\right) } \nonumber \\&{\left( d_{1}{\in }(-{\infty },+{\infty }),d_{2}{\in }(-{\infty },+{\infty }),{\cdots },d_{|V|}{\in }(-{\infty },+{\infty })\right) }. \end{aligned}$$

(10.17)

We can then derive simultaneous equations for the statistical trajectory $\{{\overline{\alpha }}({\alpha }^{*},{\beta }^{*},t),{\overline{\beta }}({\alpha }^{*},{\beta }^{*},t)|t=1,2,3,{\cdots }\}$) in the convergence process $\{({\alpha }({\boldsymbol{d}},t),{\beta }({\boldsymbol{d}},t))|t=1,2,3,{\cdots }\}$ of the above EM algorithm.

Equations (10.9) and (10.10) can be rewritten as follows:

$$\begin{aligned} {\frac{1}{|E|}} {\frac{{\partial }}{{\partial }{\alpha }({\boldsymbol{d}},t+1)}} {\left( {\ln }{\left( Z{\left( {\alpha }({\boldsymbol{d}},t+1) \right) }\right) }\right) } = {\frac{1}{|E|}} {\frac{{\partial }}{{\partial }{\alpha }({\boldsymbol{d}},t)}} {\left( {\ln }{\left( Z{\left( {\boldsymbol{d}}, {\alpha }({\boldsymbol{d}},t), {\beta }({\boldsymbol{d}},t) \right) }\right) }\right) }, \nonumber \\ \end{aligned}$$

(10.18)

$$\begin{aligned} {\frac{1}{{\beta }({\boldsymbol{d}},t+1)}} = {\frac{1}{|V|}} {\frac{{\partial }}{{\partial }{\beta }({\boldsymbol{d}},t)}} {\left( {\ln }{\left( Z{\left( {\boldsymbol{d}}, {\alpha }({\boldsymbol{d}},t), {\beta }({\boldsymbol{d}},t) \right) }\right) }\right) }, \end{aligned}$$

(10.19)

where

$$\begin{aligned} Z({\alpha }) \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\prod _{\{i,j\}{\in }E}} {\exp }{\left( -{\frac{1}{2}}{\alpha }{\left( s_{i}-s_{j}\right) }^{2} \right) }, \end{aligned}$$

(10.20)

$$\begin{aligned} Z({\boldsymbol{d}},{\alpha },{\beta }) \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} w{\left( s_{1},s_{2},{\cdots },s_{|V|}{\big |} {\boldsymbol{d}},{\alpha },{\beta }\right) }, \end{aligned}$$

(10.21)

$$\begin{aligned} w{\left( {\boldsymbol{s}}{\big |}{\boldsymbol{d}},{\alpha },{\beta }\right) }= & {} w{\left( s_{1},s_{2},{\cdots },s_{|V|}{\big |} d_{1},d_{2},{\cdots },d_{|V|},{\alpha },{\beta }\right) } \nonumber \\\equiv & {} {\left( {\prod _{i{\in }V}} {\exp }{\left( -{\frac{1}{2}}{\beta }{\left( s_{i}-d_{i}\right) }^{2} \right) } \right) } {\left( {\prod _{\{i,j\}{\in }E}} {\exp }{\left( -{\frac{1}{2}}{\alpha }{\left( s_{i}-s_{j}\right) }^{2} \right) } \right) }.\nonumber \\ \end{aligned}$$

(10.22)

By taking the expectation values of both sides of Eqs. (10.18) and (10.19) with respect to the state vector of the data point ${\boldsymbol{d}}$ in the probability density function ${\rho }({\boldsymbol{d}}|{\alpha }^{*},{\beta }^{*})$, the simultaneous deterministic equation for the statistical trajectory $\{{\overline{\alpha }}({\alpha }^{*},{\beta }^{*},t), {\overline{\beta }}({\alpha }^{*},{\beta }^{*},t) | t=1,2,3,{\cdots }\}$ of the EM procedure can be derived as follows:

$$\begin{aligned}&{\frac{1}{|E|}} {\frac{{\partial }}{{\partial }{\overline{\alpha }}({\alpha }^{*},{\beta }^{*},t+1)}} {\left( {\ln }{\left( Z{\left( {\overline{\alpha }}({\alpha }^{*},{\beta }^{*},t+1) \right) }\right) }\right) } \nonumber \\&= {\frac{1}{|E|}} {\frac{{\partial }}{{\partial }{\overline{\alpha }}({\alpha }^{*},{\beta }^{*},t)}} {\left( {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\rho }{\left( {\boldsymbol{d}}{\big |}{\alpha }^{*},{\beta }^{*}\right) } {\left( {\ln }{\left( Z{\left( {\boldsymbol{d}}, {\overline{\alpha }}({\alpha }^{*},{\beta }^{*},t), {\overline{\beta }}({\alpha }^{*},{\beta }^{*},t) \right) }\right) }\right) }dd_{1}dd_{2}{\cdots }dd_{|V|} \right) }, \nonumber \\ \end{aligned}$$

(10.23)

$$\begin{aligned}&{\frac{1}{{\overline{\beta }}({\alpha }^{*},{\beta }^{*},t+1)}} = {\frac{1}{|V|}} {\frac{{\partial }}{{\partial }{\overline{\beta }}({\alpha }^{*},{\beta }^{*},t)}}\nonumber \\&\quad {\left( {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\rho }{\left( {\boldsymbol{d}}{\big |}{\alpha }^{*},{\beta }^{*}\right) } {\left( {\ln }{\left( Z{\left( {\boldsymbol{d}}, {\overline{\alpha }}({\alpha }^{*},{\beta }^{*}t), {\overline{\beta }}({\alpha }^{*},{\beta }^{*},t) \right) }\right) }\right) }dd_{1}dd_{2}{\cdots }dd_{|V|} \right) }. \nonumber \\ \end{aligned}$$

(10.24)

In the case of a continuous state space ${\Omega }=(-{\infty },+{\infty })$, the posterior and prior probabilistic models correspond to Gaussian graphical models, and the statistical trajectory in Eqs. (10.23) and (10.24) can be exactly computed by means of the multi-dimensional Gaussian integral formula [30].

For a discrete state space ${\Omega }$, it is generally hard to treat Eqs. (10.23) and (10.24) analytically. To estimate Eqs. (10.23) and (10.24), the following quantity is often introduced in statistical mechanical informatics [13, 17]:

$$\begin{aligned} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\rho }{\left( {\boldsymbol{d}}{\big |}{\alpha }^{*},{\beta }^{*}\right) } {\left( {\ln }{\left( Z{\left( {\boldsymbol{d}}, {\alpha }, {\beta } \right) }\right) }\right) }dd_{1}dd_{2}{\cdots }dd_{|V|}. \end{aligned}$$

(10.25)

The quantity in Eq. (10.25) can be rewritten as follows:

$$\begin{aligned}&{\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\rho }{\left( {\boldsymbol{d}}{\big |}{\alpha }^{*},{\beta }^{*}\right) } {\left( {\lim _{n{\rightarrow }+0}}{\frac{1}{n}}{\left( Z{\left( {\boldsymbol{d}}, {\alpha },{\beta } \right) }^{n}-1\right) }\right) }dd_{1}dd_{2}{\cdots }dd_{|V|} \nonumber \\&= {\lim _{n{\rightarrow }+0}}{\frac{1}{n}} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\rho }{\left( {\boldsymbol{d}}{\big |}{\alpha }^{*},{\beta }^{*}\right) } {\left( {\left( Z{\left( {\boldsymbol{d}}, {\alpha },{\beta } \right) }^{n}-1\right) }\right) }dd_{1}dd_{2}{\cdots }dd_{|V|} \nonumber \\&= {\lim _{n{\rightarrow }+0}}{\frac{1}{n}} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\rho }{\left( {\boldsymbol{d}}{\big |}{\alpha }^{*},{\beta }^{*}\right) } {\left( {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} w{\left( {\boldsymbol{s}} {\big |} {\boldsymbol{d}}, {\alpha },{\beta } \right) }\right) }^{n}dd_{1}dd_{2}{\cdots }dd_{|V|}-1 \nonumber \\&= {\lim _{n{\rightarrow }+0}}{\frac{1}{n}}{\Bigg (} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\rho }{\left( {\boldsymbol{d}}{\big |}{\alpha }^{*},{\beta }^{*}\right) } \nonumber \\&\qquad\qquad{}{\times } {\prod _{j=1}^{n}} {\left( {\sum _{s_{1,j}{\in }{\Omega }}}{\sum _{s_{2,j}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|,j}{\in }{\Omega }}} w{\left( s_{1,j},s_{2,j},{\cdots }s_{|V|,j} {\big |} {\boldsymbol{d}}, {\alpha },{\beta } \right) }\right) } dd_{1}dd_{2}{\cdots }dd_{|V|}{\Bigg )}-1 \nonumber \\&= {\frac{1}{Z({\alpha }^{*})}}{\left( {\sqrt{{\frac{{\beta }^{*}}{2{\pi }}}}}\right) }^{|V|} \nonumber \\&{\times }{\Bigg \{} {\lim _{n{\rightarrow }+0}}{\frac{1}{n}}{\Bigg (} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\left( {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} w{\left( {\tau }_{1},{\tau }_{2},{\cdots }{\tau }_{|V|} {\big |} {\boldsymbol{d}}, {\alpha }^{*},{\beta }^{*} \right) }\right) } \nonumber \\&\qquad\qquad{}{\times } {\prod _{j=1}^{n}} {\left( {\sum _{s_{1,j}{\in }{\Omega }}}{\sum _{s_{2,j}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|,j}{\in }{\Omega }}} w{\left( s_{1,j},s_{2,j},{\cdots }s_{|V|,j} {\big |} {\boldsymbol{d}}, {\alpha },{\beta } \right) }\right) } {\Bigg )}dd_{1}dd_{2}{\cdots }dd_{|V|}{\Bigg \}}-1. \end{aligned}$$

(10.26)

Equation (10.26) means that computation of the statistical quantity in Eq. (10.25) can be reduced, up to some normalization constant, to computation of the statistical quantity in the probabilistic model given by the weight factor

$$\begin{aligned} w{\left( {\tau }_{1},{\tau }_{2},{\cdots }{\tau }_{|V|} {\big |} {\boldsymbol{d}}, {\alpha }^{*},{\beta }^{*} \right) } {\prod _{j=1}^{n}}w{\left( s_{1,j},s_{2,j},{\cdots }s_{|V|,j} {\big |} {\boldsymbol{d}}, {\alpha },{\beta } \right) .} \end{aligned}$$

(10.27)

We remark that the weight factor (10.27) is expressed by considering some replicas of the posterior probabilistic model $P({\boldsymbol{s}}|{\boldsymbol{d}},{\alpha },{\beta })$ and the analysis starting from the weight factor (10.27) is referred to as a replica method [13, 17]. One possible case for analytical treatment is the EM algorithm with the prior and posterior probabilistic models in Eqs. (10.2) and (10.3) for the compete graph (V, E). The dynamics of the EM algorithm with the MCMC method can be analyzed by using the replica method and the master equations for Glauber dynamics [31].^{Footnote 1}

2.3 Expectation-Maximization Algorithm for Probabilistic Image Segmentations

This section extends the previous section to the statistical machine learning framework for probabilistic image segmentation. In probabilistic image segmentations, we consider a square grid graph (V, E) in which a light intensity vector ${\boldsymbol{d_{i}}}={\left( d_{i\mathrm{{R}}}, d_{i\mathrm{{G}}}, d_{i\mathrm{{B}}} \right) }$ for the three components red $d_{i\mathrm{{R}}}$, green $d_{i\mathrm{{G}}}$ and blue $d_{i\mathrm{{B}}}$ is assigned to each node i. The state vector ${\boldsymbol{s}}$ for the labeled configuration and the data matrix ${\boldsymbol{D}}$ for the color image configuration are expressed as

$$\begin{aligned} {\boldsymbol{s}} = {\left( \begin{array}{ccc} s_{1} \\ s_{2} \\ s_{3} \\ {\vdots } \\ s_{|V|} \end{array} \right) },~ {\boldsymbol{D}} = {\left( \begin{array}{ccccccc} {\boldsymbol{d_{1}}} \\ {\boldsymbol{d_{2}}} \\ {\boldsymbol{d_{3}}} \\ {\vdots } \\ {\boldsymbol{d_{|V|}}} \\ \end{array} \right) } = {\left( \begin{array}{ccccccc} d_{1\mathrm{{R}}} &{} d_{1\mathrm{{G}}} &{} d_{1\mathrm{{B}}} \\ d_{2\mathrm{{R}}} &{} d_{2\mathrm{{G}}} &{} d_{2\mathrm{{B}}} \\ d_{3\mathrm{{R}}} &{} d_{3\mathrm{{G}}} &{} d_{3\mathrm{{B}}} \\ {\vdots } &{} {\vdots } &{} {\vdots } \\ d_{|V|\mathrm{{R}}} &{} d_{|V|\mathrm{{G}}} &{} d_{|V|\mathrm{{B}}} \\ \end{array} \right) }. \end{aligned}$$

(10.28)

Here ${\rho }({\boldsymbol{D}}|{\boldsymbol{s}},{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1))$ and $P({\boldsymbol{s}}|{\alpha })$ are assumed to be as follows:

$$\begin{aligned} {\rho }{\left( {\boldsymbol{D}}|{\boldsymbol{s}},{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } = {\prod _{i{\in }V}} g{\left( {\boldsymbol{d_{i}}} {\big |} s_{i},{\boldsymbol{a}}(s_{i}),{\boldsymbol{C}}(s_{i}) \right) }, \end{aligned}$$

(10.29)

$$\begin{aligned} P({\boldsymbol{s}}|{\alpha }) = {\frac{ {\displaystyle { {\prod _{\{i,j\}{\in }E}} {\exp }{\left( -2{\alpha }{\big (}1-{\delta }_{s_{i},s_{j}}{\big )} \right) } }} }{ {\displaystyle { {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\prod _{\{i,j\}{\in }E}} {\exp }{\left( -2{\alpha } {\big (}1-{\delta }_{s_{i},s_{j}}{\big )} \right) } }} } }, \end{aligned}$$

(10.30)

where

$$\begin{aligned}&g{\left( {\boldsymbol{d_{i}}}{\big |}s_{i},{\boldsymbol{a}}(s_{i}),{\boldsymbol{C}}(s_{i}) \right) } \equiv {\sqrt{{\frac{1}{{\det }{\left( 2{\pi } {\boldsymbol{C}}(s_{i}) \right) }}}}} {\exp }{\left( -{\frac{1}{2}}{\left( {\boldsymbol{d_{i}}}-{\boldsymbol{a}}(s_{i})\right) } {\boldsymbol{C^{-1}}}(s_{i}){\left( {\boldsymbol{d_{i}}}-{\boldsymbol{a}}(s_{i})\right) }^\mathrm{{T}} \right) }, \nonumber \\&\end{aligned}$$

(10.31)

$$\begin{aligned} {\boldsymbol{a}}(+1) = {\left( \begin{array}{ccc} a_\mathrm{{R}}(+1) \\ a_\mathrm{{G}}(+1) \\ a_\mathrm{{B}}(+1) \end{array} \right) },~ {\boldsymbol{a}}(-1) = {\left( \begin{array}{ccc} a_\mathrm{{R}}(-1) \\ a_\mathrm{{G}}(-1) \\ a_\mathrm{{B}}(-1) \end{array} \right) }, \end{aligned}$$

(10.32)

$$\begin{aligned} {\boldsymbol{C}}(+1)= {\left( \begin{array}{ccc} C_{\mathrm{{R}}\mathrm{{R}}}(+1) &{} C_{\mathrm{{R}}\mathrm{{G}}}(+1) &{} C_{\mathrm{{R}}\mathrm{{B}}}(+1) \\ C_{\mathrm{{G}}\mathrm{{R}}}(+1) &{} C_{\mathrm{{G}}\mathrm{{G}}}(+1) &{} C_{\mathrm{{G}}\mathrm{{B}}}(+1) \\ C_{\mathrm{{B}}\mathrm{{R}}}(+1) &{} C_{\mathrm{{B}}\mathrm{{G}}}(+1) &{} C_{\mathrm{{B}}\mathrm{{B}}}(+1) \\ \end{array} \right) },~ {\boldsymbol{C}}(-1)= {\left( \begin{array}{ccc} C_{\mathrm{{R}}\mathrm{{R}}}(-1) &{} C_{\mathrm{{R}}\mathrm{{G}}}(-1) &{} C_{\mathrm{{R}}\mathrm{{B}}}(-1) \\ C_{\mathrm{{G}}\mathrm{{R}}}(-1) &{} C_{\mathrm{{G}}\mathrm{{G}}}(-1) &{} C_{\mathrm{{G}}\mathrm{{B}}}(-1) \\ C_{\mathrm{{B}}\mathrm{{R}}}(-1) &{} C_{\mathrm{{B}}\mathrm{{G}}}(-1) &{} C_{\mathrm{{B}}\mathrm{{B}}}(-1) \\ \end{array} \right) }. \nonumber \\ \end{aligned}$$

(10.33)

Note that the probabilistic graphical model in Eq. (10.30) is referred to as a Potts mode [33].

In probabilistic segmentation and clustering, ${\rho }{\left( {\boldsymbol{D}}|{\boldsymbol{s}},{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }$ in Eq. (10.29) and $P({\boldsymbol{s}}|{\alpha })$ in Eq. (10.30) correspond to the data generative and prior models, respectively. The joint probability of ${\boldsymbol{s}}$ and ${\boldsymbol{D}}$ is expressed in terms of the data generative and prior distributions, ${\rho }{\left( {\boldsymbol{D}}|{\boldsymbol{s}},{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }$ and $P({\boldsymbol{s}}|{\alpha })$, as follows:

$$\begin{aligned} {\rho }{\left( {\boldsymbol{s}},{\boldsymbol{D}}|{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }\equiv {\rho }{\left( {\boldsymbol{D}}|{\boldsymbol{s}},{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }P{\left( {\boldsymbol{s}}|{\alpha }\right) }.\nonumber \\ \end{aligned}$$

(10.34)

By using the joint probability distribution, the posterior probability $P{\left( {\boldsymbol{s}}|{\boldsymbol{D}},{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }$ and the marginal likelihood ${\rho }{\left( {\boldsymbol{D}}|{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }$ are defined by using Bayes formulas as follows:

$$\begin{aligned}&P{\left( {\boldsymbol{s}}|{\boldsymbol{D}},{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } \equiv {\frac{{\rho }{\left( {\boldsymbol{s}},{\boldsymbol{D}}|{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }}{{\rho }{\left( {\boldsymbol{D}}|{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }}}, \nonumber \\ \end{aligned}$$

(10.35)

$$\begin{aligned}&{} {\rho }{\left( {\boldsymbol{D}}|{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } \nonumber \\&\qquad\quad \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\rho }{\left( {\boldsymbol{s}},{\boldsymbol{D}}|{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }. \end{aligned}$$

(10.36)

Estimates of the hyperparameters and parameter vector, namely, ${\widehat{{\alpha }}}({\boldsymbol{D}})$, ${\boldsymbol{{\widehat{a}}}}(+1|{\boldsymbol{D}})$, ${\boldsymbol{{\widehat{a}}}}(-1|{\boldsymbol{D}})$, ${\boldsymbol{{\widehat{C}}}}(+1|{\boldsymbol{D}})$, ${\boldsymbol{{\widehat{C}}}}(-1|{\boldsymbol{D}})$, ${\boldsymbol{{\widehat{s}}}}({\boldsymbol{D}}) = {\left( {\widehat{s}}_{1}({\boldsymbol{D}}),{\widehat{s}}_{2}({\boldsymbol{D}}),{\cdots },{\widehat{s}}_{|V|}({\boldsymbol{D}}) \right) }$ are determined by

$$\begin{aligned}&{\left( {\widehat{{\alpha }}}({\boldsymbol{D}}), {\boldsymbol{{\widehat{a}}}}(+1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{a}}}}(-1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{C}}}}(+1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{C}}}}(-1|{\boldsymbol{D}}) \right) } \nonumber \\&\qquad\qquad{} ={\arg }{\max _{{\left( {\alpha }, {\boldsymbol{a}}(+1), {\boldsymbol{a}}(-1), {\boldsymbol{C}}(+1), {\boldsymbol{C}}(-1) \right) }}} {\rho }{\left( {\boldsymbol{D}}{\big |} {\alpha }, {\boldsymbol{a}}(+1), {\boldsymbol{a}}(-1), {\boldsymbol{C}}(+1), {\boldsymbol{C}}(-1) \right) }, \end{aligned}$$

(10.37)

$$\begin{aligned}&{\widehat{s}}_{i}({\boldsymbol{D}}) ={\arg }{\max _{s_{i}{\in }{\Omega }}}P_{i}{\left( s_{i}{\big |}{\boldsymbol{D}}, {\widehat{{\alpha }}}({\boldsymbol{D}}), {\boldsymbol{{\widehat{a}}}}(+1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{a}}}}(-1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{C}}}}(+1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{C}}}}(-1|{\boldsymbol{D}})\right) }~(i{\in }V). \nonumber \\&\end{aligned}$$

(10.38)

The $\mathcal{{Q}}$-function for the EM algorithm in the present framework is defined by

$$\begin{aligned}&\mathcal{{Q}}{\left( {\alpha },{\boldsymbol{s}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1) |{\boldsymbol{s'}}(+1),{\boldsymbol{s'}}(-1),{\boldsymbol{C'}}(+1),{\boldsymbol{C'}}(-1),{\boldsymbol{D}} \right) } \nonumber \\&\qquad\quad{} \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} P{\left( {\boldsymbol{s}}|{\boldsymbol{D}},{\alpha },{\boldsymbol{a'}}(+1),{\boldsymbol{a'}}(-1),{\boldsymbol{C'}}(+1),{\boldsymbol{C'}}(-1)\right) } \nonumber \\&\qquad\qquad\qquad{}{\times } {\ln }{\left( {\rho }{\left( {\boldsymbol{s}},{\boldsymbol{D}}|{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } \right) }. \end{aligned}$$

(10.39)

The EM algorithm is a procedure that performs the following E-step and M-step repeatedly for $t=0,1,2,{\cdots }$ until ${\widehat{\alpha }}({\boldsymbol{D}})$, ${\boldsymbol{\widehat{a}}}(+1,{\boldsymbol{D}})$, ${\boldsymbol{\widehat{a}}}(-1,{\boldsymbol{D}})$, ${\boldsymbol{\widehat{C}}}(+1,{\boldsymbol{D}})$, ${\boldsymbol{\widehat{C}}}(-1,{\boldsymbol{D}})$ converge:

E-step::

Compute $\mathcal{{Q}}\left( {\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1) {\big |}{\alpha }(t),{\boldsymbol{a}}(+1,t),{\boldsymbol{a}}(-1,t),\right. \left. {\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t) \right) $ for various values of ${\boldsymbol{a}}(+1)$, ${\boldsymbol{a}}(-1)$, ${\boldsymbol{C}}(+1)$, and ${\boldsymbol{C}}(-1)$.

M-step::

Determine ${\alpha }(t+1)$, ${\boldsymbol{a}}(+1,t+1)$, ${\boldsymbol{a}}(-1,t+1)$, ${\boldsymbol{C}}(+1,t+1)$, and ${\boldsymbol{C}}(-1,t+1)$ that satisfy the extremum conditions of $\mathcal{{Q}}$-function with respect to ${\boldsymbol{a}}(+1)$, ${\boldsymbol{a}}(-1)$, ${\boldsymbol{C}}(+1)$ and ${\boldsymbol{C}}(-1)$ as follows:

$$\begin{aligned}&{\left( {\alpha }(t+1),{\boldsymbol{a}}(+1,t+1),{\boldsymbol{a}}(-1,t+1),{\boldsymbol{C}}(+1,t+1),{\boldsymbol{C}}(-1,t+1) \right) } \nonumber \\&{\leftarrow } {\mathop {{\mathrm{extremum}}}\limits _{{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)}} \nonumber \\&\mathcal{{Q}}{\left( {\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1) {\big |} {\alpha }(t),{\boldsymbol{a}}(+1,t),{\boldsymbol{a}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t),{\boldsymbol{D}} \right) }. \nonumber \\&\end{aligned}$$

(10.40)

Update ${\widehat{\alpha }}({\boldsymbol{D}}){\leftarrow }{\alpha }(t+1)$, ${\boldsymbol{\widehat{a}}}(+1,{\boldsymbol{D}}){\leftarrow }{\boldsymbol{a}}(+1,t+1)$, ${\boldsymbol{\widehat{a}}}(-1,{\boldsymbol{D}}){\leftarrow }{\boldsymbol{a}}(-1,t+1)$, ${\boldsymbol{\widehat{C}}}(+1,{\boldsymbol{D}}){\leftarrow }{\boldsymbol{C}}(+1,t+1)$ and ${\boldsymbol{\widehat{C}}}(-1,{\boldsymbol{D}}){\leftarrow }{\boldsymbol{C}}(-1,t+1)$.

By using the equalities in Eqs. (10.29), (10.30), (10.34), and (10.35), the EM algorithm by the $\mathcal{{Q}}$-function can be reduced to the following simultaneous update rules:

$$\begin{aligned}&{\frac{1}{|E|}}{\sum _{\{i,j\}{\in }E}} {\sum _{s_{i}{\in }{\Omega }}} {\sum _{s_{j}{\in }{\Omega }}} {\left( 1-{\delta }_{s_{i},s_{j}}\right) } P_{ij}{\left( s_{i},s_{j}{\big |}{\alpha }(t+1)\right) } \nonumber \\&= {\frac{1}{|E|}} {\sum _{\{i,j\}{\in }E}} {\sum _{s_{i}{\in }{\Omega }}} {\sum _{s_{j}{\in }{\Omega }}} {\left( 1-{\delta }_{s_{i},s_{j}}\right) } P_{ij}{\left( s_{i},s_{j}{\big |}{\boldsymbol{D}},{\alpha }(t),{\boldsymbol{a}}(+1,t),{\boldsymbol{a}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)\right) }, \nonumber \\&\end{aligned}$$

(10.41)

$$\begin{aligned} {\boldsymbol{a}}(s_{i},t+1) = {\frac{ {\displaystyle { {\sum _{i{\in }V}} {\boldsymbol{d_{i}}} P_{i}{\left( s_{i}{\big |}{\boldsymbol{D}},{\alpha }(t),{\boldsymbol{a}}(+1,t),{\boldsymbol{a}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)\right) } }} }{ {\displaystyle { {\sum _{i{\in }V}} P_{i}{\left( s_{i}{\big |}{\boldsymbol{D}},{\alpha }(t),{\boldsymbol{a}}(+1,t),{\boldsymbol{a}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)\right) } }} } } ~(s_{i}{\in }{\Omega }), \nonumber \\ \end{aligned}$$

(10.42)

$$\begin{aligned}&{\boldsymbol{C}}(s_{i},t+1) \nonumber \\&= {\frac{ {\displaystyle { {\sum _{i{\in }V}} {\left( {\boldsymbol{d_{i}}}-{\boldsymbol{a}}(s_{i},t)\right) }^\mathrm{{T}}{\left( {\boldsymbol{d_{i}}}-{\boldsymbol{a}}(s_{i},t)\right) } P_{i}{\left( s_{i}{\big |}{\boldsymbol{D}},{\alpha }(t),{\boldsymbol{a}}(+1,t),{\boldsymbol{a}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)\right) } }} }{ {\displaystyle { {\sum _{i{\in }V}} P_{i}{\left( s_{i}{\big |}{\boldsymbol{D}},{\alpha }(t),{\boldsymbol{a}}(+1,t),{\boldsymbol{a}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)\right) } }} } } ~(s_{i}{\in }{\Omega }), \nonumber \\&\end{aligned}$$

(10.43)

where

$$\begin{aligned}&P_{i}(s_{i}|{\boldsymbol{D}},{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)) \nonumber \\&\equiv {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}} P{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}{\big |}{\boldsymbol{D}},{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } ~(i{\in }V), \nonumber \\ \end{aligned}$$

(10.44)

$$\begin{aligned}&P_{ij}(s_{i},s_{j}|{\boldsymbol{D}},{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)) \nonumber \\&=P_{ji}(s_{j},s_{i}|{\boldsymbol{D}},{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)) \nonumber \\&\equiv {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}} P{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}{\big |}{\boldsymbol{D}},{\alpha },{\boldsymbol{a}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } ~(\{i,j\}{\in }E), \nonumber \\&\end{aligned}$$

(10.45)

$$\begin{aligned} P_{ij}(s_{i},s_{j}|{\alpha })&=P_{ji}(s_{j},s_{i}|{\alpha }) \nonumber \\&\equiv {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}} P{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}{\big |}{\alpha }\right) } ~(\{i,j\}{\in }E). \nonumber \\&\end{aligned}$$

(10.46)

3 Statistical Mechanical Informatics

In statistical mechanical informatics [13,14,15,16,17], Ising models are very familiar probabilistic models for which computations are done by statistical mechanical techniques, including advanced mean-field methods, renormalization group methods, Monte Carlo simulations, and replica methods [36, 37]. This section reviews the framework of the Ising model and associated advanced mean-field methods.^{Footnote 2}

3.1 Ising Model

Let us consider an Ising model defined by the following probability distribution for the state space ${\Omega }=\{+1,-1\}$ for the state variable $s_{i}$ at each node $i({\in }V)$:

$$\begin{aligned} P{\left( {\boldsymbol{s}}{\Big |} {\boldsymbol{d}},{\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }= & {} P{\left( s_{1},s_{2},{\cdots },s_{|V|} {\Big |} d_{1},d_{2},{\cdots },d_{|V|},{\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) } \nonumber \\\equiv & {} {\frac{ {\displaystyle { {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\left( {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}}{\big (}s_{i}-s_{j}{\big )}^{2}+{\frac{1}{2}}h{\sum _{i{\in }V}}{\big (}s_{i}-d_{i}{\big )}^{2}\right) }\right) } }} }{ {\displaystyle { {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\left( {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}}{\big (}s_{i}-s_{j}{\big )}^{2}+{\frac{1}{2}}h{\sum _{i{\in }V}}{\big (}s_{i}-d_{i}{\big )}^{2}\right) }\right) } }} }} \nonumber \\&{} {\big (}J> 0,,~T>0,~d_{i}{\in }(-{\infty },+{\infty })~({\forall }i{\in }V){\big )}. \end{aligned}$$

(10.47)

Because ${s_{i}}^{2}=1$ ($i{\in }V$), the probability distribution $P({\boldsymbol{s}})$ can be reduced to

$$\begin{aligned} P{\left( {\boldsymbol{s}}{\Big |} {\boldsymbol{d}}, {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) } = {\frac{1}{Z}}{\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}H({\boldsymbol{s}})\right) }~(T>0), \end{aligned}$$

(10.48)

$$\begin{aligned} H({\boldsymbol{s}})= & {} H(s_{1},s_{2},{\cdots },s_{|V|}) \nonumber \\\equiv & {} -J{\sum _{\{i,j\}{\in }E}}s_{i}s_{j}-h{\sum _{i{\in }V}}d_{i}s_{i}~{\big (}J>0,~h{\ge }0,~d_{i}{\in }(-{\infty },+{\infty })~({\forall }i{\in }V){\big )}, \end{aligned}$$

(10.49)

$$\begin{aligned} Z \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}H({\boldsymbol{s}})\right) }, \end{aligned}$$

(10.50)

where $H({\boldsymbol{s}})$ and Z are referred to in statistical mechanical informatics as the energy function (or Hamiltonian) and the partition function, respectively, the probability distribution in Eq. (10.47) is called the Gibbs distribution, $k_\mathrm{{B}}$ is the Boltzmann constant, T is the (absolute) temperature, J is the (ferromagnetic) interaction, and h is the external field.

Let us suppose the Kullback-Leibler Divergence

$$\begin{aligned} \mathrm{{KL}}[P||R] \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} R({\boldsymbol{s}}) {\ln }{\left( {\frac{R({\boldsymbol{s}})}{P{\left( {\boldsymbol{s}}{\Big |} {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }}} \right) }, \end{aligned}$$

(10.51)

which is always non-negative for two probability distributions $P{\left( {\boldsymbol{s}} {\Big |} {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}} \right) }$ and $R({\boldsymbol{s}})$ and is regarded as a pseudo-distance between them. By substituting the explicit expression for $P({\boldsymbol{s}})$ in Eqs. (10.48), (10.49) and (10.50) into Eq. (10.51), the expression for the Kullback-Leibler divergence (10.51) in terms of the partition function Z and the free energy functional $\mathcal{{F}}[R]$ can be derived as follows:

$$\begin{aligned} \mathrm{{KL}}[P||R] = {\frac{1}{k_\mathrm{{B}}T}} {\big (}k_\mathrm{{B}}T{\ln }(Z) +\mathcal{{F}}[R] {\big )}, \end{aligned}$$

(10.52)

where

$$\begin{aligned} \mathcal{{F}}[R] \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} H({\boldsymbol{s}})R({\boldsymbol{s}}) + k_\mathrm{{B}}T {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} R({\boldsymbol{s}}){\ln }{\big (}R({\boldsymbol{s}}){\big )}. \end{aligned}$$

(10.53)

For the free energy functional $\mathcal{{F}}[R]$, it is valid that

$$\begin{aligned} {\arg }{\min _{R}}{\left\{ \mathcal{{F}}[R] {\Big |} {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} R{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}\right) }=1 \right\} }= P{\left( {\boldsymbol{s}}{\Big |} {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }, \nonumber \\ \end{aligned}$$

(10.54)

$$\begin{aligned} {\min _{R}}{\left\{ \mathcal{{F}}[R] {\Big |} {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} R{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}\right) }=1 \right\} } =-k_\mathrm{{B}}T{\ln }{\left( Z\right) }. \end{aligned}$$

(10.55)

Note that $-k_\mathrm{{B}}T{\ln }{\left( Z\right) }$ is referred to as the free energy for the Gibbs distribution in Eq. (10.47).

3.2 Advanced Mean-Field Method

This section reviews the fundamental framework of advanced mean-field methods [12], including the mean-field approximation [35,36,37] and the Bethe approximation [35, 39,40,41]. Our framework is given for the Ising model in Eqs. (10.48), (10.49), and (10.50). It is known that a generalization of the present framework can be realized by using the cluster variation method in Refs. [42,43,44,45].

We introduce a trial probability distribution $R({\boldsymbol{s}})=R(s_{1},s_{2},{\cdots },s_{|V|})$ which is restricted to the following functional form:

$$\begin{aligned} R({\boldsymbol{s}})=R{\left( s_{1},s_{2},{\cdots },s_{|V|}\right) }={\prod _{i{\in }V}}R_{i}(s_{i}), \end{aligned}$$

(10.56)

$$\begin{aligned} R_{i}(s_{i}) = {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}} R{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}\right) } ~(i{\in }V). \end{aligned}$$

(10.57)

By using the definition of $R_{i}(s_{i})$ and the normalization condition

$$\begin{aligned} {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} R{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}\right) } =1, \end{aligned}$$

(10.58)

we confirm that

$$\begin{aligned} {\sum _{{\tau }_{i}{\in }{\Omega }}} R_{i}({\tau }_{i}) =R_{i}(+1)+R_{i}(-1) = 1~(i{\in }V). \end{aligned}$$

(10.59)

By substituting the expression $R({\boldsymbol{s}})$ in terms of the marginal probability distribution $R_{i}(s_{i})$ ($i{\in }V$), such that ${\boldsymbol{R_{i}}}= {\left( \begin{array}{ccc} R_{i}(+1) &{} 0 \\ 0 &{} R_{i}(-1) \end{array} \right) }$ into Eqs. (10.56)–(10.53), the free energy functional $\mathcal{{F}}[R]$ can be reduced to the following mean-field free energy functional:

$$\begin{aligned} \mathcal{{F}}[R]=\mathcal{{F}}_\mathrm{{MF}}[\{{\boldsymbol{R}}_{i}|i{\in }V\}], \end{aligned}$$

(10.60)

where

$$\begin{aligned} \mathcal{{F}}_\mathrm{{MF}}[\{{\boldsymbol{R_{i}}}|i{\in }V\}]= & {} \mathcal{{F}}_\mathrm{{MF}}{\left[ {\left\{ {\left( \begin{array}{ccc} R_{i}(+1) &{} 0 \\ 0 &{} R_{i}(-1) \end{array} \right) } {\Bigg |}i{\in }V\right\} }\right] } \nonumber \\\equiv & {} -J{\sum _{\{i,j\}{\in }E}}{\left( {\sum _{{\tau }_{i}{\in }{\Omega }}}{\tau }_{i}R_{i}({\tau }_{i})\right) }{\left( {\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{j}R_{j}({\tau }_{j})\right) } -h{\sum _{i{\in }V}}d_{i}{\left( {\sum _{{\tau }_{i}{\in }{\Omega }}}{\tau }_{i}R_{i}({\tau }_{i})\right) } \nonumber \\&+k_\mathrm{{B}}T{\sum _{i{\in }V}}{\sum _{{\tau }_{i}{\in }{\Omega }}}R_{i}({\tau }_{i}){\ln }{\left( R_{i}({\tau }_{i})\right) }. \end{aligned}$$

(10.61)

Let us suppose the following conditional minimization of the free energy functional:

$$\begin{aligned}&{\boldsymbol{{\widehat{R}}_{i}}} = {\left\{ {\left( \begin{array}{ccc} {\widehat{R}}_{i}(+1) &{} 0 \\ 0 &{} {\widehat{R}}_{i}(-1) \end{array} \right) } {\Big |}i{\in }V,s_{i}{\in }{\Omega }\right\} } \nonumber \\&={\arg }{\min _{\{{\boldsymbol{R_{i}}}|i{\in }V\}}}{\Bigg \{}\mathcal{{F}}_\mathrm{{MF}}[\{{\boldsymbol{R_{i}}}|i{\in }V\}]{\Bigg |}{\sum _{{\tau }_{i}{\in }{\Omega }}}R_{i}({\tau }_{i}) =1,i{\in }V{\Bigg \}}. \end{aligned}$$

(10.62)

First we introduce the Lagrange multiplier ${\lambda }_{i}$ ($i{\in }V$) to ensure the normalization conditions ${\displaystyle {{\sum _{{\tau }{\in }{\Omega }}}R_{i}({\tau }) =1}}$ ($i{\in }V$) as follows:

$$\begin{aligned} \mathcal{{L}}_\mathrm{{MF}}[\{{\boldsymbol{R_{i}}}|i{\in }V\}] \equiv \mathcal{{F}}_\mathrm{{MF}}[\{{\boldsymbol{R_{i}}}|i{\in }V\}] - {\sum _{i{\in }V}}{\lambda }_{i}{\big (}{\sum _{{\tau }_{i}{\in }{\Omega }}}R_{i}({\tau }_{i})-1{\big )}. \end{aligned}$$

(10.63)

${\boldsymbol{{\widehat{R}}_{i}}}$ ($i{\in }V$) are determined so as to satisfy the following extremum condition:

$$\begin{aligned} {\Big [}{\frac{{\partial }}{{\partial }R_{i}(-1)}}\mathcal{{L}}_\mathrm{{MF}}[\{{\boldsymbol{R_{i}}}|i{\in }V\}]{\Big ]}_{\{{\boldsymbol{R_{i}}}={\boldsymbol{{\widehat{R}}_{i}}}|i{\in }V\}}=0 ~(i{\in }V), \end{aligned}$$

(10.64)

$$\begin{aligned} {\Big [}{\frac{{\partial }}{{\partial }R_{i}(+1)}}\mathcal{{L}}_\mathrm{{MF}}[\{{\boldsymbol{R_{i}}}|i{\in }V\}]{\Big ]}_{\{{\boldsymbol{R_{i}}}={\boldsymbol{{\widehat{R}}_{i}}}|i{\in }V\}}=0 ~(i{\in }V), \end{aligned}$$

(10.65)

such that

$$\begin{aligned} {\Big [}{\frac{{\partial }}{{\partial }{\boldsymbol{R_{i}}}}}\mathcal{{L}}_\mathrm{{MF}}[\{{\boldsymbol{R_{i}}}|i{\in }V\}]{\Big ]}_{\{{\boldsymbol{R_{i}}}={\boldsymbol{{\widehat{R}}_{i}}}|i{\in }V\}}=0 ~(i{\in }V). \end{aligned}$$

(10.66)

It needs to be shown that ${\boldsymbol{{\widehat{R}}_{i}}}$ ($i{\in }V$) are derived as follows:

$$\begin{aligned}&{\widehat{R}}_{i}(s_{i}) \nonumber \\&={\exp }{\left( -1+{\frac{{\lambda }_{i}}{k_\mathrm{{B}}T}}\right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( hd_{i}+J{\sum _{j{\in }{\partial }i}}{\left( {\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{j}{\widehat{R}}_{j}({\tau }_{j})\right) } \right) }s_{i}\right) } ~{\big (}i{\in }V, s_{i}{\in }{\Omega }{\big )}. \nonumber \\ \end{aligned}$$

(10.67)

Finally, ${\lambda }_{i}$ needs to be determined such that it satisfies the normalization condition of the marginal probability ${\widehat{R}}_{i}(s_{i})$. The marginal probabilities ${\big \{}{\boldsymbol{{\widehat{R}}_{i}}}{\big |}i{\in }V{\big \}}$ are derived as

$$\begin{aligned} {\widehat{R}}_{i}(s_{i}) ={\frac{ {\displaystyle { {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( hd_{i}+J{\sum _{j{\in }{\partial }i}}{\left( {\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{j}{\widehat{R}}_{j}({\tau }_{j})\right) } \right) }s_{i}\right) } }} }{ {\displaystyle { {\sum _{{\tau }_{i}{\in }{\Omega }}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( hd_{i}+J{\sum _{j{\in }{\partial }i}}{\left( {\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{j}{\widehat{R}}_{j}({\tau }_{j})\right) } \right) }{\tau }_{i}\right) } }} }} ~{\big (}i{\in }V, s_{i}{\in }{\Omega }{\big )}. \end{aligned}$$

(10.68)

We introduce the local magnetization

$$\begin{aligned} m_{i} \equiv {\sum _{{\tau }_{i}{\in }{\Omega }}}{\tau }_{i}{\widehat{R}}_{i}({\tau }_{i}). \end{aligned}$$

(10.69)

By solving the simultaneous equations

$$\begin{aligned} {\sum _{{\tau }_{i}{\in }{\Omega }}} {\widehat{R}}_{i}({\tau }_{i}) ={\widehat{R}}_{i}(+1)+{\widehat{R}}_{i}(-1) = 1~(i{\in }V), \end{aligned}$$

(10.70)

$$\begin{aligned} {\sum _{{\tau }_{i}{\in }{\Omega }}} {\tau }_{i}{\widehat{R}}_{i}({\tau }_{i}) ={\widehat{R}}_{i}(+1)-{\widehat{R}}_{i}(-1) = m_{i}~(i{\in }V), \end{aligned}$$

(10.71)

with respect to ${\widehat{R}}_{i}(+1)$ and ${\widehat{R}}_{i}(-1)$, we derive the following expression for the marginal probability:

$$\begin{aligned} {\widehat{R}}_{i}(s_{i}) = {\frac{1}{2}}{\big (} 1 + m_{i}s_{i} {\big )}~(i{\in }V). \end{aligned}$$

(10.72)

The extremum conditions in Eq. (10.68) can be reduced to the following simultaneous deterministic equation of $\{m_{i}|i{\in }V\}$:

$$\begin{aligned} m_{i}={\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( hd_{i}+J{\sum _{j{\in }{\partial }i}}m_{j}\right) } \right) }~(i{\in }V), \end{aligned}$$

(10.73)

which is referred to as the mean-field equation.^{Footnote 3}

By substituting Eq. (10.72) into Eq. (10.61), the mean-field free energy functional can be reduced to

$$\begin{aligned} \mathcal{{F}}_\mathrm{{MF}}{\left[ {\big \{}{\widehat{R}}_{i}(-1),{\widehat{R}}_{i}(+1){\big |}i{\in }V{\big \}}\right] } =F_\mathrm{{MF}}{\big (}m_{1},m_{2},{\cdots },m_{|V|}{\big )}, \end{aligned}$$

(10.74)

$$\begin{aligned} F_\mathrm{{MF}}{\big (}m_{1},m_{2},{\cdots },m_{|V|}{\big )}\equiv & {} -J{\sum _{\{i,j\}{\in }E}}m_{i}m_{j} -h{\sum _{i{\in }V}}d_{i}m_{i} \nonumber \\&+k_\mathrm{{B}}T{\sum _{i{\in }V}} {\frac{1}{2}}{\big (}1+m_{i}{\big )} {\ln }{\left( {\frac{1}{2}}{\big (}1+m_{i}{\big )}\right) } \nonumber \\&+k_\mathrm{{B}}T{\sum _{i{\in }V}} {\frac{1}{2}}{\big (}1-m_{i}{\big )} {\ln }{\left( {\frac{1}{2}}{\big (}1-m_{i}{\big )}\right) }. \end{aligned}$$

(10.75)

The extremum conditions

$$\begin{aligned} {\frac{{\partial }}{{\partial }m_{i}}} F_\mathrm{{MF}}{\big (}m_{1},m_{2},{\cdots },m_{|V|}{\big )}=0~(i{\in }V) \end{aligned}$$

(10.76)

can be reduced to the mean-field equations in Eq. (10.73).

We now explore the framework of the Bethe approximation for the Ising model in Eqs. (10.48), (10.49), and (10.50). Our framework is based on the cluster variation method [39, 42,43,44,45].

We introduce a trial probability distribution $R({\boldsymbol{s}})=R(s_{1},s_{2},{\cdots },s_{|V|})$ that is restricted to the following functional form:

$$\begin{aligned} R({\boldsymbol{s}})=R(s_{1},s_{2},{\cdots },s_{|V|})= & {} {\left( {\prod _{i{\in }V}}R_{i}(s_{i})\right) } {\left( {\prod _{\{i,j\}{\in }E}} {\frac{ R_{ij}(s_{i},s_{j}) }{R_{i}(s_{i})R_{j}(s_{j})}} \right) } \nonumber \\= & {} {\left( {\prod _{i{\in }V}}R_{i}(s_{i})^{1-|{\partial }i|}\right) } {\left( {\prod _{\{i,j\}{\in }E}} R_{ij}(s_{i},s_{j}) \right) }, \end{aligned}$$

(10.77)

where

$$\begin{aligned} R_{ij}(s_{i},s_{j})= & {} R_{ji}(s_{j},s_{i}) \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}} {\sum _{{\tau }_{2}{\in }{\Omega }}} {\cdots } {\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}} R{\left( {\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}\right) } ~(\{i,j\}{\in }E). \end{aligned}$$

(10.78)

By using Eqs. (10.57) and (10.78), we can derive the normalization and reducibility conditions in the marginal probabilities as follows:

$$\begin{aligned} {\sum _{{\tau }_{i}{\in }{\Omega }}}R_{i}({\tau }_{i})=1~(i{\in }V), ~ {\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}({\tau }_{i},{\tau }_{j})=1~(\{i,j\}{\in }E), \end{aligned}$$

(10.79)

$$\begin{aligned} R_{i}(s_{i}) = {\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}(s_{i},{\tau }_{j}),~ R_{j}(s_{j}) = {\sum _{{\tau }_{i}{\in }{\Omega }}}R_{ij}({\tau }_{i},s_{j})~(\{i,j\}{\in }E). \end{aligned}$$

(10.80)

By substituting the explicit expression for $P({\boldsymbol{s}})$ and the expression ${\ln }{\left( R({\boldsymbol{s}})\right) }$ in terms of the marginal probability distributions ${\boldsymbol{R}}_{i}= {\left( \begin{array}{ccc} R_{i}(+1) &{} 0 \\ 0 &{} R_{i}(-1) \end{array} \right) }$ ($i{\in }V$) and ${\boldsymbol{R}}_{ij}= {\left( \begin{array}{cccc} R_{ij}(+1,+1) &{} 0 &{} 0 &{} 0 \\ 0 &{} R_{ij}(+1,-1) &{} 0 &{} 0 \\ 0 &{} 0 &{} R_{ij}(-1,+1) &{} 0 \\ 0 &{} 0 &{} 0 &{} R_{ij}(-1,-1) \end{array} \right) }$ ($\{i,j\}{\in }E$) in Eq. (10.77) into Eq. (10.51), the Kullback-Leibler divergence can be reduced to the following expression in terms of the partition function Z and the Bethe free energy functional $\mathcal{{F}}_\mathrm{{Bethe}}[\{{\boldsymbol{R}}_{i}|i{\in }V\},\{{\boldsymbol{R}}_{ij}|\{i,j\}{\in }E\}]$:

$$\begin{aligned} \mathrm{{KL}}[P||R] = {\frac{1}{k_\mathrm{{B}}T}} {\big (}k_\mathrm{{B}}T{\ln }Z +\mathcal{{F}}_\mathrm{{Bethe}}{\big [}\{{\boldsymbol{R}}_{i}|i{\in }V\},\{{\boldsymbol{R}}_{ij}|\{i,j\}{\in }E\}{\big ]} {\big )}, \end{aligned}$$

(10.81)

where

$$\begin{aligned}&\mathcal{{F}}_\mathrm{{Bethe}}{\big [}\{{\boldsymbol{R}}_{i}|i{\in }V\},\{{\boldsymbol{R}}_{ij}|\{i,j\}{\in }E\}{\big ]} \nonumber \\&\equiv -J{\sum _{\{i,j\}{\in }E}}{\left( {\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{i}{\tau }_{j}R_{\{i,j\}}({\tau }_{i},{\tau }_{j})\right) } -h{\sum _{i{\in }V}}{\left( {\sum _{{\tau }_{i}{\in }{\Omega }}}{\tau }_{i}R_{i}({\tau }_{i})\right) } \nonumber \\&\,\,{} +k_\mathrm{{B}}T{\sum _{i{\in }V}}{\left( 1-|{\partial }i|\right) } {\left( {\sum _{{\tau }_{i}{\in }{\Omega }}}R_{i}({\tau }_{i}){\ln }{\big (}R_{i}({\tau }_{i}){\big )}\right) } \nonumber \\&\,\,{} +k_\mathrm{{B}}T {\sum _{\{i,j\}{\in }E}} {\left( {\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}({\tau }_{i},{\tau }_{j}){\ln }{\left( R_{ij}({\tau }_{i},{\tau }_{j})\right) }\right) }. \end{aligned}$$

(10.82)

Let us suppose the following conditional minimization of the Bethe free energy functional:

$$\begin{aligned}&{\left( {\left\{ {\boldsymbol{\widehat{R}}}_{i}{\Big |}i{\in }V\right\} },{\left\{ {\boldsymbol{\widehat{R}}}_{\{i,j\}}{\Big |}\{i,j\}{\in }E \right\} } \right) } \nonumber \\&={\arg }{\min _{\{{\boldsymbol{R}}_{i}|i{\in }V\},\{{\boldsymbol{R}}_{\{i,j\}}|\{i,j\}{\in }E\}}} {\Bigg \{}\mathcal{{F}}_\mathrm{{Bethe}}{\big [}\{{\boldsymbol{R}}_{i}|i{\in }V\}],\{{\boldsymbol{R}}_{\{i,j\}}|\{i,j\}{\in }E\}{\big ]} \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad{} {\Bigg |}{\sum _{{\tau }_{i}{\in }{\Omega }}}R_{i}({\tau }_{i}) =1~(i{\in }V), {\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}({\tau }_{i},{\tau }_{j}) =1~(\{i,j\}{\in }E), \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad{} R_{i}(-1)={\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}(-1,{\tau }_{j})~(j{\in }{\partial }i,~i{\in }V), \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad{} R_{i}(+1)={\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}(+1,{\tau }_{j})~(j{\in }{\partial }i,~i{\in }V) {\Bigg \}}. \end{aligned}$$

(10.83)

We introduce the Lagrange multiplier ${\lambda }_{i}$ ($i{\in }V$), ${\lambda }_{\{i,j\}}$, ${\lambda }_{i,ij}(-1)={\lambda }_{i,ji}(-1)$, ${\lambda }_{i,ij}(+1)={\lambda }_{i,ji}(+1)$ ($\{i,j\}{\in }E$) to ensure the normalization and reducibility conditions as follows:

$$\begin{aligned} \mathcal{{L}}_\mathrm{{Bethe}}{\big [}\{{\boldsymbol{R}}_{i}|i{\in }V\},\{{\boldsymbol{R}}_{ij}|\{i,j\}{\in }E\}{\big ]}\equiv & {} \mathcal{{F}}_\mathrm{{Bethe}}{\big [}\{{\boldsymbol{R}}_{i}|i{\in }V\},\{{\boldsymbol{R}}_{ij}|\{i,j\}{\in }E\}{\big ]} \nonumber \\&- {\sum _{i{\in }V}}{\lambda }_{i}{\big (}{\sum _{{\tau }_{i}{\in }{\Omega }}}R_{i}({\tau }_{i})-1{\big )} \nonumber \\&- {\sum _{\{i,j\}{\in }E}}{\lambda }_{\{i,j\}}{\big (}{\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}({\tau }_{i},{\tau }_{j})-1{\big )} \nonumber \\&- {\sum _{i{\in }V}}{\sum _{j{\in }{\partial }i}}{\lambda }_{i,ij}(-1){\big (}R_{i}(-1)-{\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}(-1,{\tau }_{j}){\big )} \nonumber \\&- {\sum _{i{\in }V}}{\sum _{j{\in }{\partial }i}}{\lambda }_{i,ij}(+1){\big (}R_{i}(+1)-{\sum _{{\tau }_{j}{\in }{\Omega }}}R_{ij}(+1,{\tau }_{j}){\big )}. \end{aligned}$$

(10.84)

The marginal probabilities ${\boldsymbol{{\widehat{R}}_{i}}}$ ($i{\in }V$) and ${\boldsymbol{{\widehat{R}}_{ij}}}$ ($\{i,j\}{\in }E$) are determined so as to satisfy the following extremum condition:

$$\begin{aligned} {\Big [}{\frac{{\partial }}{{\partial }{\boldsymbol{R_{i}}}}} \mathcal{{L}}_\mathrm{{Bethe}}{\big [}\{{\boldsymbol{R_{i}}}|i{\in }V\},\{{\boldsymbol{R}}_{ij}|\{i,j\}{\in }E\}{\big ]} {\Big ]}_{\{{\boldsymbol{R}}_{i}={\boldsymbol{{\widehat{R}}_{i}}}|i{\in }V\},\{{\boldsymbol{R}}_{ij}={\boldsymbol{{\widehat{R}}_{ij}}}|\{i,j\}{\in }E\}}=0 ~(i{\in }V), \nonumber \\ \end{aligned}$$

(10.85)

$$\begin{aligned} {\Big [}{\frac{{\partial }}{{\partial }{\boldsymbol{R_{ij}}}}} \mathcal{{L}}_\mathrm{{Bethe}}{\big [}\{{\boldsymbol{R}}_{i}|i{\in }V\},\{{\boldsymbol{R}}_{ij}|\{i,j\}{\in }E\}{\big ]} {\Big ]}_{\{{\boldsymbol{R}}_{i}={\boldsymbol{{\widehat{R}}_{i}}}|i{\in }V\},\{{\boldsymbol{R}}_{ij}={\boldsymbol{{\widehat{R}}_{ij}}}|\{i,j\}{\in }E\}}=0 ~(\{i,j\}{\in }E). \nonumber \\ \end{aligned}$$

(10.86)

It needs to be shown that ${\widehat{R}}_{i}(s_{i})$ ($i{\in }V$) and ${\widehat{R}}_{ij}(s_{i},s_{j})={\widehat{R}}_{ji}(s_{j},s_{i})$ ($\{i,j\}{\in }E$) are derived as follows:

$$\begin{aligned} {\widehat{R}}_{i}(s_{i})= & {} {\exp }{\left( -1-{\frac{{\lambda }_{i}}{k_\mathrm{{B}}T{\left( |{\partial }i|-1\right) }}}\right) } \nonumber \\&{\times } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( -{\frac{1}{|{\partial }i|-1}}hd_{i}s_{i} -{\frac{1}{|{\partial }i|-1}}{\sum _{j{\in }{\partial }i}}{\lambda }_{i,ij}(s_{i}) \right) }\right) } ~(i{\in }V), \end{aligned}$$

(10.87)

$$\begin{aligned} {\widehat{R}}_{ij}(s_{i},s_{j}) ={\widehat{R}}_{ji}(s_{j},s_{i})= & {} {\exp }{\left( -1+{\frac{{\lambda }_{\{i,j\}}}{k_\mathrm{{B}}T}}\right) } \nonumber \\&{}{\times } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( Js_{i}s_{j} -{\lambda }_{i,ij}(s_{i}) -{\lambda }_{j,ij}(s_{j}) \right) }\right) } ~(\{i,j\}{\in }E). \end{aligned}$$

(10.88)

Finally, ${\lambda }_{i}$ and ${\lambda }_{\{i,j\}}$ need to be determined so as to satisfy the normalization condition of the marginal probabilities ${\widehat{R}}_{i}(s_{i})$ and ${\widehat{R}}_{ij}(s_{i},s_{j})$.

By introducing the messages ${\mu }_{k{\rightarrow }i}(s_{i})$ and ${\mu }_{l{\rightarrow }j}(s_{j})$ in the transformations

$$\begin{aligned} {\exp }{\left( -{\frac{{\lambda }_{i,ij}(s_{i})}{k_\mathrm{{B}}T}} \right) } ={\left( {\prod _{k{\in }{\partial }i{\setminus }\{j\}}}{\mu }_{k{\rightarrow }i}(s_{i})\right) }{\exp }{\left( {\frac{h}{k_\mathrm{{B}}T}}d_{i}s_{i} \right) }, \end{aligned}$$

(10.89)

$$\begin{aligned} {\exp }{\left( -{\frac{{\lambda }_{j,ij}(s_{j})}{k_\mathrm{{B}}T}} \right) } ={\left( {\prod _{l{\in }{\partial }j{\setminus }\{i\}}}{\mu }_{l{\rightarrow }j}(s_{j})\right) }{\exp }{\left( {\frac{h}{k_\mathrm{{B}}T}}d_{j}s_{j} \right) }. \end{aligned}$$

(10.90)

The expressions of the marginal probabilities ${\boldsymbol{\widehat{R}}}_{i}$ and ${\boldsymbol{\widehat{R}}}_{ij}$ in Eqs. (10.87) and (10.88) can be reduced to the following expressions:

$$\begin{aligned} {\widehat{R}}_{i}(s_{i}) = {\frac{1}{Z_{i}}} {\left( {\prod _{k{\in }{\partial }i}}{\mu }_{k{\rightarrow }i}(s_{i}) \right) }{\exp }{\left( {\frac{h}{k_\mathrm{{B}}T}}d_{i}s_{i}\right) } ~(i{\in }V), \end{aligned}$$

(10.91)

$$\begin{aligned} {\widehat{R}}_{ij}(s_{i},s_{j}) ={\widehat{R}}_{ji}(s_{j},s_{i})= & {} {\frac{1}{Z_{\{i,j\}}}} {\left( {\prod _{k{\in }{\partial }i{\setminus }\{j\}}}{\mu }_{k{\rightarrow }i}(s_{i})\right) } {\left( {\prod _{l{\in }{\partial }j{\setminus }\{i\}}}{\mu }_{l{\rightarrow }j}(s_{j})\right) } \nonumber \\&{\times } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( Js_{i}s_{j}+hd_{i}s_{i}+hd_{j}s_{j}\right) }\right) } ~(\{i,j\}{\in }E), \nonumber \\&\end{aligned}$$

(10.92)

$$\begin{aligned} Z_{i} \equiv {\sum _{{\tau }_{i}{\in }{\Omega }}} {\left( {\prod _{k{\in }{\partial }i}}{\mu }_{k{\rightarrow }i}({\tau }_{i}) \right) }{\exp }{\left( {\frac{h}{k_\mathrm{{B}}T}}d_{i}{\tau }_{i}\right) } ~(i{\in }V), \end{aligned}$$

(10.93)

$$\begin{aligned} Z_{\{i,j\}}\equiv & {} {\sum _{{\tau }_{i}{\in }{\Omega }}} {\sum _{{\tau }_{j}{\in }{\Omega }}} {\left( {\prod _{k{\in }{\partial }i{\setminus }\{j\}}}{\mu }_{k{\rightarrow }i}({\tau }_{i})\right) } {\left( {\prod _{l{\in }{\partial }j{\setminus }\{i\}}}{\mu }_{l{\rightarrow }j}({\tau }_{j})\right) } \nonumber \\&\qquad{}{\times } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( J{\tau }_{i}{\tau }_{j}+hd_{i}{\tau }_{i}+hd_{j}{\tau }_{j}\right) }\right) } ~(\{i,j\}{\in }E). \end{aligned}$$

(10.94)

By substituting Eqs. (10.91) and (10.92) into the reducibility conditions in Eq. (10.80), the simultaneous deterministic equations for the messages can be derived as follows:

$$\begin{aligned} {\mu }_{j{\rightarrow }i}(s_{i}) = {\frac{Z_{i}}{Z_{\{i,j\}}}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\left( {\prod _{l{\in }{\partial }j{\setminus }\{i\}}}{\mu }_{l{\rightarrow }j}({\tau }_{j})\right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( Js_{i}{\tau }_{j}+hd_{j}{\tau }_{j}\right) }\right) } ~(\{i,j\}{\in }E), \nonumber \\ \end{aligned}$$

(10.95)

$$\begin{aligned} {\mu }_{i{\rightarrow }j}(s_{j}) = {\frac{Z_{j}}{Z_{\{i,j\}}}}{\sum _{{\tau }_{i}{\in }{\Omega }}} {\left( {\prod _{k{\in }{\partial }i{\setminus }\{j\}}}{\mu }_{k{\rightarrow }i}({\tau }_{i})\right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( J{\tau }_{i}s_{j}+hd_{i}{\tau }_{i}\right) }\right) } ~(\{i,j\}{\in }E). \nonumber \\ \end{aligned}$$

(10.96)

The Bethe free energy functional is given by

$$\begin{aligned} \mathcal{{F}}_\mathrm{{Bethe}}{\big [}\{{\boldsymbol{\widehat{R}}}_{i}|i{\in }V\},\{{\boldsymbol{\widehat{R}}}_{ij}|\{i,j\}{\in }E\}{\big ]} = -k_\mathrm{{B}}T{\sum _{i{\in }V}}{\big (}1-|{\partial }i|{\big )}{\ln }Z_{i} -k_\mathrm{{B}}T{\sum _{\{i,j\}{\in }E}}{\ln }Z_{\{i,j\}}. \nonumber \\ \end{aligned}$$

(10.97)

The framework in the Bethe approximation using Eqs. (10.91), (10.92), (10.93), and (10.94) with Eqs. (10.95) and (10.96) is referred to as a loopy belief propagation in statistical machine learning theory [12, 46,47,48]. The present derivation is based on the cluster variation method in Refs. [39, 42,43,44], and [45]. Recently, some novel approaches for loopy belief propagation methods have been proposed, including the approximate message passing algorithm [49], and replica cluster variation method [50, 51]. A review summarizing recent developments in loopy belief propagation methods is given in Ref. [52].

By solving

$$\begin{aligned} {\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\widehat{R}}_{ij}({\tau }_{i},{\tau }_{j}) ={\widehat{R}}_{ij}(+1,+1)+{\widehat{R}}_{ij}(-1,+1)+{\widehat{R}}_{ij}(+1,-1)+{\widehat{R}}_{ij}(-1,-1)=1, \nonumber \\ \end{aligned}$$

(10.98)

$$\begin{aligned} {\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\tau }_{i}{\widehat{R}}_{ij}({\tau }_{i},{\tau }_{j}) ={\widehat{R}}_{ij}(+1,+1)-{\widehat{R}}_{ij}(-1,+1)+{\widehat{R}}_{ij}(+1,-1)-{\widehat{R}}_{ij}(-1,-1)=m_{i}, \nonumber \\ \end{aligned}$$

(10.99)

$$\begin{aligned} {\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\tau }_{j}{\widehat{R}}_{ij}({\tau }_{i},{\tau }_{j}) ={\widehat{R}}_{ij}(+1,+1)+{\widehat{R}}_{ij}(-1,+1)-{\widehat{R}}_{ij}(+1,-1)-{\widehat{R}}_{ij}(-1,-1)=m_{j}, \nonumber \\ \end{aligned}$$

(10.100)

$$\begin{aligned} {\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\tau }_{i}{\tau }_{j}{\widehat{R}}_{ij}({\tau }_{i},{\tau }_{j})&={\widehat{R}}_{ij}(+1,+1)-{\widehat{R}}_{ij}(-1,+1) \nonumber \\&\,-{\widehat{R}}_{ij}(+1,-1)+{\widehat{R}}_{ij}(-1,-1)=c_{\{i,j\}}=c_{\{j,i\}}, \nonumber \\ \end{aligned}$$

(10.101)

as simultaneous linear equations for ${\widehat{R}}_{ij}(+1,+1)$, ${\widehat{R}}_{ij}(-1,+1)$, ${\widehat{R}}_{ij}(+1,-1)$, and ${\widehat{R}}_{ij}(-1,-1)$, we can confirm the following equality:

$$\begin{aligned} {\widehat{R}}_{ij}(s_{i},s_{j}) = {\frac{1}{4}}{\big (} 1 + m_{i}s_{i} + m_{j}s_{j} + c_{\{i,j\}}s_{i}s_{j}{\big )}. \end{aligned}$$

(10.102)

By substituting Eqs. (10.72) and (10.102) into Eq. (10.82), the Bethe free energy functional can be reduced to

$$\begin{aligned} \mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\big \{}{\boldsymbol{{\widehat{R}}_{i}}}{\big |}i{\in }V{\big \}},{\big \{}{\boldsymbol{{\widehat{R}}_{ij}}}{\big |}\{i,j\}{\in }E{\big \}}\right] } =F_\mathrm{{Bethe}} {\left( {\big \{} m_{i} {\big |} i{\in }V {\big \}}, {\big \{} c_{\{i,j\}} {\big |} \{i,j\}{\in }E {\big \}} \right) }, \end{aligned}$$

(10.103)

$$\begin{aligned}&F_\mathrm{{Bethe}} {\left( {\big \{} m_{i} {\big |} i{\in }V {\big \}}, {\big \{} c_{\{i,j\}} {\big |} \{i,j\}{\in }E {\big \}} \right) } \equiv -J{\sum _{\{i,j\}{\in }E}}c_{\{i,j\}} -h{\sum _{i{\in }V}}d_{i}m_{i} \nonumber \\&+k_\mathrm{{B}}T{\sum _{i{\in }V}}(1-|{\partial }i|){\sum _{s_{i}={\pm }1}} {\widehat{R}}_{i}(s_{i}){\ln }{\big (}{\widehat{R}}_{i}(s_{i}){\big )} +k_\mathrm{{B}}T{\sum _{\{i,j\}{\in }E}} {\sum _{s_{j}={\pm }1}}{\widehat{R}}_{ij}(s_{i},s_{j}) {\ln }{\big (}{\widehat{R}}_{ij}(s_{i},s_{j}){\big )}. \nonumber \\&\end{aligned}$$

(10.104)

The extremum conditions

$$\begin{aligned} {\frac{{\partial }}{{\partial }m_{k}}} F_\mathrm{{Bethe}} {\left( {\big \{} m_{i} {\big |} i{\in }V {\big \}}, {\big \{} c_{\{i,j\}} {\big |} \{i,j\}{\in }E {\big \}} \right) } =0~(k{\in }V), \end{aligned}$$

(10.105)

$$\begin{aligned} {\frac{{\partial }}{{\partial }c_{\{k,l\}}}} F_\mathrm{{Bethe}} {\left( {\big \{} m_{i} {\big |} i{\in }V {\big \}}, {\big \{} c_{\{i,j\}} {\big |} \{i,j\}{\in }E {\big \}} \right) } =0~(\{k,l\}{\in }E) \end{aligned}$$

(10.106)

can be reduced to the following simultaneous equations:

$$\begin{aligned} {\frac{h}{k_\mathrm{{B}}T}}d_{i} ={\frac{1}{2}}{\left( 1-|{\partial }i|\right) }{\sum _{{\tau }_{i}{\in }{\Omega }}}{\tau }_{i}{\ln }{\left( {\widehat{R}}_{i}({\tau }_{i})\right) } +{\frac{1}{4}}{\sum _{j{\in }{\partial }i}}{\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{i}{\ln }{\big (}{\widehat{R}}_{ij}({\tau }_{i},{\tau }_{j}){\big )} ~(i{\in }V), \end{aligned}$$

(10.107)

$$\begin{aligned} {\frac{J}{k_\mathrm{{B}}T}} ={\frac{1}{4}}{\sum _{{\tau }_{i}{\in }{\Omega }}}{\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{i}{\tau }_{j}{\ln }{\big (}{\widehat{R}}_{ij}({\tau }_{i},{\tau }_{j}){\big )} ~(\{i,j\}{\in }E). \end{aligned}$$

(10.108)

The schemes for the derivations of Eqs. (10.107) and (10.108) from the Bethe free energy (Eqs. (10.103)–(10.104)) are given in Refs. [41, 53,54,55].

In the advanced mean-field method, some researchers are interested in perturbative computation of the correction terms with respect to ${\frac{J}{k_\mathrm{{B}}T}}$ from the mean-field free energy [56, 57], which is referred to as a Thouless-Anderson-Palmar (TAP) free energy. The scheme used in the derivations has been extended to a classical Heisenberg model [58]. One familiar perturbative method in statistical mechanical informatics is the Plefka expansion, in which we obtain higher-order correction terms with respect to ${\frac{J}{k_\mathrm{{B}}T}}$ from the mean-field free energy [12]. By substituting Eq. (10.102) into Eq. (10.108), $c_{\{i,j\}}$ can be expressed in terms of $m_{i}$, $m_{j}$, and ${\frac{J}{k_\mathrm{{B}}T}}$. It is known that the TAP equation can be derived by expanding the expression for $c_{\{i,j\}}$ up to the second-order term ${\left( {\frac{J}{k_\mathrm{{B}}T}}\right) }^{2}$ with respect to an infinitesimal ${\frac{J}{k_\mathrm{{B}}T}}$ and by substituting it into Eq. (10.82) with Eqs. (10.72) and (10.102) [41], The fundamental framework of the TAP free energy and its expansion using the advanced mean-field method has been clarified [59]. The Bethe free energy functional and the TAP free energy as well as loopy belief propagation have been applied to Boltzmann machine learning [53, 60,61,62,63,64]. Some recent developments appear in Chap. 7 of Part 3 in this book.

The EM schemes with advanced mean-field methods in the previous sections have been applied to noise reduction in probabilistic image processing [30, 51, 65,66,67,68,69]. The basic frameworks are based on Eqs. (10.14) and (10.15) with the two-body and one-body posterior marginal probability distributions in Eqs. (10.11) and (10.12) as well as the two-body prior marginal probability distribution in Eq. (10.13). They can be computed by means of the message passing algorithms in Eqs. (10.91) and (10.92) with Eqs. (10.93), (10.94), (10.95), and (10.96) for the Ising model in Eqs. (10.47), (10.48), and (10.50) with the prior and posterior probability distributions in Eqs. (10.2) and (10.3), respectively. The framework and some numerical experimental results are shown in Figs. 10.1 and 10.2, respectively. Moreover, the loopy belief propagation is applicable to Bayesian image segmentation in the framework of Sect. 10.2.3 [70]. They are also useful for community detection be means of the stochastic block model for modular networks [71,72,73].

3.3 Free Energy Landscapes and Phase Transitions in the Thermodynamic Limit

In this section, we consider the Ising model defined by

$$\begin{aligned} P{\left( {\boldsymbol{s}}{\Big |}{\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }\equiv & {} {\frac{1}{Z{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}H({\boldsymbol{s}})\right) }, \end{aligned}$$

(10.109)

where

$$\begin{aligned} H({\boldsymbol{s}})= & {} H(s_{1},s_{2},{\cdots },s_{|V|}) \nonumber \\\equiv & {} -J{\sum _{\{i,j\}{\in }E}}s_{i}s_{j}-h{\sum _{i{\in }V}}s_{i}~{\left( J>0 \right) }, \end{aligned}$$

(10.110)

$$\begin{aligned} Z{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }\equiv & {} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\dots }{\sum _{s_{|V|}{\in }{\Omega }}} {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}H({\boldsymbol{s}}) \right) }. \end{aligned}$$

(10.111)

The energy function in Eq. (10.110) corresponds to the one in Eq. (10.49) for the case of $d_{i}=1$ for every node $i({\in }V)$. In the present section, we consider a regular of degree 4 that includes a square grid graph with periodic boundary conditions along the x- and y-direction as shown in Fig. 10.3.

For the Ising model in Eq. (10.110) and its partition function in Eq. (10.50), we have the free energy per node

$$\begin{aligned} f(J,h,T) =-k_\mathrm{{B}}T {\times } {\lim _{|V|{\rightarrow }+{\infty }}}{\frac{1}{|V|}}{\ln }{\left( Z \right) }, \end{aligned}$$

(10.112)

the internal energy for zero external field

$$\begin{aligned} Ju\equiv & {} {\frac{{\partial }}{{\partial }{\left( {\frac{1}{k_\mathrm{{B}}T}}\right) }}}{\left( {\frac{1}{k_\mathrm{{B}}T}}f(J,h=0,T) \right) } \nonumber \\= & {} {\lim _{|V|{\rightarrow }+{\infty }}}{\frac{1}{|E|}}{\sum _{\{i,j\}{\in }E}}{\sum _{\boldsymbol{s}}}(-s_{i}s_{j}) P{\left( {\boldsymbol{s}} {\Big |} {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}=0 \right) }, \end{aligned}$$

(10.113)

and the spontaneous magnetization

$$\begin{aligned} m_{\pm }\equiv & {} {\lim _{h{\rightarrow }{\pm }0}} {\frac{{\partial }}{{\partial }{\left( {\frac{h}{k_\mathrm{{B}}T}}\right) }}}{\left( {\frac{1}{k_\mathrm{{B}}T}}f(J,h,T) \right) } \nonumber \\= & {} {\lim _{h{\rightarrow }{\pm }0}}{\lim _{|V|{\rightarrow }+{\infty }}}{\frac{1}{|V|}}{\sum _{i{\in }V}}{\sum _{\boldsymbol{s}}}s_{i} P{\left( {\boldsymbol{s}}{\Big |}{\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }, \end{aligned}$$

(10.114)

as important statistical quantities in the thermodynamic limit $|V|{\rightarrow }+{\infty }$. The existence of the thermodynamic limit $|V|{\rightarrow }+{\infty }$ means that the limit of the right-hand side in Eq. (10.110) converges. Sufficient conditions for the existence of the thermodynamic limit of the Ising model of Eqs. (10.109), (10.110), and (10.111) have been given by Ruelle in Ref. [38].

In the thermodynamic limit $|V|{\rightarrow }+{\infty }$ for the Ising model in Eq. (10.110) on a square grid graph with periodic boundary conditions along the x- and y-direction as shown in Fig. 10.3,

$$\begin{aligned} {\frac{u}{J}}= -{\coth }{\left( {\frac{2J}{k_\mathrm{{B}}T}} \right) } {\left( 1+{\left( 2{\tanh }^{2}{\left( {\frac{2J}{k_\mathrm{{B}}T}} \right) } -1 \right) } {\left( {\frac{2}{{\pi }}} \right) } {\int _{0}^{{\frac{{\pi }}{2}}}} {\left( 1 - {\left( {\frac{2{\sinh }{\left( {\frac{2J}{k_\mathrm{{B}}T}} \right) }}{{\cosh }^{2}{\left( {\frac{2J}{k_\mathrm{{B}}T}} \right) }}} \right) }^{2}{\sin }^{2}{\left( {\theta } \right) } \right) }^{-{\frac{1}{2}}} d{\theta } \right) }, \nonumber \\ \end{aligned}$$

(10.115)

$$\begin{aligned} {m_{\pm }}^{2}= & {} {\lim _{|{\boldsymbol{r}}_{i}-{\boldsymbol{r}}_{j}|{\rightarrow }+{\infty }}} {\lim _{|V|{\rightarrow }+{\infty }}}{\sum _{\boldsymbol{s}}}s_{i}s_{j} P{\left( {\boldsymbol{s}} {\Big |} {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}=0 \right) } \nonumber \\= & {} {\left\{ \begin{array}{ll} 0 &{} {\left( {\frac{J}{k_\mathrm{{B}}T}}<{\frac{1}{2}}\mathrm{{arc}}{\sinh }(1) \right) } \\ {\left( 1- {\sinh }^{-4}{\left( {\frac{2J}{k_\mathrm{{B}}T}} \right) } \right) }^{{\frac{1}{4}}} &{} {\left( {\frac{J}{k_\mathrm{{B}}T}}>{\frac{1}{2}}\mathrm{{arc}}{\sinh }(1) \right) } \end{array}\right. } , \end{aligned}$$

(10.116)

where ${\boldsymbol{r}}_{i}$ is the position vector of each node $i({\in }V)$ [34, 74, 75]. In Eq. (10.116), the spontaneous magnetizations $m_{+}$ and $m_{-}$ correspond to each branch of $m_{+}{\ge }0$ and $m_{-}{\le }0$, respectively. They are as shown in Fig. 10.4. Note that for the Ising model in Eq. (10.110) on such regular graphs,

$$\begin{aligned} m_{i,V}{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) } \equiv {\sum _{\boldsymbol{s}}}s_{i} P{\left( {\boldsymbol{s}}{\Big |}{\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }, \end{aligned}$$

(10.117)

for every $i({\in }V)$, does not depend on i but can be expressed as $m_{V}{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }$.

In the mean-field approximation of the previous subsection, the spontaneous magnetizations

$$\begin{aligned} m_{\pm }={\lim _{h{\rightarrow }{\pm }0}}{\lim _{|V|{\rightarrow }+{\infty }}}m_{V}{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }, \end{aligned}$$

(10.118)

are given as solutions of the following mean-field equation:

$$\begin{aligned} m_{\pm }={\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\big (}h+4Jm_{\pm }{\big )} \right) }~(h{\rightarrow }{\pm }0), \end{aligned}$$

(10.119)

and the internal energy Ju in Eq. (10.113) in the mean-field approximation is given as

$$\begin{aligned} u=-{m_{\pm }}^{2}. \end{aligned}$$

(10.120)

The solutions of Eq. (10.119) correspond to the extremum values of the following mean-field free energy:

$$\begin{aligned} f_\mathrm{{MF}}(m)\equiv & {} {\frac{1}{|V|}}F_\mathrm{{MF}}{\left( m_{1},m_{2},{\cdots },m_{|V|} \right) } \nonumber \\= & {} -2Jm^{2} +k_\mathrm{{B}}T {\left( {\frac{1}{2}}{\big (}1+m{\big )}\right) } {\ln }{\left( {\frac{1}{2}}{\big (}1+m{\big )}\right) } \nonumber \\&\qquad\qquad{} +k_\mathrm{{B}}T {\left( {\frac{1}{2}}{\big (}1-m{\big )}\right) } {\ln }{\left( {\frac{1}{2}}{\big (}1-m{\big )}\right) }, \end{aligned}$$

(10.121)

which corresponds to ${\frac{1}{|V|}}F_\mathrm{{MF}}{\left( m_{1},m_{2},{\cdots },m_{|V|}\right) }$ in Eq. (10.75) for $h=0$. The spontaneous magnetization $m_{\pm }$ and the internal energy u for $h=0$ are computed by setting $0<{\left| {\frac{h}{k_\mathrm{{B}}T}} \right| }<10^{-5}$ and using the iteration method for Eq. (10.119) numerically. The graphs of ${\left( {\frac{J}{k_\mathrm{{B}}T}},u\right) }$ and ${\left( {\frac{J}{k_\mathrm{{B}}T}},m_{\pm }\right) }$ are shown in Fig. 10.5. Moreover, the graphs of ${\left( m,{\frac{1}{k_\mathrm{{B}}T}}f_\mathrm{{MF}}(m)\right) }$ for ${\frac{J}{k_\mathrm{{B}}T}}=0.20$, 0.25, and 0.40 are shown in Fig. 10.6. It is known that the mean-field equation always has the trivial solution $m_{{\pm }}=0$, and begins to have some non-trivial solutions for $m_{+}>0$ and $m_{-}<0$. The mean-field equation (10.119) begins to have some non-trivial solutions in the region of ${\frac{J}{k_\mathrm{{B}}T}}>{\frac{1}{4}}$ by expanding the right-hand side of Eq. (10.119) around $m=0$ and keeping the first-order term of m.

Next, we consider the Bethe approximation for the Ising model in Eqs. (10.48), (10.50), and (10.110) on the regular graph (V, E) of degree 4. In this case, the average $m_i$, the correlation $c_{ij}$ and the messages ${\mu }_{i{\rightarrow }j}(+1)$ and ${\mu }_{i{\rightarrow }j}(-1)$ do not depend on i and j, and can be expressed as m, c, ${\mu }(+1)$ and ${\mu }(-1)$. We now introduce

$$\begin{aligned} {\Lambda } \equiv {\frac{1}{2}}k_\mathrm{{B}}T{\ln }{\left( {\frac{{\mu }(+1)}{{\mu }(-1)}} \right) }. \end{aligned}$$

(10.122)

The message passing equations in Eqs. (10.95) and (10.96) and the magnetization are reduced to

$$\begin{aligned} {\frac{{\Lambda }}{k_\mathrm{{B}}T}} =\mathrm{{arc}}{\tanh }{\left( {\tanh }{\left( {\frac{J}{k_\mathrm{{B}}T}}\right) } {\tanh }{\left( {\frac{h+3{\Lambda }}{k_\mathrm{{B}}T}} \right) }\right) }. \end{aligned}$$

(10.123)

Moreover, since the marginal probabilities ${\widehat{R}}_{i}(+1)$ and ${\widehat{R}}_{i}(-1)$ are also independent of i, we can derive the expression for the magnetization in terms of ${\Lambda }$ as follows:

$$\begin{aligned} m_\mathrm{{Bethe}}{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}} \right) } \equiv {\frac{1}{|V|}} {\sum _{i{\in }V}} {\left( {\sum _{s_{i}{\in }{\Omega }}}s_{i}{\widehat{R}}(s_{i}) \right) } ={\tanh }{\left( {\frac{h+4{\Lambda }}{k_\mathrm{{B}}T}}\right) }. \end{aligned}$$

(10.124)

For the infinitesimal small limits of h, such that $h{\rightarrow }+0$ and $h{\rightarrow }-0$, the magnetization $m_\mathrm{{Bethe}}{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}} \right) }$ in Eq. (10.124) can be computed numerically by using the iteration method. Moreover, Eqs. (10.104) and (10.108) can be reduced to

$$\begin{aligned} f_\mathrm{{Bethe}}(m,c)= & {} {\frac{1}{|V|}}F_\mathrm{{Bethe}} {\left( {\big \{} m_{i} {\big |} i{\in }V {\big \}}, {\big \{} c_{\{i,j\}} {\big |} \{i,j\}{\in }E {\big \}} \right) } \nonumber \\= & {} -2Jc-hm \nonumber \\&-3k_\mathrm{{B}}T {\left( {\frac{1}{2}}{\big (}1+m{\big )}\right) } {\ln }{\left( {\frac{1}{2}}{\big (}1+m{\big )}\right) } -3k_\mathrm{{B}}T {\left( {\frac{1}{2}}{\big (}1-m{\big )}\right) } {\ln }{\left( {\frac{1}{2}}{\big (}1-m{\big )}\right) } \nonumber \\&+2k_\mathrm{{B}}T {\left( {\frac{1}{4}}{\big (}1+2m+c{\big )}\right) } {\ln }{\left( {\frac{1}{4}}{\big (}1+2m+c{\big )}\right) } \nonumber \\&+4k_\mathrm{{B}}T {\left( {\frac{1}{4}}{\big (}1-c{\big )}\right) } {\ln }{\left( {\frac{1}{4}}{\big (}1-c{\big )}\right) } \nonumber \\&+2k_\mathrm{{B}}T {\left( {\frac{1}{4}}{\big (}1-2m+c{\big )}\right) } {\ln }{\left( {\frac{1}{4}}{\big (}1-2m+c{\big )}\right) }, \end{aligned}$$

(10.125)

and

$$\begin{aligned} {\frac{J}{k_\mathrm{{B}}T}} ={\ln }{\frac{(1+c)^{2} - 4m^{2}}{(1-c)^{2}}}, \end{aligned}$$

(10.126)

such that

$$\begin{aligned} c={\frac{1}{{\tanh }{\left( {\frac{2J}{k_\mathrm{{B}}T}}\right) }}} {\left( 1- {\sqrt{1-(1-2m^{2}){\tanh }^{2}{\left( {\frac{2J}{k_\mathrm{{B}}T}}\right) } -2m^{2}{\tanh }{\left( {\frac{2J}{k_\mathrm{{B}}T}}\right) } }} \right) }. \end{aligned}$$

(10.127)

The graphs of

$$\begin{aligned} {\left( m,{\frac{1}{k_\mathrm{{B}}T}}f_\mathrm{{Bethe}} {\Bigg (}m, {\frac{1}{{\tanh }{\left( {\frac{2J}{k_\mathrm{{B}}T}}\right) }}} {\Big (} 1- {\sqrt{1-(1-2m^{2}){\tanh }^{2}{\left( {\frac{2J}{k_\mathrm{{B}}T}}\right) } -2m^{2}{\tanh }{\left( {\frac{2J}{k_\mathrm{{B}}T}}\right) } }} {\Big )} {\Bigg )} \right) }, \nonumber \\ \end{aligned}$$

(10.128)

for ${\frac{J}{k_\mathrm{{B}}T}}=0.25$, ${\frac{J}{k_\mathrm{{B}}T}}=\mathrm{{arc}}{\tanh }{\left( {\frac{1}{3}} \right) }$, and ${\frac{k_\mathrm{{B}}T}{J}}=0.40$ in the case of $h=0$ are shown in Figs. 10.7, 10.8, and 10.9, respectively. Figure 10.7 shows the internal energy u from Eq. (10.113) and the spontaneous magnetization $m_{\pm }$ in Eq. (10.118) in loopy belief propagation (Bethe approximation) for the Ising model in Eqs. (10.48), (10.50), and (10.110) on the regular graph (V, E) of degree 4. These quantities u and $m_{\pm }$ are obtained by

$$\begin{aligned} u_\mathrm{{Bethe}}{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}=0 \right) } = {\frac{ {\cosh }{\left( {\frac{6{\Lambda }}{k_\mathrm{{B}}T}} \right) } -{\exp }{\left( -{\frac{2J}{k_\mathrm{{B}}T}} \right) } }{ {\cosh }{\left( {\frac{6{\Lambda }}{k_\mathrm{{B}}T}} \right) } +{\exp }{\left( -{\frac{2J}{k_\mathrm{{B}}T}} \right) } } }, \end{aligned}$$

(10.129)

$$\begin{aligned} m_\mathrm{{Bethe}}{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}=0 \right) } ={\tanh }{\left( {\frac{4{\Lambda }}{k_\mathrm{{B}}T}}\right) }, \end{aligned}$$

(10.130)

where

$$\begin{aligned} {\frac{{\Lambda }}{k_\mathrm{{B}}T}} =\mathrm{{arc}}{\tanh }{\left( {\tanh }{\left( {\frac{J}{k_\mathrm{{B}}T}}\right) } {\tanh }{\left( {\frac{3{\Lambda }}{k_\mathrm{{B}}T}} \right) }\right) }. \end{aligned}$$

(10.131)

These always give the same results as in Eq. (10.110) on the regular graph (V, E) of degree 4. In particular, it is known that the results for Eqs. (10.48), (10.50), and (10.110) on the regular tree graph (V, E) of degree 4 are exact. It is known that Eq. (10.123) always has the trivial solution ${\Lambda }=0$, but begins to have some non-trivial solutions in the region of ${\frac{J}{k_\mathrm{{B}}T}{J}}>\mathrm{{arc}}{\tanh }{\left( {\frac{1}{3}}\right) }$ by expanding the right-hand side of Eq. (10.123) around ${\Lambda }=0$ and keeping the first-order term of ${\Lambda }$. In Fig. 10.7, the blue curves correspond to global minimum states that are stable states and the red lines correspond to the local maximum state that are unstable states for each value of ${\frac{J}{k_\mathrm{{B}}T}}$ in the Bethe free energy $f_\mathrm{{Bethe}}(m,c)$ of Eq. (10.125) for the case of $h=0$. The Bethe free energy landscapes $f_\mathrm{{Bethe}}(m,c)$ of Eq. (10.125) in the case of $h=0$ for several values of ${\frac{J}{k_\mathrm{{B}}T}}$ are shown in Figs. 10.8 and 10.9. It is known that Eq. (10.123) always has the trivial solution ${\Lambda }=0$, but begins to have some non-trivial solutions in the region of ${\frac{J}{k_\mathrm{{B}}T}}>\mathrm{{arc}}{\tanh }{\left( {\frac{1}{3}}\right) }$ by expanding the right-hand side of Eq. (10.123) around ${\Lambda }=0$ and keeping the first-order term of ${\Lambda }$. In Fig. 10.7, the blue curves correspond to global minimum states that are stable states and the red lines correspond to the local maximum state that are unstable states for each value of ${\frac{J}{k_\mathrm{{B}}T}}$ in the Bethe free energy $f_\mathrm{{Bethe}}(m,c)$ in Eq. (10.125) for the case of $h=0$. The Bethe free energy landscapes $f_\mathrm{{Bethe}}(m,c)$ in Eq. (10.125) in the case of $h=0$ for several values of ${\frac{J}{k_\mathrm{{B}}T}}$ are shown in Figs. 10.8 and 10.9.

Now we consider the $|{\Omega }|$-state Potts model [33] given by

$$\begin{aligned}&P{\left( {\boldsymbol{s}}{\Big |}{\frac{J}{k_\mathrm{{B}}T}},{\frac{h_{0}}{k_\mathrm{{B}}T}}, {\frac{h_{1}}{k_\mathrm{{B}}T}},{\cdots },{\frac{h_{|{\Omega }|-1}}{k_\mathrm{{B}}T}}\right) } \nonumber \\&= {\frac{1}{Z}}{\left( {\prod _{\{i,j\}{\in }E}}{\exp }{\left( {\frac{J}{k_\mathrm{{B}}T}}{\delta }_{s_{i}s_{j}}\right) } \right) } {\left( {\prod _{i{\in }V}} {\prod _{n{\in }{\Omega }}} {\exp }{\left( {\frac{h_{n}}{k_\mathrm{{B}}T}}{\delta }_{s_{i},n} \right) } \right) }, \end{aligned}$$

(10.132)

$$\begin{aligned} Z \equiv {\sum _{s_{1}{\in }{\Omega }}} {\sum _{s_{2}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\left( {\prod _{\{i,j\}{\in }E}} {\exp }{\left( {\frac{J}{k_\mathrm{{B}}T}}{\delta }_{s_{i}s_{j}} \right) } \right) } {\left( {\prod _{i{\in }V}} {\prod _{n{\in }{\Omega }}} {\exp }{\left( {\frac{h_{n}}{k_\mathrm{{B}}T}}{\delta }_{s_{i},n} \right) } \right) }, \end{aligned}$$

(10.133)

where ${\Omega }=\{0,1,2,{\cdots },|{\Omega }|-1\}$. By similar arguments to those for Eqs. (10.72) and (10.102), the marginal probabilities ${\widehat{R}}_{i}(s_{i})$ and ${\widehat{R}}_{ij}(s_{i},s_{j})$ can be expressed as orthonormal expansions as follows:

$$\begin{aligned} {\widehat{R}}_{i}(s_{i}) = {\left( {\frac{1}{|{\Omega }|}}\right) } + {\sum _{k{\in }{\Omega }{\setminus }\{0\}}}m_{i}^{(k)}{\Phi }_{k}(s_{i}), \end{aligned}$$

(10.134)

$$\begin{aligned} & {} {\widehat{R}}_{ij}(s_{i},s_{j})= {} {\left( {\frac{1}{|{\Omega }|}}\right) }^{2} +{\left( {\frac{1}{|{\Omega }|}}\right) }{\sum _{k{\in }{\Omega }{\setminus }\{0\}}} m_{i}^{(k)}{\Phi }_{k}(s_{i}) \nonumber \\&\qquad\quad{} + {\left( {\frac{1}{|{\Omega }|}}\right) }{\sum _{l{\in }{\Omega }{\setminus }\{0\}}}m_{j}^{(l)}{\Phi }_{l}(s_{j}) +{\sum _{k{\in }{\Omega }{\setminus }\{0\}}}{\sum _{l{\in }{\Omega }{\setminus }\{0\}}}c_{ij}^{(k,l)}{\Phi }_{k}(s_{i}){\Phi }_{l}(s_{j}), \end{aligned}$$

(10.135)

where $\{{\Phi }_{k}(s_{i})|s_{i}{\in }{\Omega },k{\in }{\Omega }\}$ is the set of orthonormal polynomials satisfying the following relationships:

$$\begin{aligned} {\Phi }_{0}(s_{i}) \equiv {\left( {\frac{1}{|{\Omega }|}}\right) }, \end{aligned}$$

(10.136)

$$\begin{aligned} {\sum _{s_{i}{\in }{\Omega }}}{\Phi }_{k}(s_{i}){\Phi }_{l}(s_{i})={\delta }_{k,l}~(k{\in }{\Omega },l{\in }{\Omega }). \end{aligned}$$

(10.137)

Because it is valid that

$$\begin{aligned} {\sum _{s_{i}{\in }{\Omega }}}{\sum _{s_{i}{\in }{\Omega }}}{\Phi }_{k}(s_{i}){\Phi }_{l}(s_{j}){\delta }_{s_{i},s_{j}} ={\sum _{s_{i}{\in }{\Omega }}}{\Phi }_{k}(s_{i}){\Phi }_{l}(s_{i}) ={\delta }_{k,l}~(k{\in }{\Omega },l{\in }{\Omega }), \end{aligned}$$

(10.138)

we have the following orthonormal expansion of ${\delta }_{s_{i},s_{j}}$:

$$\begin{aligned} {\delta }_{s_{i},s_{j}}= & {} {\sum _{k{\in }{\Omega }}}{\sum _{l{\in }{\Omega }}} {\left( {\sum _{s'_{i}{\in }{\Omega }}}{\sum _{s'_{j}{\in }{\Omega }}}{\Phi }_{k}(s'_{i}){\Phi }_{l}(s'_{j}){\delta }_{s'_{i},s'_{j}} \right) } {\Phi }_{k}(s_{i}){\Phi }_{l}(s_{j}) \nonumber \\= & {} {\sum _{k{\in }{\Omega }}}{\Phi }_{k}(s_{i}){\Phi }_{k}(s_{j})~(s_{i}{\in }{\Omega },s_{j}{\in }{\Omega }). \end{aligned}$$

(10.139)

By using Eqs. (10.134) and (10.135) and the orthonormal expansion of the two-body interaction part of the Potts model, the Bethe free energy functional for the Potts model in Eqs. (10.132) and (10.133) can be reduced to

$$\begin{aligned}&\mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\big \{}{\boldsymbol{{\widehat{R}}_{i}}}{\big |}i{\in }V{\big \}},{\big \{}{\boldsymbol{{\widehat{R}}_{ij}}}{\big |}\{i,j\}{\in }E{\big \}}\right] } \nonumber \\&=F_\mathrm{{Bethe}} {\left( {\big \{} m_{i}^{(k)} {\big |} i{\in }V, k{\in }{\Omega }{\setminus }\{0\} {\big \}}, {\big \{} c_{\{i,j\}}^{(k,l)} {\big |} \{i,j\}{\in }E,~k,l{\in }{\Omega }{\setminus }\{0\}{\big \}} \right) }, \end{aligned}$$

(10.140)

where

$$\begin{aligned}&F_\mathrm{{Bethe}} {\left( {\big \{} m_{i}^{(k)} {\big |} i{\in }V,~k{\in }{\Omega }{\setminus }\{0\} {\big \}}, {\big \{} c_{\{i,j\}}^{(k,l)} {\big |} \{i,j\}{\in }E,~k,l{\in }{\Omega }{\setminus }\{0\} {\big \}} \right) } \nonumber \\&\equiv -J{\sum _{i{\in }V}}|{\partial }i|{\left( {\frac{1}{|{\Omega }|}}\right) }^{2} -J{\sum _{i{\in }V}}|{\partial }i|{\frac{1}{|{\Omega }|}}{\sum _{k{\in }{\Omega }{\setminus }\{0\}}}m_{i}^{(k)} -J{\sum _{\{i,j\}{\in }E}}{\sum _{k{\in }{\Omega }{\setminus }\{0\}}}c_{i,j}^{(k,k)} \nonumber \\&+k_\mathrm{{B}}T{\sum _{i{\in }V}}(1-|{\partial }i|) {\widehat{R}}_{i}(s_{i}){\ln }{\big (}{\widehat{R}}_{i}(s_{i}){\big )} \nonumber \\&+k_\mathrm{{B}}T{\sum _{\{i,j\}{\in }E}} {\widehat{R}}_{ij}(s_{i},s_{j}) {\ln }{\big (}{\widehat{R}}_{ij}(s_{i},s_{j}){\big )}. \end{aligned}$$

(10.141)

For the case of spatially uniformity, $m_{i}^{(k)}$ and $c_{ij}^{(k,l)}$ are independent of i and $\{i,j\}$ and can be represented by $m^{(k)}$ and $c^{(k,l)}$, respectively, in the Bethe free energy in Eq. (10.141). For the three-state and four-state Potts model, the Bethe free energy in Eq. (10.141) can be represented by

$$\begin{aligned} F_\mathrm{{Bethe}} {\left( {\left( \begin{array}{cccc} m^{(1)} \\ m^{(2)} \end{array} \right) }, {\left( \begin{array}{ccc} c^{(1,1)} &{} c^{(1,2)} \\ c^{(2,1)} &{} c^{(2,2)} \end{array} \right) } \right) }, \nonumber \\ \end{aligned}$$

(10.142)

and

$$\begin{aligned} F_\mathrm{{Bethe}} {\left( {\left( \begin{array}{cccc} m^{(1)} \\ m^{(2)} \\ m^{(3)} \\ \end{array} \right) }, {\left( \begin{array}{ccc} c^{(1,1)} &{} c^{(1,2)} &{} c^{(1,3)} \\ c^{(2,1)} &{} c^{(2,2)} &{} c^{(2,3)} \\ c^{(3,1)} &{} c^{(3,2)} &{} c^{(3,3)} \end{array} \right) } \right) }, \end{aligned}$$

(10.143)

respectively. Figures 10.10 and 10.11 show the internal energy with no external fields

$$\begin{aligned} u={\lim _{|V|{\rightarrow }+{\infty }}}{\frac{1}{|E|}}{\sum _{\{i,j\}{\in }E}}{\sum _{\boldsymbol{s}}}{\left( -{\delta }_{s_{i},s_{j}}\right) } P{\left( {\boldsymbol{s}}{\Big |}{\frac{J}{k_\mathrm{{B}}T}},{\frac{h_{0}}{k_\mathrm{{B}}T}}=0, {\frac{h_{1}}{k_\mathrm{{B}}T}}=0,{\cdots },{\frac{h_{|{\Omega }|-1}}{k_\mathrm{{B}}T}}=0\right) }, \nonumber \\ \end{aligned}$$

(10.144)

in loopy belief propagation (Bethe approximation) on the regular graph (V, E) of degree 4. We now consider also the moments $m^{(2)}$ and $m^{(1)}$ as order parameters for the three-state and four-state Potts model, respectively, for the following cases:

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathrm{{(I)}}~{\displaystyle {{\lim _{h_{0}{\rightarrow }+0}m^{(2)}}}}, h_{1}=h_{2}=0, \\ \mathrm{{(II)}}~{\displaystyle {{\lim _{h_{1}{\rightarrow }+0}m^{(2)}}}}, h_{0}=h_{2}=0, \\ \mathrm{{(III)}}~{\displaystyle {{\lim _{h_{2}{\rightarrow }+0}m^{(2)}}}}, h_{0}=h_{1}=0, \\ \mathrm{{(IV)}}~m^{(2)}~\mathrm{{under}}~h_{0}=h_{1}=h_{2}=0~\mathrm{{and}}~{\mu }(0)={\mu }(1)={\mu }(2)={\frac{1}{3}},\\ \end{array}\right. } \end{aligned}$$

(10.145)

for the three-state Potts model, and

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathrm{{(I)}}~{\displaystyle {{\lim _{h_{0}{\rightarrow }+0}m^{(2)}}}}, h_{1}=h_{2}=h_{3}=0, \\ \mathrm{{(II)}}~{\displaystyle {{\lim _{h_{1}{\rightarrow }+0}m^{(2)}}}}, h_{0}=h_{2}=h_{3}=0, \\ \mathrm{{(III)}}~{\displaystyle {{\lim _{h_{2}{\rightarrow }+0}m^{(2)}}}}, h_{0}=h_{1}=h_{3}=0, \\ \mathrm{{(IV)}}~{\displaystyle {{\lim _{h_{3}{\rightarrow }+0}m^{(2)}}}}, h_{0}=h_{1}=h_{2}=0, \\ \mathrm{{(V)}}~m^{(2)}~\mathrm{{under}}~h_{0}=h_{1}=h_{2}=0~\mathrm{{and}}~{\mu }(0)={\mu }(1)={\mu }(2)={\mu }(3)={\frac{1}{4}},\\ \end{array}\right. } \end{aligned}$$

(10.146)

for the four-state Potts model. These are also shown in Figs. 10.10 and 10.11. In Figs. 10.10 and 10.11, blue, green, and red lines show the global minimum states, local minimum states, and local maximum states, respectively, of the Bethe free energies which are given by Eq. (10.142) for the three-state Potts model and by Eq. (10.143) for the four-state Potts model. In the global minimum states, there exist discontinuous points in $m^{(2)}$ and $m^{(1)}$ as well as u. Although the first derivative Ju of the free energy with respect to ${\frac{1}{k_\mathrm{{B}}T}}$ is always continuous, the second derivative diverges or has discontinuity in the Ising model as shown in Figs. 10.4, 10.5, and 10.7. This kind of singularity is referred to as a second-order phase transition in statistical mechanics. However, the first derivative Ju of the free energy with respect to ${\frac{1}{k_\mathrm{{B}}T}}$ has a discontinuity as shown in Figs. 10.10 and 10.11. This singularity is referred to as a first-order phase transition in statistical mechanics. Figures 10.12 and 10.13 show the Bethe free energy landscapes

$$\begin{aligned} f_\mathrm{{Bethe}}{\left( m^{(1)},m^{(2)}\right) } \equiv {\frac{1}{|V|}} {\mathop {{\mathrm{extremum}}}\limits _{ {\left( \begin{array}{ccc} c^{(1,1)} &{} c^{(1,2)} \\ c^{(2,1)} &{} c^{(2,2)} \end{array} \right) } }} F_\mathrm{{Bethe}} {\left( {\left( \begin{array}{cccc} m^{(1)} \\ m^{(2)} \end{array} \right) }, {\left( \begin{array}{ccc} c^{(1,1)} &{} c^{(1,2)} \\ c^{(2,1)} &{} c^{(2,2)} \end{array} \right) } \right) }, \nonumber \\ \end{aligned}$$

(10.147)

for the three-state Potts model and

$$\begin{aligned} f_\mathrm{{Bethe}}{\left( m^{(1)},m^{(3)}\right) } \equiv {\frac{1}{|V|}} {\mathop {{\mathrm{extremum}}}\limits _{m^{(2)}, {\left( \begin{array}{ccc} c^{(1,1)} &{} c^{(1,2)} &{} c^{(1,3)} \\ c^{(2,1)} &{} c^{(2,2)} &{} c^{(2,3)} \\ c^{(3,1)} &{} c^{(3,2)} &{} c^{(3,3)} \end{array} \right) } }} F_\mathrm{{Bethe}} {\left( {\left( \begin{array}{cccc} m^{(1)} \\ m^{(2)} \\ m^{(3)} \\ \end{array} \right) }, {\left( \begin{array}{ccc} c^{(1,1)} &{} c^{(1,2)} &{} c^{(1,3)} \\ c^{(2,1)} &{} c^{(2,2)} &{} c^{(2,3)} \\ c^{(3,1)} &{} c^{(3,2)} &{} c^{(3,3)} \end{array} \right) } \right) }, \nonumber \\ \end{aligned}$$

(10.148)

for the four-state Potts model, respectively.

3.4 Ising Model on a Complete Graph

This section considers a complete graph (V, E) for which the energy function $H({\boldsymbol{s}})$ is defined by

$$\begin{aligned} H({\boldsymbol{s}}) = H(s_{1},s_{2},{\cdots },s_{|V|}) \equiv -{\frac{J}{|V|}}{\sum _{\{i,j\}{\in }E}}s_{i}s_{j}-h{\sum _{i{\in }V}}s_{i}~{\left( J>0\right) }, \end{aligned}$$

(10.149)

instead of Eq. (10.110) in Eqs. (10.109) and (10.111). Note that the interaction between every pair of connected nodes is set to ${\frac{J}{|V|}}$ to guarantee the existence of the thermodynamic limit in $|V|{\rightarrow }+{\infty }$ for the complete graph in the sense of Ruelle in Ref. [38].

The free energy in Eq. (10.55) is expressed as follows:

$$\begin{aligned}&-k_\mathrm{{B}}T{\ln }{\left( Z{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }\right) } \nonumber \\= & {} -k_\mathrm{{B}}T{\ln }{\left( {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\exp }{\left( {\frac{h}{k_\mathrm{{B}}T}} {\displaystyle {{\sum _{i{\in }V}}}}{\tau }_{i} +{\frac{J}{|V|k_\mathrm{{B}}T}}{\displaystyle {{\sum _{\{i,j\}{\in }E}}}}{\tau }_{i}{\tau }_{j} \right) } \right) } \nonumber \\= & {} {\frac{J}{2}} -k_\mathrm{{B}}T{\ln } {\left( {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\exp }{\left( {\frac{h}{k_\mathrm{{B}}T}}{\displaystyle {{\sum _{i{\in }V}}}}{\tau }_{i}\right) } {\exp }{\left( {\frac{1}{2}}{\left( {\sqrt{{\frac{J}{|V|k_\mathrm{{B}}T}}}}{\displaystyle {{\sum _{i{\in }V}}}}{\tau }_{i}\right) }^{2} \right) }\right) }. \nonumber \\&\end{aligned}$$

(10.150)

By using the Gauss integral formula

$$\begin{aligned} {\frac{1}{{\sqrt{2{\pi }}}}}{\int _{-{\infty }}^{+{\infty }}} {\exp }{\left( -{\frac{1}{2}}x^{2}+ax \right) }dx ={\exp }{\left( {\frac{1}{2}}a^{2} \right) }, \end{aligned}$$

(10.151)

the expression for the free energy is rewritten as

$$\begin{aligned}&{} -k_\mathrm{{B}}T{\ln }{\left( Z{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }\right) } \nonumber \\&\qquad\quad {}= {\frac{1}{2}}J -{\ln }{\Bigg (}{\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\frac{1}{{\sqrt{2{\pi }}}}} {\int _{-{\infty }}^{+{\infty }}} e^{-{\frac{1}{2}}x^{2}} \nonumber \\&\qquad\qquad\qquad\qquad\quad{} {\times }{\exp }{\left( {\sum _{i{\in }V}} {\left( {\sqrt{{\frac{J}{|V|k_\mathrm{{B}}T}}}}x+{\frac{h}{k_\mathrm{{B}}T}}\right) } {\tau }_{i}\right) }dx{\Bigg )} \nonumber \\&\qquad\quad {}= {\frac{1}{2}}J -{\ln }{\Bigg (} {\frac{1}{{\sqrt{2{\pi }}}}} {\int _{-{\infty }}^{+{\infty }}} e^{-{\frac{1}{2}}x^{2}} {\prod _{i{\in }V}} {\left( {\sum _{{\tau }_{i}{\in }{\Omega }}} {\exp }{\left( {\left( {\sqrt{{\frac{J}{|V|k_\mathrm{{B}}T}}}}x+{\frac{h}{k_\mathrm{{B}}T}}\right) } {\tau }_{i}\right) }\right) }dx{\Bigg )} \nonumber \\&\qquad\quad {}= {\frac{1}{2}}J -{\ln }{\Bigg (} {\frac{1}{{\sqrt{2{\pi }}}}} {\int _{-{\infty }}^{+{\infty }}} e^{-{\frac{1}{2}}x^{2}} {\prod _{i{\in }V}} {\left( 2{\cosh }{\left( {\sqrt{{\frac{J}{|V|k_\mathrm{{B}}T}}}}x+{\frac{h}{k_\mathrm{{B}}T}} \right) }\right) } dx{\Bigg )}. \end{aligned}$$

(10.152)

Note that the procedure in which a new continuous variable x is introduced in Eq. (10.152) is referred to as a Hubbard-Stratonovich transformation [13]. Moreover, by replacing the variable x by ${\displaystyle {y=x{\sqrt{|V|{\left( {\frac{k_\mathrm{{B}}T}{J}}\right) }}}}}$, the free energy can be written as

$$\begin{aligned} -k_\mathrm{{B}}T{\ln }{\left( Z{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }\right) } ={\frac{1}{2}}J -{\ln }{\Bigg (} {\frac{1}{{\sqrt{2{\pi }}}}} {\int _{-{\infty }}^{+{\infty }}} {\exp }{\big (}|V|{\psi }(y){\big )}dy{\Bigg )}, \end{aligned}$$

(10.153)

where

$$\begin{aligned} {\psi }(y) \equiv -{\frac{1}{2}}{\left( {\frac{J}{k_\mathrm{{B}}T}}\right) }y^{2} +{\frac{1}{|V|}}{\sum _{i{\in }V}} {\ln }{\left( 2{\cosh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( Jy+h\right) }\right) } \right) }. \end{aligned}$$

(10.154)

We now consider the magnetization ${\displaystyle { m{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) } ={\lim _{|V|{\rightarrow }+{\infty }}}{\frac{1}{|V|}} {\sum _{i{\in }V}}{\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}}s_{i}P{\left( {\boldsymbol{s}}{\Big |}{\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) } }}$ for Eqs. (10.109) and (10.111) with Eq. (10.149) as follows:

$$\begin{aligned} m{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }= & {} {\lim _{|V|{\rightarrow }+{\infty }}} {\frac{1}{|V|}} {\frac{{\partial }}{{\partial }{\left( {\frac{h}{k_\mathrm{{B}}T}}\right) }}} {\left( -k_\mathrm{{B}}T{\ln }{\left( Z{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }\right) }\right) } \nonumber \\= & {} {\lim _{|V|{\rightarrow }+{\infty }}} {\frac{ {\displaystyle { {\int _{-{\infty }}^{+{\infty }}} {\exp }{\left( |V|{\left( {\psi }(y) +{\frac{1}{|V|}}{\ln }{\left( {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( Jy+h\right) }\right) } \right) } \right) } \right) }dy }} }{ {\displaystyle { {\int _{-{\infty }}^{+{\infty }}} {\exp }{\left( |V|{\psi }(y)\right) }dy }} }}. \end{aligned}$$

(10.155)

Because it is valid that

$$\begin{aligned} {\lim _{|V|{\rightarrow }+{\infty }}} {\frac{{\partial }}{{\partial }y}} {\psi }(y)= & {} {\lim _{|V|{\rightarrow }+{\infty }}} {\frac{{\partial }}{{\partial }y}} {\left( {\psi }(y) +{\frac{1}{|V|}}{\ln }{\left( {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( Jy+h\right) }\right) } \right) }\right) } \nonumber \\= & {} -{\frac{J}{k_\mathrm{{B}}T}}{\left( y-{\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( Jy+h\right) } \right) } \right) }, \end{aligned}$$

(10.156)

we obtain the magnetization as

$$\begin{aligned} m{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }= & {} {\lim _{|V|{\rightarrow }+{\infty }}} {\frac{ {\displaystyle { {\exp }{\left( |V|{\left( {\psi }(y_\mathrm{{max}}) +{\frac{1}{|V|}}{\ln }{\left( {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( Jy_\mathrm{{max}}+h\right) }\right) } \right) } \right) } \right) } }} }{ {\exp }{\Big (}|V|{\psi }(y_\mathrm{{max}}){\Big )} }} \nonumber \\= & {} {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( Jy_\mathrm{{max}}+h\right) }\right) }, \end{aligned}$$

(10.157)

where

$$\begin{aligned} y_\mathrm{{max}} ={\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( Jy_\mathrm{{max}}+h\right) }\right) } \end{aligned}$$

(10.158)

by using a saddle point method [37]. Equations (10.157) and (10.158) reduce to the following mean-field equation for $m{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }$:

$$\begin{aligned} m{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) } ={\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( Jm{\left( {\frac{J}{k_\mathrm{{B}}T}},{\frac{h}{k_\mathrm{{B}}T}}\right) }+h\right) }\right) }. \end{aligned}$$

(10.159)

This means that it is possible to treat the Ising model on the complete graph in the thermodynamic limit analytically using the mean-field method.

By combining the replica method with the Hubbard-Stratonovich transformation and the saddle point method, it is possible to treat the random average in Eq. (10.25) for the Ising model with non-uniform external fields on the complete graph analytically [13, 76]. In statistical mechanics, this kind of approach has been developed as the spin glass theory [77,78,79,80]. Such computational techniques that use the replica method for Ising models with spatially non-uniform interactions and external fields on the complete graph have been used to estimate statistical performance analysis for many probabilistic information processing systems [13, 15,16,17].

Next, we consider the belief propagation method for the Ising model on the complete graph in Eq. (10.149) with Eqs. (10.109) and (10.111). For an infinitesimal small $|V|^{-1}$, the message passing rule in Eq. (10.95) can be expanded to

$$\begin{aligned} {\mu }_{j{\rightarrow }i}(s_{i})= & {} {\frac{Z_{i}}{Z_{\{i,j\}}}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\left( {\prod _{l{\in }{\partial }j{\setminus }\{i\}}}{\mu }_{l{\rightarrow }j}({\tau }_{j})\right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( {\frac{J}{|V|}}s_{i}{\tau }_{j}+h{\tau }_{j}\right) }\right) } \nonumber \\= & {} {\frac{Z_{i}}{Z_{\{i,j\}}}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\left( {\prod _{l{\in }{\partial }j{\setminus }\{i\}}}{\mu }_{l{\rightarrow }j}({\tau }_{j})\right) } {\exp }{\left( {\frac{h}{k_\mathrm{{B}}T}}{\tau }_{j}\right) } {\left( 1+{\frac{1}{|V|}}{\frac{J}{k_\mathrm{{B}}T}}s_{i}{\tau }_{j}+ \mathcal{{O}}{\left( |V|^{-2}\right) }\right) } \nonumber \\= & {} {\frac{Z_{i}Z_{j}}{Z_{\{i,j\}}}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\widehat{R}}_{j}({\tau }_{j}) {\left( 1+{\frac{1}{|V|}}{\frac{J}{k_\mathrm{{B}}T}}s_{i}{\tau }_{j}+ \mathcal{{O}}{\left( |V|^{-2}\right) }\right) } \nonumber \\= & {} {\frac{Z_{i}Z_{j}}{Z_{\{i,j\}}}} {\left( 1+{\frac{1}{k_\mathrm{{B}}T}}{\left( {\frac{J}{|V|}}{\sum _{{\tau }_{j}{\in }{\Omega }}} {\tau }_{j}{\widehat{R}}_{j}({\tau }_{j}) \right) }s_{i} + \mathcal{{O}}{\left( |V|^{-2}\right) }\right) } \nonumber \\= & {} {\frac{Z_{i}Z_{j}}{Z_{\{i,j\}}}} {\left( {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( {\frac{J}{|V|}}{\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{j}{\widehat{R}}_{j}({\tau }_{j}) \right) }s_{i}\right) } + \mathcal{{O}}{\left( |V|^{-2}\right) }\right) }. \end{aligned}$$

(10.160)

By substituting Eq. (10.160) into Eq. (10.91), the marginal probabilities can be expressed as follows:

$$\begin{aligned} {\widehat{R}}_{i}(s_{i})&= {\frac{ {\displaystyle { {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( {\frac{J}{|V|}}{\sum _{j{\in }V{\setminus }\{i\}}}{\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{j}{\widehat{R}}_{j}({\tau }_{j}) +h\right) }s_{i}\right) } }} }{ {\displaystyle { {\sum _{{\tau }_{i}{\in }{\Omega }}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\left( {\frac{J}{|V|}}{\sum _{j{\in }V{\setminus }\{i\}}}{\sum _{{\tau }_{j}{\in }{\Omega }}}{\tau }_{j}{\widehat{R}}_{j}({\tau }_{j}) +h \right) }{\tau }_{i}\right) } }} }} \nonumber \\&+\mathcal{{O}}{\left( |V|^{-1}\right) }~(|V|{\rightarrow }+{\infty },~s_{i}{\in }{\Omega },i{\in }V), \nonumber \\ \end{aligned}$$

(10.161)

$$\begin{aligned} {\widehat{R}}_{ij}{\left( s_{i},s_{j}\right) } ={\widehat{R}}_{i}{\left( s_{i}\right) }{\widehat{R}}_{j}{\left( s_{j}\right) } +\mathcal{{O}}{\left( |V|^{-1}\right) }~(|V|{\rightarrow }+{\infty },~s_{i}{\in }{\Omega },s_{j}{\in }{\Omega },\{i,j\}{\in }E). \end{aligned}$$

(10.162)

Equation (10.161) can be regarded as a system of simultaneous deterministic equations for ${\left\{ {\widehat{R}}_{i}(s_{i}){\Big |}s_{i}{\in }{\Omega },i{\in }V\right\} }$ and is equivalent to the mean-field equation in Eq. (10.68) for Eq. (10.149) with Eqs. (10.109) and (10.111).

3.5 Probabilistic Segmentation by Potts Prior and Loopy Belief Propagation

In Sect. 10.2.3, we gave the fundamental framework of probabilistic segmentation based on the Potts prior, and reduced the framework of the EM procedure for estimating hyperparameters to the extremum conditions of the $\mathcal{{Q}}$-function as shown in Eqs. (10.41), (10.42), and (10.43) with Eqs. (10.44), (10.45), and (10.46). These frameworks can be realized by combining them with the loopy belief propagation in Sect. 10.3.2 to give the following practical procedures [70]:

$$ {\boxed {\mathbf {Probabilistic \, segmentation \, algorithm \, (Input:}{\boldsymbol{D}}, \mathbf {Output:}\, {\widehat{\alpha }}({\boldsymbol{D}}), {\widehat{u}}({\boldsymbol{D}}), {\widehat{\boldsymbol{a}}}({\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}({\boldsymbol{D}}), {\widehat{\boldsymbol{s}}}({\boldsymbol{D}})) }} $$

Step 1::

Input the data vector ${\boldsymbol{d}}$ and set the initial values of hyperparameters ${\widehat{\alpha }}({\boldsymbol{D}})$, ${\widehat{\boldsymbol{a}}}({\boldsymbol{D}})$, ${\widehat{\boldsymbol{C}}}({\boldsymbol{D}})$ and messages in the loopy belief propagation $\{{\widehat{\mu }}_{j{\rightarrow }i}(s_{i},{\boldsymbol{D}})| i{\in }V, j{\in }{\partial }i,s_{i}{\in }{\Omega }\}$ for the posterior probability distribution. We set $t \leftarrow 0$ as the number of iterations of the EM procedure.

Step 2 (E-step):

Set $t{\leftarrow }t+1$ and update ${\widehat{u}}({\boldsymbol{D}})$, ${\widehat{\boldsymbol{a}}}({\boldsymbol{D}})$, ${\widehat{\boldsymbol{C}}}({\boldsymbol{D}})$, $\{{\widehat{\mu }}_{j{\rightarrow }i}(s_{i},{\boldsymbol{D}})| s_{i}{\in }{\Omega },i{\in }\mathcal{{V}},j{\in }{\partial }i\}$ using the following procedures:

$$\begin{aligned} {\mu }_{j{\rightarrow }i}(s_{i})\leftarrow & {} {\frac{ {\displaystyle {{\sum _{{\tau }_{j}{\in }{\Omega }}}}} {\exp }{\left( 2{\widehat{\alpha }}({\boldsymbol{D}}){\delta }_{s_{i},{\tau }_{j}}\right) } g{\left( {\boldsymbol{d_{j}}}{\big |}{\tau }_{j},{\widehat{\boldsymbol{a}}}({\tau }_{j},{\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}({\tau }_{j},{\boldsymbol{D}})\right) } {\displaystyle { {\prod _{k{\in }{\partial }j{\backslash }\{i\}}} }} {\widehat{\mu }}_{k{\rightarrow }j}({\tau }_{j},{\boldsymbol{D}}) }{ {\displaystyle {{\sum _{{\tau }_{i}{\in }{\Omega }}}}} {\displaystyle {{\sum _{{\tau }_{j}{\in }{\Omega }}}}} {\exp }{\left( 2{\widehat{\alpha }}({\boldsymbol{D}}){\delta }_{{\tau }_{i},{\tau }_{j}}\right) } g{\left( {\boldsymbol{d_{j}}}{\big |}{\tau }_{j},{\widehat{\boldsymbol{a}}}({\tau }_{j},{\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}({\tau }_{j},{\boldsymbol{D}})\right) } {\displaystyle { {\prod _{k{\in }{\partial }j{\backslash }\{i\}}} }} {\widehat{\mu }}_{k{\rightarrow }j}({\tau }_{j},{\boldsymbol{D}}) } } \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad{} (s_{i}{\in }{\Omega }, i{\in }V,j{\in }{\partial }i), \end{aligned}$$

(10.163)

$$\begin{aligned} {\widehat{\mu }}_{j{\rightarrow }i}(s_{i},{\boldsymbol{D}})\leftarrow & {} {\mu }_{j{\rightarrow }i}(s_{i})~ (s_{i}{\in }{\Omega }, ~i{\in }V,~j{\in }{\partial }i), \end{aligned}$$

(10.164)

$$\begin{aligned} B_{i}\leftarrow & {} {\displaystyle {{\sum _{{\tau }_{i}{\in }{\Omega }}}}} g{\left( {\boldsymbol{d_{i}}}{\big |}{\tau }_{i},{\widehat{\boldsymbol{a}}}({\tau }_{i},{\boldsymbol{D}}), {\widehat{\boldsymbol{\sigma }}}({\tau }_{i},{\boldsymbol{D}})\right) } {\prod _{k{\in }{\partial }i}} {\widehat{\mu }}_{k{\rightarrow }i}({\tau }_{i},{\boldsymbol{D}}) {} (i{\in }V), \end{aligned}$$

(10.165)

$$\begin{aligned} B_{\{i,j\}}\leftarrow & {} {\displaystyle {{\sum _{{\tau }_{i}{\in }{\Omega }}}}} {\displaystyle {{\sum _{{\tau }_{j}{\in }{\Omega }}}}} {\Big (} {\prod _{k{\in }{\partial }i{\backslash }\{j\}}} {\widehat{\mu }}_{k{\rightarrow }i}({\tau }_{i},{\boldsymbol{D}}) {\Big )} g{\left( {\boldsymbol{d_{i}}} {\big |} {\tau }_{i},{\widehat{\boldsymbol{a}}}({\tau }_{i},{\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}({\tau }_{i},{\boldsymbol{D}})\right) } \nonumber \\&\qquad{}{\times } {\exp }{\left( 2{\widehat{\alpha }}({\boldsymbol{D}}){\delta }_{{\tau }_{i},{\tau }_{j}}\right) } \nonumber \\&\qquad{}{\times } g{\left( {\boldsymbol{d_{j}}}{\big |}{\tau }_{j},{\widehat{\boldsymbol{a}}}({\tau }_{j},{\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}({\tau }_{j},{\boldsymbol{D}})\right) } {\Big (} {\prod _{k{\in }{\partial }j{\backslash }\{i\}}} {\widehat{\mu }}_{k{\rightarrow }j}({\tau }_{j},{\boldsymbol{D}}) {\Big )} ~(\{i,j\}{\in }E), \nonumber \\&\end{aligned}$$

(10.166)

$$\begin{aligned} {\boldsymbol{a}}(s_{i})\leftarrow & {} {\frac{ {\displaystyle {{\sum _{i{\in }V}}}} {\displaystyle {{\frac{1}{B_{i}}}}} {\boldsymbol{d_{i}}} g{\left( {\boldsymbol{d_{i}}}{\big |}s_{i},{\widehat{\boldsymbol{a}}}(s_{i},{\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}(s_{i},{\boldsymbol{D}})\right) } {\Big (} {\displaystyle {{\prod _{k{\in }{\partial }i}}}} {\widehat{\mu }}_{k{\rightarrow }i}(s_{i},{\boldsymbol{D}}) {\Big )} }{ {\displaystyle {{\sum _{i{\in }V}}}} {\displaystyle {{\frac{1}{B_{i}}}}} g{\left( {\boldsymbol{d_{i}}} {\big |} s_{i},{\boldsymbol{a}}(s_{i},{\boldsymbol{D}}),{\boldsymbol{C}}(s_{i},{\boldsymbol{D}}) \right) } {\Big (} {\displaystyle {{\prod _{k{\in }{\partial }i}}}} {\widehat{\mu }}_{k{\rightarrow }i}(s_{i},{\boldsymbol{D}}) {\Big )} }} ~(s_{i}{\in }{\Omega }), \nonumber \\&\end{aligned}$$

(10.167)

$$\begin{aligned} {\boldsymbol{C}}(s_{i})\leftarrow & {} {\frac{ {\displaystyle {{\sum _{i{\in }V}}}} {\displaystyle {{\frac{1}{B_{i}}}}} {\big (}{\boldsymbol{d_{i}}}-{\widehat{a}}(s_{i},{\boldsymbol{D}}){\big )} {\big (}{\boldsymbol{d_{i}}}-{\widehat{a}}(s_{i},{\boldsymbol{D}}){\big )}^\mathrm{{T}} g{\left( {\boldsymbol{d_{i}}}{\big |}s_{i},{\widehat{\boldsymbol{a}}}(s_{i},{\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}(s_{i},{\boldsymbol{D}})\right) } {\Big (} {\displaystyle {{\prod _{k{\in }{\partial }i}}}} {\widehat{\mu }}_{k{\rightarrow }i}(s_{i},{\boldsymbol{D}}) {\Big )} }{ {\displaystyle {{\sum _{i{\in }V}}}} {\displaystyle {{\frac{1}{B_{i}}}}} g{\left( {\boldsymbol{d_{i}}}{\big |}s_{i},{\widehat{\boldsymbol{a}}}(s_{i},{\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}(s_{i},{\boldsymbol{D}})\right) } {\Big (} {\displaystyle {{\prod _{k{\in }{\partial }i}}}} {\widehat{\mu }}_{k{\rightarrow }i}(s_{i},{\boldsymbol{D}}) {\Big )} }} ~(s_{i}{\in }{\Omega }), \nonumber \\&\end{aligned}$$

(10.168)

$$\begin{aligned} {\widehat{u}}({\boldsymbol{D}})\leftarrow & {} {\frac{1}{|E|}} {\sum _{\{i,j\}{\in }E}} {\Bigg (} {\frac{1}{B_{\{i,j\}}}} {\displaystyle {{\sum _{{\tau }_{i}{\in }{\Omega }}}}} {\displaystyle {{\sum _{{\tau }_{j}{\in }{\Omega }}}}} {\big (}-{\delta }_{{\tau }_{i},{\tau }_{j}}{\big )} {\exp }{\left( 2{\widehat{\alpha }}({\boldsymbol{D}}){\delta }_{{\tau }_{i},{\tau }_{j}}\right) } \nonumber \\&\qquad\qquad\quad{}{\times } {\Big (} {\prod _{k{\in }{\partial }i{\backslash }\{j\}}} {\widehat{\mu }}_{k{\rightarrow }i}({\tau }_{i},{\boldsymbol{D}}) {\Big )} g{\left( {\boldsymbol{d_{i}}}{\big |}{\tau }_{i},{\widehat{\boldsymbol{a}}}({\tau }_{i},{\boldsymbol{D}}),{\widehat{\boldsymbol{C}}}({\tau }_{i},{\boldsymbol{D}})\right) } \nonumber \\&\qquad\qquad\quad{}{\times } {\Big (} {\prod _{k{\in }{\partial }j{\backslash }\{i\}}} {\widehat{\mu }}_{k{\rightarrow }j}({\tau }_{j},{\boldsymbol{D}}) {\Big )} g{\left( {\boldsymbol{d_{j}}}{\big |}{\tau }_{j},{\widehat{\boldsymbol{a}}}({\tau }_{j},{\boldsymbol{D}}),{\widehat{\boldsymbol{C}}}({\tau }_{j},{\boldsymbol{D}})\right) } {\Bigg )}, \nonumber \\&\end{aligned}$$

(10.169)

$$\begin{aligned} {\boldsymbol{\widehat{a}}}(s_{i},{\boldsymbol{D}}) \leftarrow {\boldsymbol{a}}(s_{i}) ~ (s_{i}{\in }{\Omega }), \end{aligned}$$

(10.170)

$$\begin{aligned} {\widehat{\boldsymbol{C}}}(s_{i},{\boldsymbol{D}}) \leftarrow {\boldsymbol{C}}(s_{i}) ~ (s_{i}{\in }{\Omega }). \end{aligned}$$

(10.171)

Here, $g{\left( {\boldsymbol{d_{i}}}{\big |}{\xi },{\widehat{\boldsymbol{a}}}({\xi },{\boldsymbol{D}}), {\widehat{\boldsymbol{C}}}({\xi },{\boldsymbol{D}})\right) }$ is defined by Eq. (10.31) for each state ${\xi }({\in }{\Omega })$.

Step 3 (M-step)::

Set the initial values of the messages $\{{\widehat{\lambda }}({\xi })|{\xi }{\in }{\Omega }\}$ in the loopy belief propagation for the Potts prior and repeat the following procedure until ${\widehat{\alpha }}({\boldsymbol{d}})$ and $\{{\widehat{\lambda }}({\xi })|{\xi }{\in }{\Omega }\}$ converge:

$$\begin{aligned} {\lambda }(s_{i})\leftarrow & {} {\frac{ {\displaystyle {{\sum _{{\tau }_{j}{\in }{\Omega }}}}} {\exp }{\left( 2{\widehat{\alpha }}({\boldsymbol{D}}){\delta }_{s_{i},{\tau }_{j}}\right) } {\widehat{\lambda }}({\tau }_{j},{\boldsymbol{D}})^{3} }{ {\displaystyle {{\sum _{{\tau }_{i}{\in }{\Omega }}}}} {\displaystyle {{\sum _{{\tau }_{j}{\in }{\Omega }}}}} {\exp }{\left( 2{\widehat{\alpha }}({\boldsymbol{D}}){\delta }_{{\tau }_{i},{\tau }_{j}}\right) } {\widehat{\lambda }}({\tau }_{j},{\boldsymbol{D}})^{3} } } ~(s_{i}{\in }{\Omega }), \end{aligned}$$

(10.172)

$$\begin{aligned} {\widehat{\lambda }}(s_{i},{\boldsymbol{d}})\leftarrow & {} {\lambda }(s_{i})~ (s_{i}{\in }{\Omega }), \end{aligned}$$

(10.173)

$$\begin{aligned} {\widehat{\alpha }}({\boldsymbol{D}})\leftarrow & {} {\widehat{\alpha }}({\boldsymbol{D}}) {\times } {\Bigg (} {\frac{1}{1+{\widehat{u}}({\boldsymbol{D}})}} {\frac{ {\displaystyle {{\sum _{{\tau }_{i}{\in }{\Omega }}}}} {\displaystyle {{\sum _{{\tau }_{j}{\in }{\Omega }}}}} {\big (}1-{\delta }_{{\tau }_{i},{\tau }_{j}}{\big )} {\widehat{\lambda }}({\tau }_{i},{\boldsymbol{D}})^{3} {\exp }{\left( 2{\widehat{\alpha }}({\boldsymbol{D}}){\delta }_{{\tau }_{i},{\tau }_{j}}\right) } {\widehat{\lambda }}({\tau } _{j},{\boldsymbol{D}})^{3} }{ {\displaystyle {{\sum _{{\tau }_{i}{\in }{\Omega }}}}} {\displaystyle {{\sum _{{\tau }_{j}{\in }{\Omega }}}}} {\widehat{\lambda }}({\tau }_{i},{\boldsymbol{D}})^{3} {\exp }{\left( 2{\widehat{\alpha }}({\boldsymbol{D}}){\delta }_{{\tau }_{i},{\tau }_{j}}\right) } {\widehat{\lambda }}({\tau }_{j},{\boldsymbol{D}})^{3} } } {\Bigg )}^{1/4}. \nonumber \\&\end{aligned}$$

(10.174)

Step 4:

Compute the output ${\widehat{\boldsymbol{s}}}({\boldsymbol{D}}) ={\big (}{\widehat{s}}_{1}({\boldsymbol{D}}),{\widehat{s}}_{2}({\boldsymbol{D}}), {\cdots },{\widehat{s}}_{|V|}({\boldsymbol{D}}){\big )}$ as follows:

$$\begin{aligned} {\widehat{s}}_{i}({\boldsymbol{D}}) \leftarrow {\arg }{\max _{s_{i}{\in }{\Omega }}} ~g{\left( {\boldsymbol{d_{i}}}{\big |}s_{i},{\boldsymbol{\widehat{a}}}(s_{i},{\boldsymbol{D}}),{\boldsymbol{\widehat{C}}}(s_{i},{\boldsymbol{D}})\right) } {\prod _{k{\in }{\partial }i}} {\widehat{\mu }}_{k{\rightarrow }i}(s_{i},{\boldsymbol{D}}) ~(i{\in }V). \end{aligned}$$

(10.175)

Stop if the hyperparameters ${\widehat{\alpha }}({\boldsymbol{D}})$, ${\widehat{\boldsymbol{a}}}(s_{i},{\boldsymbol{D}})$ ($s_{i}{\in }{\Omega }$), and ${\widehat{\boldsymbol{C}}}(s_{i},{\boldsymbol{D}})$ ($s_{i}{\in }{\Omega }$) converge and return to Step 2 otherwise.

Some of the numerical experimental results are shown in Fig. 10.14. The Potts prior has the first-order phase transition as shown in Sect. 10.3.6. Figure 10.14 shows how the hyperparameter $2{\alpha }={\frac{J}{k_\mathrm{{B}}T}}$ converges in the EM procedure with loopy belief propagation under the first-order phase transition.

3.6 Real-Space Renormalization Group Method and Sublinear Modeling of Statistical Machine Learning

First, we explore the most fundamental real-space renormalization procedure for the Ising model in Eq. (10.49) on the ring graph (V, E), where

$$\begin{aligned} E \equiv {\Big \{} \{1,2\}, \{2,3\},\{3,4\},{\cdots }, \{|V|-1,|V|\}, \{|V|,1\} {\Big \}}, \end{aligned}$$

(10.176)

in the case of $|V|=2^{L}$. We have the following equality:

$$\begin{aligned}&{\sum _{s_{2}{\in }{\Omega }}} {\sum _{s_{4}{\in }{\Omega }}} {\sum _{s_{6}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|}{\in }{\Omega }}} {\prod _{\{i,j\}{\in }E}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}Js_{i}s_{i+1}\right) } \nonumber \\&= {\left( {\sum _{s_{2}{\in }{\Omega }}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}J{\left( s_{1} + s_{3}\right) }s_{2}\right) } \right) } {\left( {\sum _{s_{4}{\in }{\Omega }}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}J{\left( s_{3} + s_{5}\right) }s_{4}\right) } \right) } \nonumber \\&\qquad\quad{} {\times } {\cdots } {\times } {\left( {\sum _{s_{|V|}{\in }{\Omega }}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}J{\left( s_{|V|-3} + s_{|V|-1}\right) }s_{|V|-2}\right) } \right) } \nonumber \\&\qquad\quad{} {\left( {\sum _{s_{|V|}{\in }{\Omega }}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}J{\left( s_{|V|-1} + s_{1}\right) }s_{|V|}\right) } \right) } \nonumber \\&= 2^{{\frac{|V|}{2}}} {\left( {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}T}}J\right) } \right) }^{{\frac{1}{2}}{\left( 1+s_{1}s_{3}\right) }} {\left( {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}T}}J\right) } \right) }^{{\frac{1}{2}}{\left( 1+s_{3}s_{5}\right) }} \nonumber \\&\qquad\quad{} {\times } {\cdots } {\times } {\left( {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}T}}J\right) } \right) }^{{\frac{1}{2}}{\left( 1+s_{|V|-3}s_{|V|-1}\right) }} {\left( {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}T}}J\right) } \right) }^{{\frac{1}{2}}{\left( 1+s_{|V|-1}s_{1}\right) }} \nonumber \\&= 2^{{\frac{|V|}{2}}} {\left( {\prod _{i=0}^{{\frac{|V|}{2}}-2}} {\exp }{\left( {\big (}1+s_{2i+1}s_{2i+3}{\big )} {\times }{\frac{1}{2}}{\ln }{\left( {\cosh }{\left( {\frac{2}{k_\mathrm{{B}}T}}J\right) } \right) } \right) } \right) } \nonumber \\&{\exp }{\left( {\big (}1+s_{|V|-1}s_{1}{\big )} {\times }{\frac{1}{2}}{\ln }{\left( {\cosh }{\left( {\frac{2}{k_\mathrm{{B}}T}}J\right) } \right) } \right) }. \nonumber \\&\end{aligned}$$

(10.177)

For the ${\frac{|V|}{2}}$-dimensional state vector $(a_{1},a_{3},a_{5},{\cdots },a_{|V-3|},a_{|V|-1})$, the marginal probability distribution $P_{\{1,3,5,{\cdots },|V|-3,|V|-1\}}{\big (}a_{1},a_{3},a_{5}, {\cdots },a_{|V|-3},a_{|V|-1} {\big |}{\alpha }{\big )}$ is expressed as

$$\begin{aligned}&P_{\{1,3,5,{\cdots },|V|-3,|V|-1\}} {\left( s_{1},s_{3},s_{5},{\cdots },s_{|V|-3},s_{|V|-1}\right) } \nonumber \\&\equiv {\sum _{s_{2}{\in }{\Omega }}} {\sum _{s_{4}{\in }{\Omega }}} {\sum _{s_{6}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|-2}{\in }{\Omega }}} {\sum _{s_{|V|}{\in }{\Omega }}} P{\left( s_{1},s_{2},s_{3},s_{4},s_{5},s_{6},{\cdots },s_{|V|-3},s_{|V|-2}s_{|V|-1},s_{|V|}\right) } \nonumber \\&= {\frac{ {\left( {\displaystyle {{\prod _{i=0}^{{\frac{|V|}{2}}-2}}}} {\exp }{\left( {\alpha }^{(1)}s_{2i+1},s_{2i+3}\right) } \right) } {\exp }{\left( {\alpha }^{(1)}s_{|V|-1},s_{1}\right) } }{ {\displaystyle { {\sum _{a_{1}{\in }{\Omega }}}{\sum _{a_{3}{\in }{\Omega }}} {\cdots } {\sum _{a_{|V|-1}{\in }{\Omega }}} }} {\left( {\displaystyle {{\prod _{i=0}^{{\frac{|V|}{2}}-2}}}} {\exp }{\left( {\alpha }^{(1)}s_{2i+1},s_{2i+3}\right) } \right) } {\exp }{\left( {\alpha }^{(1)}s_{|V|-1},s_{1}\right) } } }, \end{aligned}$$

(10.178)

where

$$\begin{aligned} {\alpha }^{(1)} \equiv {\frac{1}{2}}{\ln }{\left( {\cosh }{\left( {\frac{2}{k_\mathrm{{B}}T}}J\right) }\right) }. \end{aligned}$$

(10.179)

The remaining nodes, which are denoted by odd numbers, are now renumbered by replacing i with ${\frac{i-1}{2}}$ for $i=1,3,5,{\cdots },|V|-3,|V|-1$ and new sets $V^{(1)}$ and $E^{(1)}$ of nodes and edges and a new state vector ${\boldsymbol{s^{(1)}}}={\left( s_{1}^{(1)},s_{2}^{(1)},s_{3}^{(1)},{\cdots },s_{{\frac{|V|}{2}}-1}^{(1)},s_{{\frac{|V|}{2}}}^{(1)}\right) }^\mathrm{{T}}$ are introduced as follows:

$$\begin{aligned} V^{(1)} \equiv {\left\{ 1,2,,3,4,{\cdots },{\frac{|V|}{2}}-1,{\frac{|V|}{2}}\right\} }, \end{aligned}$$

(10.180)

$$\begin{aligned} E^{(1)} \equiv {\left\{ \{1,2\},\{2,3\},\{3,4\},{\cdots }, \{{\frac{|V|}{2}}-1,{\frac{|V|}{2}}\}, \{{\frac{|V|}{2}},1\} \right\} }, \end{aligned}$$

(10.181)

$$\begin{aligned} s_{i}^{(1)}=s_{2i-1}~(i=1,2,{\cdots },|V|/2). \end{aligned}$$

(10.182)

For the ${\frac{|V|}{2}}$-dimentional state vector ${\boldsymbol{s^{(1)}}}={\left( s_{1}^{(1)},s_{2}^{(1)},s_{3}^{(1)},{\cdots },s_{{\frac{|V|}{2}}-1}^{(1)},s_{{\frac{|V|}{2}}}^{(1)}\right) }^\mathrm{{T}}$, we define a new renormalized probability distribution by

$$\begin{aligned} P^{(1)}{\left( {\boldsymbol{s}}^{(1)}\right) }\equiv & {} {\frac{ {\displaystyle {{\prod _{\{i,j\}{\in }E^{(1)}}}}} {\exp }{\left( {\alpha }^{(1)}s_{i}^{(1)}s_{j}^{(1)}\right) } }{ {\displaystyle { {\sum _{a_{1}^{(1)}{\in }{\Omega }}}{\sum _{a_{2}^{(1)}{\in }{\Omega }}} {\cdots } {\sum _{a_{|V|/2}^{(1)}{\in }{\Omega }}} }} {\displaystyle {{\prod _{\{i,j\}{\in }E^{(1)}}}}} {\exp }{\left( {\alpha }^{(1)}s_{i}^{(1)}s_{j}^{(1)}\right) } } }. \end{aligned}$$

(10.183)

By repeating the above renormalizing procedures,

$$\begin{aligned}&{\sum _{s_{2}^{(r-1)}{\in }{\Omega }}} {\sum _{s_{4}^{(r-1)}{\in }{\Omega }}} {\sum _{s_{6}^{(r-1)}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V^{(r-1)}|}{\in }{\Omega }}} {\prod _{\{i,j\}{\in }E^{(r-1)}}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}Js_{i}^{(r-1)}s_{i+1}^{(r-1)}\right) } \nonumber \\&= 2^{{\frac{|V|}{2^{r}}}} {\left( {\prod _{i=0}^{{\frac{|V|}{2^{r}}}-2}} {\exp }{\left( {\big (}1+s_{2i+1}s_{2i+3}{\big )} {\times }{\frac{1}{2}}{\ln }{\left( {\cosh }{\left( 2{\alpha }^{(r-1)}\right) } \right) } \right) } \right) } \nonumber \\&\qquad\quad{} {\exp }{\left( {\big (}1+s_{|V|-1}s_{1}{\big )} {\times }{\frac{1}{2}}{\ln }{\left( {\cosh }{\left( {\frac{2}{k_\mathrm{{B}}T}}J\right) } \right) } \right) }, \nonumber \\&\end{aligned}$$

(10.184)

$$\begin{aligned} {\alpha }^{(r)} \equiv {\frac{1}{2}}{\ln }{\left( {\cosh }{\left( 2 {\alpha }^{(r-1)} \right) } \right) }, \end{aligned}$$

(10.185)

$$\begin{aligned} s_{i}^{(r)}=s_{2i-1}^{(r-1)}~{\left( i=1,2,{\cdots },{\frac{|V|}{2^{r}}} \right) }, \end{aligned}$$

(10.186)

the renormalized probability of the r-th step is generated as follows:

$$\begin{aligned} P^{(r)}{\left( {\boldsymbol{s}}^{(r)} \right) }\equiv & {} {\frac{ {\displaystyle {{\prod _{\{i,j\}{\in }E^{(r)}}}}} {\exp }{\left( {\alpha }^{(r)}s_{i}^{(r)}s_{i+1}^{(r)} \right) } }{ {\displaystyle { {\sum _{s_{1}^{(r)}{\in }{\Omega }}}{\sum _{s_{2}^{(r)}{\in }{\Omega }}} {\cdots } {\sum _{s_{|V|/2^{r}}^{(r)}{\in }{\Omega }}} }} {\displaystyle {{\prod _{\{i,j\}{\in }E^{(r)}}}}} {\exp }{\left( {\alpha }^{(r)}s_{i}^{(r)}s_{i+1}^{(r)} \right) } } }, \end{aligned}$$

(10.187)

where

$$\begin{aligned} V^{(r)} \equiv {\big \{}1,2,{\cdots },{\frac{|V|}{2^{r}}}{\big \}}, \end{aligned}$$

(10.188)

$$\begin{aligned} E^{(r)} \equiv {\Big \{}\{1,2\},\{2,3\},\{3,4\},{\cdots }, \{{\frac{|V|}{2^{r}}}-1,{\frac{|V|}{2^{r}}}\}, \{{\frac{|V|}{2^{r}}},1\} {\Big \}}. \end{aligned}$$

(10.189)

Note that $V^{(0)}=V$, $E^{(0)}=E$, ${\alpha }^{(0)}={\frac{J}{k_\mathrm{{B}}T}}$, ${\boldsymbol{s}}^{(0)}={\boldsymbol{s}}$, and $P^{(0)}{\left( {\boldsymbol{s}}^{(0)}\right) }=P{\left( {\boldsymbol{s}}\right) }$.

Equation (10.185) corresponds to the update rule from ${\alpha }^{(r-1)}$ ${\alpha }^{(r)}$. By solving Eq. (10.185) with respect to ${\alpha }^{(r-1)}$, we can derive the inverse transformation rule of the real-space renormalization group procedure as follows:

$$\begin{aligned} {\alpha }^{(r-1)} = {\frac{1}{2}} \mathrm{{arc}}{\cosh }{\left( {\exp }{\left( 2{\alpha }^{(r)} \right) } \right) }. \end{aligned}$$

(10.190)

If the hyperparameter ${\alpha }^{(r)}$ in the r-th renormalized probability distribution $P^{(r)}{\left( {\boldsymbol{s}}^{(r)} \right) }$ has been estimated from given data vectors by means of the EM algorithm for renormalized probabilistic graphical models on ring graphs ${\left( V^{(r)},E^{(r)}\right) }$, we can estimate the hyperparameter ${\alpha }^{(0)}={\frac{J}{k_\mathrm{{B}}T}}$ of the probabilistic graphical models (10.49) on ring graphs (V, E) by using the inverse transformation rule of the real-space renormalization group procedure (10.190).

Now, we extend the real-space renormalization group scheme for the probabilistic graphical model on the ring graph to the square grid graph as a pair approximation in the real-space renormalization group framework as follows:

$$\begin{aligned}&{} {\exp }{\left( {\alpha }^{(r)}s_{1}s_{3} \right) } \varpropto {\sum _{s_{2}{\in }{\Omega }}} {\sum _{s_{4}{\in }{\Omega }}} {\exp }{\left( {\alpha }^{(r-1)} {\left( s_{1}s_{2} +s_{2}s_{3} +s_{1}s_{4} +s_{4}s_{3}\right) }\right) }. \end{aligned}$$

(10.191)

Equation (10.191) can be reduced to

$$\begin{aligned} {\alpha }^{(r)} = {\ln }{\left( {\cosh }{\left( 2 {\alpha }^{(r-1)} \right) } \right) }. \end{aligned}$$

(10.192)

The r-th renormalized probability distribution for Eq. (10.49) is expressed as

$$\begin{aligned} P^{(r)}{\left( {\boldsymbol{s}}^{(r)}\right) } \propto {\prod _{\{i,j\}{\in }E^{(r)}}} {\exp }{\left( {\alpha }^{(r)}s_{i}^{(r)}s_{j}^{(r)} \right) }. \end{aligned}$$

(10.193)

The inversion formula in Eq. (10.192) can be derived as

$$\begin{aligned} {\alpha }^{(r-1)} = {\frac{1}{2}} \mathrm{{arc}}{\cosh }{\left( {\exp }{\left( {\alpha }^{(r)} \right) } \right) }. \end{aligned}$$

(10.194)

The above framework can be extended to the $|{\Omega }|$-state Potts model, as shown in Fig. 10.15. The inverse renormalization group transformation can also be applied to the probabilistic segmentations in Eqs. (10.41), (10.42), and (10.43) with Eqs. (10.44), (10.45), and (10.46) in Sect. 10.2.3 [81]. One of the numerical experimental results in the inverse renormalization group transformation in probabilistic segmentations is shown in Fig. 10.16.

4 Quantum Statistical Machine Learning

This section explores the fundamental frameworks of quantum probabilistic graphical models based on energy matrices and density matrices. Note that every energy matrix needs to be Hermitian and have a density matrix that is defined by all the eigenvalues and all the eigenvectors of each energy matrix. If all the off-diagonal elements of the density matrix are zero, the diagonal elements correspond to the probability distribution in the probabilistic graphical model. First, we explain general frameworks of density matrices and their differentiations and define the minimization of free energies of density matrices. Second, we give the definitions of tensor products of matrices as well as vectors. By using Pauli spin matrices as well as tensor products, we introduce quantum probabilistic graphical models. Finally, we extend the conventional EM algorithm to a quantum expectation-maximization (QEM) algorithm.

4.1 Elementary Function and Differentiations of Hermitian Matrices

Before proceeding with the quantum statistical mechanical extension of statistical machine learning, we need to explore some essential formulas for Hermitian matrices and their derivatives. Some fundamental properties of matrices for statistical inference have appeared in Ref. [82]. In the present section, we give some useful formulas for treating the entropy in quantum probabilistic graphical models.

We consider the $M{\times }M$ Hermitian matrix ${\boldsymbol{A}}$

$$\begin{aligned} {\boldsymbol{A}}= {\left( \begin{array}{ccccc} A_{11} &{} A_{12} &{} {\cdots } &{} A_{1M}\\ A_{21} &{} A_{22} &{} {\cdots } &{} A_{2M}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ A_{M1} &{} A_{M2} &{} {\cdots } &{} A_{MM} \end{array} \right) }, \end{aligned}$$

(10.195)

which satisfies ${\boldsymbol{A}}={\boldsymbol{\overline{A}}}^\mathrm{{T}}$. Here we remark that ${\boldsymbol{A}}^\mathrm{{T}}$ and ${\boldsymbol{\overline{A}}}$ are the transpose and conjugate matrix of ${\boldsymbol{A}}$, respectively. We introduce vertical and horizontal basis vectors in the M-dimensional space as follows:

$$\begin{aligned} |1{\rangle } ={\left( \begin{array}{ccccc} 1 \\ 0 \\ 0 \\ 0 \\ {\vdots } \\ 0 \\ 0 \\ 0 \\ \end{array} \right) }, ~ |2{\rangle } ={\left( \begin{array}{ccccc} 0 \\ 1 \\ 0 \\ 0 \\ {\vdots } \\ 0 \\ 0 \\ 0 \\ \end{array} \right) }, ~ |3{\rangle } ={\left( \begin{array}{ccccc} 0 \\ 0 \\ 1 \\ 0 \\ {\vdots } \\ 0 \\ 0 \\ 0 \\ \end{array} \right) }, {\cdots }, |M-1{\rangle } ={\left( \begin{array}{ccccc} 0 \\ 0 \\ 0 \\ 0 \\ {\vdots } \\ 0 \\ 1 \\ 0 \\ \end{array} \right) }, |M{\rangle } ={\left( \begin{array}{ccccc} 0 \\ 0 \\ 0 \\ 0 \\ {\vdots } \\ 0 \\ 0 \\ 1 \\ \end{array} \right) }, \end{aligned}$$

(10.196)

and

$$\begin{aligned} \left\{ \begin{array}{rll} {\langle }1| &{} = &{} {\left( 1,0,0,0,{\cdots },0,0,0\right) }, \\ {\langle }2| &{} = &{} {\left( 0,1,0,0,{\cdots },0,0,0\right) }, \\ {\langle }3| &{} = &{} {\left( 0,0,1,0,{\cdots },0,0,0\right) }, \\ &{} {\vdots } &{} \\ {\langle }M-1| &{} = &{} {\left( 0,0,0,0,{\cdots },0,1,0\right) }, \\ {\langle }M| &{} = &{} {\left( 0,0,0,0,{\cdots },0,0,1\right) }. \\ \end{array} \right. \end{aligned}$$

(10.197)

We can confirm that

$$\begin{aligned} {\langle }i|{\boldsymbol{A}}|j{\rangle }=A_{ij}~{\left( i{\in }\{1,2,{\cdots },M\},~j{\in }\{1,2,{\cdots },M\} \right) }. \end{aligned}$$

(10.198)

The Hermitian matrix ${\boldsymbol{A}}$ is diagonalized as

$$\begin{aligned} {\boldsymbol{A}}={\boldsymbol{U}}{\boldsymbol{\Lambda }}{\boldsymbol{U}}^{-1}, \end{aligned}$$

(10.199)

$$\begin{aligned} {\boldsymbol{\Lambda }} \equiv {\left( \begin{array}{ccccccccc} {\lambda }_{1} &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {\lambda }_{2} &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {\lambda }_{3} &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {\lambda }_{M} \end{array} \right) }, \end{aligned}$$

(10.200)

where all the eigenvalues, ${\lambda }_{1},{\lambda }_{2},{\cdots },{\lambda }_{M}$, are always real numbers. For the eigenvector ${\boldsymbol{u}}_{i}= {\left( \begin{array}{ccccccccc} U_{1i} \\ U_{2i} \\ {\vdots } \\ U_{Mi} \end{array} \right) }$ corresponding to the eigenvalue ${\lambda }_{i}$, such that ${\boldsymbol{A}}{\boldsymbol{u}}_{i}={\lambda }_{i}{\boldsymbol{u}}_{i}$, for every $i{\in }\{1,2,3,{\cdots },M\}$ the matrix ${\boldsymbol{U}}$ is defined by

$$\begin{aligned} {\boldsymbol{U}} \equiv {\left( {\boldsymbol{u}}_{1},{\boldsymbol{u}}_{2},{\boldsymbol{u}}_{3},{\cdots },{\boldsymbol{u}}_{M} \right) } = {\left( \begin{array}{ccccccccc} U_{11} &{} U_{12} &{} U_{13} &{} {\cdots } &{} U_{1M} \\ U_{21} &{} U_{22} &{} U_{23} &{} {\cdots } &{} U_{2M} \\ U_{31} &{} U_{32} &{} U_{33} &{} {\cdots } &{} U_{3M} \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ U_{M1} &{} U_{M2} &{} U_{M3} &{} {\cdots } &{} U_{MM} \end{array} \right) }. \end{aligned}$$

(10.201)

It is known that ${\boldsymbol{U}}$ is a unitary matrix that satisfies ${\boldsymbol{U}}^{-1}={\boldsymbol{\overline{U}}}^\mathrm{{T}}$ for any Hermitian matrix ${\boldsymbol{s}}$. If ${\lambda }_{1}$ is the maximum eigenvalue, its corresponding eigenvector ${\boldsymbol{u_{1}}}$ is expressed using the following notation:

$$\begin{aligned} {\boldsymbol{u_{1}}} = {\arg }{\max }{\boldsymbol{A}}. \end{aligned}$$

(10.202)

Note that ${\arg }{\max }{\boldsymbol{A}}$ is the eigenvector that corresponds to the maximum eigenvalue of ${\boldsymbol{A}}$.

For any Hermitian matrix ${\boldsymbol{A}}$, the exponential function is defined by

$$\begin{aligned} {\exp }{\left( {\boldsymbol{A}}\right) }\equiv & {} {\sum _{n=0}^{+{\infty }}}{\frac{1}{n!}}{\boldsymbol{A}}^{n} \nonumber \\= & {} {\boldsymbol{U}} {\left( \begin{array}{ccccccccc} {\exp }({\lambda }_{1}) &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {\exp }({\lambda }_{2}) &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {\exp }({\lambda }_{3}) &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {\exp }({\lambda }_{M}) \end{array} \right) } {\boldsymbol{U}}^{-1}, \end{aligned}$$

(10.203)

and ${\ln }{\left( {\boldsymbol{A}}\right) }$ is defined by the inverse function of ${\exp }{\left( {\boldsymbol{A}} \right) }$ such that

$$\begin{aligned} {\exp }{\left( {\ln }{\left( {\boldsymbol{A}}\right) }\right) }={\boldsymbol{A}}. \end{aligned}$$

(10.204)

In the present definition, we have

$$\begin{aligned} {\exp }{\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I}}\right) } ={\left( {\exp }{\left( {\boldsymbol{A}}\right) }\right) }{\otimes }{\boldsymbol{I}}, {\exp }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{A}}\right) } ={\boldsymbol{I}}{\otimes }{\left( {\exp }{\left( {\boldsymbol{A}}\right) }\right) }, \end{aligned}$$

(10.205)

where ${\boldsymbol{I}}$ is an identity matrix.

For $|1-{\lambda }_{1}|<1,|1-{\lambda }_{2}|<1,{\cdots },|1-{\lambda }_{N}|<1$, ${\ln }{\left( {\boldsymbol{A}}\right) }$ is defined by

$$\begin{aligned} {\ln }{\left( {\boldsymbol{A}}\right) } = {\ln }{\left( {\boldsymbol{I}} - {\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }\right) }\equiv & {} - {\sum _{n=1}^{+{\infty }}}{\frac{1}{n}}{\left( {\boldsymbol{I}} - {\boldsymbol{A}}\right) }^{n} \nonumber \\= & {} {\boldsymbol{U}} {\left( \begin{array}{ccccccccc} {\ln }({\lambda }_{1}) &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {\ln }({\lambda }_{2}) &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {\ln }({\lambda }_{3}) &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {\ln }({\lambda }_{M}) \end{array} \right) } {\boldsymbol{U}}^{-1}. \end{aligned}$$

(10.206)

By using Eqs. (10.203) and (10.206), we can confirm that

$$\begin{aligned} {\exp }{\left( {\ln }{\left( {\boldsymbol{A}}\right) } \right) }= & {} {\sum _{n=0}^{+{\infty }}}{\frac{1}{n!}} {\left( {\ln }{\left( {\boldsymbol{A}} \right) } \right) }^{n} \nonumber \\= & {} {\sum _{n=0}^{+{\infty }}} {\frac{1}{n!}} {\boldsymbol{U}} {\left( \begin{array}{ccccccccc} {\left( {\ln }({\lambda }_{1})\right) }^{n} &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {\left( {\ln }({\lambda }_{2})\right) }^{n} &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {\left( {\ln }({\lambda }_{3})\right) }^{n} &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {\left( {\ln }({\lambda }_{M})\right) }^{n} \end{array} \right) } {\boldsymbol{U}}^{-1} \nonumber \\= & {} {\boldsymbol{U}} {\left( \begin{array}{ccccccccc} {\exp }{\left( {\ln }({\lambda }_{1})\right) } &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {\exp }{\left( {\ln }({\lambda }_{2})\right) } &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {\exp }{\left( {\ln }({\lambda }_{3})\right) } &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {\exp }{\left( {\ln }({\lambda }_{M})\right) } \end{array} \right) } {\boldsymbol{U}}^{-1} \nonumber \\= & {} {\boldsymbol{U}} {\left( \begin{array}{ccccccccc} {\lambda }_{1} &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {\lambda }_{2} &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {\lambda }_{3} &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {\lambda }_{M} \end{array} \right) } {\boldsymbol{U}}^{-1} ={\boldsymbol{A}}. \end{aligned}$$

(10.207)

Moreover, we have

$$\begin{aligned}&{\sum _{n=0}^{N}}{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{n} -{\left( {\boldsymbol{I}}-{\boldsymbol{s}}\right) }{\sum _{n=0}^{N}}{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{n} \nonumber \\&={\boldsymbol{I}}+{\left( {\boldsymbol{I}}-{\boldsymbol{s}}\right) }+{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{2}+{\cdots }+{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{N} \nonumber \\&\quad{} -{\left( {\boldsymbol{I}}-{\boldsymbol{s}}\right) }-{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{2}-{\cdots }-{\left( {\boldsymbol{I}}-{\boldsymbol{s}}\right) }^{N}-{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{N+1} \nonumber \\&={\boldsymbol{I}}-{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{N+1}, \end{aligned}$$

(10.208)

such that

$$\begin{aligned} {\sum _{n=0}^{N}}{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{n} ={\left( {\boldsymbol{I}} - {\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }\right) }^{-1}{\left( {\boldsymbol{I}} - {\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{N+1} \right) }. \end{aligned}$$

(10.209)

We have

$$\begin{aligned} {\boldsymbol{A}}^{N+1}={\boldsymbol{U}} {\left( \begin{array}{ccccccccc} {(1-{\lambda }_{1})}^{N+1} &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {(1-{\lambda }_{2})}^{N+1} &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {(1-{\lambda }_{3})}^{N+1} &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {(1-{\lambda }_{M})}^{N+1} \end{array} \right) } {\boldsymbol{U}}^{-1}{\rightarrow }0~(N{\rightarrow }+{\infty }), \nonumber \\ \end{aligned}$$

(10.210)

so it is valid that

$$\begin{aligned} {\boldsymbol{A}}^{-1}={\left( {\boldsymbol{I}} - {\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }\right) }^{-1} ={\sum _{n=0}^{+{\infty }}}{\left( {\boldsymbol{I}}-{\boldsymbol{A}}\right) }^{n}. \end{aligned}$$

(10.211)

Note that ${\exp }({\boldsymbol{A}}$ and ${\ln }({\boldsymbol{A}})$ as well as ${\boldsymbol{A}}^{-1}$ are also Hermitian matrices in the present case. (This can be shown by using ${\overline{{\boldsymbol{A}}^{n}}}^\mathrm{{T}}={\big (}{\overline{{\boldsymbol{A}}}}^\mathrm{{T}}{\big )}^{n}$.)

We now introduce a Hermitian matrix function ${\boldsymbol{G}}(x)$ for any real number x as follows:

$$\begin{aligned} {\boldsymbol{G}}(x)\equiv & {} {\left( \begin{array}{ccccccccc} G_{11}(x) &{} G_{12}(x) &{} G_{13}(x) &{} {\cdots } &{} G_{1M}(x) \\ G_{21}(x) &{} G_{22}(x) &{} G_{23}(x) &{} {\cdots } &{} G_{2M}(x) \\ G_{31}(x) &{} G_{32}(x) &{} G_{33}(x) &{} {\cdots } &{} G_{3M}(x) \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ G_{M1}(x) &{} G_{M2}(x) &{} G_{M3}(x) &{} {\cdots } &{} G_{MM}(x) \end{array} \right) }. \end{aligned}$$

(10.212)

We have

$$\begin{aligned} G_{ij}(x)={\overline{G}}_{ji}(x)~(i{\in }\{1,2,{\cdots },M\},~j{\in }\{1,2,{\cdots },M\}), \end{aligned}$$

(10.213)

such that

$$\begin{aligned} {{\langle }}i{{|}}{\boldsymbol{G}}(x){{|}}j{{\rangle }} ={{\langle }}j{{|}}{\boldsymbol{\overline{G}}}(x){{|}}i{{\rangle }}~(i{\in }\{1,2,{\cdots },M\},~j{\in }\{1,2,{\cdots },M\}). \end{aligned}$$

(10.214)

It is obvious that the derivative of the matrix ${\boldsymbol{G}}(x)$ with respect to x, namely,

$$\begin{aligned} {\frac{d}{dx}}{\boldsymbol{G}}(x)\equiv & {} {\left( \begin{array}{ccccccccc} {\frac{d}{dx}}G_{11}(x) &{} {\frac{d}{dx}}G_{12}(x) &{} {\frac{d}{dx}}G_{13}(x) &{} {\cdots } &{} {\frac{d}{dx}}G_{1M}(x) \\ {\frac{d}{dx}}G_{21}(x) &{} {\frac{d}{dx}}G_{22}(x) &{} {\frac{d}{dx}}G_{23}(x) &{} {\cdots } &{} {\frac{d}{dx}}G_{2M}(x) \\ {\frac{d}{dx}}G_{31}(x) &{} {\frac{d}{dx}}G_{32}(x) &{} {\frac{d}{dx}}G_{33}(x) &{} {\cdots } &{} {\frac{d}{dx}}G_{3M}(x) \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ {\frac{d}{dx}}G_{M1}(x) &{} {\frac{d}{dx}}G_{M2}(x) &{} {\frac{d}{dx}}G_{M3}(x) &{} {\cdots } &{} {\frac{d}{dx}}G_{MM}(x) \end{array} \right) } \end{aligned}$$

(10.215)

is also a Hermitian matrix such that

$$\begin{aligned} {{\langle }}i{{|}}{\frac{d}{dx}}{\boldsymbol{G}}(x){{|}}j{{\rangle }} ={{\langle }}j{{|}}{\frac{d}{dx}}{\boldsymbol{\overline{G}}}(x){{|}}j{{\rangle }}~(i{\in }\{1,2,{\cdots },M\},~j{\in }\{1,2,{\cdots },M\}). \nonumber \\ \end{aligned}$$

(10.216)

We have the following equalities:

$$\begin{aligned} {\frac{d}{dx}}{\left( \mathrm{{Tr}}{\left[ {\boldsymbol{G}}(x)\right] }\right) }=\mathrm{{Tr}}{\left[ {\frac{d}{dx}}{\boldsymbol{G}}(x)\right] }, \end{aligned}$$

(10.217)

and

$$\begin{aligned} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\boldsymbol{G}}(x)\right) } =\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x){\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }\right) }. \end{aligned}$$

(10.218)

Equation (10.218) can be confirmed as follows:

$$\begin{aligned} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) } {\boldsymbol{G}}(x) \right) }= & {} {\sum _{i=1}^{M}}{{\langle }}i{{|}}{\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\boldsymbol{G}}(x){{|}}i{{\rangle }} ={\sum _{i=1}^{M}}{\sum _{j=1}^{M}} {{\langle }}i{{|}}{\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{{|}}j{{\rangle }}{{\langle }}j{{|}}{\boldsymbol{G}}(x){{|}}i{{\rangle }} \nonumber \\= & {} {\sum _{j=1}^{M}}{\sum _{i=1}^{M}} {{\langle }}j{{|}}{\boldsymbol{G}}(x){{|}}i{{\rangle }} {{\langle }}i{{|}}{\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{{|}}j{{\rangle }} =\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x){\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) } \right) }. \nonumber \\&\end{aligned}$$

(10.219)

By using Eqs. (10.217) and (10.219), we derive the following fundamental formula

$$\begin{aligned} {\frac{d}{dx}}\mathrm{{tr}}{\left( {\boldsymbol{G}}(x)^{n}\right) }= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\left( {\boldsymbol{G}}(x)^{n-1}\right) }\right) } +\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x){\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)^{n-1}\right) }\right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\left( {\boldsymbol{G}}(x)^{n-1}\right) }\right) } \nonumber \\&+\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x) {\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\left( {\boldsymbol{G}}(x)^{n-2}\right) } +{\boldsymbol{G}}(x){\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)^{n-2}\right) } \right) }\right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\left( {\boldsymbol{G}}(x)^{n-1}\right) }\right) } \nonumber \\&+\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x) {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\left( {\boldsymbol{G}}(x)^{n-2}\right) }\right) } +\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x)^{2}{\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)^{n-2}\right) }\right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\boldsymbol{G}}(x)^{n-1}\right) } \nonumber \\&+\mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\boldsymbol{G}}(x){\boldsymbol{G}}(x)^{n-2}\right) } +\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x)^{2}{\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)^{n-2}\right) }\right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }2{\boldsymbol{G}}(x)^{n-1}\right) } +\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x)^{2}{\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)^{n-2}\right) }\right) } \nonumber \\= & {} {\cdots } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }(n-1){\boldsymbol{G}}(x)^{n-1}\right) } +\mathrm{{Tr}}{\left( {\boldsymbol{G}}(x)^{n-1}{\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }\right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }(n-1){\boldsymbol{G}}(x)^{n-1}\right) } +\mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\boldsymbol{G}}(x)^{n-1}\right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }n{\boldsymbol{G}}(x)^{n-1}\right) }. \end{aligned}$$

(10.220)

From Eqs. (10.217) and (10.220), we can confirm the following equality:

$$\begin{aligned} {\frac{d}{dx}}\mathrm{{Tr}}{\left( {\ln }{\left( {\boldsymbol{G}}(x)\right) }\right) }= & {} {\frac{d}{dx}}\mathrm{{Tr}}{\left( {\ln }{\left( {\boldsymbol{I}}-{\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }\right) }\right) } \nonumber \\= & {} {\frac{d}{dx}}\mathrm{{Tr}}{\left( -{\sum _{n=1}^{+{\infty }}}{\frac{1}{n}}{\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }^{n} \right) } \nonumber \\= & {} -{\sum _{n=1}^{+{\infty }}}{\frac{1}{n}} {\frac{d}{dx}}\mathrm{{Tr}}{\left( {\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }^{n} \right) } \nonumber \\= & {} -{\sum _{n=1}^{+{\infty }}}{\frac{1}{n}} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }\right) } n{\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }^{n-1} \right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( -{\frac{d}{dx}}{\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }\right) } {\left( {\sum _{n=1}^{+{\infty }}} {\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }^{n-1} \right) }\right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) } {\left( {\sum _{n=1}^{+{\infty }}} {\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }^{n-1} \right) } \right) } \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\left( {\boldsymbol{I}}-{\left( {\boldsymbol{I}}-{\boldsymbol{G}}(x)\right) }\right) }^{-1}\right) }. \nonumber \\= & {} \mathrm{{Tr}}{\left( {\left( {\frac{d}{dx}}{\boldsymbol{G}}(x)\right) }{\left( {\boldsymbol{G}}(x)^{-1}\right) }\right) }. \end{aligned}$$

(10.221)

By using Eqs. (10.221), we can confirm the following equality:

$$\begin{aligned} {\frac{d}{d{\boldsymbol{A}}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\left( {\ln }{\left( {\boldsymbol{A}}\right) }\right) }\right] }\equiv & {} {\left( \begin{array}{ccccccccc} {\frac{d}{dA_{11}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } &{} {\frac{d}{dA_{12}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } &{} {\cdots } &{} {\frac{d}{dA_{1M}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } \\ {\frac{d}{dA_{21}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } &{} {\frac{d}{dA_{22}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } &{} {\cdots } &{} {\frac{d}{dA_{2M}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } \\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ {\frac{d}{dA_{M1}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } &{} {\frac{d}{dA_{M2}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } &{} {\cdots } &{} {\frac{d}{dA_{MM}}}\mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\ln }{\left( {\boldsymbol{A}}\right) }\right] } \end{array} \right) } \nonumber \\= & {} {\ln }{\left( {\boldsymbol{A}}\right) }+{\boldsymbol{I}}. \end{aligned}$$

(10.222)

4.2 Minimization of Free Energy Functionals for Density Matrices

For any $M{\times }M$ Hermitian matrix ${\boldsymbol{H}}$ that satisfies ${\boldsymbol{H}}={\boldsymbol{{\overline{H}}}}^\mathrm{{T}}$, the free energy functional for an $M{\times }M$ trial density matrix

$$\begin{aligned} {\boldsymbol{R}}= {\left( \begin{array}{ccccccccc} R_{11} &{} R_{12} &{} {\cdots } &{} R_{1M} \\ R_{21} &{} R_{22} &{} {\cdots } &{} R_{2M} \\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ R_{M1} &{} R_{M2} &{} {\cdots } &{} R_{MM} \end{array} \right) } \end{aligned}$$

(10.223)

is defined by

$$\begin{aligned} \mathcal{{F}}[{\boldsymbol{R}}] =\mathrm{{Tr}}{\left[ {\boldsymbol{R}}{\left( {\boldsymbol{H}}+k_\mathrm{{B}}T{\ln }{\left( {\boldsymbol{R}} \right) } \right) } \right] }. \end{aligned}$$

(10.224)

The density matrix ${\boldsymbol{P}}$ is determined so as to satisfy the following conditional minimization with the normalization condition as follows:

$$\begin{aligned} {\boldsymbol{P}} ={\arg }{\min _{{\boldsymbol{R}}}}{\big \{} \mathcal{{F}}[{\boldsymbol{R}}] {\big |} \mathrm{{Tr}}{\left[ {\boldsymbol{R}}\right] }=1 {\big \}}, \end{aligned}$$

(10.225)

and this reduces to

$$\begin{aligned} {\boldsymbol{P}}={\frac{1}{Z}}{\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\boldsymbol{H}}\right) }, \end{aligned}$$

(10.226)

$$\begin{aligned} Z \equiv \mathrm{{Tr}}{\left[ {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\boldsymbol{H}}\right) }\right] }. \end{aligned}$$

(10.227)

First, we introduce the Lagrange multiplier ${\lambda }$ to ensure the normalization condition as follows:

$$\begin{aligned} \mathcal{{L}}[{\boldsymbol{R}}] \equiv \mathcal{{F}}[{\boldsymbol{R}}] - {\lambda }{\big (}\mathrm{{Tr}}{\left[ {\boldsymbol{R}}\right] }-1{\big )}. \end{aligned}$$

(10.228)

${\boldsymbol{{\widehat{R}}}}$ are determined so as to satisfy the following extremum condition:

$$\begin{aligned} {\frac{{\partial }}{{\partial }R_{mm'}}}\mathcal{{L}}[{\boldsymbol{R}}]=0~(m=1,2,{\cdots },M,~m'=1,2,{\cdots },M). \end{aligned}$$

(10.229)

Finally, by determining ${\lambda }$ so as to satisfy the normalization condition $\mathrm{{Tr}}{\left[ {\boldsymbol{{\widehat{R}}}}\right] }=1$, Eqs. (10.226) and (10.227) can be derived.

Because the energy matrix ${\boldsymbol{H}}$ is a Hermitian matrix, all the eigenvalues $h_{m}$ are always real numbers and all the eigenvectors ${\left( \begin{array}{ccccccccc} {\psi }^{(m)}(1) \\ {\psi }^{(m)}(2) \\ {\vdots } \\ {\psi }^{(m)}(M) \\ \end{array} \right) }$ can be chosen as real vectors and are defined by

$$\begin{aligned} {\boldsymbol{H}} {\left( \begin{array}{ccccccccc} {\psi }^{(m)}(1) \\ {\psi }^{(m)}(2) \\ {\vdots } \\ {\psi }^{(m)}(M) \\ \end{array} \right) } = h^{(m)} {\left( \begin{array}{ccccccccc} {\psi }^{(m)}(1) \\ {\psi }^{(m)}(2) \\ {\vdots } \\ {\psi }^{(m)}(M) \\ \end{array} \right) }~(m=1,2,{\cdots },M), \end{aligned}$$

(10.230)

where

$$\begin{aligned} {\left( {\psi }^{(m)}(1), {\psi }^{(m)}(2), {\cdots }, {\psi }^{(m)}(M) \right) } {\left( \begin{array}{ccccccccc} {\psi }^{(m)}(1) \\ {\psi }^{(m)}(2) \\ {\vdots } \\ {\psi }^{(m)}(M) \\ \end{array} \right) }=1~(m=1,2,{\cdots },M). \end{aligned}$$

(10.231)

By using these eigenvalues and eigenvectors of ${\boldsymbol{H}}$, the density matrix can be expressed as

$$\begin{aligned} {\boldsymbol{{\widehat{R}}}}= & {} {\left( \begin{array}{ccccccccc} {\psi }^{(1)}(1) &{} {\psi }^{(2)}(1) &{} {\cdots } &{} {\psi }^{(M)}(1) \\ {\psi }^{(1)}(2) &{} {\psi }^{(2)}(2) &{} {\cdots } &{} {\psi }^{(M)}(2) \\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ {\psi }^{(1)}(M) &{} {\psi }^{(2)}(M) &{} {\cdots } &{} {\psi }^{(M)}(M) \\ \end{array} \right) } {\left( \begin{array}{ccccccccc} p^{(1)} &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} p^{(2)} &{} {\cdots } &{} 0\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} {\cdots } &{} p^{(M)} \end{array} \right) } {\left( \begin{array}{ccccccccc} {\psi }^{(1)}(1) &{} {\psi }^{(2)}(1) &{} {\cdots } &{} {\psi }^{(M)}(1) \\ {\psi }^{(1)}(2) &{} {\psi }^{(2)}(2) &{} {\cdots } &{} {\psi }^{(M)}(2) \\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ {\psi }^{(1)}(M) &{} {\psi }^{(2)}(M) &{} {\cdots } &{} {\psi }^{(M)}(M) \\ \end{array} \right) }^{\mathrm{{T}}}, \nonumber \\&\end{aligned}$$

(10.232)

where

$$\begin{aligned} p^{(m)} ={\frac{ {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}h^{(m)} \right) } }{ \mathrm{{Tr}}{\left[ {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}h^{(m)} \right) } \right] } } }~(m=1,2,{\cdots },M). \end{aligned}$$

(10.233)

This means that the probability of each state ${\left( \begin{array}{ccccccccc} {\psi }^{(m)}(1) \\ {\psi }^{(m)}(2) \\ {\vdots } \\ {\psi }^{(m)}(M) \\ \end{array} \right) }$ is $p^{(m)}$ for $m=1,2,{\cdots },M$.

4.3 Tensor Products

This section explores tensor products (Kronecker products) [82]. Tensor products include some fundamental mathematical concepts for achieving quantum statistical mechanical extensions of probabilistic graphical models.

We introduce tensor products for matrices and vectors by the following definitions:

$$\begin{aligned} {\left( \begin{array}{ccc} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{array} \right) } {\otimes } {\left( \begin{array}{ccc} B_{11} &{} B_{12} \\ B_{21} &{} B_{22} \end{array} \right) }= & {} {\left( \begin{array}{ccc} A_{11} {\left( \begin{array}{ccc} B_{11} &{} B_{12} \\ B_{21} &{} B_{22} \end{array} \right) } &{} A_{12}{\left( \begin{array}{ccc} B_{11} &{} B_{12} \\ B_{21} &{} B_{22} \end{array} \right) } \\ A_{21} {\left( \begin{array}{ccc} B_{11} &{} B_{12} \\ B_{21} &{} B_{22} \end{array} \right) } &{} A_{22}{\left( \begin{array}{ccc} B_{11} &{} B_{12} \\ B_{21} &{} B_{22} \end{array} \right) } \end{array} \right) } \nonumber \\= & {} {\left( \begin{array}{cccc} A_{11}B_{11} &{} A_{11}B_{12} &{} A_{12}B_{11} &{} A_{12}B_{12} \\ A_{11}B_{21} &{} A_{11}B_{22} &{} A_{12}B_{21} &{} A_{12}B_{22} \\ A_{21}B_{11} &{} A_{21}B_{12} &{} A_{22}B_{11} &{} A_{12}B_{12} \\ A_{21}B_{21} &{} A_{21}B_{22} &{} A_{22}B_{21} &{} A_{12}B_{22} \end{array} \right) }, \end{aligned}$$

(10.234)

$$\begin{aligned} {\left( \begin{array}{ccc} A_{1} \\ A_{2} \end{array} \right) } {\otimes } {\left( \begin{array}{ccc} B_{1} \\ B_{2} \end{array} \right) } = {\left( \begin{array}{ccc} A_{1} {\left( \begin{array}{ccc} B_{1} \\ B_{2} \end{array} \right) } \\ A_{2} {\left( \begin{array}{ccc} B_{1} \\ B_{2} \end{array} \right) } \end{array} \right) } ={\left( \begin{array}{cccc} A_{11}B_{11} \\ A_{11}B_{21} \\ A_{21}B_{11} \\ A_{21}B_{21} \end{array} \right) }. \end{aligned}$$

(10.235)

We remark that

$$\begin{aligned}&{\left( {\left( \begin{array}{ccc} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{array} \right) } {\otimes } {\left( \begin{array}{ccc} B_{11} &{} B_{12} \\ B_{21} &{} B_{22} \end{array} \right) } \right) } {\left( {\left( \begin{array}{ccc} C_{11} &{} C_{12} \\ C_{21} &{} C_{22} \end{array} \right) } {\otimes } {\left( \begin{array}{ccc} D_{11} &{} D_{12} \\ D_{21} &{} D_{22} \end{array} \right) } \right) } \nonumber \\&= {\left( {\left( \begin{array}{ccc} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{array} \right) } {\left( \begin{array}{ccc} C_{11} &{} C_{12} \\ C_{21} &{} C_{22} \end{array} \right) } \right) } {\otimes } {\left( {\left( \begin{array}{ccc} B_{11} &{} B_{12} \\ B_{21} &{} B_{22} \end{array} \right) } {\left( \begin{array}{ccc} D_{11} &{} D_{12} \\ D_{21} &{} D_{22} \end{array} \right) } \right) }. \end{aligned}$$

(10.236)

Moreover, for the following general matrices ${\boldsymbol{A}}$ and ${\boldsymbol{B}}$,

$$\begin{aligned} {\boldsymbol{A}}= {\left( \begin{array}{ccccc} A_{11} &{} A_{12} &{} {\cdots } &{} A_{1M}\\ A_{21} &{} A_{22} &{} {\cdots } &{} A_{2M}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ A_{M1} &{} A_{M2} &{} {\cdots } &{} A_{MM} \end{array} \right) },~ {\boldsymbol{B}}= {\left( \begin{array}{ccccc} B_{11} &{} B_{12} &{} {\cdots } &{} B_{1N}\\ B_{21} &{} B_{22} &{} {\cdots } &{} B_{2N}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ B_{N1} &{} B_{N2} &{} {\cdots } &{} B_{NN} \end{array} \right) }, \end{aligned}$$

(10.237)

we define the tensor product ${\boldsymbol{A}}{\otimes }{\boldsymbol{B}}$ as

$$\begin{aligned}&{\boldsymbol{A}}{\otimes }{\boldsymbol{B}} = {\left( \begin{array}{ccccc} A_{11} &{} A_{12} &{} {\cdots } &{} A_{1M}\\ A_{21} &{} A_{22} &{} {\cdots } &{} A_{2M}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ A_{M1} &{} A_{M2} &{} {\cdots } &{} A_{MM} \end{array} \right) } {\otimes } {\left( \begin{array}{ccccc} B_{11} &{} B_{12} &{} {\cdots } &{} B_{1N}\\ B_{21} &{} B_{22} &{} {\cdots } &{} B_{2N}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ B_{N1} &{} B_{N2} &{} {\cdots } &{} B_{NN} \end{array} \right) } \nonumber \\&= {\left( \begin{array}{ccccc} A_{11}{\boldsymbol{B}} &{} A_{12}{\boldsymbol{B}} &{} {\cdots } &{} A_{1M}{\boldsymbol{B}} \\ A_{21}{\boldsymbol{B}} &{} A_{22}{\boldsymbol{B}} &{} {\cdots } &{} A_{2M}{\boldsymbol{B}} \\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ A_{M1}{\boldsymbol{B}} &{} A_{M2}{\boldsymbol{B}} &{} {\cdots } &{} A_{MM}{\boldsymbol{B}} \end{array} \right) } \nonumber \\&={\left( \begin{array}{cccccccccccccccccccc} A_{11}B_{11} &{} A_{11}B_{12} &{} {\cdots } &{} A_{11}B_{1N} &{} A_{12}B_{11} &{} A_{12}B_{12} &{} {\cdots } &{} A_{12}B_{1N} &{} {\cdots } &{} A_{1M}B_{11} &{} A_{1M}B_{12} &{} {\cdots } &{} A_{1M}B_{1N}\\ A_{11}B_{21} &{} A_{11}B_{22} &{} {\cdots } &{} A_{11}B_{2N} &{} A_{12}B_{21} &{} A_{12}B_{22} &{} {\cdots } &{} A_{12}B_{2N} &{} {\cdots } &{} A_{1M}B_{21} &{} A_{1M}B_{22} &{} {\cdots } &{} A_{1M}B_{2N}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ A_{11}B_{N1} &{} A_{11}B_{N2} &{} {\cdots } &{} A_{11}B_{NN} &{} A_{12}B_{N1} &{} A_{12}B_{N2} &{} {\cdots } &{} A_{12}B_{NN} &{} {\cdots } &{} A_{1M}B_{N1} &{} A_{1M}B_{N2} &{} {\cdots } &{} A_{1M}B_{NN}\\ A_{21}B_{11} &{} A_{21}B_{12} &{} {\cdots } &{} A_{21}B_{1N} &{} A_{22}B_{11} &{} A_{22}B_{12} &{} {\cdots } &{} A_{22}B_{1N} &{} {\cdots } &{} A_{2M}B_{11} &{} A_{2M}B_{12} &{} {\cdots } &{} A_{2M}B_{1N}\\ A_{21}B_{21} &{} A_{21}B_{22} &{} {\cdots } &{} A_{21}B_{2N} &{} A_{22}B_{21} &{} A_{22}B_{22} &{} {\cdots } &{} A_{22}B_{2N} &{} {\cdots } &{} A_{2M}B_{21} &{} A_{2M}B_{22} &{} {\cdots } &{} A_{2M}B_{2N}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ A_{21}B_{N1} &{} A_{21}B_{N2} &{} {\cdots } &{} A_{21}B_{NN} &{} A_{22}B_{N1} &{} A_{22}B_{N2} &{} {\cdots } &{} A_{22}B_{NN} &{} {\cdots } &{} A_{2M}B_{N1} &{} A_{2M}B_{N2} &{} {\cdots } &{} A_{2M}B_{NN}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ A_{M1}B_{11} &{} A_{M1}B_{12} &{} {\cdots } &{} A_{M1}B_{1N} &{} A_{M2}B_{11} &{} A_{M2}B_{12} &{} {\cdots } &{} A_{M2}B_{1N} &{} {\cdots } &{} A_{MM}B_{11} &{} A_{MM}B_{12} &{} {\cdots } &{} A_{MM}B_{1N}\\ A_{M1}B_{21} &{} A_{M1}B_{22} &{} {\cdots } &{} A_{M1}B_{2N} &{} A_{M2}B_{21} &{} A_{M2}B_{22} &{} {\cdots } &{} A_{M2}B_{2N} &{} {\cdots } &{} A_{MM}B_{21} &{} A_{MM}B_{22} &{} {\cdots } &{} A_{MM}B_{2N}\\ {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\ddots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ A_{M1}B_{N1} &{} A_{M1}B_{N2} &{} {\cdots } &{} A_{M1}B_{NN} &{} A_{M2}B_{N1} &{} A_{M2}B_{N2} &{} {\cdots } &{} A_{M2}B_{NN} &{} {\cdots } &{} A_{MM}B_{N1} &{} A_{MM}B_{N2} &{} {\cdots } &{} A_{MM}B_{NN} \end{array} \right) }, \nonumber \\&\end{aligned}$$

(10.238)

Similarly, for vectors

$$\begin{aligned} {\boldsymbol{a}} ={\left( \begin{array}{ccccc} a_{1} \\ a_{2} \\ {\vdots } \\ a_{M} \end{array} \right) }, ~ {\boldsymbol{b}} = {\left( \begin{array}{ccccc} b_{1}\\ b_{2}\\ {\vdots } \\ b_{N} \end{array} \right) }, \end{aligned}$$

(10.239)

the tensor product ${\boldsymbol{a}}{\otimes }{\boldsymbol{b}}$ is defined as

$$\begin{aligned} {\boldsymbol{a}}{\otimes }{\boldsymbol{b}}= {\left( \begin{array}{ccccc} a_{1} \\ a_{2} \\ {\vdots } \\ a_{M} \end{array} \right) } {\otimes } {\left( \begin{array}{ccccc} b_{1}\\ b_{2}\\ {\vdots } \\ b_{N} \end{array} \right) }= {\left( \begin{array}{ccccc} a_{1}{\boldsymbol{b}} \\ a_{2}{\boldsymbol{b}} \\ {\vdots } \\ a_{M}{\boldsymbol{b}} \end{array} \right) } = {\left( \begin{array}{ccccc} a_{1} {\left( \begin{array}{ccccc} b_{1}\\ b_{2}\\ {\vdots } \\ b_{N} \end{array} \right) } \\ a_{2} {\left( \begin{array}{ccccc} b_{1}\\ b_{2}\\ {\vdots } \\ b_{N} \end{array} \right) } \\ {\vdots } \\ a_{M} {\left( \begin{array}{ccccc} b_{1}\\ b_{2}\\ {\vdots } \\ b_{N} \end{array} \right) } \end{array} \right) } ={\left( \begin{array}{cccccccccccccccccccc} a_{1}b_{1} \\ a_{1}b_{2} \\ {\vdots } \\ a_{1}b_{N} \\ a_{2}b_{1} \\ a_{2}b_{2} \\ {\vdots } \\ a_{2}b_{N} \\ {\vdots } \\ a_{M}b_{1} \\ a_{M}b_{2} \\ {\vdots } \\ a_{M}b_{N} \end{array} \right) }, \end{aligned}$$

(10.240)

$$\begin{aligned}&{\boldsymbol{a}}^\mathrm{{T}}{\otimes }{\boldsymbol{b}}^\mathrm{{T}}= {\left( a_{1},a_{2},{\cdots },a_{M} \right) } {\otimes } {\left( b_{1},b_{2},{\cdots },b_{N} \right) }= {\left( a_{1}{\boldsymbol{b}}^\mathrm{{T}},a_{2}{\boldsymbol{b}}^\mathrm{{T}},{\cdots },a_{M}{\boldsymbol{b}}^\mathrm{{T}} \right) } \nonumber \\&\quad{} ={\big (} a_{1}b_{1}, a_{1}b_{2}, {\cdots }, a_{1}b_{N}, a_{2}b_{1}, a_{2}b_{2}, {\cdots }, a_{2}b_{N}, {\cdots }, a_{M}b_{1}, a_{M}b_{2}, {\cdots }, a_{M}b_{N} {\big )}. \nonumber \\&\end{aligned}$$

(10.241)

We introduce the following two-dimensional fundamental vectors:

$$\begin{aligned} |1{\rangle } \equiv {\left( \begin{array}{ccccc} 1\\ 0 \end{array} \right) },~ |2{\rangle } \equiv {\left( \begin{array}{ccccc} 0\\ 1 \end{array} \right) }, \end{aligned}$$

(10.242)

$$\begin{aligned} |1{\rangle } \equiv {\left( 1, 0 \right) },~ {\langle }2| \equiv {\left( 0, 1 \right) }. \end{aligned}$$

(10.243)

By using the fundamental vectors in two-dimensional space, we define the vertical and horizontal fundamental vectors in four-dimensional space by using the tensor product as follows:

$$\begin{aligned} \left\{ \begin{array}{ll} |1,1{\rangle } \equiv |1{\rangle }{\otimes }|1{\rangle } = {\left( \begin{array}{ccccc} 1\\ 0\\ 0\\ 0 \end{array} \right) }, &{} |1,2{\rangle } \equiv |1{\rangle }{\otimes }|2{\rangle } = {\left( \begin{array}{ccccc} 0\\ 1\\ 0\\ 0 \end{array} \right) },\\ |2,1{\rangle } \equiv |2{\rangle }{\otimes }|1{\rangle } = {\left( \begin{array}{ccccc} 0\\ 0\\ 1\\ 0 \end{array} \right) }, &{} |2,2{\rangle } \equiv |2{\rangle }{\otimes }|2{\rangle } = {\left( \begin{array}{ccccc} 0\\ 0\\ 0\\ 1 \end{array} \right) }, \\ \end{array} \right. \end{aligned}$$

(10.244)

$$\begin{aligned} \left\{ \begin{array}{ll} {\langle }1,1| \equiv {\langle }1|{\otimes }{\langle }1| = {\left( 1, 0, 0, 0 \right) }, &{} {\langle }1,2| \equiv {\langle }1|{\otimes }{\langle }2| = {\left( 0, 1, 0, 0 \right) },\\ &{} \\ {\langle }2,1| \equiv {\langle }2|{\otimes }{\langle }1| = {\left( 0, 0, 1, 0 \right) }, &{} {\langle }2,2| \equiv {\langle }2|{\otimes }{\langle }2| = {\left( 0, 0, 0, 1 \right) }, \end{array} \right. \end{aligned}$$

(10.245)

It is easy to confirm the following equality:

$$\begin{aligned} {\langle }i,j| {\left( \begin{array}{ccc} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{array} \right) } {\otimes } {\left( \begin{array}{ccc} B_{11} &{} B_{12} \\ B_{21} &{} B_{22} \end{array} \right) } |i',j'{\rangle } = A_{i,i'}B_{j,j'}. \end{aligned}$$

(10.246)

By extending the above example to general-dimensional fundamental vectors, the $(i,j|i',j')$-components of ${\boldsymbol{A}}{\times }{\boldsymbol{B}}$ for any $M{\times }M$ matrix ${\boldsymbol{A}}$ and $N{\times }N$ matrix ${\boldsymbol{B}}$ are expressed as

$$\begin{aligned} {\langle }i,j|{\boldsymbol{A}}{\otimes }{\boldsymbol{B}}|i',j'{\rangle } ={\langle }i|{\boldsymbol{A}}|i'{\rangle }{\langle }j|{\boldsymbol{B}}|j'{\rangle } =A_{i,i'}B_{j,j'}. \end{aligned}$$

(10.247)

For $M{\times }M$ matrices ${\boldsymbol{A}}$ and ${\boldsymbol{C}}$ and $N{\times }N$ matrices ${\boldsymbol{B}}$ and ${\boldsymbol{D}}$, we have

$$\begin{aligned} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{B}}\right) } {\left( {\boldsymbol{C}}{\otimes }{\boldsymbol{D}}\right) } = {\left( {\boldsymbol{A}}{\boldsymbol{C}}\right) } {\otimes } {\left( {\boldsymbol{B}}{\boldsymbol{D}}\right) }, \end{aligned}$$

(10.248)

and

$$\begin{aligned} \mathrm{{Tr}}{\left[ {\boldsymbol{A}}{\otimes }{\boldsymbol{B}}\right] } ={\left( \mathrm{{Tr}}{\left[ {\boldsymbol{A}}\right] }\right) }{\left( \mathrm{{Tr}}{\left[ {\boldsymbol{B}}\right] }\right) }. \end{aligned}$$

(10.249)

In deriving the equality in Eq. (10.248), the $(i,j|i',j')$-components of the $MN{\times }MN$ matrix ${\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{B}}\right) }{\left( {\boldsymbol{C}}{\otimes }{\boldsymbol{D}}\right) }$ are given by

$$\begin{aligned} {\langle }i,j|{\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{B}}\right) } {\left( {\boldsymbol{C}}{\otimes }{\boldsymbol{D}}\right) }|i',j'{\rangle }= {} {\sum _{i''=1}^{M}}{\sum _{j''=1}^{N}} {\langle }i,j|{\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{B}}\right) } |i'',j''{\rangle } {\langle }i'',j''| {\left( {\boldsymbol{C}}{\otimes }{\boldsymbol{D}}\right) }|i',j'{\rangle } \nonumber \\&\qquad{} {\big (}i{\in }\{1,2,{\cdots },M\},~ i'{\in }\{1,2,{\cdots },M\},~ j{\in }\{1,2,{\cdots },N\},~ j'{\in }\{1,2,{\cdots },N\}{\big )}. \nonumber \\&\end{aligned}$$

(10.250)

For the $M{\times }M$ and $N{\times }N$ identity matrices ${\boldsymbol{I^{(M)}}}$ and ${\boldsymbol{I^{(N)}}}$, it is valid that

$$\begin{aligned} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} \right) } {\left( {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) } = {\left( {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) } {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} \right) } ={\boldsymbol{A}}{\otimes }{\boldsymbol{B}}. \end{aligned}$$

(10.251)

Moreover, by using mathematical induction, we can confirm the following binomial expansion:

$$\begin{aligned} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} + {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }^{n} = {\sum _{k=0}^{n}}{\frac{n!}{k!(n-k)!}} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} \right) }^{k} {\left( {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }^{n-k}. \end{aligned}$$

(10.252)

By using Eq. (10.252), we can derive the following equality:

$$\begin{aligned} {\exp } {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} + {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }= & {} {\sum _{n=0}^{+{\infty }}} {\frac{1}{n!}} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} + {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }^{n} \nonumber \\= & {} {\sum _{n=0}^{+{\infty }}} {\frac{1}{n!}} {\sum _{k=0}^{n}}{\frac{n!}{k!(n-k)!}} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} \right) }^{k} {\left( {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }^{n-k} \nonumber \\= & {} {\sum _{k=0}^{+{\infty }}} {\sum _{n=k}^{+{\infty }}} {\frac{1}{n!}} {\frac{n!}{k!(n-k)!}} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} \right) }^{k} {\left( {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }^{n-k} \nonumber \\= & {} {\sum _{k=0}^{+{\infty }}} {\sum _{n=k}^{+{\infty }}} {\frac{1}{k!(n-k)!}} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} \right) }^{k} {\left( {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }^{n-k} \nonumber \\= & {} {\sum _{k=0}^{+{\infty }}} {\sum _{l=0}^{+{\infty }}} {\frac{1}{k!l!}} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} \right) }^{k} {\left( {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }^{l} \nonumber \\= & {} {\left( {\sum _{k=0}^{+{\infty }}} {\frac{1}{k!}} {\left( {\boldsymbol{A}}{\otimes }{\boldsymbol{I^{(N)}}} \right) }^{k} \right) } {\left( {\sum _{l=0}^{+{\infty }}} {\frac{1}{l!}} {\left( {\boldsymbol{I^{(M)}}}{\otimes }{\boldsymbol{B}} \right) }^{l} \right) } \nonumber \\= & {} {\left( {\sum _{k=0}^{+{\infty }}} {\frac{1}{k!}} {\left( {\boldsymbol{A}}^{k} {\otimes }{\boldsymbol{I^{(N)}}} \right) } \right) } {\left( {\sum _{l=0}^{+{\infty }}} {\frac{1}{l!}} {\left( {\boldsymbol{I^{(M)}}}{\otimes } {\boldsymbol{B}}^{l} \right) } \right) } \nonumber \\= & {} {\left( {\left( {\sum _{k=0}^{+{\infty }}} {\frac{1}{k!}} {\boldsymbol{A}}^{k} \right) } {\otimes }{\boldsymbol{I^{(N)}}} \right) } {\left( {\boldsymbol{I^{(M)}}}{\otimes } {\left( {\sum _{l=0}^{+{\infty }}} {\frac{1}{l!}} {\boldsymbol{B}}^{l} \right) } \right) } \nonumber \\= & {} {\left( {\exp }{\left( {\boldsymbol{A}} \right) } {\otimes }{\boldsymbol{I^{(N)}}} \right) } {\left( {\boldsymbol{I^{(M)}}}{\otimes } {\exp }{\left( {\boldsymbol{B}} \right) } \right) } \nonumber \\= & {} {\exp }{\left( {\boldsymbol{A}} \right) } {\otimes } {\exp }{\left( {\boldsymbol{B}} \right) }. \end{aligned}$$

(10.253)

By taking the logarithm of both sides of Eq. (10.253), we have

$$\begin{aligned} {\ln }{\left( {\exp }{\left( {\boldsymbol{A}}\right) }\right) }{\otimes }{\boldsymbol{I^{(N)}}} + {\boldsymbol{I^{(M)}}}{\otimes }{\ln }{\left( {\exp }{\left( {\boldsymbol{B}}\right) }\right) } = {\ln } {\left( {\exp }{\left( {\boldsymbol{A}} \right) } {\otimes } {\exp }{\left( {\boldsymbol{B}} \right) } \right) }. \end{aligned}$$

(10.254)

4.4 Quantum Probabilistic Graphical Models and Quantum Expectation-Maximization Algorithm

This section explores a type of probabilistic graphical modeling based on Pauli spin matrices from the quantum statistical mechanical point of view. Our review focuses on the transverse Ising model in statistical mechanical informatics [37, 83, 84]. Note that generalization of the framework is possible.

Consider a graph specified by nodes and edges (V, E) where V is the set of all nodes i and E is the set of all edges $\{i,j\}$. We introduce Pauli spin matrices ${\boldsymbol{{\sigma }^{z}}}$ and ${\boldsymbol{{\sigma }^{x}}}$ as well as an identity matrix ${\boldsymbol{I}}$ defined by

$$\begin{aligned} {\boldsymbol{{\sigma }^{z}}}={\left( \begin{array}{ccc} +1 &{} 0 \\ 0 &{} -1 \end{array} \right) },~ {\boldsymbol{{\sigma }^{x}}}={\left( \begin{array}{ccc} 0 &{} +1 \\ +1 &{} 0 \end{array} \right) },~ {\boldsymbol{I}}={\left( \begin{array}{ccc} +1 &{} 0 \\ 0 &{} +1 \end{array} \right) }. \end{aligned}$$

(10.255)

The Pauli spin matrices at each node $i{\in }V{\equiv }\{1,2,{\cdots },N\}$ are defined by

$$\begin{aligned} \left\{ \begin{array}{ccccccccccc} {\boldsymbol{{\sigma }_{1}^{x}}} &{} \equiv &{} {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}, &{} {\boldsymbol{{\sigma }_{1}^{y}}} &{} \equiv &{} {\boldsymbol{{\sigma }^{y}}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}, &{} {\boldsymbol{{\sigma }_{1}^{z}}} &{} \equiv &{} {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}},\\ {\boldsymbol{{\sigma }_{2}^{x}}} &{} \equiv &{} {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}, &{} {\boldsymbol{{\sigma }_{2}^{y}}} &{} \equiv &{} {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{y}}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}, &{} {\boldsymbol{{\sigma }_{2}^{z}}} &{} \equiv &{} {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}},\\ &{} {\vdots } &{} &{} {\vdots } &{} &{} {\vdots } &{} \\ {\boldsymbol{{\sigma }_{|V|}^{x}}} &{} \equiv &{} {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}, &{} {\boldsymbol{{\sigma }_{|V|}^{y}}} &{} \equiv &{} {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{y}}}, &{} {\boldsymbol{{\sigma }_{|V|}^{z}}} &{} \equiv &{} {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}. \end{array} \right. \nonumber \\&\end{aligned}$$

(10.256)

The vertical and horizontal N-dimensional state vectors are defined by

$$\begin{aligned} |s_{1},s_{2},{\cdots },s_{|V|}{\rangle } \equiv |s_{1}{\rangle }{\otimes }|s_{2}{\rangle }{\otimes }{\cdots }{\otimes }|s_{|V|}{\rangle } ~(s_{1}{\in }{\Omega },s_{2}{\in }{\Omega },{\cdots },s_{|V|}{\in }{\Omega }), \end{aligned}$$

(10.257)

$$\begin{aligned} {\langle }s_{1},s_{2},{\cdots },s_{|V|}| \equiv {\langle }s_{1}|{\otimes }{\langle }s_{2}|{\otimes }{\cdots }{\otimes }{\langle }s_{|V|}| ~(s_{1}{\in }{\Omega },s_{2}{\in }{\Omega },{\cdots },s_{|V|}{\in }{\Omega }), \end{aligned}$$

(10.258)

where

$$\begin{aligned} {\langle }+1| \equiv {\left( 1,0\right) },~ {\langle }-1| \equiv {\left( 0,1\right) },~ |+1{\rangle } \equiv {\left( \begin{array}{ccccccccc} 1 \\ 0 \end{array} \right) },~ |-1{\rangle } \equiv {\left( \begin{array}{ccccccccc} 0 \\ 1 \end{array} \right) }. \end{aligned}$$

(10.259)

By using the state vector representations, $(s_{1},s_{2},{\cdots },s_{|V|}|s'_{1},s'_{2},{\cdots },s'_{|V|})$-elements of ${\boldsymbol{{\sigma }_{i}^{x}}}$, ${\boldsymbol{{\sigma }_{j}^{z}}}$ and ${\boldsymbol{{\sigma }_{i}^{z}}}{\boldsymbol{{\sigma }_{j}^{z}}}$ are given as

$$\begin{aligned}&{\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{{\sigma }_{i}^{x}}} |s'_{1},s'_{2},{\cdots },s'_{|V|}{\rangle } = {\left( {\prod _{k{\in }V{\setminus }\{i\}}}{\delta }_{s_{k},s'_{k}}\right) } {\langle }s_{i}|{\boldsymbol{{\sigma }^{x}}}|s'_{i}{\rangle } \nonumber \\&\qquad{} (s_{1}{\in }{\Omega },s_{2}{\in }{\Omega },{\cdots },s_{|V|}{\in }{\Omega };~s'_{1}{\in }{\Omega },s'_{2}{\in }{\Omega },{\cdots },s'_{|V|}{\in }{\Omega }), \end{aligned}$$

(10.260)

$$\begin{aligned}&{\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{{\sigma }_{i}^{z}}} |s'_{1},s'_{2},{\cdots },s'_{|V|}{\rangle } = {\left( {\prod _{k{\in }V{\setminus }\{i\}}}{\delta }_{s_{k},s'_{k}}\right) } {\langle }s_{i}|{\boldsymbol{{\sigma }^{z}}}|s'_{i}{\rangle } \nonumber \\&\qquad{} (s_{1}{\in }{\Omega },s_{2}{\in }{\Omega },{\cdots },s_{|V|}{\in }{\Omega };~s'_{1}{\in }{\Omega },s'_{2}{\in }{\Omega },{\cdots },s'_{|V|}{\in }{\Omega }), \end{aligned}$$

(10.261)

$$\begin{aligned}&{\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{{\sigma }_{i}^{z}}}{\boldsymbol{{\sigma }_{j}^{z}}} |s'_{1},s'_{2},{\cdots },s'_{|V|}{\rangle } = {\left( {\prod _{k{\in }V{\setminus }\{i,j\}}}{\delta }_{s_{k},s'_{k}}\right) } {\langle }s_{i},s_{j}|({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}})|s'_{i},s'_{j}{\rangle } \nonumber \\&\qquad{} (s_{1}{\in }{\Omega },s_{2}{\in }{\Omega },{\cdots },s_{|V|}{\in }{\Omega };~s'_{1}{\in }{\Omega },s'_{2}{\in }{\Omega },{\cdots },s'_{|V|}{\in }{\Omega }). \end{aligned}$$

(10.262)

The prior density matrix ${\boldsymbol{P}}({\alpha },{\gamma })$ and the data generative density matrix ${\boldsymbol{P}}({\boldsymbol{d}}|{\beta })$ for a given data vector ${\boldsymbol{d}}$ are assumed to be

$$\begin{aligned} {\boldsymbol{P}}({\alpha },{\gamma })= {\frac{ {\exp }{\Big (} {\displaystyle { -{\frac{1}{2}}{\alpha }{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} +{\gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} {\Big )} }{ \mathrm{{Tr}}{\Big [} {\exp }{\Big (} {\displaystyle { -{\alpha }{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} +{\gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} {\Big )} {\Big ]} } }, \end{aligned}$$

(10.263)

$$\begin{aligned} {\boldsymbol{P}}({\boldsymbol{d}}|{\beta })= {\left( {\displaystyle { {\sqrt{{\frac{{\beta }}{2{\pi }}}}} }} \right) }^{|V|} {\exp }{\left( {\displaystyle { -{\frac{1}{2}}{\beta }{\sum _{i{\in }V}}{\left( d_{i}{\boldsymbol{I^{(2^{|V|})}}}-{\boldsymbol{{\sigma }_{i}^{z}}}\right) }^{2} }} \right) }, \end{aligned}$$

(10.264)

where ${\alpha }$, ${\beta }$, and ${\gamma }$ are hyperparameters. The data generative density matrix ${\boldsymbol{P}}({\boldsymbol{d}}|{\beta })$ is expressed as a $|{\Omega }|^{|V|}{\times }|{\Omega }|^{|V|}$ diagonal matrix in which all the off-diagonal elements are zero. Each diagonal element ${\langle }s_{1},s_{2},{\cdots },s_{|V|}|{\boldsymbol{P}}({\boldsymbol{d}}|{\beta })|s_{1},s_{2},{\cdots },s_{|V|}{\rangle }$ ($(s_{1},s_{2},{\cdots },s_{|V|})^\mathrm{{T}}{\in }{\Omega }^{|V|}$) corresponds to the probability of the data vector ${\boldsymbol{d}}$ according to additive white Gaussian noise when the state vector $(s_{1},s_{2},{\cdots },s_{|V|})$ is given, and ${\beta }$ corresponds to the inverse of variance in the additive white Gaussian noise. By considering a quantum statistical mechanical extension of the Bayes formula, a posterior density matrix ${\boldsymbol{P}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$ and a joint density matrix ${\boldsymbol{P}}({\boldsymbol{d}}|{\alpha },{\beta },{\gamma })$ can be expressed as follows:

$$\begin{aligned} {\boldsymbol{P}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })\equiv & {} {\frac{ {\exp }{\left( {\ln }{\Big (} {\boldsymbol{P}}{\left( {\boldsymbol{d}}|{\beta } \right) } {\Big )} +{\ln }{\Big (} {\boldsymbol{P}}{\left( {\alpha },{\gamma } \right) } {\Big )} \right) } }{ \mathrm{{Tr}}{\Big [} {\exp }{\left( {\ln }{\Big (} {\boldsymbol{P}}{\left( {\boldsymbol{d}}|{\beta } \right) } {\Big )} +{\ln }{\Big (} {\boldsymbol{P}}{\left( {\alpha },{\gamma } \right) } {\Big )} \right) } {\Big ]} } } \nonumber \\= & {} {\frac{ {\exp }{\Big (} {\displaystyle { -{\frac{1}{2}}{\alpha }{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} -{\frac{1}{2}}{\beta }{\sum _{i{\in }V}}{\left( d_{i}{\boldsymbol{I^{(2^{|V|})}}}-{\boldsymbol{{\sigma }_{i}^{z}}}\right) }^{2} +{\gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} {\Big )} }{ \mathrm{{Tr}}{\Big [} {\exp }{\Big (} {\displaystyle { -{\frac{1}{2}}{\alpha }{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} -{\frac{1}{2}}{\beta }{\sum _{i{\in }V}}{\left( d_{i}{\boldsymbol{I^{(2^{|V|})}}}-{\boldsymbol{{\sigma }_{i}^{z}}}\right) }^{2} +{\gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} {\Big )} {\Big ]} } }, \nonumber \\&\end{aligned}$$

(10.265)

$$\begin{aligned} {\boldsymbol{P}}{\left( {\boldsymbol{d}} {\big |} {\alpha },{\beta },{\gamma } \right) }\equiv & {} {\exp }{\left( {\ln }{\Big (} {\boldsymbol{P}}{\left( {\boldsymbol{d}}|{\beta } \right) } {\big )} +{\ln }{\big (} {\boldsymbol{P}}{\left( {\alpha },{\gamma } \right) } {\Big )} \right) } \nonumber \\= & {} {\frac{ {\exp }{\Big (} {\displaystyle { -{\frac{1}{2}}{\alpha }{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} +{\frac{1}{2}}{\beta }{\sum _{i{\in }V}}(d_{i}{\boldsymbol{I^{(2^{|V|})}}}-{\boldsymbol{{\sigma }_{i}^{z}}})^{2} +{\gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} {\Big )} }{ {\left( {\displaystyle { {\sqrt{{\frac{2{\pi }}{{\beta }}}}} }} \right) }^{|V|} \mathrm{{Tr}}{\Big [} {\exp }{\Big (} {\displaystyle { -{\frac{1}{2}}{\alpha }{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} +{\gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} {\Big )} {\Big ]} } }. \nonumber \\&\end{aligned}$$

(10.266)

The estimates of the states of the hyperparameters ${\left( {\widehat{{\alpha }}}({\boldsymbol{d}}),{\widehat{{\beta }}}({\boldsymbol{d}}),{\widehat{{\gamma }}}({\boldsymbol{d}}) \right) }$ are found that maximize the marginal likelihood $\mathrm{{Tr}}{\Big [} {\boldsymbol{P}}{\big (} {\boldsymbol{d}} {\big |} {\alpha },{\beta },{\gamma } {\big )} {\Big ]}$ as follows:

$$\begin{aligned} {\left( {\widehat{{\alpha }}}({\boldsymbol{d}}),{\widehat{{\beta }}}({\boldsymbol{d}}),{\widehat{{\gamma }}}({\boldsymbol{d}}) \right) } \equiv {\arg }{\max _{{\left( {\alpha },{\beta },{\gamma } \right) }}} \mathrm{{Tr}}{\Big [} {\boldsymbol{P}}{\big (} {\boldsymbol{d}} {\big |} {\alpha },{\beta },{\gamma } {\big )} {\Big ]}. \end{aligned}$$

(10.267)

To achieve the estimation criteria for hyperparameters ${\alpha }$, ${\beta }$, and ${\gamma }$ in Eqs. (10.267), we extend the $\mathcal{{Q}}$-function in Eq. (10.8) to the following expression from a quantum statistical mechanical point of view:

$$\begin{aligned} \mathcal{{Q}}{\Big (} {\alpha },{\beta },{\gamma } {\Big |}{\alpha }',{\beta }',{\gamma }',{\boldsymbol{d}} {\Big )}\equiv & {} \mathrm{{Tr}}{\Big [} {\boldsymbol{P}}{\big (}{\boldsymbol{d}},{\alpha }',{\beta }',{\gamma }'{\big )} {\ln } {\Big (} {\boldsymbol{P}}{\big (} {\boldsymbol{d}} {\big |} {\alpha },{\beta },{\gamma } {\big )} {\Big )}{\Big ]}. \end{aligned}$$

(10.268)

The quantum EM algorithm can be summarized as a procedure consisting of the following E- and M-step which are repeated for $t=0,1,2,{\cdots }$ until ${\widehat{\alpha }}$ and ${\widehat{\beta }}$ converge:

E-step::: Compute $\mathcal{{Q}}{\big (}{\alpha },{\beta }{\big |}{\alpha }(t),{\beta }(t),{\boldsymbol{d}}{\big )}$ for various values of ${\alpha }$ and ${\beta }$.
M-step::: Determine ${\big (}{\alpha }(t+1),{\beta }(t+1){\big )}$ so as to satisfy the extremum of conditions of $\mathcal{{Q}}{\big (}{\alpha },{\beta }{\big |}{\alpha }(t),{\beta }(t),{\boldsymbol{d}}{\big )}$ with respect to ${\alpha }$ and ${\beta }$. Update ${\widehat{\alpha }}{\leftarrow }{\alpha }(t+1)$ and ${\widehat{\beta }}{\leftarrow }{\beta }(t+1)$.

The quantum EM algorithm can obtain the solution of the extremum condition in the marginal likelihood $\mathrm{{Tr}}{\Big [} {\boldsymbol{P}}{\big (} {\boldsymbol{d}} {\big |} {\alpha },{\beta },{\gamma } {\big )} {\Big ]}$, because we have the following equalities:

$$\begin{aligned} {\left\{ \begin{array}{ccc} {\left[ {\frac{{\partial }}{{\partial }{\alpha }}} \mathcal{{Q}}{\left( {\alpha },{\beta },{\gamma } {\Big |}{\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}},{\boldsymbol{d}} \right) } \right] }_{({\alpha },{\beta },{\gamma })=({\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}})} ={\left[ {\frac{{\partial }}{{\partial }{\alpha }}}{\ln }{\left( \mathrm{{Tr}}{\left[ {\boldsymbol{P}}{\left( {\boldsymbol{d}} {\big |} {\alpha },{\beta },{\gamma }\right) } \right] } \right) } \right] }_{({\alpha },{\beta },{\gamma })=({\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}})}, \\ {\left[ {\frac{{\partial }}{{\partial }{\beta }}} \mathcal{{Q}}{\left( {\alpha },{\beta },{\gamma } {\Big |}{\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}},{\boldsymbol{d}} \right) } \right] }_{({\alpha },{\beta },{\gamma })=({\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}})} ={\left[ {\frac{{\partial }}{{\partial }{\beta }}}{\ln }{\left( \mathrm{{Tr}}{\left[ {\boldsymbol{P}}{\left( {\boldsymbol{d}} {\big |} {\alpha },{\beta },{\gamma } \right) } \right] } \right) } \right] }_{({\alpha },{\beta },{\gamma })=({\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}})}, \\ {\left[ {\frac{{\partial }}{{\partial }{\gamma }}} \mathcal{{Q}}{\left( {\alpha },{\beta },{\gamma } {\Big |}{\widehat{{\gamma }}},{\widehat{{\beta }}},{\widehat{{\gamma }}},{\boldsymbol{d}} \right) } \right] }_{({\alpha },{\beta },{\gamma })=({\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}})} ={\left[ {\frac{{\partial }}{{\partial }{\alpha }}}{\ln }{\left( \mathrm{{Tr}}{\left[ {\boldsymbol{P}}{\left( {\boldsymbol{d}} {\big |} {\alpha },{\beta },{\gamma } \right) } \right] } \right) } \right] }_{({\alpha },{\beta },{\gamma })=({\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}})}. \\ \end{array} \right. } \nonumber \\ \end{aligned}$$

(10.269)

By substituting Eq. (10.265) into Eq. (10.268), the $\mathcal{{Q}}$-function can be rewritten as follows:

$$\begin{aligned} \mathcal{{Q}}{\big (} {\alpha },{\beta },{\gamma } {\big |}{\alpha }',{\beta }',{\gamma }',{\boldsymbol{d}} {\big )}= & {} -{\frac{1}{2}}{\alpha }{\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}} {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}-{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }^{2}{\boldsymbol{P_{ij}}}({\boldsymbol{d}},{\alpha }',{\beta }',{\gamma }') \nonumber \\&-{\frac{1}{2}}{\beta }{\sum _{i{\in }V}}\mathrm{{Tr}}{\left[ d_{i}{\boldsymbol{I}}-{\boldsymbol{{\sigma }^{z}}})^{2}{\boldsymbol{P_{i}}}({\boldsymbol{d}},{\alpha }',{\beta }',{\gamma }')\right] } \nonumber \\&+{\gamma }{\sum _{i{\in }V}}\mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }_{i}^{x}}}{\boldsymbol{P_{i}}}({\boldsymbol{d}},{\alpha }',{\beta }',{\gamma }')\right] } \nonumber \\&+ |V| {\ln } {\left( {\displaystyle { {\sqrt{{\frac{2{\pi }}{{\beta }}}}} }} \right) } \nonumber \\&+{\ln }{\Big (} \mathrm{{Tr}}{\Big [} {\exp }{\Big (} {\displaystyle { -{\frac{1}{2}}{\alpha }{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}-{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }^{2} +{\gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} {\Big )} {\Big ]} {\big )}. \nonumber \\&\end{aligned}$$

(10.270)

The extremum conditions of $\mathcal{{Q}}{\left( {\alpha },{\beta },{\gamma } {\Big |}{\alpha }(t),{\beta }(t),{\gamma }(t),{\boldsymbol{d}} \right) }$ with respect to ${\alpha }$, ${\beta }$ and ${\gamma }$, such that,

$$\begin{aligned} {\left\{ \begin{array}{lll} {\frac{{\partial }}{{\partial }{\alpha }}} \mathcal{{Q}}{\left( {\alpha },{\beta },{\gamma } {\big |}{\alpha }(t),{\beta }(t),{\gamma }(t),{\boldsymbol{d}} \right) }=0, \\ {\frac{{\partial }}{{\partial }{\beta }}} \mathcal{{Q}}{\left( {\alpha },{\beta },{\gamma } {\big |}{\alpha }(t),{\beta }(t),{\gamma }(t),{\boldsymbol{d}} \right) }=0, \\ {\frac{{\partial }}{{\partial }{\gamma }}} \mathcal{{Q}}{\left( {\alpha },{\beta },{\gamma } {\big |}{\alpha }(t),{\beta }(t),{\gamma }(t),{\boldsymbol{d}} \right) }=0 \\ \end{array} \right. } \end{aligned}$$

(10.271)

can be reduced to the following simutaneous update rules in the quantum EM algorithm:

$$\begin{aligned}&{\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}-{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }^{2}{\boldsymbol{P}}_{ij}({\alpha }(t+1),{\gamma }(t+1))\right] } \nonumber \\&= {\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}-{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }^{2}{\boldsymbol{P}}_{ij}({\boldsymbol{d}},{\alpha }(t),{\beta }(t),{\gamma }(t))\right] }, \nonumber \\ \end{aligned}$$

(10.272)

$$\begin{aligned} {\frac{1}{{\beta }(t+1)}}= {\sum _{i{\in }V}} \mathrm{{Tr}}{\left[ (d_{i}{\boldsymbol{I}}-{\boldsymbol{{\sigma }^{z}}})^{2}{\boldsymbol{P}}_{i}({\boldsymbol{d}},{\alpha }(t),{\beta }(t),{\gamma }(t))\right] }, \end{aligned}$$

(10.273)

$$\begin{aligned} {\sum _{i{\in }V}} \mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{x}}}{\boldsymbol{P}}_{i}({\alpha }(t+1),{\gamma }(t+1))\right] } = {\sum _{i{\in }V}} \mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{x}}}{\boldsymbol{P}}_{i}({\boldsymbol{d}},{\alpha }(t),{\beta }(t),{\gamma }(t))\right] }, \end{aligned}$$

(10.274)

where

$$\begin{aligned} {\langle }s_{i}|{\boldsymbol{P_{i}}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })|s'_{i}{\rangle }= & {} {\langle }s_{i}|\mathrm{{Tr}}_{{\setminus }i}{\boldsymbol{P}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })|s'_{i}{\rangle } \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s'_{i},{\tau }'_{i}} \nonumber \\&{\times } {\left( {\prod _{j{\in }V{\setminus }\{i\}}}{\delta }_{{\tau }_{j},{\tau }'_{j}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{P}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma }) |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&(s_{i}{\in }{\Omega },~s'_{i}{\in }{\Omega }.~i{\in }V), \end{aligned}$$

(10.275)

$$\begin{aligned} {\langle }s_{i},s_{j}|{\boldsymbol{P_{ij}}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })|s'_{i},s'_{j}{\rangle }= & {} {\langle }s_{i},s_{j}|{\boldsymbol{P_{ji}}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })|s'_{i},s'_{j}{\rangle } \nonumber \\= & {} {\langle }s_{i},s_{j}|\mathrm{{Tr}}_{{\setminus }\{i,j\}}{\boldsymbol{P}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })|s'_{i},s'_{j}{\rangle } \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}}{\delta }_{s'_{i},{\tau }'_{i}}{\delta }_{s'_{j},{\tau }'_{j}} \nonumber \\&{\times } {\left( {\prod _{k{\in }V{\setminus }\{i,j\}}}{\delta }_{{\tau }_{k},{\tau }'_{k}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{P}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma }) |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&(s_{i}{\in }{\Omega }~s_{j}{\in }{\Omega },~s'_{i}{\in }{\Omega },~s'_{j}{\in }{\Omega },~i{\in }V,~j{\in }V,~i<j), \end{aligned}$$

(10.276)

$$\begin{aligned} {\langle }s_{i},s_{j}|{\boldsymbol{P_{ij}}}({\alpha },{\gamma })|s'_{i},s'_{j}{\rangle }= & {} {\langle }s_{i},s_{j}|{\boldsymbol{P_{ji}}}({\alpha },{\gamma })|s'_{i},s'_{j}{\rangle } \nonumber \\= & {} {\langle }s_{i},s_{j}|\mathrm{{Tr}}_{{\setminus }\{i,j\}}{\boldsymbol{P}}({\alpha },{\gamma })|s'_{i},s'_{j}{\rangle } \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}}{\delta }_{s'_{i},{\tau }'_{i}}{\delta }_{s'_{j},{\tau }'_{j}} \nonumber \\&{\times } {\left( {\prod _{k{\in }V{\setminus }\{i,j\}}}{\delta }_{{\tau }_{k},{\tau }'_{k}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{P}}({\alpha },{\gamma }) |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&(s_{i}{\in }{\Omega }~s_{j}{\in }{\Omega },~s'_{i}{\in }{\Omega },~s'_{j}{\in }{\Omega },~i{\in }V,~j{\in }V,~i<j). \end{aligned}$$

(10.277)

Finally, we explain how the state at each node is estimated from the reduced posterior density matrix ${\boldsymbol{P_{i}}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$ in Eq. (10.276) for each node $i({\in }V)$. The reduced posterior density matrix ${\boldsymbol{P_{i}}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$ is a real symmetric matrix and can be diagonalized as

$$\begin{aligned} {\boldsymbol{P_{i}}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })= & {} {\left( \begin{array}{ccccccccc} {\psi }_{i}^{(1)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) &{} {\psi }_{i}^{(2)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ {\psi }_{i}^{(1)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) &{} {\psi }_{i}^{(2)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) } \nonumber \\&{\times } {\left( \begin{array}{ccccccccc} P_{i}^{(1)}({\boldsymbol{d}},{\alpha },{\beta },{\gamma }) &{} 0 \\ 0 &{} P_{i}^{(2)}({\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) } \nonumber \\&{\times } {\left( \begin{array}{ccccccccc} {\psi }_{i}^{(1)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) &{} {\psi }_{i}^{(2)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ {\psi }_{i}^{(1)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) &{} {\psi }_{i}^{(2)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) }^{\mathrm{{T}}}, \end{aligned}$$

(10.278)

where the eigenvalues, $P_{i}^{(1)}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$ and $P_{i}^{(2)}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$, are always real numbers. The vectors ${\left( \begin{array}{c} {\psi }_{i}^{(1)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ {\psi }_{i}^{(1)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) }$ and ${\left( \begin{array}{c} {\psi }_{i}^{(2)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ {\psi }_{i}^{(2)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) }$ correspond to the eigenvectors for the eigenvalues $P_{i}^{(1)}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$ and $P_{i}^{(2)}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$, such that

$$\begin{aligned} {\boldsymbol{P_{i}}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma }) {\left( \begin{array}{c} {\psi }_{i}^{(n)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ {\psi }_{i}^{(n)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) }= & {} P_{i}(n|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) {\left( \begin{array}{c} {\psi }_{i}^{(n)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ {\psi }_{i}^{(n)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) } \nonumber \\&{} (i{\in }V,~n{\in }\{1,2\}). \end{aligned}$$

(10.279)

This means that the eigenvectors correspond to all possible states and probabilities of the states ${\left( \begin{array}{c} {\psi }_{i}^{(1)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ {\psi }_{i}^{(1)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) }$ and ${\left( \begin{array}{c} {\psi }_{i}^{(2)}(+1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ {\psi }_{i}^{(2)}(-1|{\boldsymbol{d}},{\alpha },{\beta },{\gamma }) \\ \end{array} \right) }$ are $P_{i}^{(2)}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$ and $P_{i}^{(2)}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$, respectively, in the reduced density matrix ${\boldsymbol{P_{i}}}({\boldsymbol{d}},{\alpha },{\beta },{\gamma })$. The estimates for the state at each node $i({\in }V)$, ${\left( \begin{array}{ccccc} {\widehat{{\psi }}}_{i}{\left( +1{\big |}{\boldsymbol{d}},{\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}} \right) } \\ {\widehat{{\psi }}}_{i}{\left( -1{\big |}{\boldsymbol{d}},{\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}} \right) } \\ \end{array} \right) }$, are given by

$$\begin{aligned} {\left( \begin{array}{ccccc} {\widehat{{\psi }}}_{i}{\left( +1{\big |}{\boldsymbol{d}},{\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}} \right) } \\ {\widehat{{\psi }}}_{i}{\left( -1{\big |}{\boldsymbol{d}},{\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}} \right) } \\ \end{array} \right) } \equiv {\arg }{\max }{\boldsymbol{P_{i}}}{\left( {\boldsymbol{d}},{\widehat{{\alpha }}},{\widehat{{\beta }}},{\widehat{{\gamma }}} \right) } ~(i{\in }V). \end{aligned}$$

(10.280)

These estimation criteria in Eqs. (10.267) and (10.280) correspond to quantum statistical mechanical extensions of the maximizations of marginal likelihood and posterior marginal.

4.5 Quantum Expectation-Maximization (EM) Algorithm for Probabilistic Image Segmentation

This section applies the framework of Sect. 10.4.5 to the EM algorithm for probabilistic image segmentations in Sect. 10.2.3. In our present framework, Hubbard Operators [85] are used instead of Pauli spin matrices.

First, we introduce Hubbard operators ${\boldsymbol{X}}_{i}^{{\tau },{\tau }'}$ at each node $i({\in }{\boldsymbol{V}})$ as follows:

$$\begin{aligned} \left\{ \begin{array}{ccccccccccc} {\boldsymbol{X_{1}^{({\tau },{\tau }')}}} &{} \equiv &{} {\boldsymbol{X^{({\tau },{\tau }')}}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}, \\ {\boldsymbol{X_{2}^{({\tau },{\tau }')}}} &{} \equiv &{} {\boldsymbol{I}}{\otimes }{\boldsymbol{X^{({\tau },{\tau }')}}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}, \\ &{} {\vdots } &{} \\ {\boldsymbol{X_{|V|}^{({\tau },{\tau }')}}} &{} \equiv &{} {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\cdots }{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{X_{|V|}^{({\tau },{\tau }')}}}, \end{array} \right. ~({\tau }{\in }{\Omega },~{\tau }'{\in }{\Omega }), \end{aligned}$$

(10.281)

where

$$\begin{aligned} {\boldsymbol{X^{(+1,+1)}}} \equiv {\left( \begin{array}{cccc} 1 &{} 0 \\ 0 &{} 0 \end{array} \right) }, ~ {\boldsymbol{X^{(+1,-1)}}} \equiv {\left( \begin{array}{cccc} 0 &{} 0 \\ 1 &{} 0 \end{array} \right) }, ~ {\boldsymbol{X^{(-1,+1)}}} \equiv {\left( \begin{array}{cccc} 0 &{} 1 \\ 0 &{} 0 \end{array} \right) }, ~ {\boldsymbol{X^{(-1,-1)}}} \equiv {\left( \begin{array}{cccc} 0 &{} 0 \\ 0 &{} 1 \end{array} \right) }. \nonumber \\&\end{aligned}$$

(10.282)

In probabilistic segmentation and clustering, ${\rho }{\left( {\boldsymbol{D}}|{\boldsymbol{s}},{\boldsymbol{{a}}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }$ in Eq. (10.29) and $P({\boldsymbol{s}}|{\alpha })$ in Eq. (10.30) correspond to the data generative and prior models, respectively. By using the Hubbard operators and extending Eq. (10.29) and Eq. (10.30) from the standpoint of quantum statistical mechanical informatics, the density matrices of the data generative model and the prior model in quantum machine learning systems for probabilistic image processing can be expressed as follows:

$$\begin{aligned}&{\boldsymbol{R}}{\left( {\boldsymbol{D}}|{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } \nonumber \\&= {\prod _{i{\in }V}} {\sum _{s_{i}{\in }{\Omega }}} {\boldsymbol{X_{i}^{{(s_{i},s_{i})}}}} {\sqrt{{\frac{1}{{\det }{\left( 2{\pi } {\boldsymbol{C}}(s_{i}) \right) }}}}} {\exp }{\left( -{\frac{1}{2}}{\left( {\boldsymbol{d_{i}}}-{\boldsymbol{{a}}}(s_{i})\right) } {\boldsymbol{C}}^{-1}(s_{i}){\left( {\boldsymbol{d_{i}}}-{\boldsymbol{{a}}}(s_{i})\right) }^\mathrm{{T}} \right) } \nonumber \\&= {\exp }{\left( -{\frac{1}{2}} {\sum _{i{\in }V}} {\sum _{s_{i}{\in }{\Omega }}} {\left( {\left( {\boldsymbol{d_{i}}}-{\boldsymbol{{a}}}(s_{i})\right) } {\boldsymbol{C}}^{-1}(s_{i}){\left( {\boldsymbol{d_{i}}}-{\boldsymbol{{a}}}(s_{i})\right) }^\mathrm{{T}} +{\ln }{\left( {\det }{\left( 2{\pi } {\boldsymbol{C}}(s_{i}) \right) }\right) } \right) } {\boldsymbol{X_{i}^{{(s_{i},s_{i})}}}} \right) }, \nonumber \\&\end{aligned}$$

(10.283)

$$\begin{aligned}&{\boldsymbol{R}}({\alpha },{\gamma }) = {\frac{ {\displaystyle { {\exp }{\left( -2{\alpha }{\sum _{\{i,j\}{\in }E}}{\big (}{\boldsymbol{I}}^{(2^{|V|})}-{\boldsymbol{X_{i}^{(+1,+1)}}}{\boldsymbol{X_{j}^{(+1,+1)}}}-{\boldsymbol{X_{i}^{(-1,-1)}}}{\boldsymbol{X_{j}^{(-1,-1)}}}{\big )} +{\gamma }{\sum _{i{\in }V}}{\big (}{\boldsymbol{X_{i}^{(+1,-1)}}}+{\boldsymbol{X_{i}^{(-1,+1)}}}{\big )} \right) } }} }{ {\displaystyle { \mathrm{{Tr}}{\left[ {\exp }{\left( -2{\alpha }{\sum _{\{i,j\}{\in }E}}{\big (}{\boldsymbol{I}}^{(2^{|V|})}-{\boldsymbol{X_{i}^{(+1,+1)}}}{\boldsymbol{X_{j}^{(+1,+1)}}}-{\boldsymbol{X_{i}^{(-1,-1)}}}{\boldsymbol{X_{j}^{(-1,-1)}}}{\big )} +{\gamma }{\sum _{i{\in }V}}{\big (}{\boldsymbol{X_{i}^{(+1,-1)}}}+{\boldsymbol{X_{i}^{(-1,+1)}}}{\big )} \right) } \right] } }} } }, \nonumber \\&\end{aligned}$$

(10.284)

where

$$\begin{aligned} {\boldsymbol{{a}}}(+1) = {\left( \begin{array}{ccc} {a}_\mathrm{{R}}(+1) \\ {a}_\mathrm{{G}}(+1) \\ {a}_\mathrm{{B}}(+1) \end{array} \right) },~ {\boldsymbol{{a}}}(-1) = {\left( \begin{array}{ccc} {a}_\mathrm{{R}}(-1) \\ {a}_\mathrm{{G}}(-1) \\ {a}_\mathrm{{B}}(-1) \end{array} \right) }, \end{aligned}$$

(10.285)

$$\begin{aligned} {\boldsymbol{C}}(+1)= {\left( \begin{array}{ccc} C_{\mathrm{{R}}\mathrm{{R}}}(+1) &{} C_{\mathrm{{R}}\mathrm{{G}}}(+1) &{} C_{\mathrm{{R}}\mathrm{{B}}}(+1) \\ C_{\mathrm{{G}}\mathrm{{R}}}(+1) &{} C_{\mathrm{{G}}\mathrm{{G}}}(+1) &{} C_{\mathrm{{G}}\mathrm{{B}}}(+1) \\ C_{\mathrm{{B}}\mathrm{{R}}}(+1) &{} C_{\mathrm{{B}}\mathrm{{G}}}(+1) &{} C_{\mathrm{{B}}\mathrm{{B}}}(+1) \\ \end{array} \right) },~ {\boldsymbol{C}}(-1)= {\left( \begin{array}{ccc} C_{\mathrm{{R}}\mathrm{{R}}}(-1) &{} C_{\mathrm{{R}}\mathrm{{G}}}(-1) &{} C_{\mathrm{{R}}\mathrm{{B}}}(-1) \\ C_{\mathrm{{G}}\mathrm{{R}}}(-1) &{} C_{\mathrm{{G}}\mathrm{{G}}}(-1) &{} C_{\mathrm{{G}}\mathrm{{B}}}(-1) \\ C_{\mathrm{{B}}\mathrm{{R}}}(-1) &{} C_{\mathrm{{B}}\mathrm{{G}}}(-1) &{} C_{\mathrm{{B}}\mathrm{{B}}}(-1) \\ \end{array} \right) }. \nonumber \\ \end{aligned}$$

(10.286)

The joint density matrix of ${\boldsymbol{s}}$ and ${\boldsymbol{D}}$ is expressed in terms of the data generative and prior density matrix as follows:

$$\begin{aligned}&{\boldsymbol{P}}{\left( {\boldsymbol{D}}|{\alpha },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } \nonumber \\&\equiv {\exp }{\left( {\ln }{\left( {\boldsymbol{P}}{\left( {\boldsymbol{D}}|{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }\right) } +{\ln }{\left( {\boldsymbol{P}}{\left( {\alpha },{\gamma }\right) }\right) } \right) }. \end{aligned}$$

(10.287)

By using the joint density matrix ${\boldsymbol{P}}{\left( {\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }$, the posterior density matrix ${\boldsymbol{P}}{\left( {\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }$ is defined by using Bayes formulas as follows:

$$\begin{aligned} {\boldsymbol{P}}{\left( {\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } \equiv {\frac{{\boldsymbol{P}}{\left( {\boldsymbol{D}}|{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }}{{\boldsymbol{P}}{\left( {\boldsymbol{D}}|{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }}}, \nonumber \\ \end{aligned}$$

(10.288)

Estimates of the hyperparameters and parameter vector, ${\widehat{{\alpha }}}({\boldsymbol{D}})$, ${\widehat{{\gamma }}}({\boldsymbol{D}})$, ${\boldsymbol{{\widehat{{a}}}}}(+1|{\boldsymbol{D}})$, ${\boldsymbol{{\widehat{{a}}}}}(-1|{\boldsymbol{D}})$, ${\boldsymbol{{\widehat{C}}}}(+1|{\boldsymbol{D}})$, ${\boldsymbol{{\widehat{C}}}}(-1|{\boldsymbol{D}})$, are given by

$$\begin{aligned}&{\left( {\widehat{{\alpha }}}({\boldsymbol{D}}), {\widehat{{\gamma }}}({\boldsymbol{D}}), {\boldsymbol{{\widehat{{a}}}}}(+1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{{a}}}}}(-1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{C}}}}(+1|{\boldsymbol{D}}), {\boldsymbol{{\widehat{C}}}}(-1|{\boldsymbol{D}}) \right) } \nonumber \\&={\arg }{\max _{{\left( {\alpha }, {\gamma }, {\boldsymbol{{\mu }}}(+1), {\boldsymbol{{\mu }}}(-1), {\boldsymbol{C}}(+1), {\boldsymbol{C}}(-1) \right) }}} \mathrm{{Tr}}{\left[ {\boldsymbol{P}}{\left( {\boldsymbol{D}}|{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }\right] }, \nonumber \\&\end{aligned}$$

(10.289)

The parameter vector ${\boldsymbol{{\widehat{s}}}}({\boldsymbol{D}}) = {\left( {\widehat{s}}_{1}({\boldsymbol{D}}),{\widehat{s}}_{2}({\boldsymbol{D}}),{\cdots },{\widehat{s}}_{|V|}({\boldsymbol{D}}) \right) }$ can be estimated from the reduced posterior marginal density matrix at each node i of ${\boldsymbol{P}}{\left( {\boldsymbol{D}},{\alpha },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) }$ by similar arguments to those for Eqs. (10.278), (10.279), and (10.280).

The $\mathcal{{Q}}$-function for the EM algorithm in the present framework is defined by

$$\begin{aligned}&\mathcal{{Q}}{\left( {\alpha }',{\gamma }',{\boldsymbol{{a}}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1) |{\alpha }',{\gamma }',{\boldsymbol{{a}'}}(+1),{\boldsymbol{{a}'}}(-1),{\boldsymbol{C'}}(+1),{\boldsymbol{C'}}(-1),{\boldsymbol{D}} \right) } \nonumber \\&\qquad\quad{} \equiv \mathrm{{Tr}} {\Big [} {\boldsymbol{P}}{\left( {\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}'}}(+1),{\boldsymbol{{a}'}}(-1),{\boldsymbol{C'}}(+1),{\boldsymbol{C'}}(-1)\right) } \nonumber \\&\qquad\qquad\qquad{}{\times } {\ln }{\left( {\boldsymbol{P}}{\left( {\boldsymbol{D}}|{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)\right) } \right) } {\Big ]}. \end{aligned}$$

(10.290)

The EM algorithm is a procedure that performs the following E- and M-step repeatedly for $t=0,1,2,{\cdots }$ until ${\widehat{\alpha }}({\boldsymbol{D}})$, ${\boldsymbol{\widehat{a}}}(+1,{\boldsymbol{D}})$, ${\boldsymbol{\widehat{a}}}(-1,{\boldsymbol{D}})$, ${\boldsymbol{\widehat{C}}}(+1,{\boldsymbol{D}})$, ${\boldsymbol{\widehat{C}}}(-1,{\boldsymbol{D}})$ converge:

E-step::

Compute $\mathcal{{Q}}{\left( {\alpha },{\boldsymbol{{a}}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1) {\big |}{\alpha }(t),{\boldsymbol{{a}}}(+1,t),{\boldsymbol{{a}}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t) \right) }$ for various values of ${\boldsymbol{{a}}}(+1)$, ${\boldsymbol{a}}(-1)$, ${\boldsymbol{C}}(+1)$ and ${\boldsymbol{C}}(-1)$.

M-step::

Determine ${\alpha }(t+1)$, ${\boldsymbol{{a}}}(+1,t+1)$, ${\boldsymbol{{a}}}(-1,t+1)$, ${\boldsymbol{C}}(+1,t+1)$ and ${\boldsymbol{C}}(-1,t+1)$

so as to satisfy the extremum conditions of $\mathcal{{Q}}$-function with respect to ${\boldsymbol{{a}}}(+1)$, ${\boldsymbol{a}}(-1)$, ${\boldsymbol{C}}(+1)$ and ${\boldsymbol{C}}(-1)$ as follows:

$$\begin{aligned}&{\left( {\alpha }(t+1),{\boldsymbol{{a}}}(+1,t+1),{\boldsymbol{{a}}}(-1,t+1),{\boldsymbol{C}}(+1,t+1),{\boldsymbol{C}}(-1,t+1) \right) } \nonumber \\&{\leftarrow } {\mathop {{\mathrm{extremum}}}\limits _{{\alpha },{\boldsymbol{{a}}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)}} \nonumber \\&\mathcal{{Q}}{\left( {\alpha },{\boldsymbol{{a}}}(+1),{\boldsymbol{a}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1) {\big |} {\alpha }(t),{\boldsymbol{{a}}}(+1,t),{\boldsymbol{{a}}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t),{\boldsymbol{D}} \right) }. \nonumber \\&\end{aligned}$$

(10.291)

Update ${\widehat{\alpha }}({\boldsymbol{D}}){\leftarrow }{\alpha }(t+1)$, ${\boldsymbol{\widehat{a}}}(+1,{\boldsymbol{D}}){\leftarrow }{\boldsymbol{{a}}}(+1,t+1)$, ${\boldsymbol{\widehat{a}}}(-1,{\boldsymbol{D}}){\leftarrow }{\boldsymbol{{a}}}(-1,t+1)$, ${\boldsymbol{\widehat{C}}}(+1,{\boldsymbol{D}}){\leftarrow }{\boldsymbol{C}}(+1,t+1)$ and ${\boldsymbol{\widehat{C}}}(-1,{\boldsymbol{D}}){\leftarrow }{\boldsymbol{C}}(-1,t+1)$.

By using some equalities in Eqs. (10.283), (10.284), (10.287), and (10.288), the EM algorithm using the $\mathcal{{Q}}$-function can be reduced to the following expression:

$$\begin{aligned}&{\frac{1}{|E|}}{\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}-{\boldsymbol{X^{(+1,+1)}}{\otimes }{\boldsymbol{X^{(+1,+1)}}}}-{\boldsymbol{X^{(+1,+1)}}{\otimes }{\boldsymbol{X^{(+1,+1)}}}}{\big )} {\boldsymbol{P_{ij}}}({\alpha }(t+1),{\gamma }(t+1)) \right] } \nonumber \\&= {\frac{1}{|E|}} {\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\Big [} {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}-{\boldsymbol{X^{(+1,+1)}}{\otimes }{\boldsymbol{X^{(+1,+1)}}}}-{\boldsymbol{X^{(+1,+1)}}{\otimes }{\boldsymbol{X^{(+1,+1)}}}}\right) } \nonumber \\&\qquad\qquad\qquad\quad{}{\times } {\boldsymbol{P_{ij}}}({\boldsymbol{D}},{\alpha }(t),{\gamma }(t),{\boldsymbol{{a}}}(+1,t),{\boldsymbol{{a}}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)){\Big ]}, \end{aligned}$$

(10.292)

$$\begin{aligned}&{\frac{1}{|V|}}{\sum _{i{\in }V}} \mathrm{{Tr}} {\left[ {\left( {\boldsymbol{X^{(+1,-1)}}}+{\boldsymbol{X^{(-1,+1)}}}\right) } {\boldsymbol{P_{i}}}({\alpha }(t+1),{\gamma }(t+1)) \right] } \nonumber \\&= {\frac{1}{|V|}} {\sum _{i{\in }V}} \mathrm{{Tr}} {\Big [} {\left( {\boldsymbol{X^{(+1,-1)}}}+{\boldsymbol{X^{(-1,+1)}}}\right) } \nonumber \\&\qquad\qquad{}{\times } {\boldsymbol{P_{i}}}({\boldsymbol{D}},{\alpha }(t),{\gamma }(t),{\boldsymbol{{a}}}(+1,t),{\boldsymbol{{a}}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)) {\Big ]}, \end{aligned}$$

(10.293)

$$\begin{aligned} {\boldsymbol{{\mu }}}({\xi },t+1) = {\frac{ {\displaystyle { {\sum _{i{\in }V}} {\boldsymbol{d_{i}}} \mathrm{{Tr}}{\left[ {\boldsymbol{X^{({\xi },{\xi })}}} {\boldsymbol{P_{i}}}({\boldsymbol{D}},{\alpha }(t),{\gamma }(t),{\boldsymbol{{a}}}(+1,t),{\boldsymbol{{a}}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)) \right] } }} }{ {\displaystyle { {\sum _{i{\in }V}} \mathrm{{Tr}}{\left[ {\boldsymbol{X^{({\xi },{\xi })}}} {\boldsymbol{P_{i}}}({\boldsymbol{D}},{\alpha }(t),{\gamma }(t),{\boldsymbol{{a}}}(+1,t),{\boldsymbol{{a}}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)) \right] } }} } } ~({\xi }{\in }{\Omega }), \nonumber \\&\end{aligned}$$

(10.294)

$$\begin{aligned}&{\boldsymbol{C}}({\xi };t+1) \nonumber \\&= {\frac{ {\displaystyle { {\sum _{i{\in }V}} {\big (}{\boldsymbol{d_{i}}}-{\boldsymbol{{a}}}({\xi };t){\big )}^\mathrm{{T}}{\big (}{\boldsymbol{d_{i}}}-{\boldsymbol{{a}}}({\xi };t){\big )} \mathrm{{Tr}}{\left[ {\boldsymbol{X^{({\xi },{\xi }')}}} {\boldsymbol{P_{i}}}({\boldsymbol{D}},{\alpha }(t),{\gamma }(t),{\boldsymbol{{a}}}(+1,t),{\boldsymbol{{a}}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)) \right] } }} }{ {\displaystyle { {\sum _{i{\in }V}} \mathrm{{Tr}}{\left[ {\boldsymbol{X^{({\xi },{\xi }')}}} {\boldsymbol{P_{i}}}({\boldsymbol{D}},{\alpha }(t),{\gamma }(t),{\boldsymbol{{a}}}(+1,t),{\boldsymbol{{a}}}(-1,t),{\boldsymbol{C}}(+1,t),{\boldsymbol{C}}(-1,t)) \right] } }} } } ~({\xi }{\in }{\Omega }), \nonumber \\&\end{aligned}$$

(10.295)

where

$$\begin{aligned}&{\langle }s_{i}|{\boldsymbol{P_{i}}}({\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1))|s'_{i}{\rangle } \nonumber \\&={\langle }s_{i}|\mathrm{{Tr}}_{{\setminus }i}{\boldsymbol{P}}({\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1))|s'_{i}{\rangle } \nonumber \\&\equiv {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s'_{i},{\tau }'_{i}} \nonumber \\&{\times } {\left( {\prod _{j{\in }V{\setminus }\{i\}}}{\delta }_{{\tau }_{j},{\tau }'_{j}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{P}}({\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1))) |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&(s_{i}{\in }{\Omega },~s'_{i}{\in }{\Omega }.~i{\in }V), \end{aligned}$$

(10.296)

$$\begin{aligned}&{\langle }s_{i},s_{j}|{\boldsymbol{P_{ij}}}({\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)))|s'_{i},s'_{j}{\rangle } \nonumber \\&= {\langle }s_{i},s_{j}|{\boldsymbol{P_{ji}}}({\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)))|s'_{i},s'_{j}{\rangle } \nonumber \\&={\langle }s_{i},s_{j}|\mathrm{{Tr}}_{{\setminus }\{i,j\}}{\boldsymbol{P}}({\boldsymbol{D}},{\alpha },{\beta },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1)))|s'_{i},s'_{j}{\rangle } \nonumber \\&\equiv {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}}{\delta }_{s'_{i},{\tau }'_{i}}{\delta }_{s'_{j},{\tau }'_{j}} \nonumber \\&{\times } {\left( {\prod _{k{\in }V{\setminus }\{i,j\}}}{\delta }_{{\tau }_{k},{\tau }'_{k}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{P}}({\boldsymbol{D}},{\alpha },{\gamma },{\boldsymbol{{a}}}(+1),{\boldsymbol{{a}}}(-1),{\boldsymbol{C}}(+1),{\boldsymbol{C}}(-1))) |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&(s_{i}{\in }{\Omega }~s_{j}{\in }{\Omega },~s'_{i}{\in }{\Omega },~s'_{j}{\in }{\Omega },~i{\in }V,~j{\in }V,~i<j), \end{aligned}$$

(10.297)

$$\begin{aligned}&{\langle }s_{i}|{\boldsymbol{P_{i}}}({\alpha },{\gamma })|s'_{i}{\rangle } ={\langle }s_{i}|\mathrm{{Tr}}_{{\setminus }\{i\}}{\boldsymbol{P}}({\alpha },{\gamma })|s'_{i}{\rangle } \nonumber \\&\equiv {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s'_{i},{\tau }'_{i}} \nonumber \\&\qquad\quad{} {\times } {\left( {\prod _{k{\in }V{\setminus }\{i\}}}{\delta }_{{\tau }_{k},{\tau }'_{k}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{P}}({\alpha },{\gamma }) |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&\qquad\quad{}(s_{i}{\in }{\Omega }~s_{j}{\in }{\Omega },~s'_{i}{\in }{\Omega },~s'_{j}{\in }{\Omega },~i{\in }V), \end{aligned}$$

(10.298)

$$\begin{aligned} {\langle }s_{i},s_{j}|{\boldsymbol{P_{ij}}}({\alpha },{\gamma })|s'_{i},s'_{j}{\rangle }= & {} {\langle }s_{i},s_{j}|{\boldsymbol{P_{ji}}}({\alpha },{\gamma })|s'_{i},s'_{j}{\rangle } \nonumber \\= & {} {\langle }s_{i},s_{j}|\mathrm{{Tr}}_{{\setminus }\{i,j\}}{\boldsymbol{P}}({\alpha },{\gamma })|s'_{i},s'_{j}{\rangle } \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}}{\delta }_{s'_{i},{\tau }'_{i}}{\delta }_{s'_{j},{\tau }'_{j}} \nonumber \\&\qquad\quad{} {\times } {\left( {\prod _{k{\in }V{\setminus }\{i,j\}}}{\delta }_{{\tau }_{k},{\tau }'_{k}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{P}}({\alpha },{\gamma }) |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&\qquad\quad{}(s_{i}{\in }{\Omega }~s_{j}{\in }{\Omega },~s'_{i}{\in }{\Omega },~s'_{j}{\in }{\Omega },~i{\in }V,~j{\in }V,~i<j). \end{aligned}$$

(10.299)

5 Quantum Statistical Mechanical Informatics

This section explains some quantum graphical modeling using some quantum mechanical extensions of statistical mechanical informatics, such as quantum statistical mechanical informatics, and particularly, advanced quantum mean-field methods. Fundamental frameworks and recent developments have been explored in some textbooks in statistical mechanics [37, 86]. In some applications of quantum annealing to massive optimization problems, a transverse Ising model is an important quantum probabilistic graphical model [83, 84] and it is known that the density matrices, for example, in Eqs. (10.263), (10.265), and (10.266), in some familiar quantum statistical machine learning systems can be reduced to transverse Ising models.

In quantum statistical mechanical informatics, one of most important schemes is Suzuki-Trotter decompositions [87, 88]. This was used to realize the quantum Monte Carlo methods by mapping d-dimensional density matrices to corresponding $(d+1)$-dimensional probability distributions [89]. Recently, some quantum annealing schemes have been realized as actual quantum computers, for example, the d-wave machine.

In the first part of this section, we explain some basic frameworks in advanced quantum mean-field methods for realizing familiar quantum statistical machine learning systems for the transverse Ising models, including conventional frameworks of quantum belief propagations. In the second part, we propose a quantum adaptive Thouless-Anderson-Palmar (TAP) method and a new approach using the momentum space renormalization group method to realize coarse graining for the transverse Ising model not only for regular graphs but also for random graphs. In the third part, we introduce Suzuki-Trotter decompositions [87, 88], and show the basic scheme for mapping a d-dimensional transverse Ising model to a $(d+1)$-dimensional Ising model and apply the scheme to the message passing rules of the conventional quantum belief propagation.

5.1 Advanced Mean-Field Methods for the Transverse Ising Model

This section explores the detailed derivation of the deterministic equations in both the quantum mean-field method and the quantum loopy belief propagation method for the transverse Ising model [83, 84]. Note that the present framework of the quantum mean-field method and the quantum loopy belief propagation method are constructed in real space, while other familiar frameworks in quantum statistical mechanics such as spin wave theory are constructed in momentum space.

For a graph (V, E) with a set of nodes V and set of edges E, we consider a density matrix ${\boldsymbol{P}}$ as

$$\begin{aligned} {\boldsymbol{P}}= {\frac{ {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}} {\left( {\displaystyle { {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} +{\frac{1}{2}}h{\sum _{i{\in }V}}{\left( {\boldsymbol{{\sigma }_{i}^{z}}} - d_{i}{\boldsymbol{I^{(2^{|V|})}}} \right) }^{2} -{\gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} \right) } \right) } }{ \mathrm{{Tr}}{\left[ {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}} {\left( {\displaystyle { {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} +{\frac{1}{2}}h{\sum _{i{\in }V}}{\left( {\boldsymbol{{\sigma }_{i}^{z}}} - d_{i}{\boldsymbol{I^{(2^{|V|})}}} \right) }^{2} -{\Gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}} }} \right) } \right) } \right] } } }. \nonumber \\ \end{aligned}$$

(10.300)

Because ${\boldsymbol{{\sigma }_{i}^{z}}}{\boldsymbol{{\sigma }_{i}^{z}}}={\boldsymbol{I^{(2^{|V|})}}}$, the density matrix in Eq. (10.300) can be reduced to Eqs. (10.226) and (10.227) with

$$\begin{aligned} {\boldsymbol{H}}=-J{\sum _{{\{i,j}\}{\in }E}}{\boldsymbol{{\sigma }_{i}^{z}}}{\boldsymbol{{\sigma }_{j}^{z}}}-h{\sum _{i{\in }V}}d_{i}{\boldsymbol{{\sigma }_{i}^{z}}}-{\Gamma }{\sum _{i{\in }V}}{\boldsymbol{{\sigma }_{i}^{x}}}. \end{aligned}$$

(10.301)

Here, all the nodes j connected with the node i by an edge $\{i,j\}$ are referred to as neighboring nodes of the node i, and the set of all neighboring nodes of the node i is denoted by the notation ${\partial }i$. The quantum probabilistic graphical model in Eqs. (10.300) and (10.301) is referred to as the Transverse Ising Model [83, 84].

First, we explain the conventional quantum mean-field method for the transverse Ising model. We introduce a $2^{N}{\times }2^{N}$ trial density matrix ${\boldsymbol{R}}$ and its $2{\times }2$ trial reduced density matrix ${\boldsymbol{R}}_{i}$ for each node $i({\in }V))$ defined by

$$\begin{aligned} {\boldsymbol{R_{i}}}=\mathrm{{Tr}}_{{\setminus }i}{\boldsymbol{R}} = {\left( \begin{array}{ccc} {\langle }+1|{\boldsymbol{R_{i}}}|+1{\rangle } &{} {\langle }+1|{\boldsymbol{R_{i}}}|-1{\rangle } \\ {\langle }-1|{\boldsymbol{R_{i}}}|+1{\rangle } &{} {\langle }-1|{\boldsymbol{R_{i}}}|-1{\rangle } \end{array} \right) }, \end{aligned}$$

(10.302)

where

$$\begin{aligned} {\langle }s_{i}|{\boldsymbol{R_{i}}}|s'_{i}{\rangle }= & {} {\langle }s_{i}|\mathrm{{Tr}}_{{\setminus }i}{\boldsymbol{R}}|s'_{i}{\rangle } \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s'_{i},{\tau }'_{i}}{\left( {\prod _{j{\in }V{\setminus }\{i\}}}{\delta }_{{\tau }_{j},{\tau }'_{j}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{R}} |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad{}(s_{i}{\in }{\Omega },~s'_{i}{\in }{\Omega }.~i{\in }V). \end{aligned}$$

(10.303)

By using Eq. (10.303), the average $\mathrm{{Tr}}({\boldsymbol{{\sigma }_{i}^{x}}}{\boldsymbol{R}})$ can be expressed in terms of the reduced density matrix ${\boldsymbol{R_{i}}}$ as follows:

$$\begin{aligned} \mathrm{{Tr}}({\boldsymbol{{\sigma }_{i}^{x}}}{\boldsymbol{R}})= & {} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\sum _{s'_{1}{\in }{\Omega }}}{\sum _{s'_{2}{\in }{\Omega }}}{\cdots }{\sum _{s'_{|V|}{\in }{\Omega }}} {\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{{\sigma }_{i}^{x}}} |s'_{1},s'_{2},{\cdots },s'_{|V|}{\rangle } \nonumber \\&\qquad\qquad\qquad{}{\times } {\langle }s'_{1},s'_{2},{\cdots },s'_{|V|}|{\boldsymbol{R}} |s_{1},s_{2},{\cdots },s_{|V|}{\rangle } \nonumber \\= & {} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\sum _{s'_{1}{\in }{\Omega }}}{\sum _{s'_{2}{\in }{\Omega }}}{\cdots }{\sum _{s'_{|V|}{\in }{\Omega }}} {\left( {\prod _{k{\in }V{\setminus }\{i\}}}{\delta }_{s_{k},s'_{k}}\right) } \nonumber \\&\qquad\qquad\qquad{}{\times } {\langle }s_{i}|{\boldsymbol{{\sigma }^{x}}}|s'_{i}{\rangle } {\langle }s'_{1},s'_{2},{\cdots },s'_{|V|}|{\boldsymbol{R}} |s_{1},s_{2},,{\cdots },s_{|V|}{\rangle } \nonumber \\= & {} {\sum _{s_{i}{\in }{\Omega }}} {\sum _{s'_{i}{\in }{\Omega }}} {\langle }s_{i}|{\boldsymbol{{\sigma }^{x}}}|s'_{i}{\rangle } {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s'_{i},{\tau }'_{i}} \nonumber \\&\qquad\qquad\qquad{}{\times } {\left( {\prod _{k{\in }V{\setminus }\{i\}}}{\delta }_{{\tau }_{k},{\tau }'_{k}}\right) } {\langle }{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}|{\boldsymbol{R}} |{\tau }_{1},{\tau }_{2},,{\cdots },{\tau }_{|V|}{\rangle } \nonumber \\= & {} {\sum _{s_{i}{\in }{\Omega }}} {\sum _{s'_{i}{\in }{\Omega }}} {\langle }s_{i}|{\boldsymbol{{\sigma }^{x}}}|s'_{i}{\rangle } {\langle }s'_{i}|{\boldsymbol{R_{i}}}|s_{i}{\rangle } \nonumber \\= & {} \mathrm{{Tr}}({\boldsymbol{{\sigma }^{x}}}{\boldsymbol{R_{i}}}). \end{aligned}$$

(10.304)

By similar arguments to those for Eq. (10.304), we derive

$$\begin{aligned} \mathrm{{Tr}}{\left( {\boldsymbol{{\sigma }^{z}_{i}}}{\boldsymbol{R}}\right) } = \mathrm{{Tr}}{\left( {\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}\right) }. \end{aligned}$$

(10.305)

Now, we assume that the trial density matrix ${\boldsymbol{R}}$ is expressed as

$$\begin{aligned} {\boldsymbol{R}} = {\boldsymbol{R_{1}}}{\otimes }{\boldsymbol{R_{2}}}{\otimes }{\cdots }{\otimes }{\boldsymbol{R_{|V|}}}. \end{aligned}$$

(10.306)

In this case, the average $\mathrm{{Tr}}({\boldsymbol{{\sigma }_{i}^{z}}}{\boldsymbol{{\sigma }_{j}^{z}}}{\boldsymbol{R}})$ and the entropy $-k_\mathrm{{B}}\mathrm{{Tr}}{\boldsymbol{R}}{\ln }{\boldsymbol{R}}$ can be expressed as

$$\begin{aligned} \mathrm{{Tr}}({\boldsymbol{{\sigma }_{i}^{z}}}{\boldsymbol{{\sigma }_{j}^{z}}}{\boldsymbol{R}})= & {} \mathrm{{Tr}}{\left( {\boldsymbol{{\sigma }_{i}^{z}}}{\boldsymbol{{\sigma }_{j}^{z}}}{\left( {\boldsymbol{R_{1}}}{\otimes }{\boldsymbol{R_{2}}}{\otimes }{\cdots }{\otimes }{\boldsymbol{R_{|V|}}}\right) }\right) } \nonumber \\= & {} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\sum _{s'_{1}{\in }{\Omega }}}{\sum _{s'_{2}{\in }{\Omega }}}{\cdots }{\sum _{s'_{|V|}{\in }{\Omega }}} {\sum _{s''_{1}{\in }{\Omega }}}{\sum _{s''_{2}{\in }{\Omega }}}{\cdots }{\sum _{s''_{|V|}{\in }{\Omega }}} {\left( {\prod _{k{\in }V{\setminus }\{i\}}}{\delta }_{s_{k},s'_{k}}\right) } {\langle }s_{i}|{\boldsymbol{{\sigma }^{z}}}|s'_{i}{\rangle } \nonumber \\&{\times } {\left( {\prod _{l{\in }V{\setminus }\{j\}}}{\delta }_{s'_{l},s''_{l}}\right) } {\langle }s'_{j}|{\boldsymbol{{\sigma }^{z}}}|s''_{j}{\rangle } {\langle }s''_{1}|{\boldsymbol{R_{1}}}|s_{1}{\rangle } {\langle }s''_{2}|{\boldsymbol{R_{2}}}|s_{2}{\rangle } {\times } {\cdots } {\times } {\langle }s''_{|V|}|{\boldsymbol{R_{|V|}}}|s_{|V|}{\rangle } \nonumber \\= & {} {\left( {\sum _{s_{i}{\in }{\Omega }}} {\sum _{s'_{i}{\in }{\Omega }}} {\langle }s_{i}|{\boldsymbol{{\sigma }^{z}}}|s'_{i}{\rangle } {\langle }s'_{i}|{\boldsymbol{R_{i}}}|s_{i}{\rangle } \right) } \nonumber \\&{\times } {\left( {\sum _{s_{j}{\in }{\Omega }}} {\sum _{s''_{j}{\in }{\Omega }}} {\langle }s_{j}|{\boldsymbol{{\sigma }^{z}}}|s''_{j}{\rangle } {\langle }s''_{j}|{\boldsymbol{R_{j}}}|s_{j}{\rangle } \right) } {\left( {\prod _{k{\in }V{\setminus }\{i,j\}}} {\left( {\sum _{s'_{k}{\in }{\Omega }}} {\sum _{s''_{k}{\in }{\Omega }}} {\delta }_{s_{k},s'_{k}}{\delta }_{s'_{k},s''_{k}}\right) } {\langle }s''_{k}|{\boldsymbol{R_{k}}}|s_{k}{\rangle } \right) } \nonumber \\= & {} {\big (} \mathrm{{Tr}}{\left( {\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}\right) }{\big )} {\big (} \mathrm{{Tr}}{\left( {\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{j}}}\right) }{\big )} {\left( {\prod _{k{\in }V{\setminus }\{i,j\}}} {\left( \mathrm{{Tr}}{\left( {\boldsymbol{R_{k}}}\right) }\right) }\right) } \nonumber \\= & {} {\big (} \mathrm{{Tr}}{\left( {\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}\right) }{\big )} {\big (} \mathrm{{Tr}}{\left( {\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{j}}}\right) }{\big )}, \end{aligned}$$

(10.307)

$$\begin{aligned} -k_\mathrm{{B}}\mathrm{{Tr}}{\left( {\boldsymbol{R}}{\ln }{\left( {\boldsymbol{R}}\right) }\right) }= & {} -\mathrm{{Tr}}{\left( {\left( {\boldsymbol{R_{1}}}{\otimes }{\boldsymbol{R_{2}}}{\otimes }{\cdots }{\otimes }{\boldsymbol{R_{|V|}}}\right) } {\ln } {\left( {\boldsymbol{R_{1}}}{\otimes }{\boldsymbol{R_{2}}}{\otimes }{\cdots }{\otimes }{\boldsymbol{R_{|V|}}}\right) } \right) } \nonumber \\= & {} -k_\mathrm{{B}}{\sum _{i=1}^{N}} \mathrm{{Tr}}{\left( {\left( {\boldsymbol{R_{1}}}{\otimes }{\boldsymbol{R_{2}}}{\otimes }{\cdots }{\otimes }{\boldsymbol{R_{|V|}}}\right) } {\left( {\boldsymbol{I}}^{(i-1)}{\otimes } {\ln } {\left( {\boldsymbol{R_{i}}}\right) } {\otimes }{\boldsymbol{I}}^{(N-i)} \right) } \right) } \nonumber \\= & {} -k_\mathrm{{B}}{\sum _{i=1}^{N}} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\sum _{s'_{1}{\in }{\Omega }}}{\sum _{s'_{2}{\in }{\Omega }}}{\cdots }{\sum _{s'_{|V|}{\in }{\Omega }}} {\langle }s_{1}|{\boldsymbol{R_{1}}}|s'_{1}{\rangle } {\langle }s_{2}|{\boldsymbol{R_{2}}}|s'_{2}{\rangle } {\times } {\cdots } {\times } {\langle }s_{|V|}|{\boldsymbol{R_{|V|}}}|s'_{|V|}{\rangle } \nonumber \\&{\times } {\left( {\prod _{k{\in }V{\setminus }\{i\}}}{\delta }_{s_{k},s'_{k}}\right) } {\langle }s'_{i}|{\ln }({\boldsymbol{R_{i}}})|s_{i}{\rangle } \nonumber \\= & {} -k_\mathrm{{B}}{\sum _{i=1}^{N}} {\left( {\sum _{s_{i}{\in }{\Omega }}} {\sum _{s'_{i}{\in }{\Omega }}} {\langle }s_{i}|{\boldsymbol{R_{i}}}|s'_{i}{\rangle } {\langle }s'_{i}|{\ln }({\boldsymbol{R_{i}}})|s_{i}{\rangle } \right) } {\left( {\prod _{k{\in }V{\setminus }\{i\}}} {\left( {\sum _{s_{k}{\in }{\Omega }}} {\sum _{s'_{k}{\in }{\Omega }}} {\delta }_{s_{k},s'_{k}} {\langle }s_{k}|{\boldsymbol{R_{k}}}|s'_{1}{\rangle } \right) } \right) } \nonumber \\= & {} -k_\mathrm{{B}}{\sum _{i=1}^{N}} {\left( \mathrm{{Tr}} {\big (} {\boldsymbol{R_{i}}}{\ln }({\boldsymbol{R_{i}}}) {\big )} \right) } {\left( {\prod _{k{\in }V{\setminus }\{i\}}} \mathrm{{Tr}}({\boldsymbol{R_{k}}}) \right) } \nonumber \\= & {} -k_\mathrm{{B}}{\sum _{i=1}^{N}} \mathrm{{Tr}} {\big (} {\boldsymbol{R_{i}}}{\ln }({\boldsymbol{R_{i}}}) {\big )}. \end{aligned}$$

(10.308)

The free energy functional can be reduced to

$$\begin{aligned} \mathcal{{F}}[{\boldsymbol{R}}] =\mathcal{{F}}_\mathrm{{MF}}[{\boldsymbol{R_{1}}},{\boldsymbol{R_{2}}},{\cdots },{\boldsymbol{R_{|V|}}}]\equiv & {} -J{\sum _{\{i,j\}{\in }E}}{\big (}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}){\big )}{\big (}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{j}}}){\big )} -h{\sum _{i{\in }V}}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}) \nonumber \\&-{\Gamma }{\sum _{i{\in }V}}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{x}}}{\boldsymbol{R_{i}}}) +k_\mathrm{{B}}T{\sum _{i{\in }V}}\mathrm{{Tr}}({\boldsymbol{R_{i}}}{\ln }({\boldsymbol{R}}_{i})). \end{aligned}$$

(10.309)

We define the optimal reduced density matrix ${\boldsymbol{{\widehat{R}}_{i}}}$ for each node $i({\in }V)$ by

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{i}}}}= & {} {\arg }~{\mathop {{\mathrm{extremum}}}\limits _{{\boldsymbol{R_{i}}}}}{\Big \{} \mathcal{{F}}_\mathrm{{MF}}{\left[ {\boldsymbol{{\widehat{R}}_{1}}},{\boldsymbol{{\widehat{R}}_{1}}},{\cdots },{\boldsymbol{{\widehat{R}}_{i-1}}}, {\boldsymbol{R_{i}}},{\boldsymbol{{\widehat{R}}_{i+1}}},{\boldsymbol{{\widehat{R}}_{i+2}}},{\cdots },{\boldsymbol{{\widehat{R}}_{|V|}}}\right] } {\Big |} \mathrm{{Tr}}{\boldsymbol{R_{i}}}=1 {\Big \}}~(i{\in }V). \nonumber \\&\end{aligned}$$

(10.310)

The simultaneous self-consistent equations for reduced density matrices are expressed as

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{i}}}}={\frac{1}{Z_{i}}}{\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( {\left( J{\sum _{j{\in }{\partial }i}}{\big (}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{{\widehat{R}}_{j}}}){\big )}+hd_{i}\right) }{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) } \right) }, \end{aligned}$$

(10.311)

$$\begin{aligned} Z_{i} \equiv \mathrm{{Tr}}{\left[ {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( {\left( J{\sum _{j{\in }{\partial }i}}{\big (}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{{\widehat{R}}_{j}}}){\big )}+hd_{i}\right) }{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) } \right) } \right] }. \end{aligned}$$

(10.312)

From Eq. (10.311), we can derive the following simultaneous self-consistent equations for the magnetizations ${\widehat{m}}_{i}^{z}\equiv \mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{{\widehat{R}}_{j}}})$ ($i{\in }V$) and ${\widehat{m}}_{i}^{x}\equiv \mathrm{{Tr}}({\boldsymbol{{\sigma }^{x}}}{\boldsymbol{{\widehat{R}}_{j}}})$ ($i{\in }V$):

$$\begin{aligned} {\widehat{m}}_{i}^{z}&= {\frac{ {\displaystyle { {\frac{J}{k_\mathrm{{B}}T}}{\sum _{j{\in }{\partial }i}}{\widehat{m}}_{i}^{z}+{\frac{h}{k_\mathrm{{B}}T}}d_{i} }} }{ {\displaystyle { {\sqrt{ {\left( {\frac{J}{k_\mathrm{{B}}T}}{\sum _{j{\in }{\partial }i}}{\widehat{m}}_{j}^{z}+{\frac{h}{k_\mathrm{{B}}T}}d_{i}\right) }^{2}+ {\left( {\frac{{\Gamma }}{k_\mathrm{{B}}T}}\right) }^{2} }} }} }} \nonumber \\&\times {\tanh }{\left( {\sqrt{ {\left( {\frac{J}{k_\mathrm{{B}}T}}{\sum _{j{\in }{\partial }i}}{\widehat{m}}_{j}^{z}+{\frac{h}{k_\mathrm{{B}}T}}d_{i}\right) }^{2} +{\left( {\frac{{\Gamma }}{k_\mathrm{{B}}T}}\right) }^{2} }} \right) }, \nonumber \\ \end{aligned}$$

(10.313)

$$\begin{aligned} {\widehat{m}}_{i}^{x}&= {\frac{ {\displaystyle { {\frac{{\Gamma }}{k_\mathrm{{B}}T}} }} }{ {\displaystyle { {\sqrt{ {\left( {\frac{J}{k_\mathrm{{B}}T}}{\sum _{j{\in }{\partial }i}}{\widehat{m}}_{j}^{z}+{\frac{h}{k_\mathrm{{B}}T}}d_{i}\right) }^{2}+ {\left( {\frac{{\Gamma }}{k_\mathrm{{B}}T}}\right) }^{2} }} }} }} \nonumber \\&\times {\tanh }{\left( {\sqrt{ {\left( {\frac{J}{k_\mathrm{{B}}T}}{\sum _{j{\in }{\partial }i}}{\widehat{m}}_{j}^{z}+{\frac{h}{k_\mathrm{{B}}T}}d_{i}\right) }^{2} +{\left( {\frac{{\Gamma }}{k_\mathrm{{B}}T}}\right) }^{2} }} \right) }. \nonumber \\ \end{aligned}$$

(10.314)

The mean-field free energy $\mathcal{{F}}_\mathrm{{MF}}{\left[ {\boldsymbol{{\widehat{R}}_{1}}},{\boldsymbol{{\widehat{R}}_{2}}},{\cdots },{\boldsymbol{{\widehat{R}}_{|V|}}}\right] }$ of the present system is expressed as

$$\begin{aligned} \mathcal{{F}}_\mathrm{{MF}}{\left[ {\boldsymbol{{\widehat{R}}_{1}}},{\boldsymbol{{\widehat{R}}_{2}}},{\cdots },{\boldsymbol{{\widehat{R}}_{|V|}}}\right] }= & {} {\sum _{i{\in }V}}{\big (}-k_\mathrm{{B}}T{\ln }{\left( Z_{i}\right) }{\big )} \nonumber \\= & {} -k_\mathrm{{B}}T{\sum _{i{\in }V}}{\ln }{\left( \mathrm{{Tr}}{\left[ {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( {\left( J{\left( {\sum _{j{\in }{\partial }i}}{\big (}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{{\widehat{R}}_{j}}}){\big )}\right) }{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{\sigma }^{x}} \right) } \right) } \right) } \right] } \right) } \nonumber \\= & {} -k_\mathrm{{B}}T{\sum _{i{\in }V}}{\ln }{\left( 2{\cosh }{\left( {\sqrt{ {\left( {\frac{J}{k_\mathrm{{B}}T}}{\sum _{j{\in }{\partial }i}}{\boldsymbol{{\widehat{m}}_{j}^{z}}} \right) }^{2} +{\Gamma }^{2}}} \right) } \right) }. \end{aligned}$$

(10.315)

Next, we extend the above framework for the mean-field method for the transverse Ising model to the quantum loopy belief propagation method based on the quantum cluster variation method in Ref. [90]. We introduce a $2^{N}{\times }2^{N}$ trial density matrix ${\boldsymbol{R}}$ and its $2{\times }2$ trial reduced density matrix ${\boldsymbol{R}}_{i}$ for each node $i({\in }V))$ defined by

$$\begin{aligned}&{\boldsymbol{R_{ij}}}={\boldsymbol{R_{ji}}}=\mathrm{{Tr}}_{{\setminus }\{i,j\}}{\boldsymbol{R}} \nonumber \\&{} = {\left( \begin{array}{ccccc} {\langle }+1,+1|{\boldsymbol{R_{ij}}}|+1,+1{\rangle } &{} {\langle }+1,+1|{\boldsymbol{R_{ij}}}|-1,+1{\rangle } &{} {\langle }+1,+1|{\boldsymbol{R_{ij}}}|+1,-1{\rangle } &{} {\langle }+1,+1|{\boldsymbol{R_{ij}}}|-1,-1{\rangle }\\ {\langle }+1,-1|{\boldsymbol{R_{ij}}}|+1,+1{\rangle } &{} {\langle }+1,-1|{\boldsymbol{R_{ij}}}|-1,+1{\rangle } &{} {\langle }+1,-1|{\boldsymbol{R_{ij}}}|+1,-1{\rangle } &{} {\langle }+1,-1|{\boldsymbol{R_{ij}}}|-1,-1{\rangle }\\ {\langle }-1,+1|{\boldsymbol{R_{ij}}}|+1,+1{\rangle } &{} {\langle }-1,+1|{\boldsymbol{R_{ij}}}|-1,+1{\rangle } &{} {\langle }-1,+1|{\boldsymbol{R_{ij}}}|+1,-1{\rangle } &{} {\langle }-1,+1|{\boldsymbol{R_{ij}}}|-1,-1{\rangle }\\ {\langle }-1,-1|{\boldsymbol{R_{ij}}}|+1,+1{\rangle } &{} {\langle }-1,-1|{\boldsymbol{R_{ij}}}|-1,+1{\rangle } &{} {\langle }-1,-1|{\boldsymbol{R_{ij}}}|+1,-1{\rangle } &{} {\langle }-1,-1|{\boldsymbol{R_{ij}}}|-1,-1{\rangle } \end{array} \right) } \nonumber \\&{}(i{\in }V,~j{\in }V,~i<j), \end{aligned}$$

(10.316)

where

$$\begin{aligned} {\langle }s_{i},s_{j}|{\boldsymbol{R_{ij}}}|s'_{i},s'_{j}{\rangle }= & {} {\langle }s_{i},s_{j}|{\boldsymbol{R_{ji}}}|s'_{i},s'_{j}{\rangle } \nonumber \\= & {} {\langle }s_{i},s_{j}|\mathrm{{Tr}}_{{\setminus }\{i,j\}}{\boldsymbol{R}}|s'_{i},s'_{j}{\rangle } \nonumber \\\equiv & {} {\sum _{{\tau }_{1}{\in }{\Omega }}}{\sum _{{\tau }_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }_{|V|}{\in }{\Omega }}} {\sum _{{\tau }'_{1}{\in }{\Omega }}}{\sum _{{\tau }'_{2}{\in }{\Omega }}}{\cdots }{\sum _{{\tau }'_{|V|}{\in }{\Omega }}} \nonumber \\&{\times } {\delta }_{s_{i},{\tau }_{i}}{\delta }_{s_{j},{\tau }_{j}}{\delta }_{s'_{i},{\tau }'_{i}}{\delta }_{s'_{j},{\tau }'_{j}} {\left( {\prod _{k{\in }V{\setminus }\{i,j\}}}{\delta }_{{\tau }_{k},{\tau }'_{k}}\right) } {\langle }{\tau }_{1},{\tau }_{2},{\cdots },{\tau }_{|V|}|{\boldsymbol{R}} |{\tau }'_{1},{\tau }'_{2},{\cdots },{\tau }'_{|V|}{\rangle } \nonumber \\&(s_{i}{\in }{\Omega }~s_{j}{\in }{\Omega },~s'_{i}{\in }{\Omega },~s'_{j}{\in }{\Omega },~i{\in }V,~j{\in }V,~i<j). \end{aligned}$$

(10.317)

By similar arguments to those for Eq. (10.304), we derive

$$\begin{aligned} \mathrm{{Tr}}{\left( {\boldsymbol{{\sigma }^{z}_{i}}}{\boldsymbol{{\sigma }^{z}_{j}}}{\boldsymbol{R}}\right) } = \mathrm{{Tr}}{\left( ({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}){\boldsymbol{R_{ij}}}\right) }. \end{aligned}$$

(10.318)

We now assume that the free energy functional can be expressed as

$$\begin{aligned} \mathcal{{F}}[{\boldsymbol{R}}]= & {} \mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\left\{ {\boldsymbol{R_{i}}}{\big |}i{\in }V\right\} },{\left\{ {\boldsymbol{R_{\{i,j\}}}}{\big |}\{i,j\}{\in }E\right\} }\right] } \nonumber \\{\equiv }&-J{\sum _{\{i,j\}{\in }E}}\mathrm{{Tr}}{\left( ({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}){\boldsymbol{R}}_{\{i,j\}}\right) } \nonumber \\&-h{\sum _{i{\in }V}}d_{i}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}) -{\Gamma }{\sum _{i{\in }V}}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{x}}}{\boldsymbol{R_{i}}}) \nonumber \\&+k_\mathrm{{B}}T{\sum _{i{\in }V}}\mathrm{{Tr}}{\left( {\boldsymbol{R_{i}}}{\ln }({\boldsymbol{R}}_{i})\right) } \nonumber \\&+k_\mathrm{{B}}T{\sum _{\{i,j\}{\in }E}} {\left( \mathrm{{Tr}}{\left( {\boldsymbol{R}}_{\{i,j\}}{\ln }({\boldsymbol{R}}_{\{i,j\}}\right) } -\mathrm{{Tr}}{\left( {\boldsymbol{R_{i}}}{\ln }({\boldsymbol{R}}_{i})\right) }-\mathrm{{Tr}}{\left( {\boldsymbol{R_{j}}}{\ln }({\boldsymbol{R}}_{j})\right) }\right) } \nonumber \\= & {} -J{\sum _{\{i,j\}{\in }E}}\mathrm{{Tr}}{\left( ({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}){\boldsymbol{R}}_{\{i,j\}}\right) } \nonumber \\&-h{\sum _{i{\in }V}}d_{i}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}) -{\Gamma }{\sum _{i{\in }V}}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{x}}}{\boldsymbol{R_{i}}}) \nonumber \\&+k_\mathrm{{B}}T{\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\left( {\boldsymbol{R}}_{\{i,j\}}{\ln }({\boldsymbol{R}}_{\{i,j\}})\right) } +k_\mathrm{{B}}T{\sum _{i{\in }V}}{\left( 1-|{\partial }i|\right) }\mathrm{{Tr}}{\left( {\boldsymbol{R_{i}}}{\ln }({\boldsymbol{R}}_{i})\right) }. \nonumber \\&\end{aligned}$$

(10.319)

We define the reduced density matrix ${\boldsymbol{{\widehat{R}}_{i}}}$ for each node $i({\in }V)$ by

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{k}}}}= & {} {\arg }~{\mathop {{\mathrm{extremum}}}\limits _{{\boldsymbol{R_{k}}}}}{\Big \{} \mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\boldsymbol{R}}_{k},{\left\{ {\boldsymbol{{\widehat{R}}_{i}}}{\big |}i{\in }V{\setminus }\{k\}\right\} }, {\left\{ {\boldsymbol{{\widehat{R}}_{\{i,j\}}}} {\big |}\{i,j\}{\in }E \right\} }\right] } {\Big |} \mathrm{{Tr}}{\boldsymbol{R_{k}}}=1,~ {\boldsymbol{R_{k}}}=\mathrm{{Tr}}_{{\setminus }k}{\boldsymbol{{\widehat{R}}_{\{k,j\}}}}~(j{\in }{\partial }k)) {\Big \}} \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad{}(k{\in }V). \end{aligned}$$

(10.320)

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{\{k,l\}}}}}= & {} {\arg }~{\mathop {{\mathrm{extremum}}}\limits _{{\boldsymbol{R_{\{k,l\}}}}}}{\Big \{} \mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\boldsymbol{R}}_{\{k,l\}},{\left\{ {\boldsymbol{{\widehat{R}}_{i}}}{\big |}i{\in }V\right\} }, {\left\{ {\boldsymbol{{\widehat{R}}_{\{i,j\}}}} {\big |}\{i,j\}{\in }E{\setminus }\{k,l\} \right\} }\right] } {\Big |} \nonumber \\&\qquad\qquad\qquad\qquad{} \mathrm{{Tr}}{\boldsymbol{R_{\{k,l\}}}}=1,~ {\boldsymbol{{\widehat{R}}_{k}}}=\mathrm{{Tr}}_{{\setminus }\{k,l\}}{\boldsymbol{R_{\{k,l\}}}},~ {\boldsymbol{{\widehat{R}}_{l}}}=\mathrm{{Tr}}_{{\setminus }\{k,l\}}{\boldsymbol{R_{\{k,l\}}}} {\Big \}} ~ (\{k,l\}{\in }E). \nonumber \\&\end{aligned}$$

(10.321)

To ensure the constraint conditions, we introduce the Lagrange multipliers as follows:

$$\begin{aligned}&\mathcal{{L}}{\left[ {\left\{ {\boldsymbol{R_{i}}}{\big |}i{\in }V\right\} },{\left\{ {\boldsymbol{R_{ij}}}{\big |}\{i,j\}{\in }E\right\} }\right] } = -J{\sum _{\{i,j\}{\in }E}}\mathrm{{Tr}}{\left( ({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}){\boldsymbol{R_{ij}}}\right) } \nonumber \\&-h{\sum _{i{\in }V}}d_{i}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}) -{\Gamma }{\sum _{i{\in }V}}\mathrm{{Tr}}({\boldsymbol{{\sigma }^{x}}}{\boldsymbol{R_{i}}}) \nonumber \\&+k_\mathrm{{B}}T{\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\left( {\boldsymbol{R_{ij}}}{\ln }({\boldsymbol{R_{ij}}})\right) } +k_\mathrm{{B}}T{\sum _{i{\in }V}}{\left( 1-|{\partial }i|\right) }\mathrm{{Tr}}{\left( {\boldsymbol{R_{i}}}{\ln }({\boldsymbol{R_{i}}})\right) }. \nonumber \\&-{\sum _{i{\in }V}}{\lambda }_{i}{\left( \mathrm{{Tr}}{\boldsymbol{R_{i}}}-1 \right) } -{\sum _{\{i,j\}{\in }E}}{\lambda }_{\{i,j\}}{\left( \mathrm{{Tr}}{\boldsymbol{R_{ij}}}-1 \right) } \nonumber \\&-{\sum _{\{i,j\}{\in }E}}\mathrm{{Tr}}{\boldsymbol{{\lambda }_{i,ij}}}{\left( {\boldsymbol{R}}_{i}-\mathrm{{Tr}}_{{\setminus }i}{\boldsymbol{R_{ij}}} \right) } -{\sum _{\{i,j\}{\in }E}}\mathrm{{Tr}}{\boldsymbol{{\lambda }_{j,ij}}}{\left( {\boldsymbol{R}}_{j}-\mathrm{{Tr}}_{{\setminus }j}{\boldsymbol{R_{ij}}} \right) } \nonumber \\= & {} {\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\Bigg (}{\boldsymbol{R_{ij}}} {\Big (}-J({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}) +k_\mathrm{{B}}{\ln }({\boldsymbol{R_{ij}}})+{\boldsymbol{{\lambda }_{i,ij}}}{\otimes }{\boldsymbol{I}}+{\boldsymbol{I}}{\otimes }{\boldsymbol{{\lambda }_{j,ij}}} -{\lambda }_{ij}({\boldsymbol{I}}{\otimes }{\boldsymbol{I}}){\Big )}{\Bigg )} \nonumber \\&+{\sum _{i{\in }V}}\mathrm{{Tr}}{\Bigg (}{\boldsymbol{R_{i}}}{\Big (} -hd_{i}{\boldsymbol{{\sigma }^{z}}} -{\Gamma }{\boldsymbol{{\sigma }^{x}}} +k_\mathrm{{B}}T{\big (}1-|{\partial }i|{\big )}{\ln }({\boldsymbol{R_{i}}}) -{\sum _{j{\in }{\partial }i}}{\boldsymbol{{\lambda }_{i,ij}}}-{\lambda }_{i}{\boldsymbol{I}}{\Big )} {\Bigg )} \nonumber \\&+{\sum _{i{\in }V}}{\lambda }_{i}+{\sum _{\{i,j\}{\in }E}}{\lambda }_{\{i,j\}}. \end{aligned}$$

(10.322)

Here we remark that ${\boldsymbol{{\lambda }_{i,ij}}}={\boldsymbol{{\lambda }_{i,ji}}}$ and ${\boldsymbol{{\lambda }_{j,ij}}}={\boldsymbol{{\lambda }_{j,ji}}}$ ($\{i,j\}{\in }E$,$i<j$).

We define the reduced density matrix ${\boldsymbol{{\widehat{R}}_{i}}}$ for each node $i({\in }V)$ and ${\boldsymbol{R_{ij}}}={\boldsymbol{R_{ji}}}$ for each edge $\{i,j\}{\in }E$ by

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{i}}}}= & {} {\arg }~{\mathop {{\mathrm{extremum}}}\limits _{{\boldsymbol{R_{i}}}}}{\Big \{}{\boldsymbol{R_{i}}} {\Big (} -hd_{i}{\boldsymbol{{\sigma }^{z}}} -{\Gamma }{\boldsymbol{{\sigma }^{x}}} -{\sum _{j{\in }{\partial }i}}{\boldsymbol{{\lambda }_{i,ij}}} +k_\mathrm{{B}}T{\left( 1-|{\partial }i|\right) }{\ln }({\boldsymbol{R_{i}}}) -{\lambda }_{i}{\boldsymbol{I}} {\Big )}{\Big \}} \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad{}(i{\in }V), \end{aligned}$$

(10.323)

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{ij}}}}= & {} {\arg }~{\mathop {{\mathrm{extremum}}}\limits _{{\boldsymbol{R_{ij}}}}}{\Big \{}{\boldsymbol{R_{ij}}} {\Big (} -J({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}) +{\boldsymbol{{\lambda }_{i,ij}}}{\otimes }{\boldsymbol{I}}+{\boldsymbol{I}}{\otimes }{\boldsymbol{{\lambda }_{j,ij}}} +k_\mathrm{{B}}T{\ln }({\boldsymbol{R_{ij}}})-{\lambda }_{\{i,j\}}({\boldsymbol{I}}{\otimes }{\boldsymbol{I}}) {\Big )}{\Big \}} \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad{}(\{i,j\}{\in }E). \end{aligned}$$

(10.324)

The simultaneous self-consistent equations for reduced density matrices are expressed as

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{i}}}}= & {} {\exp }{\left( -1+{\frac{{\lambda }_{i}}{k_\mathrm{{B}}T}} \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( {\frac{1}{|{\partial }i|-1}}\right) } {\left( -hd_{i}{\boldsymbol{{\sigma }^{z}}} -{\Gamma }{\boldsymbol{{\sigma }^{x}}} -{\sum _{j{\in }{\partial }i}}{\boldsymbol{{\lambda }_{i,ij}}} \right) } \right) }, \nonumber \\&\end{aligned}$$

(10.325)

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{ij}}}}= & {} {\exp }{\left( -1+{\frac{{\lambda }_{\{i,j\}}}{k_\mathrm{{B}}T}} \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( J({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}) -{\boldsymbol{{\lambda }_{i,ij}}}{\otimes }{\boldsymbol{I}}-{\boldsymbol{I}}{\otimes }{\boldsymbol{{\lambda }_{j,ij}}} \right) } \right) }, \nonumber \\&\end{aligned}$$

(10.326)

$$\begin{aligned} {\exp }{\left( 1-{\frac{{\lambda }_{i}}{k_\mathrm{{B}}T}} \right) } = \mathrm{{Tr}}{\left[ {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( {\frac{1}{|{\partial }i|-1}}\right) } {\left( -hd_{i}{\boldsymbol{{\sigma }^{z}}} -{\Gamma }{\boldsymbol{{\sigma }^{x}}} -{\sum _{j{\in }{\partial }i}}{\boldsymbol{{\lambda }_{k,kj}}} \right) } \right) } \right] }, \nonumber \\&\end{aligned}$$

(10.327)

$$\begin{aligned} {\exp }{\left( 1-{\frac{{\lambda }_{\{i,j\}}}{k_\mathrm{{B}}T}} \right) } = \mathrm{{Tr}}{\left[ {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( J({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}) -{\boldsymbol{{\lambda }_{i,ij}}}{\otimes }{\boldsymbol{I}}-{\boldsymbol{I}}{\otimes }{\boldsymbol{{\lambda }_{j,ij}}} \right) } \right) } \right] }. \nonumber \\&\end{aligned}$$

(10.328)

By introducing the linear transformations

$$\begin{aligned} {\boldsymbol{{\lambda }_{i,ij}}} = {\boldsymbol{{\lambda }_{i,ji}}} =-hd_{i}{\boldsymbol{{\sigma }^{z}}}-{\Gamma }{\boldsymbol{{\sigma }^{x}}} -{\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}}, \end{aligned}$$

(10.329)

Equations (10.325) and (10.326) can be rewritten as

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{i}}}} = {\frac{1}{Z_{i}}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( hd_{i}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{k{\in }{\partial }i}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} \right) } \right) }, \end{aligned}$$

(10.330)

$$\begin{aligned} {\boldsymbol{{\widehat{R}_{ij}}}}= & {} {\frac{1}{Z_{\{i,j\}}}} {\exp }{\Bigg (}{\frac{1}{k_\mathrm{{B}}T}} {\Big (} J({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}) +h{\big (}d_{i}({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}) +d_{j}({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}){\big )} +{\Gamma }{\big (}{\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}+{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}{\big )} \nonumber \\&\qquad\qquad\qquad\qquad\quad{} +{\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}}{\otimes }{\boldsymbol{I}} +{\sum _{l{\in }{\partial }j{\setminus }\{i\}}}{\boldsymbol{I}}{\otimes }{\boldsymbol{{\lambda }_{l{\rightarrow }j}}} {\Big )} {\Bigg )}, \end{aligned}$$

(10.331)

$$\begin{aligned} Z_{i} = \mathrm{{Tr}}{\left[ {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\left( hd_{i}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{k{\in }{\partial }i}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} \right) } \right) } \right] }, \end{aligned}$$

(10.332)

$$\begin{aligned} Z_{\{i,j\}}= & {} \mathrm{{Tr}}{\Bigg [} {\exp }{\Bigg (}{\frac{1}{k_\mathrm{{B}}T}} {\Big (} J({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}) +h{\big (}d_{i}({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}) +d_{j}({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}){\big )} +{\Gamma }{\big (}{\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}+{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}{\big )} \nonumber \\&\qquad\qquad\qquad\qquad\qquad{} +{\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}}{\otimes }{\boldsymbol{I}} +{\sum _{l{\in }{\partial }j{\setminus }\{i\}}}{\boldsymbol{I}}{\otimes }{\boldsymbol{{\lambda }_{l{\rightarrow }j}}} {\Big )} {\Bigg )} {\Bigg ]}. \end{aligned}$$

(10.333)

Then, by substituting Eq. (10.330) and Eq. (10.331) into

$$\begin{aligned} {\boldsymbol{R_{i}}}=\mathrm{{Tr}}_{{\setminus }i}{\boldsymbol{R}}_{ij}, \end{aligned}$$

(10.334)

we derive the following simultaneous self-consistent equations for the effective fields:

$$\begin{aligned}&{\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\boldsymbol{{\lambda }_{j{\rightarrow }i}}} + {\frac{1}{k_\mathrm{{B}}T}} {\left( hd_{i}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} \right) } \right) } \nonumber \\&\qquad\quad{} = {\frac{Z_{i}}{Z_{\{i,j\}}}} \mathrm{{Tr}}_{{\setminus }i}{\Bigg [} {\exp }{\Bigg (}{\frac{1}{k_\mathrm{{B}}T}} {\Big (} J({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}) +{\boldsymbol{I}}{\otimes }{\Big (}hd_{i}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{l{\in }{\partial }j{\setminus }\{i\}}}{\boldsymbol{{\lambda }_{l{\rightarrow }j}}} {\Big )} {\Big )} \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad{} + {\frac{1}{k_\mathrm{{B}}T}} {\Big (} hd_{i}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} {\Big )}{\otimes }{\boldsymbol{I}} {\Bigg )} {\Bigg ]}, \nonumber \\&\end{aligned}$$

(10.335)

such that

$$\begin{aligned}&{\frac{1}{k_\mathrm{{B}}T}} {\boldsymbol{{\lambda }_{j{\rightarrow }i}}} = - {\frac{1}{k_\mathrm{{B}}T}} {\left( hd_{i}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} \right) } \nonumber \\&\qquad\qquad{} + {\ln }{\Bigg (} {\frac{Z_{i}}{Z_{\{i,j\}}}} \mathrm{{Tr}}_{{\setminus }i}{\Bigg [} {\exp }{\Bigg (}{\frac{1}{k_\mathrm{{B}}T}} {\Big (} J({\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}) +{\boldsymbol{I}}{\otimes }{\Big (}hd_{i}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{l{\in }{\partial }j{\setminus }\{i\}}}{\boldsymbol{{\lambda }_{l{\rightarrow }j}}} {\Big )} {\Big )} \nonumber \\&\qquad\qquad\qquad\qquad\qquad\qquad{} + {\frac{1}{k_\mathrm{{B}}T}} {\Big (} hd_{i}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} {\Big )}{\otimes }{\boldsymbol{I}} {\Bigg )} {\Bigg ]} {\Bigg )}. \nonumber \\&\end{aligned}$$

(10.336)

Note that Eqs. (10.335) and (10.336) can be regarded as conventional message passing rule quantum loopy belief propagation. The Bethe free energy $\mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\left\{ {\boldsymbol{{\widehat{R}}_{i}}}{\big |}i{\in }V\right\} },{\left\{ {\boldsymbol{{\widehat{R}}_{ij}}}{\big |}\{i,j\}{\in }E\right\} }\right] }$ of the present system is given by

$$\begin{aligned} & {} \mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\left\{ {\boldsymbol{{\widehat{R}}_{i}}}{\big |}i{\in }V\right\} },{\left\{ {\boldsymbol{{\widehat{R}}_{ij}}}{\big |}\{i,j\}{\in }E\right\} }\right] }= {} {\sum _{i{\in }V}}{\big (}-k_\mathrm{{B}}T{\ln }{\left( Z_{i}\right) }{\big )} \nonumber \\&\qquad\qquad\quad{} +{\sum _{\{i,j\}{\in }E}}{\big (}-k_\mathrm{{B}}T{\ln }{\left( Z_{i,j}\right) }+k_\mathrm{{B}}T{\ln }(Z_{i})+k_\mathrm{{B}}T{\ln }(Z_{j}){\big )}. \end{aligned}$$

(10.337)

The conventional quantum message passing rules in Eqs. (10.335) and (10.336) reduce to Eqs. (10.95) and (10.96) for the case of ${\Gamma }=0$.

Because we have the orthonormal relationships

$$\begin{aligned} {\left\{ \begin{array}{lllll} \mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{z}}}{\boldsymbol{{\sigma }^{z}}}\right] }=\mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{x}}}{\boldsymbol{{\sigma }^{x}}}\right] }=2, \\ \mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{z}}}{\boldsymbol{I}}\right] }=\mathrm{{Tr}}{\left[ {\boldsymbol{I}}{\boldsymbol{{\sigma }^{z}}}\right] } =\mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{x}}}{\boldsymbol{I}}\right] }=\mathrm{{Tr}}{\left[ {\boldsymbol{I}}{\boldsymbol{{\sigma }^{x}}}\right] }=0, \\ \mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{z}}}{\boldsymbol{{\sigma }^{x}}}\right] }=\mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{x}}}{\boldsymbol{{\sigma }^{z}}}\right] }=0, \\ \end{array} \right. } \end{aligned}$$

(10.338)

$$\begin{aligned} {\left\{ \begin{array}{lll} \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }\right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }\right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }\right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) }\right] } =4, \\ \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) } \right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) } \right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } \right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } \right] } =0, \\ \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } \right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } \right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } \right] } = \mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } \right] } =0, \\ \end{array} \right. } \nonumber \\ \end{aligned}$$

(10.339)

the reduced density matrices ${\boldsymbol{R_{i}}}$ and ${\boldsymbol{R_{ij}}}={\boldsymbol{R_{ji}}}$ expand to the following orthonormal expansions:

$$\begin{aligned} {\boldsymbol{R_{i}}}= & {} {\frac{1}{2}}{\left( {\boldsymbol{I}}+m_{i}^{x}{\boldsymbol{{\sigma }^{x}}}+m_{i}^{z}{\boldsymbol{{\sigma }^{z}}} \right) },\end{aligned}$$

(10.340)

$$\begin{aligned} {\boldsymbol{R_{ij}}}= & {} {\boldsymbol{R_{ji}}} \nonumber \\= & {} {\frac{1}{4}}{\Bigg (} {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) } +m_{i}^{x}{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) } +m_{i}^{z}{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) } +m_{j}^{x}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } +m_{j}^{z}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } \nonumber \\&+c_{\{i,j\}}^{zz}{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } +c_{\{i,j\}}^{xz}{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } \nonumber \\&+c_{\{i,j\}}^{zx}{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } +c_{\{i,j\}}^{zx}{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } {\Bigg )}, \end{aligned}$$

(10.341)

where

$$\begin{aligned} {\left\{ \begin{array}{lll} m_{i}^{{\nu }} = \mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{{\nu }}}}{\boldsymbol{R_{i}}}\right] } = \mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }^{{\nu }}}}{\otimes }{\boldsymbol{I}}{\big )}{\boldsymbol{R_{ij}}}\right] }, \\ m_{j}^{{\nu }'} = \mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{{\nu }'}}}{\boldsymbol{R_{j}}}\right] } = \mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{{\nu }'}}}{\big )}{\boldsymbol{R_{ij}}}\right] }, \\ c_{\{i,j\}}^{{\nu },{\nu }'} = \mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }^{{\nu }}}}{\otimes }{\boldsymbol{I}}{\big )}{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{{\nu }'}}}{\big )}{\boldsymbol{R_{ij}}}\right] }, \\ \end{array} \right. } ~{\left( \{i,j\}{\in }E,~i<j,~{\nu }{\in }\{x,z\},{\nu }'{\in }\{x,z\}\right) }. \nonumber \\ \end{aligned}$$

(10.342)

By using these orthonormal expansions of the reduced density matrices, the Bethe free energy functional in Eq. (10.319) can be rewritten as

$$\begin{aligned} \mathcal{{F}}[{\boldsymbol{R}}]= & {} \mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\Big \{}m_{i}^{{\nu }}{\Big |}i{\in }V,{\nu }{\in }\{x,z\}{\Big \}}, {\Big \{}c_{\{i,j\}}^{{\nu },{\nu }'}{\Big |}\{i,j\}{\in }E,{\nu }\{x,z\},{\nu }'{\in }\{x,z\}{\Big \}}\right] } \nonumber \\&{\equiv } -J{\sum _{\{i,j\}{\in }E}}c_{\{i,j\}}^{z,z} -h{\sum _{i{\in }V}}d_{i}m_{i}^{z} -{\Gamma }{\sum _{i{\in }V}}m_{i}^{x} \nonumber \\&\qquad +k_\mathrm{{B}}T{\sum _{i{\in }V}}{\left( 1-|{\partial }i| \right) }\mathrm{{Tr}}{\left( {\boldsymbol{R_{i}}}{\ln }({\boldsymbol{R}}_{i})\right) } \nonumber \\&\qquad +k_\mathrm{{B}}T{\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\left( {\boldsymbol{R}}_{\{i,j\}}{\ln }({\boldsymbol{R}}_{\{i,j\}})\right) }. \end{aligned}$$

(10.343)

The extremum conditions

$$\begin{aligned}&{\frac{{\partial }}{{\partial }m_{k}^{{\nu }}}} \mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\Big \{}m_{i}^{{\nu }}{\Big |}i{\in }V,{\nu }{\in }\{x,z\}{\Big \}}, {\Big \{}c_{\{i,j\}}^{{\nu }{\nu }'}{\Big |}\{i,j\}{\in }E,{\nu }\{x,z\},{\nu }'{\in }\{x,z\}{\Big \}}\right] } \nonumber \\&=0~(k{\in }V,~{\nu }{\in }\{x,z\}), \nonumber \\ \end{aligned}$$

(10.344)

$$\begin{aligned}&{\frac{{\partial }}{{\partial }c_{\{k,l\}}^{{\nu }{\nu }'}}} \mathcal{{F}}_\mathrm{{Bethe}}{\left[ {\Big \{}m_{i}^{{\nu }}{\Big |}i{\in }V,{\nu }{\in }\{x,z\}{\Big \}}, {\Big \{}c_{\{i,j\}}^{{\nu }{\nu }'}{\Big |}\{i,j\}{\in }E,{\nu }\{x,z\},{\nu }'{\in }\{x,z\}{\Big \}}\right] } \nonumber \\&=0~(\{k,l\}{\in }E,~{\nu }{\in }\{x,z\},,~{\nu }'{\in }\{x,z\}), \nonumber \\ \end{aligned}$$

(10.345)

can be reduced to the following simultaneous equations:

$$\begin{aligned}&{\left\{ \begin{array}{lll} {\displaystyle { {\frac{h}{k_\mathrm{{B}}T}}d_{i} ={\frac{1}{2}}{\left( 1-|{\partial }i|\right) }\mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{z}}}{\ln }{\left( {\boldsymbol{{\widehat{R}}_{i}}} \right) }\right] } +{\frac{1}{4}}{\sum _{\{j{\in }{\partial }i,j>i\}}}\mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\ln }{\big (}{\boldsymbol{{\widehat{R}}_{ij}}}{\big )}\right] } +{\frac{1}{4}}{\sum _{\{k{\in }{\partial }i,k<i\}}}\mathrm{{Tr}}{\left[ {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }{\ln }{\big (}{\boldsymbol{{\widehat{R}}_{ki}}}{\big )}\right] } }}, \\ {\displaystyle { {\frac{{\Gamma }}{k_\mathrm{{B}}T}} ={\frac{1}{2}}{\left( 1-|{\partial }i|\right) }\mathrm{{Tr}}{\left[ {\boldsymbol{{\sigma }^{x}}}{\ln }{\left( {\boldsymbol{{\widehat{R}}_{i}}} \right) }\right] } +{\frac{1}{4}}{\sum _{\{j{\in }{\partial }i,j>i\}}}\mathrm{{Tr}}{\left[ {\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\ln }{\big (}{\boldsymbol{{\widehat{R}}_{ij}}}{\big )}\right] } +{\frac{1}{4}}{\sum _{\{k{\in }{\partial }i,k<i\}}}\mathrm{{Tr}}{\left[ {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) }{\ln }{\big (}{\boldsymbol{{\widehat{R}}_{ki}}}{\big )}\right] } }}, \\ \end{array} \right. } ~(i{\in }V), \nonumber \\&\end{aligned}$$

(10.346)

$$\begin{aligned}&{\left\{ \begin{array}{lll} {\displaystyle { {\frac{J}{k_\mathrm{{B}}T}} ={\frac{1}{4}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}{\big )}{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}{\big )}{\ln }{\big (}{\boldsymbol{{\widehat{R}}_{ij}}}{\big )}\right] } }}, \\ {\displaystyle { 0={\frac{1}{4}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}{\big )}{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}{\big )}{\ln }{\big (}{\boldsymbol{{\widehat{R}}_{ij}}}{\big )}\right] } ={\frac{1}{4}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}{\big )}{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}{\big )}{\ln }{\big (}{\boldsymbol{{\widehat{R}}_{ij}}}{\big )}\right] } }}, \\ {\displaystyle { 0={\frac{1}{4}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}{\big )}{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}{\big )}{\ln }{\big (}{\boldsymbol{{\widehat{R}}_{ij}}}{\big )}\right] } }}, \\ \end{array} \right. } ~(\{i,j\}{\in }E), \nonumber \\&\end{aligned}$$

(10.347)

where

$$\begin{aligned} {\boldsymbol{{\widehat{R}}_{i}}}= & {} {\frac{1}{2}}{\left( {\boldsymbol{I}}+{\widehat{m}}_{i}^{x}{\boldsymbol{{\sigma }^{x}}}+{\widehat{m}}_{i}^{z}{\boldsymbol{{\sigma }^{z}}} \right) }~(i{\in }V), \end{aligned}$$

(10.348)

$$\begin{aligned} {\boldsymbol{{\widehat{R}}_{ij}}}= & {} {\boldsymbol{{\widehat{R}}_{ji}}} \nonumber \\= & {} {\frac{1}{4}}{\Bigg (} {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) } +{\widehat{m}}_{i}^{x}{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) } +{\widehat{m}}_{i}^{z}{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) } +{\widehat{m}}_{j}^{x}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } +{\widehat{m}}_{j}^{z}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } \nonumber \\&+{\widehat{c}}_{\{i,j\}}^{zz}{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } +{\widehat{c}}_{\{i,j\}}^{xz}{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } \nonumber \\&+{\widehat{c}}_{\{i,j\}}^{zx}{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } +{\widehat{c}}_{\{i,j\}}^{xx}{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}\right) } {\Bigg )}~(\{i,j\}{\in }E,~i<j). \nonumber \\&\end{aligned}$$

(10.349)

For ${\Gamma }=0$, Eq. (10.347) with Eqs. (10.348) and (10.349) reduces to Eqs. (10.107) and (10.108) with Eqs. (10.72) and (10.102).

Before finishing the present subsection, we briefly review another framework of the quantum advanced mean-field method. As we mentioned above, advanced quantum mean-field methods have also been formulated in the momentum space. One familiar formulation is spin wave theory [91]. A general formulation of the quantum cluster variation method from the viewpoint of spin wave theory was proposed in Refs. [92, 93].

5.2 Real-Space Renormalization Group Method for the Transverse Ising Model

We now present sublinear modeling in statistical machine learning procedures by using the real-space renormalization group method for the transverse Ising model in Eq. (10.301) on the ring graph (V, E) of Eq. (10.176) for the case of $|V|=2^{L}$ and $h=0$. The present scheme follows the one in Refs. [37, 94]. Some extensions of the present frameworks for the ring graph in Eq. (10.176) to higher-dimensional graphs such as the torus graph may be available according to the frameworks of Ref. [94].

The important part of the transverse Ising model in Eq. (10.301), $-J{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}} \right) } -{\Gamma }{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}} \right) }$ can be diagonalized as

$$\begin{aligned}&-J{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}} \right) } -{\Gamma }{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}} \right) } = {\left( \begin{array}{cccc} -J &{} 0 &{} -{\Gamma } &{} 0 \\ 0 &{} J &{} 0 &{} -{\Gamma } \\ -{\Gamma } &{} 0 &{} J &{} 0 \\ 0 &{} -{\Gamma } &{} 0 &{} -J \end{array} \right) } \nonumber \\&= {\left( \begin{array}{cccc} {\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 &{} 0 &{} {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 &{} {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} -{\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 \\ {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 &{} 0 &{} -{\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 &{} {\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 \end{array} \right) } \nonumber \\&{\times } {\left( \begin{array}{cccc} -{\sqrt{J^{2}+{\Gamma }^{2}}} &{} 0 &{} 0 &{} 0 \\ 0 &{} -{\sqrt{J^{2}+{\Gamma }^{2}}} &{} 0 &{} 0 \\ 0 &{} 0 &{} {\sqrt{J^{2}+{\Gamma }^{2}}} &{} 0 \\ 0 &{} 0 &{} 0 &{} {\sqrt{J^{2}+{\Gamma }^{2}}} \end{array} \right) } \nonumber \\&{\times } {\left( \begin{array}{cccc} {\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 &{} 0 &{} {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 &{} {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} -{\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 \\ {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 &{} 0 &{} -{\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 &{} {\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 \end{array} \right) }^\mathrm{{T}}. \nonumber \\&\end{aligned}$$

(10.350)

The eigenvalues ${\varepsilon }_{1}={\varepsilon }_{2}=-{\sqrt{J^{2}+{\Gamma }^{2}}}$, ${\varepsilon }_{3}={\varepsilon }_{4}=+{\sqrt{J^{2}+{\Gamma }^{2}}}$ have the relationship ${\varepsilon }_{1}={\varepsilon }_{2}<{\varepsilon }_{3}={\varepsilon }_{4}$ and their corresponding eigenvectors are given by

$$\begin{aligned} {\left\{ \begin{array}{ccc} |1{\rangle }= {\left( \begin{array}{cccc} {\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 \\ {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 \end{array} \right) }, &{} |2{\rangle }= {\left( \begin{array}{cccc} 0 \\ +{\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 \\ +{\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \end{array} \right) }, \\ |3{\rangle }= {\left( \begin{array}{cccc} 0 \\ -{\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 \\ +{\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \end{array} \right) }, &{} |4{\rangle }= {\left( \begin{array}{cccc} {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 \\ -{\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ 0 \end{array} \right) }. \end{array} \right. } \end{aligned}$$

(10.351)

To realize the coarse graining of the present transverse Ising model for the case of zero temperature $T=0$ for the density matrix ${\boldsymbol{P}}$ in Eq. (10.300), we introduce the following projection operator:

$$\begin{aligned} {\mathbb {P}}_{i}^{(2^{L})} ={\underbrace{{\mathbb {P}}{\otimes }{\mathbb {P}}{\otimes }{\cdots }{\otimes }{\mathbb {P}}}_{2^{L}~{\mathbb {P}}'s}}, \end{aligned}$$

(10.352)

where

$$\begin{aligned} {\mathbb {P}}\equiv & {} {\left( \begin{array}{cc} {\langle }1| \\ {\langle }2| \end{array} \right) } {\left( |1{\rangle }{\langle }1|+|2{\rangle }{\langle }2| \right) } = {\left( \begin{array}{cc} {\langle }1| \\ {\langle }2| \end{array} \right) } \nonumber \\= & {} {\left( \begin{array}{cccc} {\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 &{} {\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 \\ 0 &{} +{\sqrt{{\frac{1}{2}}{\left( 1-{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} &{} 0 &{} {\sqrt{{\frac{1}{2}}{\left( 1+{\frac{J}{{\sqrt{J^{2}+{\Gamma }^{2}}}}}\right) }}} \\ \end{array} \right) }. \nonumber \\&\end{aligned}$$

(10.353)

Because it is valid that

$$\begin{aligned} {\mathbb {P}} {\left( \begin{array}{cccc} -J &{} 0 &{} -{\Gamma } &{} 0 \\ 0 &{} J &{} 0 &{} -{\Gamma } \\ -{\Gamma } &{} 0 &{} J &{} 0 \\ 0 &{} -{\Gamma } &{} 0 &{} -J \end{array} \right) } {\mathbb {P}}^\mathrm{{T}} = -{\sqrt{J^{2}+{\Gamma }^{2}}} {\left( \begin{array}{cc} {\langle }1|1{\rangle } &{} {\langle }1|2{\rangle } \\ {\langle }2|1{\rangle } &{} {\langle }2|2{\rangle } \end{array} \right) } = -{\sqrt{J^{2}+{\Gamma }^{2}}} {\boldsymbol{I}}, \end{aligned}$$

(10.354)

we can derive the following equalities

$$\begin{aligned}&{\mathbb {P}}_{i}^{(2^{L})} {\Big (} -J {\boldsymbol{{\sigma }^{z}_{2i-1}}}{\boldsymbol{{\sigma }^{z}_{2i}}} -{\Gamma } {\boldsymbol{{\sigma }^{x}_{2i-1}}} {\Big )} {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}} \nonumber \\&={\mathbb {P}}_{i}^{(2^{L})} {\Big (} {\overbrace{{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\cdots }{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}} ^{\mathrm{{Tensor~Products~of}}~(i-1)~\mathrm{{Matrices}}~{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }}} {\otimes } {\big (} -J{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}} \right) } -{\Gamma }{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}} \right) } {\big )} \nonumber \\&\qquad{} {\otimes } {\underbrace{{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\cdots }{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}} _{\mathrm{{Tensor~Products~of}}~(2^{L-1}-i)~\mathrm{{Matrices}}~{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }}} {\Big )} {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}} \nonumber \\&= {\Big (} {\overbrace{ {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes }{\cdots }{\otimes }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } }^{\mathrm{{Tensor~Products~of}}~(i-1)~\mathrm{{Matrices}}~{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) }}} {\Big )} {\otimes } {\Big (} {\mathbb {P}} {\big (} -J{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}} \right) } -{\Gamma }{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}} \right) } {\big )} {\mathbb {P}}^\mathrm{{T}} {\Big )} \nonumber \\&\qquad{} {\otimes } {\Big (} {\underbrace{ {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes }{\cdots }{\otimes }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } }_{\mathrm{{Tensor~Products~of}}~(2^{L-1}-i)~\mathrm{{Matrices}}~{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) }}} {\Big )} \nonumber \\&= {\Big (} {\overbrace{ {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes }{\cdots }{\otimes }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } }^{\mathrm{{Tensor~Products~of}}~(i-1)~\mathrm{{Matrices}}~{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) }}} {\Big )} {\otimes } {\Big (} {\mathbb {P}} {\big (} -J{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}} \right) } -{\Gamma }{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}} \right) } {\big )} {\mathbb {P}}^\mathrm{{T}} {\Big )} \nonumber \\&\qquad{} {\otimes } {\Big (} {\underbrace{ {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes }{\cdots }{\otimes }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } }_{\mathrm{{Tensor~Products~of}}~(2^{L-1}-i)~\mathrm{{Matrices}}~{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) }}} {\Big )} \nonumber \\&= {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(i-1)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes } {\Big (} -{\sqrt{J^{2}+{\Gamma }^{2}}} {\boldsymbol{I}} {\Big )} {\otimes } {\Big (} {\underbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }_{(2^{L-1}-i)~{\boldsymbol{I}}\mathrm{{'s}}}} {\Big )}, \end{aligned}$$

(10.355)

$$\begin{aligned}&{\mathbb {P}}_{i}^{(2^{L})} {\Big (} -J {\boldsymbol{{\sigma }^{z}_{2i}}}{\boldsymbol{{\sigma }^{z}_{2i+1}}} -{\Gamma } {\boldsymbol{{\sigma }^{x}_{2i}}} {\Big )} {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}} \nonumber \\&={\mathbb {P}}_{i}^{(2^{L})} {\Big (} {\overbrace{{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\cdots }{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}} ^{\mathrm{{Tensor~Products~of}}~(i-1)~\mathrm{{Matrices}}~{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }}} \nonumber \\&\qquad{} {\otimes } {\big (} -J{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) } -{\Gamma }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) } {\big )} \nonumber \\&\qquad{} {\otimes } {\underbrace{{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\cdots }{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}} _{\mathrm{{Tensor~Products~of}}~(2^{L-1}-i-1)~\mathrm{{Matrices}}~{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }}} {\Big )} {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}} \nonumber \\&= {\Big (} {\overbrace{ {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes }{\cdots }{\otimes }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } }^{\mathrm{{Tensor~Products~of}}~(i-1)~\mathrm{{Matrices}}~{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) }}} {\Big )} \nonumber \\&\qquad{} {\otimes } {\Bigg (} {\left( {\mathbb {P}}{\otimes }{\mathbb {P}}^\mathrm{{T}}\right) } {\Big (} -J{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) } -{\Gamma }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) } {\Big )} {\left( {\mathbb {P}}{\otimes }{\mathbb {P}}^\mathrm{{T}}\right) } {\Bigg )} \nonumber \end{aligned}$$

$$\begin{aligned}&\qquad{} {\otimes } {\Big (} {\underbrace{ {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes }{\cdots }{\otimes }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } }_{\mathrm{{Tensor~Products~of}}~(2^{L-1}-i-1)~\mathrm{{Matrices}}~{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) }}} {\Big )} \nonumber \\&\nonumber \\&= {\Big (} {\overbrace{ {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes }{\cdots }{\otimes }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } }^{\mathrm{{Tensor~Products~of}}~(i-1)~\mathrm{{Matrices}}~{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) }}} {\Big )} \nonumber \\&\qquad{} {\otimes } {\Bigg (} -J {\left( {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) }{\mathbb {P}}^\mathrm{{T}} \right) } \right) } {\left( {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) }{\mathbb {P}}^\mathrm{{T}} \right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) }{\mathbb {P}}^\mathrm{{T}}\right) } \right) } -{\Gamma }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}} \right) }{\mathbb {P}}^\mathrm{{T}} {\otimes }{\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) }{\mathbb {P}}^\mathrm{{T}} \right) } {\Bigg )} \nonumber \\&\qquad{} {\otimes } {\Big (} {\underbrace{ {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes } {\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } {\otimes }{\cdots }{\otimes }{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) } }_{\mathrm{{Tensor~Products~of}}~(2^{L-1}-i-1)~\mathrm{{Matrices}}~{\left( {\mathbb {P}}{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }{\mathbb {P}}^\mathrm{{T}}\right) }}} {\Big )} \nonumber \\&= {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(i-1)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes } {\Bigg (} -{\frac{J^{2}}{{\sqrt{J^{2}+{\Gamma }^{2}}}}} {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } -{\frac{{\Gamma }^{2}}{{\sqrt{J^{2}+{\Gamma }^{2}}}}} {\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) } {\Bigg )} {\otimes } {\Big (} {\underbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }_{(2^{L-1}-i-1)~{\boldsymbol{I}}\mathrm{{'s}}}} {\Big )}, \end{aligned}$$

(10.356)

$$\begin{aligned}&{\mathbb {P}}_{i}^{(2^{L})} {\Big (} -J {\boldsymbol{{\sigma }^{z}_{1}}}{\boldsymbol{{\sigma }^{z}_{2^{L}}}} -{\Gamma } {\boldsymbol{{\sigma }^{x}_{2^{L}}}} {\Big )} {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}} \nonumber \\&={\mathbb {P}}_{i}^{(2^{L})} {\Bigg (} -J{\Big (} {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) } {\otimes } {\overbrace{{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\cdots }{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}} ^{\mathrm{{Tensor~Products~of}}~(2^{L-1}-2)~\mathrm{{Matrices}}~{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }}} {\otimes }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) } {\Big )} \nonumber \\&\qquad\qquad\qquad{}{\times } {\Big (} {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) } {\otimes } {\overbrace{{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\cdots }{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}} ^{\mathrm{{Tensor~Products~of}}~(2^{L-1}-2)~\mathrm{{Matrices}}~{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }}} {\otimes }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}} \right) } {\Big )} \nonumber \\&\qquad\qquad{} -{\Gamma } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}} \right) } {\otimes } {\overbrace{{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}{\otimes }{\cdots }{\otimes }{\big (}{\boldsymbol{I}}{\otimes }{\boldsymbol{I}}{\big )}} ^{\mathrm{{Tensor~Products~of}}~(2^{L-1}-2)~\mathrm{{Matrices}}~{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{I}}\right) }}} {\otimes }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}} \right) } {\Bigg )} {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}} \nonumber \\&= -{\frac{J^{2}}{{\sqrt{J^{2}+{\Gamma }^{2}}}}} {\Big (}{\boldsymbol{{\sigma }^{z}}}{\otimes }{\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-1}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{I}}{\Big )} {\Big (}{\boldsymbol{I}}{\otimes }{\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-1}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{{\sigma }^{z}}}{\Big )} \nonumber \\&\qquad\qquad{} -{\frac{{\Gamma }^{2}}{{\sqrt{J^{2}+{\Gamma }^{2}}}}} {\Big (}{\boldsymbol{I}}{\otimes } {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-1}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{{\sigma }^{x}}}{\Big )}. \end{aligned}$$

(10.357)

By using these equalities, the first step of the renormalized energy matrix ${\boldsymbol{H^{(2^{L-1})}}} \equiv {{\mathbb {P}}_{i}^{(2^{L})}} {\boldsymbol{H}} {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}}$ can be reduced as follows:

$$\begin{aligned} {\boldsymbol{H^{(2^{L-1})}}}\equiv & {} {{\mathbb {P}}_{i}^{(2^{L})}} {\boldsymbol{H}} {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}} \nonumber \\= & {} -2^{L-1}{\sqrt{J^{2}+{\Gamma }^{2}}} {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-1})~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} \nonumber \\&- {\sum _{i=1}^{2^{L-1}}} {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(i-1)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes } {\Bigg (} {\frac{J^{2}}{{\sqrt{J^{2}+{\Gamma }^{2}}}}} {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } +{\frac{{\Gamma }^{2}}{{\sqrt{J^{2}+{\Gamma }^{2}}}}} {\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) } {\Bigg )} {\otimes } {\Big (} {\underbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }_{(2^{L-1}-i-1)~{\boldsymbol{I}}\mathrm{{'s}}}} {\Big )} \nonumber \\&\qquad\qquad{} -{\frac{J^{2}}{{\sqrt{J^{2}+{\Gamma }^{2}}}}} {\Big (}{\boldsymbol{{\sigma }^{z}}}{\otimes }{\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-1}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{I}}{\Big )} {\Big (}{\boldsymbol{I}}{\otimes }{\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-1}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{{\sigma }^{z}}}{\Big )} \nonumber \\&\qquad\qquad{} -{\frac{{\Gamma }^{2}}{{\sqrt{J^{2}+{\Gamma }^{2}}}}} {\Big (}{\boldsymbol{I}}{\otimes } {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-1}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{{\sigma }^{x}}}{\Big )}. \end{aligned}$$

(10.358)

By similar arguments to those for the above procedure, the r-th step of the renormalized energy matrix ${\boldsymbol{H^{(2^{L-r})}}} \equiv {\left( {{\mathbb {P}}_{i}^{(2^{L-r})}} {{\mathbb {P}}_{i}^{(2^{L-r+1})}}{\cdots } {{\mathbb {P}}_{i}^{(2^{L})}}\right) } {\boldsymbol{H}} {\left( {{\mathbb {P}}_{i}^{(2^{L})}}^\mathrm{{T}} {\cdots } {{\mathbb {P}}_{i}^{(2^{L-r+1})}}^\mathrm{{T}} {{\mathbb {P}}_{i}^{(2^{L-r})}}^\mathrm{{T}} \right) }$ can be reduced to the following recursion formulas:

$$\begin{aligned} {\boldsymbol{H^{(2^{L-r})}}}\equiv & {} {{\mathbb {P}}_{i}^{(2^{L-r+1})}} {\boldsymbol{H^{(2^{L-r+1})}}} {{\mathbb {P}}_{i}^{(2^{L-r+1})}}^\mathrm{{T}} \nonumber \\= & {} 2^{L-r}{\varepsilon }_{1}^{(r)} {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-r})~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} \nonumber \\&- {\sum _{i=1}^{2^{L-1}}} {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(i-1)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes } {\Bigg (} J^{(r)} {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) } {\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } +{\Gamma }^{(r)} {\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}\right) } {\Bigg )} {\otimes } {\Big (} {\underbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }_{(2^{L-r}-i-1)~{\boldsymbol{I}}\mathrm{{'s}}}} {\Big )} \nonumber \\&\qquad\qquad{} -J^{(r)} {\Big (}{\boldsymbol{{\sigma }^{z}}}{\otimes }{\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-r}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{I}}{\Big )} {\Big (}{\boldsymbol{I}}{\otimes }{\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-r}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{{\sigma }^{z}}}{\Big )} \nonumber \\&\qquad\qquad{} -{\Gamma }^{(r)} {\Big (}{\boldsymbol{I}}{\otimes } {\Big (} {\overbrace{ {\boldsymbol{I}} {\otimes } {\boldsymbol{I}} {\otimes }{\cdots }{\otimes }{\boldsymbol{I}} }^{(2^{L-r}-2)~{\boldsymbol{I}}\mathrm{{'s}}}}{\Big )} {\otimes }{\boldsymbol{{\sigma }^{x}}}{\Big )}, \end{aligned}$$

(10.359)

where

$$\begin{aligned} {\left\{ \begin{array}{ccccc} J^{(r)} &{} = &{} {\displaystyle {{\frac{{(J^{(r-1)}})^{2}}{{\sqrt{{(J^{(r-1)}})^{2}+({{\Gamma }^{(r-1)}})^{2}}}}}}} , \\ {\Gamma }^{(r)} &{} = &{} {\displaystyle {{\frac{({{\Gamma }^{(r-1)}})^{2}}{{\sqrt{({J^{(r-1)}})^{2}+({{\Gamma }^{(r-1)}})^{2}}}}}}}, \\ \end{array} \right. } \end{aligned}$$

(10.360)

$$\begin{aligned} {\varepsilon }_{1}^{(r)} = -{\sqrt{({J^{(r-1)}})^{2}+({{\Gamma }^{(r-1)}})^{2}}}, \end{aligned}$$

(10.361)

$$\begin{aligned} {\left\{ \begin{array}{ccccc} {\boldsymbol{H^{(2^{L})}}} &{} \equiv &{} {\boldsymbol{H}}, \\ J^{(0)} &{} \equiv &{} J, \\ {\Gamma }^{(0)} &{} \equiv &{} {\Gamma }. \end{array} \right. } \end{aligned}$$

(10.362)

The inverse of the real-space renormalization group is given by

$$\begin{aligned} {\left\{ \begin{array}{ccccc} J^{(r-1)} &{} = &{} {\displaystyle {{\sqrt{J^{(r)}{\left( J^{(r)}+{\Gamma }^{(r)} \right) }}}}}, \\ {\Gamma }^{(r-1)} &{} = &{} {\displaystyle {{\sqrt{{\Gamma }^{(r)}{\left( J^{(r)}+{\Gamma }^{(r)} \right) }}}}}, \\ \end{array} \right. } \end{aligned}$$

(10.363)

If the hyperparameters $J^{(r)}$ and ${\Gamma }^{(r)}$ in the r-th renormalized density matrix ${\boldsymbol{H^{(2^{L-r})}}}$ have been estimated from given data vectors by using the QEM algorithm for a renormalized density matrix on ring graphs ${\left( V^{(r)},E^{(r)}\right) }$, we can estimate the hyperparameters $J^{(0)}=J$ and ${\Gamma }^{(0)}={\Gamma }$ of the transverse Ising model (10.301) on the ring graph E of Eq. (10.176) for the case of $|V|=2^{L}$ and $h=0$ by using the inverse transformation rule of the real-space renormalization group procedure (10.363).

5.3 Sublinear Modeling Using a Quantum Adaptive TAP Approach and Momentum Space Renormalization Group in the Transverse Ising Model

This section proposes a novel scheme for the momentum space renormalization group approaches in Adaptive Thouless-Anderson-Palmar(TAP) Approaches for the transverse Ising model on random graphs. The adaptive TAP approach is a familiar advanced mean-field method for the probabilistic graphical model and many extensions have been proposed [95,96,97,98]. Furthermore, sublinear modeling for the EM procedure in probabilistic graphical models has been realized by introducing Momentum Space Renormalization Group Approaches [99, 100]. The method proposed in this section is formulated by combining the adaptive TAP approaches with the momentum space renormalization group approaches. Moreover, our method is applicable not only to regular graphs but also to random graphs.

The density matrix ${\boldsymbol{P}}$ in Eq. (10.300) can be rewritten as

$$\begin{aligned} {\boldsymbol{P}}= {\frac{ {\displaystyle { {\exp }{\left( -{\frac{J}{2k_\mathrm{{B}}T}}{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} -{\frac{h}{2k_\mathrm{{B}}T}}{\sum _{i{\in }V}}{\left( {\boldsymbol{{\sigma }_{i}^{z}}} - d_{i}{\boldsymbol{I^{(2^{|V|})}}} \right) }^{2} -{\frac{1}{2k_\mathrm{{B}}T}}{\sum _{i{\in }V}}{\left( {\boldsymbol{{\sigma }_{i}^{x}}} - {\gamma }{\boldsymbol{I^{2^{|V|}}}} \right) }^{2} \right) } }} }{ {\displaystyle { \mathrm{{Tr}}{\left[ {\exp }{\left( -{\frac{J}{2k_\mathrm{{B}}T}}{\sum _{\{i,j\}{\in }E}} {\left( {\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}\right) }^{2} -{\frac{h}{2k_\mathrm{{B}}T}}{\sum _{i{\in }V}}{\left( {\boldsymbol{{\sigma }_{i}^{z}}} - d_{i}{\boldsymbol{I^{(2^{|V|})}}} \right) }^{2} -{\frac{1}{2k_\mathrm{{B}}T}}{\sum _{i{\in }V}}{\left( {\boldsymbol{{\sigma }_{i}^{x}}} - {\Gamma }{\boldsymbol{I^{2^{|V|}}}} \right) }^{2} \right) } \right] } }} }}. \nonumber \\ \end{aligned}$$

(10.364)

The density matrix ${\boldsymbol{P}}$ satisfies the following minimization of the free energy functional:

$$\begin{aligned} {\boldsymbol{P}}={\arg }{\min _{{\boldsymbol{R}}}}{\left\{ \mathcal{{F}}{\big [}{\boldsymbol{R}}{\big ]}{\Big |}\mathrm{{Tr}}{\boldsymbol{R}}=1 \right\} }, \end{aligned}$$

(10.365)

$$\begin{aligned} \mathcal{{F}}{\big [}{\boldsymbol{R}}{\big ]}\equiv & {} {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}} \mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}{\big )}^{2}{\boldsymbol{R}}\right] } +{\frac{1}{2}}h{\sum _{i{\in }V}} \mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }_{i}^{z}}}-d_{i}{\boldsymbol{I^{(2^{|V|})}}}{\big )}^{2}{\boldsymbol{R}} \right] } \nonumber \\&+{\frac{1}{2}}{\sum _{i{\in }V}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }_{i}^{x}}}-{\Gamma }{\boldsymbol{I^{(2^{|V|})}}}{\big )}^{2}{\boldsymbol{R}} \right] } +k_\mathrm{{B}}T\mathrm{{Tr}}{\left[ {\boldsymbol{R}}{\ln }{\left( {\boldsymbol{R}}\right) }\right] }. \end{aligned}$$

(10.366)

Because all the off-diagonal elements of ${\big (}{\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}{\big )}^{2}$ are zero, we have

$$\begin{aligned} \mathcal{{F}}{\big [}{\boldsymbol{R}}{\big ]}= & {} {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\langle }s_{1},s_{2},{\cdots },s_{|V|}|{\big (}{\boldsymbol{{\sigma }_{i}^{z}}}-{\boldsymbol{{\sigma }_{j}^{z}}}{\big )}^{2} |s_{1},s_{2},{\cdots },s_{|V|}{\rangle } \nonumber \\&{} {\times } {\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{R}}|s_{1},s_{2},{\cdots },s_{|V|}{\rangle } \nonumber \\&+{\frac{1}{2}}h{\sum _{i{\in }V}} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\langle }s_{1},s_{2},{\cdots },s_{|V|}|{\big (}{\boldsymbol{{\sigma }_{i}^{z}}}-d_{i}{\boldsymbol{I^{(2^{|V|})}}}{\big )}^{2} |s_{1},s_{2},{\cdots },s_{|V|}{\rangle } \nonumber \\&{} {\times } {\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{R}}|s_{1},s_{2},{\cdots },s_{|V|}{\rangle } \nonumber \\&+{\frac{1}{2}}{\sum _{i{\in }V}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }_{i}^{x}}}-{\Gamma }{\boldsymbol{I^{(2^{|V|})}}}{\big )}^{2}{\boldsymbol{R}} \right] } +k_\mathrm{{B}}T\mathrm{{Tr}}{\left[ {\boldsymbol{R}}{\ln }{\left( {\boldsymbol{R}}\right) }\right] } \nonumber \\= & {} {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\left( s_{i} - s_{j} \right) }^{2} {\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{R}}|s_{1},s_{2},{\cdots },s_{|V|}{\rangle } \nonumber \\&+{\frac{1}{2}}h{\sum _{i{\in }V}} {\sum _{s_{1}{\in }{\Omega }}}{\sum _{s_{2}{\in }{\Omega }}}{\cdots }{\sum _{s_{|V|}{\in }{\Omega }}} {\left( s_{i} - d_{i} \right) }^{2} {\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{R}}|s_{1},s_{2},{\cdots },s_{|V|}{\rangle } \nonumber \\&+{\frac{1}{2}}{\sum _{i{\in }V}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }_{i}^{x}}}-{\Gamma }{\boldsymbol{I^{(2^{|V|})}}}{\big )}^{2}{\boldsymbol{R}} \right] } +k_\mathrm{{B}}T\mathrm{{Tr}}{\left[ {\boldsymbol{R}}{\ln }{\left( {\boldsymbol{R}}\right) }\right] }. \end{aligned}$$

(10.367)

By introducing the reduced density matrix ${\boldsymbol{R_{i}}}$ in Eq. (10.275) and

$$\begin{aligned} {\rho }{\left( s_{1},s_{2},{\cdots },s_{|V|}\right) } \equiv {\langle }s_{1},s_{2},{\cdots },s_{|V|}| {\boldsymbol{R}}|s_{1},s_{2},{\cdots },s_{|V|}{\rangle },~{\left( (s_{1},s_{2},{\cdots }.s_{|V|}){\in }{\Omega }^{|V|}\right) }, \nonumber \\ \end{aligned}$$

(10.368)

and by extending ${\rho }{\left( s_{1},s_{2},{\cdots },s_{|V|}\right) }$ to

$$\begin{aligned} {\rho }(\phi) = {\rho}(\phi_{1},\phi_{2},{\cdots},\phi_{|V|}){\left( \phi = (\phi_{1},\phi_{2},{\cdots},\phi_{|V|}) {\in} (-\infty, +\infty)^{|V|}\right)}\end{aligned}$$

(10.369)

the free energy functional can be expressed as

$$\begin{aligned} \mathcal{{F}}{\big [}{\boldsymbol{R}}{\big ]}= & {} {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\left( {\prod _{k{\in }V}} {\big (}{\delta }{\left( {\phi }_{k}-1 \right) } + {\delta }{\left( {\phi }_{k}+1 \right) }{\big )} \right) } {\big (}{\phi }_{i}-{\phi }_{j}{\big )}^{2}{\rho }({\boldsymbol{{\phi }}}) d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} \nonumber \\&+{\frac{1}{2}}h{\sum _{i{\in }V}} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\left( {\prod _{k{\in }V}} {\big (}{\delta }{\left( {\phi }_{k}-1 \right) } + {\delta }{\left( {\phi }_{k}+1 \right) }{\big )} \right) } {\big (}{\phi }_{i}-d_{i}{\big )}^{2}{\rho }({\boldsymbol{{\phi }}})d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} \nonumber \\&+{\frac{1}{2}}{\sum _{i{\in }V}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }_{i}^{x}}}-{\Gamma }{\boldsymbol{I^{(2^{|V|})}}}{\big )}^{2}{\boldsymbol{R_{i}}} \right] } +k_\mathrm{{B}}T\mathrm{{Tr}}{\left[ {\boldsymbol{R}}{\ln }{\left( {\boldsymbol{R}}\right) }\right] }. \end{aligned}$$

(10.370)

Now we consider the following approximate free energy:

$$\begin{aligned} \mathcal{{F}}_\mathrm{{Adaptive~TAP}}{\big [}{\rho },\{{\boldsymbol{R_{i}}},{\rho }_{i}| i{\in }V\}{\big ]}\equiv & {} {\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\big (}{\phi }_{i}-{\phi }_{j}{\big )}^{2}{\rho }({\boldsymbol{{\phi }}}) d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} \nonumber \\&+{\frac{1}{2}}h{\sum _{i{\in }V}} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\big (}{\phi }_{i}-d_{i}{\big )}^{2}{\rho }({\boldsymbol{{\phi }}})d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} \nonumber \\&+{\frac{1}{2}}{\sum _{i{\in }V}}\mathrm{{Tr}}{\left[ {\big (}{\boldsymbol{{\sigma }_{i}^{x}}}-{\Gamma }{\boldsymbol{I^{(2^{|V|})}}}{\big )}^{2}{\boldsymbol{R_{i}}} \right] } \nonumber \\&+k_\mathrm{{B}}T {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}}{\rho }({\boldsymbol{{\phi }}}) {\ln }{\big (}{\rho }({\boldsymbol{{\phi }}}){\big )} d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} \nonumber \\&+k_\mathrm{{B}}T{\sum _{i{\in }V}}{\Big (} \mathrm{{Tr}}{\boldsymbol{R}}_{i}{\ln }{\boldsymbol{R}}_{i} -{\int _{-{\infty }}^{+{\infty }}}{\rho }_{i}({\phi }_{i}) {\ln }{\big (}{\rho }_{i}({\phi }_{i}){\big )}d{\phi }_{i}{\Big )}, \end{aligned}$$

(10.371)

where

$$\begin{aligned} {\rho }_{i}{\left( {\phi }_{i}\right) }\equiv & {} {\int _{-{\infty }}^{+{\infty }}} {\int _{-{\infty }}^{+{\infty }}} {\cdots } {\int _{-{\infty }}^{+{\infty }}} {\delta }{\left( {\phi }_{i}-{\phi }'_{i} \right) } {\rho }{\left( {\phi }'_{1},{\phi }'_{2},{\cdots },{\phi }'_{|V|} \right) } d{\phi }'_{1} d{\phi }'_{2} {\cdots } d{\phi }'_{|V|} \nonumber \\&{} ~{\left( i{\in }V,{\phi }_{i}{\in }(-{\infty },+{\infty })\right) }. \end{aligned}$$

(10.372)

The reduced density matrix ${\boldsymbol{R}}_{i}$ and the marginal probability density functions ${\rho }_{i}({\phi }_{i})$ and ${\rho }({\boldsymbol{{\phi }}})$ need to satisfy the consistencies

$$\begin{aligned} \left\{ \begin{array}{llll} {\displaystyle {{\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}}}} {\phi }_{i}{\rho }({\boldsymbol{{\phi }}})d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} ={\displaystyle {\int _{-{\infty }}^{+{\infty }}}} {\phi }_{i}{\rho }_{i}({\phi }_{i})d{\phi }_{i} =\mathrm{{Tr}}{\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}}~(i{\in }V),\\ {\displaystyle {{\sum _{i{\in }V}}}}{\displaystyle {{\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}}}} {{\phi }_{i}}^{2}{\rho }({\boldsymbol{{\phi }}})d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} ={\displaystyle {{\sum _{i{\in }V}}}}{\displaystyle {\int _{-{\infty }}^{+{\infty }}}} {{\phi }_{i}}^{2}{\rho }_{i}({\phi }_{i})d{\phi }_{i}=1,\\ \end{array} \right. \end{aligned}$$

(10.373)

and the normalizations

$$\begin{aligned} \left\{ \begin{array}{llll} {\displaystyle {{\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}}}} {\rho }({\boldsymbol{{\phi }}})d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|}=1,\\ {\displaystyle {\int _{-{\infty }}^{+{\infty }}}} {\rho }_{i}({\phi }_{i})d{\phi }_{i}=1~(i{\in }V).\\ \mathrm{{Tr}}{\boldsymbol{R}}_{i}=1. \end{array} \right. \end{aligned}$$

(10.374)

${\boldsymbol{R_{i}}}$, ${\rho }_{i}({\phi }_{i})$ and ${\rho }({\boldsymbol{{\phi }}})$ are determined so as to minimize the above approximate free energy $\mathcal{{F}}{\big [}{\rho },\{{\boldsymbol{R_{i}}},{\rho }_{i}|i{\in }V\}{\big ]}$ under the constraint conditions in Eqs. (10.373) and (10.374). We introduce Lagrange multipliers ${\boldsymbol{f}} = {\left( \begin{array}{ccccc} f_{1} \\ f_{2} \\ {\vdots } \\ f_{|V|} \end{array} \right) }$ and ${\boldsymbol{g}} = {\left( \begin{array}{ccccc} g_{1} \\ g_{2} \\ {\vdots } \\ g_{|V|} \end{array} \right) }$, D, L, ${\lambda }$, and ${\lambda }_{i}$ to ensure the constraint conditions in Eqs. (10.373) and (10.374) as follows:

$$\begin{aligned}&\mathcal{{L}}_\mathrm{{Adaptive~TAP}}{\big [}{\rho },\{R_{i},{\rho }_{i}| i{\in }V\}{\big ]} \nonumber \\&\equiv \mathcal{{F}}_\mathrm{{Adaptive~TAP}}{\big [}{\rho },\{R_{i},{\rho }_{i}| i{\in }V\}{\big ]} \nonumber \\&-{\sum _{i{\in }V}}g_{i} {\left( {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\phi }_{i}{\rho }({\boldsymbol{{\phi }}})d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} -{\int _{-{\infty }}^{+{\infty }}}{\phi }_{i}{\rho }_{i}({\phi }_{i})d{\phi }_{i}\right) } \nonumber \\&-{\sum _{i{\in }V}}f_{i} {\left( {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\phi }_{i}{\rho }({\boldsymbol{{\phi }}})d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} -\mathrm{{Tr}}{\boldsymbol{{\sigma }^{z}}}{\boldsymbol{R_{i}}} \right) } \nonumber \\&-D{\left( {\sum _{i{\in }V}} {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {{\phi }_{i}}^{2}{\rho }({\boldsymbol{{\phi }}})d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|}-1\right) } \nonumber \\&-L{\left( {\sum _{i{\in }V}} {\int _{-{\infty }}^{+{\infty }}} {{\phi }_{i}}^{2}{\rho }_{i}({\phi }_{i})d{\phi }_{i}-1\right) } \nonumber \\&-{\lambda }{\left( {\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}} {\rho }({\boldsymbol{{\phi }}})d{\boldsymbol{{\phi }}}-1\right) } \nonumber \\&- {\sum _{i{\in }V}} {\lambda }_{i}{\left( {\int _{-{\infty }}^{+{\infty }}} {\rho }_{i}({\phi }_{i})d{\phi }_{i}-1\right) }. \end{aligned}$$

(10.375)

By taking the first variation of the approximate free energy $\mathcal{{L}}_\mathrm{{Adaptive~TAP}}{\big [}{\widehat{{\rho }}},\{{\boldsymbol{R_{i}}},{\rho }_{i}|i{\in }V\}{\big ]}$ with respect to the marginals, we can derive the approximate expressions of ${\boldsymbol{{\widehat{R}}_{i}}}$, ${\widehat{{\rho }}}_{i}({\phi }_{i})$, and ${\widehat{{\rho }}}({\boldsymbol{{\phi }}})$ as follows:

$$\begin{aligned} {\boldsymbol{{\widehat{R}}_{i}}}= & {} {\frac{ {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\Big (}f_{i}{\boldsymbol{{\sigma }^{z}}}+{\Gamma }{\boldsymbol{{\sigma }^{x}}}{\Big )}\right) } }{ \mathrm{{Tr}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\Big (}h_{i}{\boldsymbol{{\sigma }^{z}}}+{\Gamma }{\boldsymbol{{\sigma }^{x}}}{\Big )}\right) } } } {} (i{\in }V), \end{aligned}$$

(10.376)

$$\begin{aligned}&{\widehat{{\rho }}}({\boldsymbol{{\phi }}}) \nonumber \\&= {\frac{{\displaystyle { {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\Big (} -{\frac{1}{2}}D{\sum _{i{\in }V}}{{\phi }_{i}}^{2} +{\sum _{i{\in }V}}{\big (}f_{i}+g_{i}{\big )}{\phi }_{i} -{\frac{1}{2}}h{\sum _{i{\in }V}}{\big (}{\phi }_{i}-d_{i}{\big )}^{2} -{\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}}{\big (}{\phi }_{i}-{\phi }_{j}{\big )}^{2} {\Big )}\right) } }} }{ {\displaystyle {{\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}}}} {\displaystyle { {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\Big (} -{\frac{1}{2}}D{\sum _{i{\in }V}}{{\phi }_{i}}^{2} +{\sum _{i{\in }V}}{\big (}f_{i}+g_{i}{\big )}{\phi }_{i} -{\frac{1}{2}}h{\sum _{i{\in }V}}{\big (}{\phi }_{i}-d_{i}{\big )}^{2} -{\frac{1}{2}}J{\sum _{\{i,j\}{\in }E}}{\big (}{\phi }_{i}-{\phi }_{j}{\big )}^{2} {\Big )}\right) } }} d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|V|} } }, \nonumber \\&\end{aligned}$$

(10.377)

$$\begin{aligned} {\widehat{{\rho }}}_{i}({\phi }_{i}) = {\frac{{\displaystyle { {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\Big (} -{\frac{1}{2}}L{{\phi }_{i}}^{2} +g_{i}{\phi }_{i} -{\frac{1}{2}}h{\big (}{\phi }_{i}-d_{i}{\big )}^{2} {\Big )}\right) } }} }{ {\displaystyle {{\int _{-{\infty }}^{+{\infty }}}}} {\displaystyle { {\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\Big (} -{\frac{1}{2}}L{{\phi }_{i}}^{2} +g_{i}{\phi }_{i} -{\frac{1}{2}}h{\big (}{\phi }_{i}-d_{i}{\big )}^{2} {\Big )}\right) } }} d{\phi }_{i} } } \end{aligned}$$

(10.378)

where ${\boldsymbol{C}}$ is the $|V|{\times }|V|$ matrix in which the (i, j)-elements are defined by

$$\begin{aligned} C_{ij} \equiv \left\{ \begin{array}{llll} |{\partial }i| &{} (i=j), \\ -1 &{} (\{i,j\}{\in }E), \\ 0 &{} (\mathrm{{otherwise}}), \end{array} \right. \end{aligned}$$

(10.379)

for any nodes i(${\in }V$) and j(${\in }V$). Equations (10.376), (10.377), and (10.378) can be rewritten as

$$\begin{aligned} {\boldsymbol{{\widehat{R}}_{i}}}= & {} {\frac{1}{ 2{\cosh } {\left( {\frac{1}{k_\mathrm{{B}}T}}{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}\right) } }} {\frac{f_{i}+{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}}{2{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}}} {\left( \begin{array}{ccc} 1 &{} -{\frac{{\Gamma }}{f_{i}+{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}}} \\ +{\frac{{\Gamma }}{f_{i}+{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}}} &{} 1 \end{array} \right) } \nonumber \\&{\times } {\left( \begin{array}{ccc} {\exp }{\left( +{\frac{1}{k_\mathrm{{B}}T}}{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}\right) } &{} 0 \\ 0 &{} {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}\right) } \end{array} \right) } {\left( \begin{array}{ccc} 1 &{} +{\frac{{\Gamma }}{f_{i}+{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}}} \\ -{\frac{{\Gamma }}{f_{i}+{\sqrt{{f_{i}}^{2}+{\Gamma }^{2}}}}} &{} 1 \end{array} \right) }. \nonumber \\&\end{aligned}$$

(10.380)

$$\begin{aligned} {\widehat{{\rho }}}({\boldsymbol{{\phi }}})= & {} {\sqrt{ {\frac{{\det }{\big (} (h+D){\boldsymbol{I^{|V|}}}+J{\boldsymbol{C}} {\big )}}{(2{\pi })^{|V|}}} }} \nonumber \\&{\times } {\exp }{\Bigg (}-{\frac{1}{2k_\mathrm{{B}}T}} {\Big (}{\boldsymbol{{\phi }}} - {\big (}(h+D){\boldsymbol{I^{|V|}}}+J{\boldsymbol{C}}{\big )}^{-1}{\left( {\boldsymbol{f}}+{\boldsymbol{g}}+h{\boldsymbol{d}}\right) }{\Big )}^\mathrm{{T}} {\Big (}(h+D){\boldsymbol{I^{|V|}}}+J{\boldsymbol{C}}{\Big )} \nonumber \\&\qquad\qquad\qquad\qquad{}{\times } {\Big (}{\boldsymbol{{\phi }}} - {\big (}(h+D){\boldsymbol{I^{|V|}}}+J{\boldsymbol{C}}{\big )}^{-1}{\left( {\boldsymbol{f}}+{\boldsymbol{g}}+h{\boldsymbol{d}}\right) }{\Big )} {\Bigg )}, \end{aligned}$$

(10.381)

$$\begin{aligned} {\widehat{{\rho }}}_{i}({\phi }_{i})= & {} {\sqrt{{\frac{h+L}{2{\pi }}}}} {\exp }{\left( -{\frac{1}{2k_\mathrm{{B}}T}}{\left( h+L \right) } {\left( x- {\frac{g_{i}+hd_{i}}{h+L}}\right) }^{2} \right) } ~(i{\in }V), \end{aligned}$$

(10.382)

The Lagrange multipliers ${\boldsymbol{f}}$, ${\boldsymbol{g}}$, L, and D are often referred to as the effective fields and are determined so as to satisfy the consistencies in Eq. (10.373), which reduce to the following simultaneous equations:

$$\begin{aligned} {\boldsymbol{g}}+h{\boldsymbol{d}} ={\big (}h+L{\big )}{\Big (}{\big (}D-L{\big )}{\boldsymbol{I^{(2^{|V|})}}}+J{\boldsymbol{C}}{\Big )}^{-1}{\boldsymbol{f}}, \end{aligned}$$

(10.383)

$$\begin{aligned} {\boldsymbol{f}}+{\boldsymbol{g}}+h{\boldsymbol{d}}= & {} {\Big (}{\big (}D-L{\big )}{\boldsymbol{I^{(|V|)}}}+J{\boldsymbol{C}}{\Big )}^{-1} {\left( \begin{array}{ccc} {\frac{f_{1}}{{\sqrt{{f_{1}}^{2}+{\Gamma }^{2}}}}} {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\sqrt{{f_{1}}^{2}+{\Gamma }^{2}}} \right) } \\ {\frac{f_{2}}{{\sqrt{{f_{2}}^{2}+{\Gamma }^{2}}}}} {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\sqrt{{f_{2}}^{2}+{\Gamma }^{2}}} \right) } \\ {\vdots } \\ {\frac{f_{|V|}}{{\sqrt{{f_{|V|}}^{2}+{\Gamma }^{2}}}}} {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\sqrt{{f_{|V|}}^{2}+{\Gamma }^{2}}} \right) } \\ \end{array} \right) }, \nonumber \\&\end{aligned}$$

(10.384)

$$\begin{aligned} L=-h+{\frac{1}{2}}+{\sqrt{ {\frac{1}{4}}+{\frac{1}{|V|}}{\left( {\boldsymbol{f}}+{\boldsymbol{g}}+h{\boldsymbol{d}}\right) }^\mathrm{{T}} {\left( {\boldsymbol{f}}+{\boldsymbol{g}}+h{\boldsymbol{d}}\right) } }}, \end{aligned}$$

(10.385)

$$\begin{aligned} {\frac{1}{ {\frac{1}{2}}+{\sqrt{ {\frac{1}{4}}+{\frac{1}{|V|}}{\left( {\boldsymbol{f}}+{\boldsymbol{g}}+h{\boldsymbol{d}}\right) }^\mathrm{{T}} {\left( {\boldsymbol{f}}+{\boldsymbol{g}}+h{\boldsymbol{d}}\right) } }} }} ={\frac{1}{|V|}}\mathrm{{Tr}}{\left( {\big (}(h+D){\boldsymbol{I^{|V|}}}+J{\boldsymbol{C}}{\big )}^{-1}\right) }. \nonumber \\ \end{aligned}$$

(10.386)

The real symmetric matrix ${\boldsymbol{C}}$ is diagonalized as

$$\begin{aligned} {\boldsymbol{C}}={\boldsymbol{U}}{\boldsymbol{\Lambda }}{\boldsymbol{U}}^{-1}, \end{aligned}$$

(10.387)

$$\begin{aligned} {\boldsymbol{\Lambda }} \equiv {\left( \begin{array}{ccccccccc} {\lambda }_{1} &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {\lambda }_{2} &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {\lambda }_{3} &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {\lambda }_{|V|} \end{array} \right) }, \end{aligned}$$

(10.388)

where ${\lambda }_{1}\ge {\lambda }_{2}\ge {\lambda }_{3}\ge {\cdots } \ge {\lambda }_{|V|}$. All the eigenvalues ${\lambda }_{1},{\lambda }_{2},{\cdots },{\lambda }_{|V|}$ are always real numbers. For the eigenvector ${\boldsymbol{u}}_{i}= {\left( \begin{array}{ccccccccc} U_{1i} \\ U_{2i} \\ {\vdots } \\ U_{|V|i} \end{array} \right) }$ corresponding to the eigenvalue ${\lambda }_{i}$ such that ${\boldsymbol{A}}{\boldsymbol{u}}_{i}={\lambda }_{i}{\boldsymbol{u}}_{i}$, for every $i{\in }\{1,2,3,{\cdots },M\}$, the matrix ${\boldsymbol{U}}$ is defined by

$$\begin{aligned} {\boldsymbol{U}} \equiv {\left( {\boldsymbol{u}}_{1},{\boldsymbol{u}}_{2},{\boldsymbol{u}}_{3},{\cdots },{\boldsymbol{u}}_{M} \right) } = {\left( \begin{array}{ccccccccc} U_{11} &{} U_{12} &{} U_{13} &{} {\cdots } &{} U_{1\,V|} \\ U_{21} &{} U_{22} &{} U_{23} &{} {\cdots } &{} U_{2\,V|} \\ U_{31} &{} U_{32} &{} U_{33} &{} {\cdots } &{} U_{3\,V|} \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ U_{|V|1} &{} U_{|V|2} &{} U_{|V|3} &{} {\cdots } &{} U_{|V||V|} \end{array} \right) }. \end{aligned}$$

(10.389)

It is known that ${\boldsymbol{U}}$ is a unitary matrix that satisfies ${\boldsymbol{U}}^{-1}={\boldsymbol{U}}^\mathrm{{T}}$ for the real symmetric matrix ${\boldsymbol{C}}$. By using the diagonal matrix ${\boldsymbol{\Lambda }}$ and unitary matrix ${\boldsymbol{U}}$, the density matrix ${\boldsymbol{R}}$ in Eq. (10.390) can be represented as follows:

$$\begin{aligned} {\boldsymbol{R}}= & {} {\frac{ {\displaystyle { {\exp }{\left( -{\frac{1}{2k_\mathrm{{B}}T}} {\boldsymbol{{\zeta }^\mathrm{{T}}}} {\left( {\big (}h{\boldsymbol{I^{(2^{|V|})}}}+J{\boldsymbol{{\Lambda }}}{\big )}{\otimes }{\boldsymbol{I^{(2^{|V|})}}}\right) } {\boldsymbol{{\zeta }}} -{\frac{1}{2k_\mathrm{{B}}T}} {\boldsymbol{{\xi }^\mathrm{{T}}}} {\boldsymbol{{\xi }}} \right) } }} }{ {\displaystyle { \mathrm{{Tr}} {\left[ {\exp }{\left( -{\frac{1}{2k_\mathrm{{B}}T}} {\boldsymbol{{\zeta }^\mathrm{{T}}}} {\left( {\big (}h{\boldsymbol{I^{(2^{|V|})}}}+J{\boldsymbol{{\Lambda }}}{\big )}{\otimes }{\boldsymbol{I^{(2^{|V|})}}}\right) } {\boldsymbol{{\zeta }}} -{\frac{1}{2k_\mathrm{{B}}T}} {\boldsymbol{{\xi }^\mathrm{{T}}}} {\boldsymbol{{\xi }}} \right) } \right] } }} }}, \end{aligned}$$

(10.390)

where

$$\begin{aligned} {\boldsymbol{{\zeta }}} = {\left( \begin{array}{ccc} {\boldsymbol{{\zeta }_{1}}} \\ {\boldsymbol{{\zeta }_{2}}} \\ {\vdots } \\ {\boldsymbol{{\zeta }_{|V|}}} \\ \end{array} \right) } \equiv {\left( {\boldsymbol{U^\mathrm{{T}}}}{\otimes }{\boldsymbol{I^{(2^{|V|})}}}\right) } {\left( \begin{array}{ccc} {\boldsymbol{{\sigma }_{1}^{z}}} \\ {\boldsymbol{{\sigma }_{2}^{z}}} \\ {\vdots } \\ {\boldsymbol{{\sigma }_{|V|}^{z}}} \\ \end{array} \right) } - {\left( {\big (}h{\boldsymbol{I^{(|V|)}}}+J{\boldsymbol{{\Lambda }}}{\big )}^{-1} {\boldsymbol{U^\mathrm{{T}}}} {\left( \begin{array}{ccc} d_{1} \\ d_{2} \\ {\vdots } \\ d_{|V|} \end{array} \right) } \right) } {\otimes }{\boldsymbol{I^{(2^{|V|})}}}, \nonumber \\ \end{aligned}$$

(10.391)

$$\begin{aligned} {\boldsymbol{{\xi }}} = {\left( \begin{array}{ccc} {\boldsymbol{{\xi }_{1}}} \\ {\boldsymbol{{\xi }_{2}}} \\ {\vdots } \\ {\boldsymbol{{\xi }_{|V|}}} \\ \end{array} \right) } \equiv {\left( {\boldsymbol{U^\mathrm{{T}}}}{\otimes }{\boldsymbol{I^{(2^{|V|})}}}\right) } {\left( \begin{array}{ccc} {\boldsymbol{{\sigma }_{1}^{x}}} \\ {\boldsymbol{{\sigma }_{2}^{x}}} \\ {\vdots } \\ {\boldsymbol{{\sigma }_{|V|}^{x}}} \\ \end{array} \right) } - {\gamma } {\left( \begin{array}{ccc} {\boldsymbol{I^{(2^{|V|}|V|)}}} \\ {\boldsymbol{I^{(2^{|V|}|V|)}}} \\ {\vdots } \\ {\boldsymbol{I^{(2^{|V|}|V|)}}} \\ \end{array} \right) }. \end{aligned}$$

(10.392)

By using the Gram-Schmidt orthonormalization in the framework of Fig. 10.17, we introduce a new unitary matrix

$$\begin{aligned} {\boldsymbol{{\widetilde{U}}}} = {\left( \begin{array}{ccccccccc} {\widetilde{U}}_{11} &{} {\widetilde{U}}_{12} &{} {\widetilde{U}}_{13} &{} {\cdots } &{} {\widetilde{U}}_{1|{\widetilde{V}}|} \\ {\widetilde{U}}_{21} &{} {\widetilde{U}}_{22} &{} {\widetilde{U}}_{23} &{} {\cdots } &{} {\widetilde{U}}_{2|{\widetilde{V}}|} \\ {\widetilde{U}}_{31} &{} {\widetilde{U}}_{32} &{} {\widetilde{U}}_{33} &{} {\cdots } &{} {\widetilde{U}}_{3|{\widetilde{V}}|} \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ {\widetilde{U}}_{|{\widetilde{V}}|1} &{} {\widetilde{U}}_{|{\widetilde{V}}|2} &{} {\widetilde{U}}_{|{\widetilde{V}}|3} &{} {\cdots } &{} {\widetilde{U}}_{|{\widetilde{V}}||{\widetilde{V}}|} \end{array} \right) } \equiv {\left( {\boldsymbol{{\widetilde{u}}_{1}}},{\boldsymbol{{\widetilde{u}}_{2}}},{\boldsymbol{{\widetilde{u}}_{3}}},{\cdots },{\boldsymbol{{\widetilde{u}}_{|{\widetilde{V}}|}}} \right) }, \end{aligned}$$

(10.393)

where

$$\begin{aligned} {\boldsymbol{v}}_{1}= {\left( \begin{array}{ccccccccc} U_{11} \\ U_{21} \\ {\vdots } \\ U_{|{\widetilde{V}}|1} \end{array} \right) }, {\boldsymbol{v}}_{2}= {\left( \begin{array}{ccccccccc} U_{12} \\ U_{22} \\ {\vdots } \\ U_{|{\widetilde{V}}|2} \end{array} \right) },{\cdots },{\boldsymbol{v}}_{|{\widetilde{V}}|}= {\left( \begin{array}{ccccccccc} U_{1|{\widetilde{V}}|} \\ U_{2|{\widetilde{V}}|} \\ {\vdots } \\ U_{|{\widetilde{V}}||{\widetilde{V}}|} \end{array} \right) }, \end{aligned}$$

(10.394)

$$\begin{aligned} \left\{ \begin{array}{llllll} {\boldsymbol{u'_{1}}}&{}=&{}{\boldsymbol{v_{1}}}, &{} {\boldsymbol{{\widetilde{u}}_{1}}} = {\frac{{\boldsymbol{u'_{1}}}}{{\sqrt{{\boldsymbol{{u'_{1}}^\mathrm{{T}}}}{\boldsymbol{u'_{1}}}}}}}, \\ {\boldsymbol{u'_{2}}}&{}=&{}{\boldsymbol{v_{2}}} - {\frac{ {\boldsymbol{{u'_{1}}^\mathrm{{T}}}} {\boldsymbol{v_{2}}} }{ {\boldsymbol{{u'_{1}}^\mathrm{{T}}}}{\boldsymbol{{u'_{1}}}} }}{\boldsymbol{{u'_{1}}}}, &{} {\boldsymbol{{\widetilde{u}}_{2}}} = {\frac{{\boldsymbol{u'_{2}}}}{{\sqrt{{\boldsymbol{{u'_{2}}^\mathrm{{T}}}}{\boldsymbol{u'_{2}}}}}}}, \\ {\boldsymbol{u'_{3}}} &{}=&{}{\boldsymbol{v_{3}}} - {\frac{ {\boldsymbol{{u'_{1}}^\mathrm{{T}}}} {\boldsymbol{v_{3}}} }{ {\boldsymbol{{u'_{1}}^\mathrm{{T}}}}{\boldsymbol{{u'_{1}}}} }}{\boldsymbol{{u'_{1}}}} - {\frac{ {\boldsymbol{{u'_{2}}^\mathrm{{T}}}} {\boldsymbol{v_{3}}} }{ {\boldsymbol{{u'_{2}}^\mathrm{{T}}}}{\boldsymbol{{u'_{2}}}} }}{\boldsymbol{{u'_{2}}}}, &{} {\boldsymbol{{\widetilde{u}}_{3}}} = {\frac{{\boldsymbol{u'_{3}}}}{{\sqrt{{\boldsymbol{{u'_{3}}^\mathrm{{T}}}}{\boldsymbol{u'_{3}}}}}}}, \\ &{}{\vdots }&{} \\ {\boldsymbol{u'_{|{\widetilde{V}}|}}} &{}=&{}{\boldsymbol{v_{|{\widetilde{V}}|}}} - {\frac{ {\boldsymbol{{u'_{1}}^\mathrm{{T}}}} {\boldsymbol{v_{|{\widetilde{V}}|}}} }{ {\boldsymbol{{u'_{1}}^\mathrm{{T}}}}{\boldsymbol{{u'_{1}}}} }}{\boldsymbol{{u'_{1}}}} - {\frac{ {\boldsymbol{{u'_{2}}^\mathrm{{T}}}} {\boldsymbol{v_{|{\widetilde{V}}|}}} }{ {\boldsymbol{{u'_{2}}^\mathrm{{T}}}}{\boldsymbol{{u'_{2}}}} }}{\boldsymbol{{u'_{2}}}} +{\cdots }+ {\frac{ {\boldsymbol{{u'_{|{\widetilde{V}}|-1}}^\mathrm{{T}}}} {\boldsymbol{v_{|{\widetilde{V}}|}}} }{ {\boldsymbol{{u'_{|{\widetilde{V}}|-1}}^\mathrm{{T}}}}{\boldsymbol{{u'_{|{\widetilde{V}}|-1}}}} }}{\boldsymbol{{u'_{|{\widetilde{V}}|}}}}, &{} {\boldsymbol{{\widetilde{u}}_{|{\widetilde{V}}|}}} = {\frac{{\boldsymbol{u'_{|{\widetilde{V}}|}}}}{{\sqrt{{\boldsymbol{{u'_{|{\widetilde{V}}|}}^\mathrm{{T}}}}{\boldsymbol{u'_{|{\widetilde{V}}|}}}}}}}. \\ \end{array} \right. \end{aligned}$$

(10.395)

By using the new unitary matrix ${\boldsymbol{{\widetilde{U}}}}$ and a diagonal matrix ${\boldsymbol{{\Lambda }}}$

$$\begin{aligned} {\boldsymbol{{\widetilde{\Lambda }}}} \equiv {\left( \begin{array}{ccccccccc} {\lambda }_{1} &{} 0 &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} {\lambda }_{2} &{} 0 &{} {\cdots } &{} 0 \\ 0 &{} 0 &{} {\lambda }_{3} &{} {\cdots } &{} 0 \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ 0 &{} 0 &{} 0 &{} {\cdots } &{} {\lambda }_{|{\widetilde{V}}|} \end{array} \right) }, \end{aligned}$$

(10.396)

we introduce a renormalized density matrix ${\boldsymbol{{\widetilde{R}}}}$ from the standpoint of the momentum space renormalization group for general graphs as

$$\begin{aligned} {\boldsymbol{{\widetilde{P}}}}\equiv & {} {\frac{ {\displaystyle { {\exp }{\left( -{\frac{1}{2k_\mathrm{{B}}T}} {\boldsymbol{{\widetilde{{\zeta }}}^\mathrm{{T}}}} {\left( {\big (}h{\boldsymbol{I^{(2^{|{\widetilde{V}}|})}}}+J{\boldsymbol{{\widetilde{{\Lambda }}}}}{\big )}{\otimes }{\boldsymbol{I^{(2^{|{\widetilde{V}}|})}}}\right) } {\boldsymbol{{\widetilde{{\zeta }}}}} -{\frac{1}{2k_\mathrm{{B}}T}} {\boldsymbol{{\widetilde{{\xi }}}^\mathrm{{T}}}} {\boldsymbol{{\widetilde{{\xi }}}}} \right) } }} }{ {\displaystyle { \mathrm{{Tr}} {\left[ {\exp }{\left( -{\frac{1}{2k_\mathrm{{B}}T}} {\boldsymbol{{\widetilde{{\zeta }}}^\mathrm{{T}}}} {\left( {\big (}h{\boldsymbol{I^{(2^{|{\widetilde{V}}|})}}}+J{\boldsymbol{{\widetilde{{\Lambda }}}}}{\big )}{\otimes }{\boldsymbol{I^{(2^{|{\widetilde{V}}|})}}}\right) } {\boldsymbol{{\widetilde{{\zeta }}}}} -{\frac{1}{2k_\mathrm{{B}}T}} {\boldsymbol{{\widetilde{{\xi }}}^\mathrm{{T}}}} {\boldsymbol{{\widetilde{{\xi }}}}} \right) } \right] } }} }}, \end{aligned}$$

(10.397)

where

$$\begin{aligned} {\boldsymbol{{\widetilde{{\zeta }}}}} = {\left( \begin{array}{ccc} {\boldsymbol{{\widetilde{{\zeta }}}_{1}}} \\ {\boldsymbol{{\widetilde{{\zeta }}}_{2}}} \\ {\vdots } \\ {\boldsymbol{{\widetilde{{\zeta }}}_{|{\widetilde{V}}|}}} \\ \end{array} \right) } \equiv {\left( {\boldsymbol{{\widetilde{U}}^\mathrm{{T}}}}{\otimes }{\boldsymbol{I^{(2^{|{\widetilde{V}}|})}}}\right) } {\left( \begin{array}{ccc} {\boldsymbol{{\sigma }_{1}^{z}}} \\ {\boldsymbol{{\sigma }_{2}^{z}}} \\ {\vdots } \\ {\boldsymbol{{\sigma }_{|{\widetilde{V}}|}^{z}}} \\ \end{array} \right) } - {\left( {\big (}h{\boldsymbol{I^{(|{\widetilde{V}}|)}}}+J{\boldsymbol{{\widetilde{{\Lambda }}}}}{\big )}^{-1} {\boldsymbol{{{\widetilde{U}}^\mathrm{{T}}}}} {\left( \begin{array}{ccc} {\widetilde{d}}_{1} \\ {\widetilde{d}}_{2} \\ {\vdots } \\ {\widetilde{d}}_{|{\tilde{V}}|} \end{array} \right) } \right) } {\otimes }{\boldsymbol{I^{(2^{|{\widetilde{V}}|})}}}, \nonumber \\ \end{aligned}$$

(10.398)

$$\begin{aligned} {\boldsymbol{{\widetilde{{\xi }}}}} = {\left( \begin{array}{ccc} {\boldsymbol{{\widetilde{{\xi }}}_{1}}} \\ {\boldsymbol{{\widetilde{{\xi }}}_{2}}} \\ {\vdots } \\ {\boldsymbol{{\widetilde{{\xi }}}_{|{\widetilde{V}}|}}} \\ \end{array} \right) } \equiv {\left( {\boldsymbol{U^\mathrm{{T}}}}{\otimes }{\boldsymbol{I^{(2^{|{\widetilde{V}}|})}}}\right) } {\left( \begin{array}{ccc} {\boldsymbol{{\sigma }_{1}^{x}}} \\ {\boldsymbol{{\sigma }_{2}^{x}}} \\ {\vdots } \\ {\boldsymbol{{\sigma }_{|{\widetilde{V}}|}^{x}}} \\ \end{array} \right) } - {\gamma } {\left( \begin{array}{ccc} {\boldsymbol{I^{(2^{|{\widetilde{V}}|}|{\widetilde{V}}|)}}} \\ {\boldsymbol{I^{(2^{|{\widetilde{V}}|}|{\widetilde{V}}|)}}} \\ {\vdots } \\ {\boldsymbol{I^{(2^{|{\widetilde{V}}|}|{\widetilde{V}}|)}}} \\ \end{array} \right) }, \end{aligned}$$

(10.399)

$$\begin{aligned} {\left( \begin{array}{ccc} {\widetilde{d}}_{1} \\ {\widetilde{d}}_{2} \\ {\vdots } \\ {\widetilde{d}}_{|{\tilde{V}}|} \end{array} \right) }= & {} {\left( \begin{array}{ccccccccc} {\widetilde{U}}_{11} &{} {\widetilde{U}}_{12} &{} {\widetilde{U}}_{13} &{} {\cdots } &{} {\widetilde{U}}_{1|{\widetilde{V}}|} \\ {\widetilde{U}}_{21} &{} {\widetilde{U}}_{22} &{} {\widetilde{U}}_{23} &{} {\cdots } &{} {\widetilde{U}}_{2|{\widetilde{V}}|} \\ {\widetilde{U}}_{31} &{} {\widetilde{U}}_{32} &{} {\widetilde{U}}_{33} &{} {\cdots } &{} {\widetilde{U}}_{3|{\widetilde{V}}|} \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ {\widetilde{U}}_{|{\widetilde{V}}|1} &{} {\widetilde{U}}_{|{\widetilde{V}}|2} &{} {\widetilde{U}}_{|{\widetilde{V}}|3} &{} {\cdots } &{} {\widetilde{U}}_{|{\widetilde{V}}||{\widetilde{V}}|} \end{array} \right) } {\left( \begin{array}{ccccccccc} U_{11} &{} U_{21} &{} U_{31} &{} {\cdots } &{} U_{|V|1} \\ U_{12} &{} U_{22} &{} U_{32} &{} {\cdots } &{} U_{|V|2} \\ U_{13} &{} U_{23} &{} U_{33} &{} {\cdots } &{} U_{|V|3} \\ {\vdots } &{} {\vdots } &{} {\vdots } &{} {\ddots } &{} {\vdots } \\ U_{1|{\widetilde{V}}|} &{} U_{2|{\widetilde{V}}|} &{} U_{3|{\widetilde{V}}|} &{} {\cdots } &{} U_{|V||{\widetilde{V}}|} \end{array} \right) } {\left( \begin{array}{ccc} d_{1} \\ d_{2} \\ d_{3} \\ {\vdots } \\ d_{|V|} \end{array} \right) }. \nonumber \\&\end{aligned}$$

(10.400)

For this density matrix ${\boldsymbol{{\widetilde{P}}}}$ in Eq. (10.397), we can formulate the approximate reduced density matrix ${\boldsymbol{{\widehat{{\widetilde{R}}}}_{i}}}$ and the approximate Gaussian marginal probability density function ${\widehat{{\widetilde{{\rho }}}}}({\boldsymbol{{\widetilde{{\phi }}}}})={\widehat{{\widetilde{{\rho }}}}}({\phi }_{1},{\phi }_{2},{\cdots },{\phi }_{|{\widetilde{V}}|})$ and ${\widehat{{\widetilde{{\rho }}}}}({\phi }_{i})$ for the corresponding quantum adaptive TAP approximation as follows:

$$\begin{aligned} {\boldsymbol{{\widehat{{\widetilde{R}}}}_{i}}}= & {} {\frac{1}{ 2{\cosh } {\left( {\frac{1}{k_\mathrm{{B}}T}}{\sqrt{{{\tilde{f}}_{i}}^{2}+{\Gamma }^{2}}}\right) } }} {\frac{{\widetilde{f}}_{i}+{\sqrt{{{\widetilde{f}}_{i}}^{2}+{\Gamma }^{2}}}}{2{\sqrt{{{\widetilde{f}}_{i}}^{2}+{\Gamma }^{2}}}}} {\left( \begin{array}{ccc} 1 &{} -{\frac{{\Gamma }}{{\widetilde{f}}_{i}+{\sqrt{{{\widetilde{f}}_{i}}^{2}+{\Gamma }^{2}}}}} \\ +{\frac{{\Gamma }}{{\widetilde{f}}_{i}+{\sqrt{{{\widetilde{f}}_{i}}^{2}+{\Gamma }^{2}}}}} &{} 1 \end{array} \right) } \nonumber \\&{\times } {\left( \begin{array}{ccc} {\exp }{\left( +{\frac{1}{k_\mathrm{{B}}T}}{\sqrt{{{\widetilde{f}}_{i}}^{2}+{\Gamma }^{2}}}\right) } &{} 0 \\ 0 &{} {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\sqrt{{{\widetilde{f}}_{i}}^{2}+{\Gamma }^{2}}}\right) } \end{array} \right) } {\left( \begin{array}{ccc} 1 &{} +{\frac{{\Gamma }}{{\widetilde{f}}_{i}+{\sqrt{{{\widetilde{f}}_{i}}^{2}+{\Gamma }^{2}}}}} \\ -{\frac{{\Gamma }}{{\widetilde{f}}_{i}+{\sqrt{{{\widetilde{f}}_{i}}^{2}+{\Gamma }^{2}}}}} &{} 1 \end{array} \right) }. \nonumber \\ \end{aligned}$$

(10.401)

$$\begin{aligned} {\widehat{{\widetilde{{\rho }}}}}({\phi }_{1},{\phi }_{2},{\cdots },{\phi }_{|{\widetilde{V}}|})= & {} {\sqrt{ {\frac{{\det }{\left( {\left( h+{\widetilde{D}}\right) }{\boldsymbol{I^{(|{\widetilde{V}}|)}}}+J{\boldsymbol{{\widetilde{{\Lambda }}}}} \right) }}{{\left( 2{\pi }k_\mathrm{{B}}T\right) }^{|{\widetilde{V}}|}}} }} \nonumber \\&{\times } {\exp }{\Bigg (}-{\frac{1}{2k_\mathrm{{B}}T}} {\left( {\boldsymbol{{\widetilde{{\phi }}}}} - {\boldsymbol{{\widetilde{U}}}} {\left( {\left( h+{\widetilde{D}}\right) }{\boldsymbol{I^{(|{\widetilde{V}}|)}}}+J{\boldsymbol{{\widetilde{{\Lambda }}}}}\right) }^{-1} {\boldsymbol{{\widetilde{U}}^\mathrm{{T}}}} {\left( {\boldsymbol{{\widetilde{f}}}}+{\boldsymbol{{\widetilde{g}}}}+h{\boldsymbol{{\widetilde{d}}}}\right) }\right) }^\mathrm{{T}} \nonumber \\&\qquad\qquad\qquad{}{\times } {\boldsymbol{{\widetilde{U}}}} {\left( {\left( h+{\widetilde{D}}\right) }{\boldsymbol{I^{(|{\widetilde{V}}|)}}}+J{\boldsymbol{{\widetilde{{\Lambda }}}}} \right) } {\boldsymbol{{\widetilde{U}}^\mathrm{{T}}}} \nonumber \\&\qquad\qquad\qquad{}{\times } {\left( {\boldsymbol{{\widetilde{{\phi }}}}} - {\boldsymbol{{\widetilde{U}}}} {\left( {\left( h+{\widetilde{D}}\right) }{\boldsymbol{I^{(|{\widetilde{V}}|)}}}+J{\boldsymbol{{\widetilde{{\Lambda }}}}}\right) }^{-1} {\boldsymbol{{\widetilde{U}}^\mathrm{{T}}}} {\left( {\boldsymbol{{\widetilde{f}}}}+{\boldsymbol{{\widetilde{g}}}}+h{\boldsymbol{{\widetilde{d}}}}\right) }\right) } {\Bigg )}, \nonumber \\&\end{aligned}$$

(10.402)

$$\begin{aligned} {\widehat{{\widetilde{{\rho }}}}}_{i}({\phi }_{i})= & {} {\sqrt{{\frac{h+{\widetilde{L}}}{2{\pi }k_\mathrm{{B}}T}}}} {\exp }{\left( -{\frac{1}{2k_\mathrm{{B}}T}}{\left( h+{\widetilde{L}} \right) } {\left( {\phi }_{i}- {\frac{{\widetilde{g}}_{i}+h{\widetilde{d}}_{i}}{h+{\widetilde{L}}}}\right) }^{2} \right) } ~(i{\in }{\widetilde{V}}). \end{aligned}$$

(10.403)

The reduced density matrix ${\boldsymbol{{\widehat{{\widetilde{R}}}}_{i}}}$ and the marginal probability density functions ${\widehat{{\widetilde{{\rho }}}}}_{i}({\phi }_{i})$ and ${\widehat{{\widetilde{{\rho }}}}}{\left( {\phi }_{1},{\phi }_{2},{\cdots },{\phi }_{|{\widetilde{V}}|}\right) }$ need to satisfy the consistencies

$$\begin{aligned} \left\{ \begin{array}{llll} {\displaystyle {{\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}}}} {\phi }_{i}{\widehat{{\widetilde{{\rho }}}}}{\left( {\phi }_{1},{\phi }_{2},{\cdots },{\phi }_{|{\widetilde{V}}|}\right) }d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|{\widetilde{V}}|} ={\displaystyle {\int _{-{\infty }}^{+{\infty }}}} {\phi }_{i}{\widehat{{\widetilde{{\rho }}}}}_{i}{\left( {\phi }_{i}\right) }d{\phi }_{i} =\mathrm{{Tr}}{\boldsymbol{{\sigma }^{z}}}{\boldsymbol{{\widehat{{\widetilde{R}}}}_{i}}}~{\left( i{\in }{\widetilde{V}}\right) },\\ {\displaystyle {{\sum _{i{\in }{\widetilde{V}}}}}}{\displaystyle {{\int _{-{\infty }}^{+{\infty }}}{\int _{-{\infty }}^{+{\infty }}}{\cdots }{\int _{-{\infty }}^{+{\infty }}}}} {{\phi }_{i}}^{2}{\widehat{{\widetilde{{\rho }}}}}{\left( {\phi }_{1},{\phi }_{2},{\cdots },{\phi }_{|{\widetilde{V}}|}\right) }d{\phi }_{1}d{\phi }_{2}{\cdots }d{\phi }_{|{\widetilde{V}}|} ={\displaystyle {{\sum _{i{\in }{\widetilde{V}}}}}}{\displaystyle {\int _{-{\infty }}^{+{\infty }}}} {{\phi }_{i}}^{2}{\widehat{{\widetilde{{\rho }}}}}_{i}{\left( {\phi }_{i}\right) }d{\phi }_{i}=1.\\ \end{array} \right. \nonumber \\ \end{aligned}$$

(10.404)

The Lagrange multipliers ${\boldsymbol{{\widetilde{f}}}}$, ${\boldsymbol{{\widetilde{g}}}}$, ${\widetilde{L}}$, and ${\widetilde{D}}$ are determined so as to satisfy the consistencies in Eq. (10.404), which reduce to the following simultaneous equations:

$$\begin{aligned} {\boldsymbol{{\widetilde{g}}}}+h{\boldsymbol{{\widetilde{d}}}} ={\big (}h+{\widetilde{L}}{\big )} {\boldsymbol{{\widetilde{U}}}} {\left( {\big (}{\widetilde{D}}-{\widetilde{L}}{\big )}{\boldsymbol{I^{(|{\widetilde{V}}|)}}}+J{\boldsymbol{{\widetilde{\Lambda }}}} \right) }^{-1} {\boldsymbol{{\widetilde{U}}^\mathrm{{T}}}} {\boldsymbol{{\widetilde{f}}}}, \end{aligned}$$

(10.405)

$$\begin{aligned} {\boldsymbol{{\widetilde{f}}}}+{\boldsymbol{{\widetilde{g}}}}+h{\boldsymbol{d}} = {\boldsymbol{{\widetilde{U}}}} {\left( {\big (}{\widetilde{D}}-{\widetilde{L}}{\big )}{\boldsymbol{I^{(|{\widetilde{V}}|)}}}+J{\boldsymbol{{\widetilde{\Lambda }}}} \right) }^{-1} {\boldsymbol{{\widetilde{U}}^\mathrm{{T}}}} {\left( \begin{array}{ccc} {\frac{{\widetilde{f}}_{1}}{{\sqrt{{{\widetilde{f}}_{1}}^{2}+{\Gamma }^{2}}}}} {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\sqrt{{{\widetilde{f}}_{1}}^{2}+{\Gamma }^{2}}} \right) } \\ {\frac{{\widetilde{f}}_{2}}{{\sqrt{{{\widetilde{f}}_{2}}^{2}+{\Gamma }^{2}}}}} {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\sqrt{{{\widetilde{f}}_{2}}^{2}+{\Gamma }^{2}}} \right) } \\ {\vdots } \\ {\frac{{\widetilde{f}}_{|{\widetilde{V}}|}}{{\sqrt{{{\widetilde{f}}_{|{\widetilde{V}}|}}^{2}+{\Gamma }^{2}}}}} {\tanh }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\sqrt{{{\widetilde{f}}_{|{\widetilde{V}}|}}^{2}+{\Gamma }^{2}}} \right) } \\ \end{array} \right) }, \nonumber \\ \end{aligned}$$

(10.406)

$$\begin{aligned} {\widetilde{L}}=-h+{\frac{1}{2}}+{\sqrt{ {\frac{1}{4}}+{\frac{1}{|{\widetilde{V}}|}}{\left( {\boldsymbol{{\widetilde{f}}}}+{\boldsymbol{{\widetilde{g}}}}+h{\boldsymbol{{\widetilde{d}}}}\right) }^\mathrm{{T}} {\left( {\boldsymbol{{\widetilde{f}}}}+{\boldsymbol{{\widetilde{g}}}}+h{\boldsymbol{{\widetilde{d}}}}\right) } }}, \end{aligned}$$

(10.407)

$$\begin{aligned} {\frac{1}{ {\frac{1}{2}}+{\sqrt{ {\frac{1}{4}}+{\frac{1}{|{\widetilde{V}}|}}{\left( {\boldsymbol{{\widetilde{f}}}}+{\boldsymbol{{\widetilde{g}}}}+h{\boldsymbol{d}}\right) }^\mathrm{{T}} {\left( {\boldsymbol{{\widetilde{f}}}}+{\boldsymbol{{\widetilde{g}}}}+h{\boldsymbol{{\widetilde{d}}}}\right) } }} }} ={\frac{1}{|{\widetilde{V}}|}}\mathrm{{Tr}}{\left[ {\left( {\big (}{\widetilde{D}}-{\widetilde{L}}{\big )}{\boldsymbol{I^{(|{\widetilde{V}}|)}}}+J{\boldsymbol{{\widetilde{\Lambda }}}} \right) }^{-1} \right] }. \nonumber \\ \end{aligned}$$

(10.408)

5.4 Suzuki-Trotter Decomposition in the Transverse Ising Model

In this section, we review the Suzuki-Trotter formulas and extensions from conventional quantum loopy belief propagation using them. In quantum probabilistic graphical models, the state space is defined by all the eigenvectors of the density matrix ${\boldsymbol{R}}$ and the probability of each eigenvector is given by the eigenvalue as mentioned in Sect. 10.4.2. To compute some statistical quantities by using the Monte Carlo method, it is necessary to diagonalize the energy matrix ${\boldsymbol{H}}$, which is a massive computation. Instead of such a scheme, quantum Monte Carlo methods based on the Suzuki-Trotter formulas were proposed [89]. One important part of the quantum Monte Carlo method is the mapping from a quantum probabilistic graphical model to a conventional (classical) probabilistic graphical model by introducing the techniques of Suzuki-Trotter decompositions. It is known that some statistical quantities for conventional (classical) probabilistic graphical models can be computed by MCMC methods. This is a basic idea behind quantum Monte Carlo methods. Let us first review the Suzuki-Trotter formulas and explicitly give a detailed scheme of Suzuki-Trotter decompositions for the transverse Ising model in Eqs. (10.226) and (10.227) with Eq. (10.301).

From the definition of the exponential function for square matrices, we have

$$\begin{aligned} {\exp }{\left( x({\boldsymbol{A}}+{\boldsymbol{B}})\right) } ={\boldsymbol{I}}+x{\left( {\boldsymbol{A}}+{\boldsymbol{B}}\right) }+{\frac{1}{2}}x^{2}{\left( {\boldsymbol{A}}+{\boldsymbol{B}}\right) }^{2}+\mathcal{{O}}(x^{3}) ~(x{\rightarrow }0), \end{aligned}$$

(10.409)

$$\begin{aligned} {\exp }{\left( x{\boldsymbol{A}}\right) } ={\boldsymbol{I}}+x{\boldsymbol{A}}+{\frac{1}{2}}x^{2}{\boldsymbol{A}}^{2}+\mathcal{{O}}(x^{3}) ~(x{\rightarrow }+0), \end{aligned}$$

(10.410)

$$\begin{aligned} {\exp }{\left( x{\boldsymbol{B}}\right) } ={\boldsymbol{I}}+x{\boldsymbol{B}}+{\frac{1}{2}}x^{2}{\boldsymbol{B}}^{2}+\mathcal{{O}}(x^{3}) ~(x{\rightarrow }+0). \end{aligned}$$

(10.411)

From these equalities, the following formula can be confirmed:

$$\begin{aligned} {\exp }{\left( x({\boldsymbol{A}}+{\boldsymbol{B}})\right) } ={\exp }{\left( x{\boldsymbol{A}}\right) }{\exp }{\left( x{\boldsymbol{B}}\right) }+\mathcal{{O}}(x^{2}) ~(x{\rightarrow }+0). \end{aligned}$$

(10.412)

Moreover, we have

$$\begin{aligned} {\exp }{\left( x{\boldsymbol{A}}\right) } ={\left[ {\exp }{\left( {\frac{x}{M}}{\boldsymbol{A}}\right) }\right] }^{M} +\mathcal{{O}}{\left( {\frac{x^{2}}{M}}\right) } ~(x^{2} \ll M), \end{aligned}$$

(10.413)

$$\begin{aligned} {\exp }{\left( x({\boldsymbol{A}}+{\boldsymbol{B}})\right) } ={\left[ {\exp }{\left( {\frac{x}{M}}{\boldsymbol{A}}\right) }{\exp }{\left( {\frac{x}{M}}{\boldsymbol{B}}\right) }\right] }^{M} +\mathcal{{O}}{\left( {\frac{x^{2}}{M}}\right) } ~(x^{2} \ll M). \end{aligned}$$

(10.414)

Generally, for a graph (V, E) with the set of nodes $V=\{1,2,{\cdots },N\}$ and the set of edges $E={\big \{}\{i,j\}{\big \}}$, we have

$$\begin{aligned} {\exp }{\left( x{\sum _{\{i,j\}{\in }E}}{\boldsymbol{A_{\{i,j\}}}}\right) } ={\left[ {\prod _{\{i,j\}{\in }E}}{\exp }{\left( {\frac{x}{M}}{\boldsymbol{A_{\{i,j\}}}}\right) }\right] }^{M} +\mathcal{{O}}{\left( {\frac{x^{2}}{M}}\right) } ~(x^{2} \ll M), \end{aligned}$$

(10.415)

$$\begin{aligned}&{\exp }{\left( x{\sum _{\{i,j\}{\in }E}}{\boldsymbol{A_{\{i,j\}}}}+x{\sum _{\{i,j\}{\in }E}}{\boldsymbol{B_{\{i,j\}}}}\right) } \nonumber \\&={\left[ {\left( {\prod _{\{i,j\}{\in }E}}{\exp }{\left( {\frac{x}{M}}{\boldsymbol{A_{\{i,j\}}}}\right) }\right) } {\left( {\prod _{\{i,j\}{\in }E}}{\exp }{\left( {\frac{x}{M}}{\boldsymbol{B_{\{i,j\}}}}\right) }\right) } \right] }^{M} +\mathcal{{O}}{\left( {\frac{x^{2}}{M}}\right) } ~(x^{2}\ll M). \nonumber \\&\end{aligned}$$

(10.416)

These are referred to as a Suzuki-Trotter Decomposition [87, 88].

For the case of $N=2$, we consider an energy matrix ${\boldsymbol{H}}$ defined by

$$\begin{aligned} {\boldsymbol{H}}=-J{\boldsymbol{{\sigma }_{1}^{z}}}{\boldsymbol{{\sigma }_{2}^{z}}}-h_{1}{\boldsymbol{{\sigma }_{1}^{z}}}-h_{2}{\boldsymbol{{\sigma }_{2}^{z}}}-{\Gamma }{\boldsymbol{{\sigma }_{1}^{x}}}-{\Gamma }{\boldsymbol{{\sigma }_{2}^{x}}}. \end{aligned}$$

(10.417)

It is referred to as a quantum transverse Ising model on a chain ${\big (}V={\big \{}1,2{\big \}},E={\big \{}\{1,2\}{\big \}}{\big )}$ with three nodes and two edges. By using the above Suzuki-Trotter formula, we have

$$\begin{aligned}&\qquad{\langle }s_{1,1},s_{2,1}|{\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\boldsymbol{H}}\right) }|s'_{1,1},s'_{2,2}{\rangle } \nonumber \\&{}={\lim _{M{\rightarrow }+{\infty }}}{\left[ {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( J{\boldsymbol{{\sigma }_{1}^{z}}}{\boldsymbol{{\sigma }_{2}^{z}}}+h_{1}{\boldsymbol{{\sigma }_{1}^{z}}}+h_{2}{\boldsymbol{{\sigma }_{2}^{z}}} \right) }\right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( {\Gamma }{\boldsymbol{{\sigma }_{1}^{x}}}+{\Gamma }{\boldsymbol{{\sigma }_{2}^{x}}} \right) }\right) } \right] }^{M} \nonumber \\&{}={\lim _{M{\rightarrow }+{\infty }}} {\sum _{{\tau }_{1,1}{\in }{\Omega }}}{\sum _{{\tau }_{2,1}{\in }{\Omega }}} {\sum _{s_{1,2}{\in }{\Omega }}}{\sum _{s_{2,2}{\in }{\Omega }}} {\sum _{{\tau }_{1,2}{\in }{\Omega }}}{\sum _{{\tau }_{2,2}{\in }{\Omega }}} {\cdots } {\sum _{s_{1,M}{\in }{\Omega }}}{\sum _{s_{2,M}{\in }{\Omega }}} {\delta }_{s_{1,M+1},s'_{1,1}} {\delta }_{s_{2,M+1},s'_{2,1}} \nonumber \\&\qquad\quad{\times } {\prod _{m=1}^{M}} {\Bigg (} {\langle }s_{1,m},s_{2,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( J{\boldsymbol{{\sigma }_{1}^{z}}}{\boldsymbol{{\sigma }_{2}^{z}}}+h_{1}{\boldsymbol{{\sigma }_{1}^{z}}}+h_{2}{\boldsymbol{{\sigma }_{2}^{z}}} \right) }\right) } |{\tau }_{1,m},{\tau }_{2,m}{\rangle } \nonumber \\&\qquad\qquad\qquad\quad{}{\times } {\langle }{\tau }_{1,m},{\tau }_{2,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( {\Gamma }{\boldsymbol{{\sigma }_{1}^{x}}}+{\Gamma }{\boldsymbol{{\sigma }_{2}^{x}}} \right) }\right) } |s_{1,m+1},s_{2,m+1}{\rangle } {\Bigg )}. \end{aligned}$$

(10.418)

Note that

$$\begin{aligned} {\Gamma }{\boldsymbol{{\sigma }_{1}^{x}}}+{\Gamma }{\boldsymbol{{\sigma }_{2}^{x}}} ={\Gamma } {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}}+{\Gamma }{\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}}. \end{aligned}$$

(10.419)

By using Eq. (10.253), Eq. (10.418) can be rewritten as

$$\begin{aligned}&{\langle }s_{1,1},s_{2,1}|{\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\boldsymbol{H}}\right) }|s'_{1,1},s'_{2,1}{\rangle } \nonumber \\&={\lim _{M{\rightarrow }+{\infty }}} {\sum _{s_{1,2}{\in }{\Omega }}}{\sum _{s_{2,2}{\in }{\Omega }}} {\cdots } {\sum _{s_{1,M}{\in }{\Omega }}}{\sum _{s_{2,M}{\in }{\Omega }}} {\delta }_{s_{1,M+1},s'_{1,1}} {\delta }_{s_{2,M+1},s'_{2,1}} \nonumber \\&\qquad{}{\times } {\prod _{m=1}^{M}} {\Bigg (} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( Js_{1,m}s_{2,m}+h_{1}s_{1,m}+h_{2}s_{2,m}\right) }\right) } \nonumber \\&\qquad\quad{}{\times } {\langle }s_{1,m},s_{2,m}| {\left( {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}}\right) }{\otimes }{\boldsymbol{I}}\right) } {\left( {\boldsymbol{I}}{\otimes }{\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) }\right) } |s_{1,m+1},s_{2,m+1}{\rangle } {\Bigg )}. \nonumber \\&\end{aligned}$$

(10.420)

Moreover, by the definition of the tensor product for ${\boldsymbol{A}}{\otimes }{\boldsymbol{I}}$ and ${\boldsymbol{I}}{\otimes }{\boldsymbol{A}}$ for any matrix ${\boldsymbol{A}}$ in terms of Eq. (10.238), we have

$$\begin{aligned}&{\langle }s_{1,m},s_{2,m}| {\left( {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}}\right) }{\otimes }{\boldsymbol{I}}\right) } {\left( {\boldsymbol{I}}{\otimes }{\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) }\right) } |s_{1,m+1},s_{2,m+1}{\rangle } \nonumber \\&\qquad{} = {\langle }s_{1,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}}\right) } |s_{1,m+1}{\rangle } {\langle }s_{2,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) } |s_{2,m+1}{\rangle } \nonumber \\&\qquad{} = {\langle }s_{1,m}| {\left( \begin{array}{ccc} {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } &{} {\sinh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } \\ {\sinh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } &{} {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } \end{array} \right) } |s_{1,m+1}{\rangle } \nonumber \\&\qquad\qquad{}{\times } {\langle }s_{2,m}| {\left( \begin{array}{ccc} {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } &{} {\sinh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } \\ {\sinh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } &{} {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } \end{array} \right) } |s_{2,m+1}{\rangle }. \end{aligned}$$

(10.421)

Equation (10.420) can be rewritten in terms of the two-dimensional representation as

$$\begin{aligned}&{\langle }s_{1,1},s_{2,1}|{\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\boldsymbol{H}}\right) }|s'_{1,1},s'_{2,1}{\rangle } \nonumber \\&={\lim _{M{\rightarrow }+{\infty }}} {\sum _{s_{1,2}{\in }{\Omega }}}{\sum _{s_{2,2}{\in }{\Omega }}} {\cdots } {\sum _{s_{1,M}{\in }{\Omega }}}{\sum _{s_{2,M}{\in }{\Omega }}} {\delta }_{s_{1,M+1},s'_{1,1}} {\delta }_{s_{2,M+1},s'_{2,1}} \nonumber \\&\qquad\quad{}{\times } {\prod _{m=1}^{M}} {\Bigg (} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( Js_{1,m}s_{2,m}+h_{1}s_{1,m}+h_{2}s_{2,m}\right) }\right) } \nonumber \\&\qquad\qquad\quad{}{\times } {\langle }s_{1,m}| {\left( \begin{array}{ccc} {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } &{} {\sinh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } \\ {\sinh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } &{} {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } \end{array} \right) } |s_{1,m+1}{\rangle } \nonumber \\&\qquad\qquad\quad{}{\times } {\langle }s_{2,m}| {\left( \begin{array}{ccc} {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } &{} {\sinh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } \\ {\sinh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } &{} {\cosh }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } \right) } \end{array} \right) } |s_{2,m+1}{\rangle } {\Bigg )}. \end{aligned}$$

(10.422)

Eventually, the density matrix ${\boldsymbol{P}}$ of the transverse Ising model for two nodes in Eq. (10.417), such that

$$\begin{aligned} {\boldsymbol{P}} \equiv {\frac{{\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\boldsymbol{H}} \right) }}{\mathrm{{Tr}}{\left[ {\exp }{\left( -{\frac{1}{k_\mathrm{{B}}T}}{\boldsymbol{H}} \right) } \right] }}}, \end{aligned}$$

(10.423)

can be reduced to the probability distribution $P^{(M)}({\boldsymbol{s_{1}}},{\boldsymbol{s_{2}}},{\boldsymbol{s_{3}}},{\cdots },{\boldsymbol{s_{M}}},{\boldsymbol{s_{M+1}}})$ for ${\boldsymbol{s_{m}}}= {\left( \begin{array}{ccc} s_{1,m} \\ s_{2,m} \\ \end{array} \right) }$ ($m=1,2,{\cdots },M+1$) on the $2{\times }(M+1)$ ladder graph as follows:

$$\begin{aligned}&{\langle }s_{1,1},s_{2,1}|{\boldsymbol{P}}|s'_{1,1},s'_{2,1}{\rangle } \nonumber \\&={\lim _{M{\rightarrow }+{\infty }}} {\sum _{{\boldsymbol{s_{1}}}{\in }{\Omega }^{2}}} {\sum _{{\boldsymbol{s_{2}}}{\in }{\Omega }^{2}}} {\cdots } {\sum _{{\boldsymbol{s_{M}}}{\in }{\Omega }^{2}}} {\sum _{{\boldsymbol{s_{M+1}}}{\in }{\Omega }^{2}}} {\delta }_{s_{1,M+1},s'_{1,1}}{\delta }_{s_{2,M+1},s'_{1,2}} P^{(M)}({\boldsymbol{s_{1}}},{\boldsymbol{s_{2}}},{\boldsymbol{s_{3}}},{\cdots },{\boldsymbol{s_{M}}},{\boldsymbol{s_{M+1}}}), \nonumber \\&\end{aligned}$$

(10.424)

where

$$\begin{aligned}&P^{(M)}({\boldsymbol{s_{1}}},{\boldsymbol{s_{2}}},{\boldsymbol{s_{3}}},{\cdots },{\boldsymbol{s_{M}}},{\boldsymbol{s_{M+1}}}) \nonumber \\&\qquad\quad{} \equiv {\frac{1}{Z^{(M)}}} {\prod _{m=1}^{M}} {\Bigg (} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( Js_{1,m}s_{2,m}+h_{1}s_{1,m}+h_{2}s_{2,m}\right) }\right) } \nonumber \\&\qquad\qquad\qquad{}{\times } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( K{\left( MT,{\Gamma } \right) }s_{1,m}s_{1,m+1}+K{\left( MT,{\Gamma } \right) }s_{2,m}s_{2,m+1}\right) } \right) } {\Bigg )}, \nonumber \\&\end{aligned}$$

(10.425)

$$\begin{aligned} Z^{(M)}\equiv & {} {\sum _{s_{1,2}{\in }{\Omega }}}{\sum _{s_{2,2}{\in }{\Omega }}} {\cdots } {\sum _{s_{1,M}{\in }{\Omega }}}{\sum _{s_{2,M}{\in }{\Omega }}} {\delta }_{s_{1,M+1},s_{1,1}} {\delta }_{s_{2,M+1},s_{2,1}} \nonumber \\&\quad{}{\times } {\prod _{m=1}^{M}} {\Bigg (} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( Js_{1,m}s_{2,m}+h_{1}s_{1,m}+h_{2}s_{2,m}\right) }\right) } \nonumber \\&\qquad{}{\times } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\left( K{\left( MT,{\Gamma } \right) }s_{1,m}s_{1,m+1}+K{\left( MT,{\Gamma } \right) }s_{2,m}s_{2,m+1}\right) } \right) } {\Bigg )}, \nonumber \\&\end{aligned}$$

(10.426)

$$\begin{aligned} K{\left( T,{\Gamma } \right) } \equiv k_\mathrm{{B}}T {\ln }{\left( {\sqrt{{\frac{{\cosh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\Gamma } \right) }}{{\sinh }{\left( {\frac{1}{k_\mathrm{{B}}T}}{\Gamma } \right) } }}}} \right) }. \end{aligned}$$

(10.427)

The density matrix ${\boldsymbol{P}}$ in Eq. (10.423) of the transverse Ising model for |V| nodes $V=\{1,2,{\cdots },|V|\}$, which is given by Eqs. (10.226) and (10.227) with Eq. (10.301), can be reduced to the matrix representation for s probability distribution $P^{(M)}({\boldsymbol{s_{1}}},{\boldsymbol{s_{2}}},{\boldsymbol{s_{3}}},{\cdots },{\boldsymbol{s_{M}}},{\boldsymbol{s_{M+1}}})$ for ${\boldsymbol{s_{m}}}= {\left( \begin{array}{ccc} s_{1,m} \\ s_{2,m} \\ {\vdots } \\ s_{N,m} \\ \end{array} \right) }$ ($m=1,2,{\cdots },M+1$) on the $|V|{\times }(M+1)$ ladder graph as follows:

$$\begin{aligned}&{\langle }s_{1,1},s_{2,1},{\cdots },s_{|V|,1}|{\boldsymbol{P}}|s'_{1,1},s'_{2,1},{\cdots },s'_{|V|,1}{\rangle } \nonumber \\&={\lim _{M{\rightarrow }+{\infty }}} {\sum _{{\boldsymbol{s_{1}}}{\in }{\Omega }^{|V|}}} {\sum _{{\boldsymbol{s_{2}}}{\in }{\Omega }^{|V|}}} {\cdots } {\sum _{{\boldsymbol{s_{M}}}{\in }{\Omega }^{|V|}}} {\sum _{{\boldsymbol{s_{M+1}}}{\in }{\Omega }^{|V|}}} {\left( {\prod _{i{\in }V}} {\delta }_{s_{i,M+1},s'_{i,1}} \right) } P^{(M)}({\boldsymbol{s_{1}}},{\boldsymbol{s_{2}}},{\boldsymbol{s_{3}}},{\cdots },{\boldsymbol{s_{M}}},{\boldsymbol{s_{M+1}}}), \nonumber \\&\end{aligned}$$

(10.428)

where

$$\begin{aligned}&P^{(M)}({\boldsymbol{s_{1}}},{\boldsymbol{s_{2}}},{\boldsymbol{s_{3}}},{\cdots },{\boldsymbol{s_{M}}},{\boldsymbol{s_{M+1}}}) \nonumber \\&\quad{} \equiv {\frac{1}{Z^{(M)}}} {\prod _{m=1}^{M}} {\left( {\prod _{\{i,j\}{\in }E}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} Js_{i,m}s_{j,m} \right) } \right) } \nonumber \\&\qquad\qquad\qquad{}{\times } {\left( {\prod _{i{\in }V}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}K{\left( MT,{\Gamma } \right) }s_{i,m}s_{i,m+1} \right) } \right) } {\left( {\prod _{i{\in }V}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} h_{i}s_{i,m} \right) } \right) }, \nonumber \\&\end{aligned}$$

(10.429)

$$\begin{aligned} Z^{(M)}\equiv & {} {\sum _{{\boldsymbol{s_{1}}}{\in }{\Omega }^{|V|}}} {\sum _{{\boldsymbol{s_{2}}}{\in }{\Omega }^{|V|}}} {\cdots } {\sum _{{\boldsymbol{s_{M}}}{\in }{\Omega }^{|V|}}} {\left( {\prod _{i{\in }V}} {\delta }_{s_{i,M+1},s_{i,1}} \right) } \nonumber \\&\qquad\qquad{}{\times } {\prod _{m=1}^{M}} {\left( {\prod _{\{i,j\}{\in }E}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} Js_{i,m}s_{j,m} \right) } \right) } \nonumber \\&\qquad\qquad\quad{}{\times } {\left( {\prod _{i{\in }V}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}K{\left( MT,{\Gamma } \right) }s_{i,m}s_{i,m+1} \right) } \right) } {\left( {\prod _{i{\in }V}} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} h_{i}s_{i,m} \right) } \right) }. \nonumber \\&\end{aligned}$$

(10.430)

The dynamics of quantum Monte Carlo methods based on Suzuki-Trotter decompositions have been analyzed by using Glauber dynamics [101, 102] and Langevin dynamics [103, 104]. Recently, these analyses are applied to some statistical machine learning systems with quantum annealing [105, 106]. Some statistical analysis of quantum Monte Carlo methods for statistical inferences based on Suzuki-Trotter decompositions [87, 88] are shown in Chaps. 12 and 13 of Part III of this book.

We now try to construct a modification of the conventional quantum message passing rule in Eq. (10.335) for the transverse Ising model in Eqs. (10.226) and (10.227) with Eq. (10.301) by imposing the assumption that all off-diagonal elements of ${\boldsymbol{{\lambda }_{j{\rightarrow }i}}}$ and ${\boldsymbol{{\lambda }_{i{\rightarrow }j}}}$ for any edge $\{i,j\}({\in }E)$ are zero. By using the Suzuki-Trotter formulas in Eqs. (10.415)–(10.416), Eq. (10.335) can be represented as follows:

$$\begin{aligned}&{\lim _{M{\rightarrow }+{\infty }}} {\left[ {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} hd_{i}{\boldsymbol{{\sigma }^{z}}} \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\boldsymbol{{\lambda }_{j{\rightarrow }i}}} \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} \right) } \right] }^{M} \nonumber \\&= {\frac{Z_{i}}{Z_{\{i,j\}}}} {\lim _{M{\rightarrow }+{\infty }}} \mathrm{{Tr}}_{{\setminus }i}{\Bigg [} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}} \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} hd_{i}{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}}\right) } \right) } \nonumber \\&{\times } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} J{\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}})({\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}}\right) } \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} hd_{j}{\left( {\boldsymbol{{\boldsymbol{I}}{\otimes }{\sigma }^{z}}}\right) } \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma } {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{x}}} \right) } \nonumber \\&{\times } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\boldsymbol{I}}{\otimes } {\left( {\sum _{l{\in }{\partial }j{\setminus }\{i\}}}{\boldsymbol{{\lambda }_{l{\rightarrow }j}}} \right) } \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\left( {\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} \right) } {\otimes }{\boldsymbol{I}} \right) } {\Bigg ]}^{M}, \end{aligned}$$

(10.431)

such that,

$$\begin{aligned}&{\lim _{M{\rightarrow }+{\infty }}} {\sum _{s_{i,2}{\in }{\Omega }}} {\cdots } {\sum _{s_{i,M}{\in }{\Omega }}} {\left( {\prod _{m=1}^{M}} {\langle }s_{i,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} \right) } |s_{i,m}{\rangle } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} hd_{i}s_{i,m} \right) } \right) } \nonumber \\&{\times } {\delta }_{s'_{i,1},s_{i,M+1}} {\prod _{m=1}^{M}}{\left( {\langle }s_{i,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\boldsymbol{{\lambda }_{j{\rightarrow }i}}} \right) } |s_{i,m}{\rangle } {\langle }s_{i,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) } |s_{i,m+1}{\rangle } \right) } \nonumber \\&= {\lim _{M{\rightarrow }+{\infty }}} {\sum _{s_{i,2}{\in }{\Omega }}} {\cdots } {\sum _{s_{i,M}{\in }{\Omega }}} {\left( {\prod _{m=1}^{M}} {\langle }s_{i,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\sum _{k{\in }{\partial }i{\setminus }\{j\}}}{\boldsymbol{{\lambda }_{k{\rightarrow }i}}} \right) } |s_{i,m}{\rangle } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} hd_{i}s_{i,m} \right) } \right) } \nonumber \\&{\times } {\Bigg (} {\delta }_{s'_{i,1},s_{i,M+1}} {\frac{Z_{i}}{Z_{\{i,j\}}}} {\sum _{s_{j,1}{\in }{\Omega }}}{\sum _{s'_{j,1}{\in }{\Omega }}} {\delta }_{s_{j,1},s'_{j,1}} \nonumber \\&{\times } {\sum _{s_{j,2}{\in }{\Omega }}} {\cdots } {\sum _{s_{j,M}{\in }{\Omega }}} {\delta }_{s'_{j,1},s_{j,M+1}} {\prod _{m=1}^{M}}{\Bigg (} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\left( Js_{i,m}s_{j,m} \right) } \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}hd_{j}s_{j,m} \right) } \nonumber \\&{\times } {\langle }s_{j,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\sum _{l{\in }{\partial }j{\setminus }\{i\}}}{\boldsymbol{{\lambda }_{l{\rightarrow }j}}} \right) } |s_{j,m}{\rangle } \nonumber \\&{\times } {\langle }s_{i,m}|{\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}}\right) }|s_{i,m+1}{\rangle } {\langle }s_{j,m}|{\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}}\right) }|s_{j,m+1}{\rangle } {\Bigg )} {\Bigg )}. \end{aligned}$$

(10.432)

The sufficient conditions for Eq. (10.432) are given by

$$\begin{aligned}&{\delta }_{s'_{i,1},s_{i,M+1}} {\prod _{m=1}^{M}}{\Bigg (} {\langle }s_{i,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\boldsymbol{{\lambda }_{j{\rightarrow }i}}} \right) } |s_{i,m}{\rangle } {\langle }s_{i,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) } |s_{i,m+1}{\rangle } {\Bigg )} \nonumber \\&= {\delta }_{s'_{i,1},s_{i,M+1}} {\frac{Z_{i}}{Z_{\{i,j\}}}} {\sum _{s_{j,1}{\in }{\Omega }}}{\sum _{s'_{j,1}{\in }{\Omega }}} {\delta }_{s_{j,1},s'_{j,1}} \nonumber \\&\qquad{}{\times } {\sum _{s_{j,2}{\in }{\Omega }}} {\cdots } {\sum _{s_{j,M}{\in }{\Omega }}} {\delta }_{s'_{j,1},s_{j,M+1}} {\prod _{m=1}^{M}}{\Bigg (} {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}Js_{i,m}s_{j,m} \right) } {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}hd_{j}s_{j,m} \right) } \nonumber \\&{\times } {\langle }s_{j,m}| {\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}} {\sum _{l{\in }{\partial }j{\setminus }\{i\}}}{\boldsymbol{{\lambda }_{l{\rightarrow }j}}} \right) } |s_{j,m}{\rangle } \nonumber \\&{\times } {\langle }s_{i,m}|{\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}}\right) }|s_{i,m+1}{\rangle } {\langle }s_{j,m}|{\exp }{\left( {\frac{1}{k_\mathrm{{B}}TM}}{\Gamma }{\boldsymbol{{\sigma }^{x}}}\right) }|s_{j,m+1}{\rangle } {\Bigg )}. \nonumber \\&\end{aligned}$$

(10.433)

By taking the summations ${\displaystyle {{\sum _{s_{i,2}{\in }{\Omega }}}{\cdots }{\sum _{s_{i,M}{\in }{\Omega }}}}}$ and the limit $M{\rightarrow }+{\infty }$ on both sides of Eq. (10.433), modified message passing rules can be derived as follows:

$$\begin{aligned}&{\exp }{\left( {\frac{1}{k_\mathrm{{B}}T}} {\boldsymbol{{\lambda }_{j{\rightarrow }i}}} + {\frac{1}{k_\mathrm{{B}}T}} {\Gamma }{\boldsymbol{{\sigma }^{x}}} \right) } \nonumber \\&\qquad{} = {\frac{Z_{i}}{Z_{\{i,j\}}}} \mathrm{{Tr}}_{{\setminus }i}{\Bigg [} {\exp }{\Bigg (}{\frac{1}{k_\mathrm{{B}}T}} {\Big (} J {\left( {\boldsymbol{{\sigma }^{z}}}{\otimes }{\boldsymbol{I}} \right) }{\left( {\boldsymbol{I}}{\otimes }{\boldsymbol{{\sigma }^{z}}} \right) } + {\Gamma }{\left( {\boldsymbol{{\sigma }^{x}}}{\otimes }{\boldsymbol{I}} \right) } {\Big )} \nonumber \\&\qquad\qquad{} + {\frac{1}{k_\mathrm{{B}}T}} {\boldsymbol{I}}{\otimes }{\Big (}hd_{j}{\boldsymbol{{\sigma }^{z}}} +{\Gamma }{\boldsymbol{{\sigma }^{x}}} +{\sum _{l{\in }{\partial }j{\setminus }\{i\}}}{\boldsymbol{{\lambda }_{l{\rightarrow }j}}} {\Big )} {\Bigg )} {\Bigg ]}. \end{aligned}$$

(10.434)

We remark that the modified message passing rules of Eq. (10.434) can be derived by considering the Bethe free energy functional in the cluster variation method with a ladder-type basic cluster for the probabilistic graphical model [107] in Eqs. (10.429)–(10.430). While the conventional framework of quantum belief propagations was given as a quantum cluster variation method in Ref. [90], some extensions of loopy belief propagations have been proposed in Refs. [108,109,110,111] from a quantum statistical mechanical standpoint.

6 Concluding Remarks

This chapter explored sublinear modeling based on statistical mechanical informatics for statistical machine learning. In statistical machine learning, we need to compute some statistical quantities in massive probabilistic graphical models. Statistical mechanical informatics can provide us with many statistical approximate computational techniques. One is the advanced mean-field framework, which includes mean-field methods and loopy belief propagation methods such as the Bethe approximation. The advanced mean-field framework can provide good accuracy for statistical quantities, including averages and covariances. Some statistical quantities in probabilistic graphical models sometimes have phase transitions when computing the advanced mean-field method. As we have already shown in Sect. 10.3.3, we have two familiar phase transitions, namely, the first- and second-order phase transitions. Each step of the EM algorithm is often affected by the first-order phase transition because the internal energy in the prior probabilistic model has a discontinuity. This difficulty appears in the convergence procedure of the EM algorithm, in which the trajectory of a hyperparameter passes through not only the equilibrium state but also metastable and unstable states in the loopy belief propagation of probabilistic segmentations in Sect. 10.3.5. We show that some algorithms based on loopy belief propagation in probabilistic segmentations can be accelerated by the inverse real-space renormalization group techniques in Sect. 10.3.6.

The second part of this chapter explored quantum statistical machine learning and some statistical approximate algorithms in quantum statistical mechanical informatics for realizing the framework. Quantum mechanical computations for machine learning are rapidly developing in terms of both academic research and industrial implementation. In Sect. 10.4, we explained the modeling framework of density matrices and some fundamental mathematics for it and expanded the modeling framework to the quantum expectation-maximization algorithm. In Sect. 10.5, we showed the fundamental frameworks of quantum loopy belief propagation and quantum statistical mechanical extensions of the adaptive TAP method. Moreover, we reviewed the Suzuki-Trotter expansion, and the real and the momentum space renormalization group for sublinear modeling of density matrices.

Recently, we have the framework of massive fundamental mathematical modeling in the statistical machine learning theory for many practical applications, such that, mainly the sparse modeling [4, 5] and the deep learning [10]. Many academic researchers are interested in interpretations of such modelings in the stand point of probabilistic graphical models in the statistical mathematics [2, 3, 7] and the statistical mechanical informatics [8, 9, 13, 17]. Now we have novel technologies for realizing quantum computing in the stand point of quantum mechanical extensions of the statistical mechanical informatics, such that, for example, D-wave Quantum Annealer. Some results in which the D-Wave quantum annealers have achieved high performance computing have appeared in Refs. [112,113,114,115,116]. Some recent developments of the probabilistic graphical modelings and their static and dynamical analysis of the advanced mean-field methods and the Suzuki-Trotter decompositions as well as the replica methods for realizing sublinear modeling are shown in the subsequent Chaps. 12 and 13 of the present part of this book, in the statistical mechanical point of view.

Notes

1.
Glauber dynamics was proposed in Ref. [32].
2.
A review of both exact results and approximate results as well as perturbative computations for Ising models is given in Refs. [34, 35].
3.
Equation (10.73) is often referred to as the naive mean-field equation in statistical machine learning theory [2, 3, 12].

References

J.C.D. MacKay, Information Theory, Inference, and Learning Algorithms (Cambridge University Press, 2003)
Google Scholar
D. Koller, N. Friedman, Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009)
Google Scholar
P.K. Murphy, Machine Learning: A Probabilistic Perspective (MIT Press, 2012)
Google Scholar
I. Rish, G.Y. Grabarnik, Sparse Modelling: Theory, Algorithms, and Applications (Chapman & Hall/CRC, 2015)
Google Scholar
T. Hastie, R. Tibshirani, M.J. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations (Chapman & Hall/CRC, 2015)
Google Scholar
J. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computation (CRC Press, 1991)
Google Scholar
W. Jim, D. Kay, Statistics and Neural Networks: Advances at the Interface. Royal Statistical Society Lecture Notes Series, vol. 5, ed. by M. Titterington (Oxford University Press, 2000)
Google Scholar
A. Engel, C. Van den Broeck, Statistical Mechanics of Learning (Cambridge University Press, 2001)
Google Scholar
A.C.C. Coolen, R. Kühn, P. Sollich, Theory of Neural Information Processing Systems (Oxford University Press, 2005)
Google Scholar
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. Adaptive Computation and Machine Learning Series (MIT Press, 2016)
Google Scholar
M.J. Wainwright, M.I. Jordan, Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends®. Mach. Learn. 1(1–2), 1–305 (2008). https://doi.org/10.1561/2200000001
M. Opper, D. Saad (eds.), Advanced Mean Field Methods—Theory and Practice (MIT Press, 2001)
Google Scholar
H. Nishimori, Statistical Physics of Spin Glasses and Information Processing: Introduction. International Series of Monographs on Physics (Oxford Science Publications, 2001)
Google Scholar
K. Tanaka, Statistical-mechanical approach to image processing (Topical Review). J. Phys. A Math. Gen. 35(37), R81–R150 (2002). https://doi.org/10.1088/0305-4470/35/37/201
T. Tanaka, A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors. IEEE Trans. Inf. Theory 48(11), 2888–2910 (2002). https://doi.org/10.1109/TIT.2002.804053
Y. Kabashima, D. Saad, Statistical mechanics of low-density parity-check codes (Topical Review). J. Phys. A Math. Gen. 37(6), R1–R43 (2004). https://doi.org/10.1088/0305-4470/37/6/R01
M. Mézard, A. Montanari, Information, Physics and Computation (Oxford University Press, 2009)
Google Scholar
A.K. Hartmann, H. Rieger, Optimization Algorithms in Physics (Wiley-VCH, 2001)
Google Scholar
K.A. Hartmann, M. Weigt, Phase Transitions in Combinatorial Optimization Problems (Wiley-VCH, 2005)
Google Scholar
T. Kadowaki, H. Nishimori, Quantum annealing in the transverse Ising model. Phys. Rev. E 58(5), 5, 5355–5263 (1998). https://doi.org/10.1103/PhysRevE.58.5355
A. Das, B.K. Chakrabarti (eds.), Quantum Annealing and Related Optimization Methods. Lecture Notes in Physics, vol. 679 (Springer, 2004)
Google Scholar
C.C. McGeoch, in Adiabatic Quantum Computation and Quantum Annealing: Theory and Practice (Morgan & Claypool, 2014)
Google Scholar
T. Albash, D.A. Lidar, Adiabatic quantum computation. Rev. Modern Phys. 90(1) (2018). Article ID 015002. https://doi.org/10.1103/RevModPhys.90.015002
E.K. Grant, T.S. Humble, Adiabatic Quantum Computing and Quantum Annealing. Oxford Research Encyclopedias (Oxford University Press and the American Institute of Physics, 2020). https://doi.org/10.1093/acrefore/9780190871994.013.32
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological) 39(1), 1–38 (1977). With discussion
Google Scholar
K.L. Mengersen, C.P. Robert, D. Michael Titterington, Mixtures: Estimation and Applications (Wiley, 2011)
Google Scholar
J. Marroquin, S. Mitter, T. Poggio, Probabilistic solution of ill-posed problems in computational vision. J. Am. Stat. Assoc. 82(397), 76–89 (1987). https://doi.org/10.1080/01621459.1987.10478393
W. Qian, D. Michael Titterington, Stochastic relaxations and em algorithms for Markov random fields. J. Stat. Comput. Simul. 40(1–2), 55–69 (1992) https://doi.org/10.1080/00949659208811365
C. Andrieu, N. De Freitas, A. Doucet, M.I. Jordan, An introduction to MCMC for machine learning. Mach. Learn. 50(1–2), 5–43 (2003). https://doi.org/10.1023/A:1020281327116
K. Tanaka, D. Michael Titterington, Statistical trajectory of approximate EM algorithm for probabilistic image processing. J. Phys. A Math. Theor. 40(37), 11285–11300 (2007). https://doi.org/10.1088/1751-8113/40/37/007
J. Inoue, K. Tanaka, Dynamics of the maximum marginal likelihood hyperparameter estimation in image restoration: gradient descent versus expectation and maximization algorithm. Phys. Rev. E 65(1) (2002). Article ID.016125. https://doi.org/10.1103/PhysRevE.65.016125
J. Roy, Glauber: time-dependent statistics of the Ising model. J. Math. Phys. 4(2), 294–307 (1963). https://doi.org/10.1063/1.1703954
F.Y. Wu, The Potts model, Rev. Modern Phys. 54(1), 235–268 (1982). https://doi.org/10.1103/RevModPhys.54.235
C. Domb, On the theory of cooperative phenomena in crystals. Adv. Phys. 9(34), 149–244 (1960). https://doi.org/10.1080/00018736000101189
C. Domb, On the theory of cooperative phenomena in crystals. Adv. Phys. 9(35), 245–361 (1960). https://doi.org/10.1080/00018736000101199
G. Parisi, Statistical Field Theory (Addison-Wesley, 1988)
Google Scholar
H. Nishimori, G. Ortiz, Elements of Phase Transitions and Critical Phenomena (Oxford University Press, 2011)
Google Scholar
D. Ruelle, Statistical Mechanics: Rigorous Results (Imperial College Press, 1969)
Google Scholar
T. Morita, Variational principle for the distribution function of the effective field for the random Ising model in the Bethe approximation. Phys. A Stat. Mech. Appl. 98(3), 566–572 (1979).https://doi.org/10.1016/0378-4371(79)90154-7
T. Morita, Variational principle for regular and random Ising models on the cactus tree or on the usual lattice in the “cactus approximation.” Phys. A 105(3), 620–630 (1981). https://doi.org/10.1016/0378-4371(81)90115-1
T. Horiguchi, On the Bethe approximation for the random bond Ising model. Phys. A Stat. Mech. Appl. 107(2), 360–370 (1981). https://doi.org/10.1016/0378-4371(81)90095-9
Article MathSciNet Google Scholar
T. Morita, Cluster variation method of cooperative phenomena and its generalization I. J. Phys. Soc. Jpn. 12(10), 753–755 (1957). https://doi.org/10.1143/JPSJ.12.753
Article Google Scholar
T. Morita, General structure of the distribution functions for the Heisenberg model and the Ising model. J. Math. Phys. 13(1), 115–123 (1972). https://doi.org/10.1063/1.1665840
Article Google Scholar
T. Morita, Cluster variation method and Möbius inversion formula. J. Stat. Phys. 59(3–4), 819–825 (1990). https://doi.org/10.1007/BF01025852
Article MATH Google Scholar
T. Morita, Cluster variation method for non-uniform Ising and Heisenberg models and spin-pair correlation function. Progress Theor. Phys. 85(2), 243–255 (1991). https://doi.org/10.1143/ptp/85.2.243
Y. Kabashima, D. Saad, Belief propagation vs. TAP for decoding corrupted messages. Europhys. Lett. 44(5), 668–674 (1998). https://doi.org/10.1209/epl/i1998-00524-7
J.S. Yedidia, W.T. Freeman, Y. Weiss, Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory 51(7), 2282–2312 (2005). https://doi.org/10.1109/TIT.2005.850085
A. Pelizzola, Cluster variation method in statistical physics and probabilistic graphical models (Topical Review). J. Phys. A Math. Gen. 38(2005), R309–R339 (2005). https://doi.org/10.1088/0305-4470/38/33/R01
Article MathSciNet Google Scholar
D.L. Donoho, A. Maleki, A. Montanari, Message-passing algorithms for compressed sensing, in Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 45 (2009), pp. 18914–18919. https://doi.org/10.1073/pnas.0909892106
T. Rizzo, A. Lage-Castellanos, F. Ricci-Tersenghi, Replica cluster variational method. J. Stat. Phys. 139(3), 367–374 (2010). https://doi.org/10.1007/s10955-010-9938-3
Article MathSciNet MATH Google Scholar
M. Yasuda, S. Kataoka, K. Tanaka, Statistical analysis of loopy belief propagation in random fields. Phys. Rev. E 92(4) (2015). Article ID. 042120. https://doi.org/10.1103/PhysRevE.92.042120
F. Krzakala, F. Ricci-Tersenghi, L. Zdeborova, R. Zecchina, E.W. Tramel, L.F. Cugliandolo, Statistical Physics, Optimization, Inference and Message-Passing Algorithms. Lecture Notes of the Les Houches School of Physics. Special Issue (Oxford University Press, 2013)
Google Scholar
M. Welling, Y.W. Teh, Approximate inference in Boltzmann machines. Artif. Intell. 143(1), 19–50 (2003). https://doi.org/10.1016/S0004-3702(02)00361-2
M. Yasuda, S. Kataoka, K. Tanaka, Inverse problem in pairwise Markov random fields using loopy belief propagation. J. Phys. Soc. Jpn. 81(4), 1–8 (2012). Article ID 044801. https://doi.org/10.1143/JPSJ.81.044801
F. Ricci-Tersenghi, The Bethe approximation for solving the inverse Ising problem: a comparison with other inference methods. J. Stat. Mech. Theory Exp. (2012). Article ID P08015. https://doi.org/10.1088/1742-5468/2012/08/P08015
T. Morita, T. Horiguchi, Exactly solvable model of a spin glass. Solid State Commun. 19(9), 833–835 (1976). https://doi.org/10.1016/0038-1098(76)90665-7
Article Google Scholar
D.J. Thouless, P.W. Anderson, R.G. Palmer, Solution of ‘Solvable model of a spin glass’. Philos. Mag. J. Theor. Exp. Appl. Phys. 35(3), 593–601 (1977). https://doi.org/10.1080/14786437708235992
T. Morita, T. Horiguchi, Exactly solvable model of a random classical Heisenberg magnet. J. Phys. C Solid State Phys. 10(11), 1949–1961 (1977). https://doi.org/10.1088/0022-3719/10/11/029
Article Google Scholar
M. Yasuda, K. Tanaka, The relationship between Plefka’s expansion and the cluster variation method. J. Phys. Soc. Jpn. 75(8) (2006). Article ID 084006. https://doi.org/10.1143/JPSJ.75.084006
M. Yasuda, K. Tanaka, Approximate learning algorithm in Boltzmann machines. Neural Comput. 21(11), 3130–3178 (2009). https://doi.org/10.1162/neco.2009.08-08-844
Article MathSciNet MATH Google Scholar
M. Yasuda, K. Tanaka, TAP equation for non-negative Boltzmann machine. Philos. Mag. 92(1–3), 192–209 (2012). https://doi.org/10.1080/14786435.2011.634856
Article Google Scholar
M. Yasuda, Y. Kabashima, K. Tanaka, Replica Plefka expansion of Ising systems. J. Stat. Mech. Theory Exp. 2012(4) (2012). Article ID P04002. https://doi.org/10.1088/1742-5468/2012/04/P04002
E.W. Tramel, A. Drémeau, F. Krzakala, Approximate message passing with restricted Boltzmann machine priors. J. Stat. Mech. Theory Exp. 2016(7) (2016). Article ID 073401 https://doi.org/10.1088/1742-5468/2016/07/073401
M. Gabrié, Mean-field inference methods for neural networks (Topical Review). J. Phys. A Math. Theor. 53(23) (2020). Article ID 223002. https://orcid.org/0000-0002-5989-1018
K. Tanaka, J. Inoue, D.M. Titterington, Probabilistic image processing by means of Bethe approximation for Q-Ising model. J. Phys. A Math. Gen. 36(43), 11023–11036 (2003). https://doi.org/10.1088/0305-4470/36/43/025
Article MathSciNet MATH Google Scholar
K. Tanaka, H. Shouno, M. Okada, D.M. Titterington, Accuracy of the Bethe approximation for hyperparameter estimation in probabilistic image processing. J. Phys. A Math. Gen. 37(36), 8675–8696 (2004). https://doi.org/10.1088/0305-4470/37/36/007
S. Kataoka, M. Yasuda, K. Tanaka, Statistical performance analysis in probabilistic image processing. J. Phys. Soc. Jpn. 79(2) (2010). Article ID 025001. https://doi.org/10.1143/JPSJ.79.025001
S. Kataoka, M. Yasuda, K. Tanaka, D.M. Titterington, Statistical analysis of the expectation-maximization algorithm with loopy belief propagation in Bayesian image modeling. Philos. Mag. Study Condens Matter 92(1-3), 50–63 (2012). https://doi.org/10.1080/14786435.2011.624558
K. Tanaka, M. Yasuda, D. Michael Titterington, Bayesian image modelling by means of generalized sparse prior and loopy belief propagation. J. Phys. Soc. Jpn. 81(11) (2012). Article ID 114802. https://doi.org/10.1143/JPSJ.81.114802
K. Tanaka, S. Kataoka, M. Yasuda, Y. Waizumi, C.-T. Hsu, Bayesian image segmentations by Potts prior and Loopy belief propagation. J. Phys. Soc. Jpn. 83(12) (2014). Article ID 124002. https://doi.org/10.7566/JPSJ.83.124002
M.B. Hastings, Community detection as an inference problem. Phys. Rev. E 74(3) (2006). Article ID 035102(R). https://doi.org/10.1103/PhysRevE.74.035102
A. Decelle, F. Krzakala, C. Moore, L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84(6) (2011). Article ID 066106. https://doi.org/10.1103/PhysRevE.84.066106
S. Kataoka, T. Kobayashi, M. Yasuda, K. Tanaka, Community detection algorithm combining stochastic block model and attribute data clustering. J. Phys. Soc. Jpn. 85(11) (2016). Article ID 114802. https://doi.org/10.7566/JPSJ.85.114802
B. McCoy, T.T. Wu, The Two-Dimensional Ising Model (Harvard University Press, 1973). ISBN: 9780674180758
Google Scholar
R.J. Baxter, Exactly Solved Models in Statistical Mechanics (Academic Press, 1982). ISBN: 9780486462714
Google Scholar
T. Horiguchi, Husimi-Temperley model under a random field. J. Math. Phys. 20(8), 1774–1775 (1979). https://doi.org/10.1063/1.524265
Article Google Scholar
M. Mezard, G. Parisi, M.A. Virasoro, Spin Glass Theory and Beyond (World Scientific, 1987)
Google Scholar
K.H. Fisher, J.A. Hertz, Spin Glasses (Cambridge University Press, 1993)
Google Scholar
Michel Talagrand, Mean Field Models for Spin Glass. Volume I: Basic Examples (Springer, 2011)
Google Scholar
M. Talagrand, Mean Field Models for Spin Glass. Volume II: Advanced Replica-Symmetry and Low Temperature (Springer, 2011)
Google Scholar
K. Tanaka, S. Kataoka, M. Yasuda, M. Ohzeki, Inverse renormalization group transformation in Bayesian image segmentations. J. Phys. Soc. Jpn. 84(4) (2015). Article ID 045001. https://doi.org/10.7566/JPSJ.84.045001
D.A. Harville, Matrix algebra from a statistician’s prespective (Springer, 1997)
Google Scholar
S. Suzuki, J. Inoue, B.K. Chakrabarti, Quantum Ising Phases and Transitions in Transverse Ising Models. Lecture Notes in Physics Book, vol. 862 (Springer, 2013)
Google Scholar
A. Dutta, G. Aeppli, B.K. Chakrabarti, U. Divakaran, T.F. Rosenbaum, D. Sen, Quantum Phase Transitions in Transverse Field Spin Models: From Statistical Physics to Quantum Information (Springer, 2015)
Google Scholar
S.G. Ovchinnikov, V.V. Val’kov, Hubbard Operators in the Theory of Strongly Correlated Electrons (Imperial College Press, 2004). https://doi.org/10.1142/9781860945977_0001
H. Tasaki, Physics and Mathematics of Quantum Many-Body Systems (Graduate Texts in Physics) (Springer, 2020)
Google Scholar
M. Suzuki, Relationship between $d$-dimensional quantal spin systems and $(d+1)$-dimensional Ising systems—equivalence, critical exponents and systematic approximants of the partition function and spin correlations. Prog. Theor. Phys. 56(5), 1454–1469 (1976). https://doi.org/10.1143/PTP.56.1454
Article MATH Google Scholar
M. Suzuki, Decomposition formulas of exponential operators and Lie exponentials with some applications to quantum mechanics and statistical physics. J. Math. Phys. 26(4), 601–612 (1985). https://doi.org/10.1063/1.526596
Article MathSciNet MATH Google Scholar
J. Gubernatis, N. Kawashima, P. Werner, Quantum Monte Carlo Methods: Algorithms for Lattice Models (Cambridge University Press, 2016)
Google Scholar
T. Morita, Cluster variation method of cooperative phenomena and its generalization II. Quantum statistics. J. Phys. Soc. Jpn. 12(10), 1060–1063 (1957). https://doi.org/10.1143/JPSJ.12.1060
Article Google Scholar
R. Kubo, The spin-wave theory as a variational method and Its application to antiferromagnetism. Rev. Modern Phys. 25(1), 344–351 (1953). https://doi.org/10.1103/RevModPhys.25.344
Article MATH Google Scholar
T. Morita, An approximation scheme of the cluster variation method for quantum lattice gases. Prog. Theor. Phys. 92(6), 1081–1093 (1994). https://doi.org/10.1143/ptp/92.6.1081
Article Google Scholar
T. Morita, A Bose lattice gas equivalent to Heisenberg model and its QCVM study. J. Phys. Soc. Jpn. 64(4), 1211–1216 (1995). https://doi.org/10.1143/JPSJ.64.1211
Article Google Scholar
R. Miyazaki, H. Nishimori, G. Ortiz, Real-space renormalization group for the transverse-field Ising model in two and three dimensions. Phys. Rev. E 83(5) (2011). Article ID 051103. https://doi.org/10.1103/PhysRevE.83.051103
M. Opper, O. Winther, Adaptive and self-averaging Thouless-Anderson-Palmer mean-field theory for probabilistic modeling. Phys. Rev. E 64(5) (2001). Article ID 056131 https://doi.org/10.1103/PhysRevE.64.056131
M. Opper, O. Winther, Expectation consistent approximate inference. J. Mach. Learn. Res. 6(73), 2177–2204 (2005). https://doi.org/10.5555/1046920.1194917
M. Yasuda, C. Takahashi, K. Tanaka, Perturbative interpretation of adaptive Thouless-Anderson-Palmer free energy. J. Phys. Soc. Jpn. 85(7) (2016). Article ID 075001. https://doi.org/10.7566/JPSJ.85.075001
C. Takahashi, M. Yasuda, K. Tanaka, Adaptive Thouless-Anderson-Palmer equation for higher-order Markov random fields. J. Phys. Soc. Jpn. 89(6) (2020). Article ID 064007. https://doi.org/10.7566/JPSJ.89.064007
K. Tanaka, M. Nakamura, S. Kataoka, M. Ohzeki, M. Yasuda, Momentum-space renormalization group transformation in Bayesian image modeling by Gaussian graphical model. J. Phys. Soc. Jpn. 87(8), 1–2 (2018). Article ID 085001. https://doi.org/10.7566/JPSJ.87.085001
K. Tanaka, M. Ohzeki, M. Yasuda, Sublinear computational time modeling by momentum-space renormalization group theory in statistical machine learning procedures. Rev. Socionetwork Strat. 13(2), 281–306 (2019). https://doi.org/10.1007/s12626-019-00053-1
Article Google Scholar
J. Inoue, Deterministic flows of order-parameters in stochastic processes of quantum Monte Carlo method. J. Phys. Conf. Ser. 233 (2010). Article ID 012010. https://doi.org/10.1088/1742-6596/233/1/012010
J. Inoue, Pattern-recalling processes in quantum Hopfield networks far from saturation. J. Phys. Conf. Ser. 297 (2011). Article ID 012012. https://doi.org/10.1088/1742-6596/297/1/012012
M. Ohzeki, S. Okada, M. Terabe, S. Taguchi, Optimization of neural networks via finite-value quantum fluctuations. Sci. Rep. 8 (2018). Article ID 9950. https://doi.org/10.1038/s41598-018-28212-4
S. Arai, M. Ohzeki, K. Tanaka, Dynamics of order parameters of nonstoquastic Hamiltonians in the adaptive quantum Monte Carlo method. Phys. Rev. E 99(3) (2019). Article ID 032120. https://doi.org/10.1103/PhysRevE.99.032120
S. Arai, M. Ohzeki, K. Tanaka, Teacher-student learning for a binary perceptron with quantum fluctuations. J. Phys. Soc. Jpn. 90(7) (2021). Article ID 074002. https://doi.org/10.7566/JPSJ.90.074002
S. Arai, M. Ohzeki, K. Tanaka, Mean field analysis of reverse annealing for code-division multiple-access multiuser detection. Phys. Rev. Res. 3(3) (2021). Article ID 033006. https://doi.org/10.1103/PhysRevResearch.3.03300
K. Tanaka, T. Horiguchi, T. Morita, Critical indices for the two-dimensional Ising model with nearest-neighbor and next-nearest-neighbor interactions. II. Strip cluster approximation. Phys. A Stat. Mech. Appl. 192(4), 647–664 (1993). https://doi.org/10.1016/0378-4371(93)90114-J
M.B. Hastings, Quantum belief propagation: an algorithm for thermal quantum systems. Phys. Rev. B 76(20) (2007). Article ID 201102(R). https://doi.org/10.1103/PhysRevB.76.201102
M.S. Leifer, D. Poulin, Quantum graphical models and belief propagation. Ann. Phys. 323(8), 1899–1946 (2008). https://doi.org/10.1016/j.aop.2007.10.001
F. Krzakala, A. Rosso, G. Semerjian, F. Zamponi, Path-integral representation for quantum spin models: application to the quantum cavity method and Monte Carlo simulations. Phys. Rev. B 78(13) (2008). Article ID 134428. https://doi.org/10.1103/PhysRevB.78.134428
M. Ohzeki, Message-passing algorithm of quantum annealing with nonstoquastic Hamiltonian. J. Phys. Soc. Jpn. 88(6) (2019). Article ID 061005. https://doi.org/10.7566/JPSJ.88.061005
N. Nishimura, K. Tanahashi, K. Suganuma, M.J. Miyama, M. Ohzeki, Item listing optimization for E-commerce websites based on diversity. Front. Comput. Sci. 1 (2019). Article ID 2. https://doi.org/10.3389/fcomp.2019.00002
S. Okada, M. Ohzeki, M. Terabe, S. Taguchi, Improving solutions by embedding larger subproblems in a D-wave quantum annealer. Sci. Rep. 9 (2019). Article ID 2098
Google Scholar
M. Ohzeki, Breaking limitation of quantum annealer in solving optimization problems under constraints. Sci. Rep. 10 (2020). Article ID 3126
Google Scholar
A.S. Koshikawa, M. Ohzeki, T. Kadowaki, K. Tanaka, Benchmark test of black-box optimization using D-wave quantum annealer. J. Phys. Soc. Jpn. 90(6) (2021). Article ID 064001. https://doi.org/10.7566/JPSJ.90.064001
T. Sato, M. Ohzeki, K. Tanaka, Assessment of image generation by quantum annealer. Sci. Rep. 11 (2021). Article ID 13523
Google Scholar

Download references

Acknowledgements

This work was partly supported by the JST-CREST program (No. JPMJCR1402) of the Japan Science and Technology Agency and a JSPS KAKENHI Grant (No.18H03303) from the Ministry of Education, Culture, Sports, Science, and Technology. The author is thankful for some valuable comments from Profs. Masayuki Ohzeki and Manaka Okuyama of Tohoku University and Prof. Muneki Yasuda of Yamagata University in Japan, Prof. Federico Ricci-Tersenghi of Università di Roma, La Sapenza in Italy, and Prof. Anthony C.C. Coolen of Radboud University in the Netherlands.

Author information

Authors and Affiliations

Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Kazuyuki Tanaka

Authors

Kazuyuki Tanaka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazuyuki Tanaka .

Editor information

Editors and Affiliations

Graduate School of Information Science, University of Hyogo, Kobe, Hyogo, Japan
Naoki Katoh
Graduate School of Information Science, University of Hyogo, Kobe, Hyogo, Japan
Yuya Higashikawa
School of Informatics and Engineering, University of Electro-Communications, Chofu, Tokyo, Japan
Hiro Ito
Department of Information Science, Ochanomizu University, Bunkyo, Tokyo, Japan
Atsuki Nagao
Human Genome Center, University of Tokyo, Minato, Tokyo, Japan
Tetsuo Shibuya
Center for Advanced Intelligence Project, RIKEN, Chuo, Tokyo, Japan
Adnan Sljoka
Graduate School of Information Science, Tohoku University, Sendai, Miyagi, Japan
Kazuyuki Tanaka
Graduate School of Engineering, Osaka Prefecture University, Sakai, Osaka, Japan
Yushi Uno

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tanaka, K. (2022). Review of Sublinear Modeling in Probabilistic Graphical Models by Statistical Mechanical Informatics and Statistical Machine Learning Theory. In: Katoh, N., et al. Sublinear Computation Paradigm. Springer, Singapore. https://doi.org/10.1007/978-981-16-4095-7_10

Download citation

DOI: https://doi.org/10.1007/978-981-16-4095-7_10
Published: 20 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4094-0
Online ISBN: 978-981-16-4095-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Review of Sublinear Modeling in Probabilistic Graphical Models by Statistical Mechanical Informatics and Statistical Machine Learning Theory

Abstract

Similar content being viewed by others

Bayesian Computation with Intractable Likelihoods

Explicit Formulae in Probability and in Statistical Physics

A life in statistical mechanics

1 Introduction

2 Statistical Machine Learning

2.1 Bayesian Statistics and Maximization of Marginal Likelihood

2.2 Expectation-Maximization Algorithm

2.3 Expectation-Maximization Algorithm for Probabilistic Image Segmentations

3 Statistical Mechanical Informatics

3.1 Ising Model

3.2 Advanced Mean-Field Method

3.3 Free Energy Landscapes and Phase Transitions in the Thermodynamic Limit

3.4 Ising Model on a Complete Graph

3.5 Probabilistic Segmentation by Potts Prior and Loopy Belief Propagation

3.6 Real-Space Renormalization Group Method and Sublinear Modeling of Statistical Machine Learning

4 Quantum Statistical Machine Learning

4.1 Elementary Function and Differentiations of Hermitian Matrices

4.2 Minimization of Free Energy Functionals for Density Matrices

4.3 Tensor Products

4.4 Quantum Probabilistic Graphical Models and Quantum Expectation-Maximization Algorithm

4.5 Quantum Expectation-Maximization (EM) Algorithm for Probabilistic Image Segmentation

5 Quantum Statistical Mechanical Informatics

5.1 Advanced Mean-Field Methods for the Transverse Ising Model

5.2 Real-Space Renormalization Group Method for the Transverse Ising Model

5.3 Sublinear Modeling Using a Quantum Adaptive TAP Approach and Momentum Space Renormalization Group in the Transverse Ising Model

5.4 Suzuki-Trotter Decomposition in the Transverse Ising Model

6 Concluding Remarks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation