Energy–Information Trade-off Induces Continuous and Discontinuous Phase Transitions in Lateral Predictive Coding

,


A. Introduction
Predictive coding is an influential theory in computational neuroscience with a long history tracing back to Kant and Helmhotz [1][2][3][4], and its recent conceptual advances include the Bayesian brain theory and the free energy principle of information processing and decision making [5][6][7][8].The predictive coding theory regards the brain as a multi-layered hierarchical neuron network which builds an internal model for the external world and employs it to perceive sensory signals and make predictions.So far most studies on predictive coding have focused on the feedforward and feedback interactions between different layers, and have largely ignored lateral (horizontal, recurrent) interactions within the same layer of neurons.But lateral interactions are abundant in biological neural networks and they play significant roles in perception and inference [9][10][11][12].Children brain consumes 50% of the body's total metabolic energy and this fraction decreases to 20% in adulthood [13].As a big energy consumer it is vital for the brain to be capable of achieving energy-efficient information processing [14][15][16][17][18]. Lateral predictive coding (LPC) might have been an elegant evolutionary solution to cope with the brain's tight energy budget.
One basic function of lateral interactions is to reduce the energy cost of internal representations.By adapting the synaptic weights according to the input correlation between spatially adjacent neurons, the external drive to one neuron is partially cancelled by the messages from its neighboring neurons and its activity becomes weaker and sparser [19][20][21][22].On the other hand, reducing the magnitude and redundancy of internal representation compromises its information content, making it less robust to external and internal noises [23][24][25].How to balance the conflicting demands of energy reduction and informa-tion robustness is an important issue of LPC.Yet LPC has been a rarely discussed topic in the physics community, in stark contrast to the huge and enduring theoretical enthusiasm on the other basic neural network models such as the perceptron, the Hopfield model of associative memory and the restricted Boltzmann machine [26].
Here we study phase transitions induced by energyinformation trade-off in the simplest model of linear LPC [27].We introduce a temperature parameter T to incorporate the fitness effect of information robustness and define a free energy, which binds together energy and function, as the minimization objective [28][29][30].We find that even if the input signals are statistically symmetric among all the N units, the optimal synaptic weight matrix will spontaneously break permutation (reciprocal) symmetry at certain critical temperature and form cyclic-dominant interaction patterns.An ideal-gas law E = (N/2)T is then followed by the energy E of the multi-unit (N ≥ 3) LPC networks, achieving upperbound efficiency in energy utilization.At low temperatures the weight matrix experiences several further phase transitions, one of them being a discontinuous transition to tight balance of excitatory and inhibitory interactions.Internal symmetries within sub-groups of units may also break, leading to multi-level functional differentiation.
Our theoretical results reveal the emergence of structural complexity in LPC systems.These collective properties of non-reciprocity, correlation reduction, and excitation-inhibition (EI) balance are qualitatively similar to the empirical observations on real-life recurrent neural networks.As an initial attempt we have left many important issues untouched, such as nonlinear synaptic interactions and non-quadratic energetic cost.More efforts from the physics community are needed to fully appreciate LPC as a basic neural computing circuit.

B. Model and free energy
Consider a fully connected network of N units (each may be a single neuron or a small region of the brain).Denote an internal state vector as ⃗ x = (x 1 , . . ., x N ) ⊤ and a sensory input as ⃗ s = (s 1 , . . ., s N ) ⊤ .Upon receiving an input ⃗ s, unit i responds by changing its state x i according to a linear LPC dynamics [19,21,27], where positive (negative) synaptic weight w ij means that unit j inhibits (excites) unit i.The steady-state encoding from sensory input to internal representation is ⃗ x ⃗ s = (I + W ) −1 ⃗ s with I being the identity matrix and W the synaptic weight matrix.The convergence of the dynamics (1) requires the real part of every eigenvalue of (I + W ) be positive (we set a threshold value 10 −5 and call it the eigenvalue bottomline).We interpret j̸ =i w ij x j as the predicted sensory input to unit i, and x i as the prediction error between the actual s i and the predicted value [9].Assuming the energy cost of maintaining and transmitting a prediction error be quadratic in x i , the mean energy E of an internal state is where P(⃗ s) is the probability distribution of sensory inputs and C is the input correlation matrix with elements c ij ≡ ⃗ s P(⃗ s)s i s j .The total energy cost of representing M ≫ 1 sensory inputs is then ME.
An infinitesimal volume of the sensory space is transformed by ⃗ x ⃗ s to an infinitesimal volume in the internal representation space, with Jacobian 1/Det(I +W ) which is the inverse determinant.The entropy of the internal states x is then plus a constant, and the mutual information between s and x is equal to this entropy up to a constant [31].
We can take MS as the total entropy of M internal states.It is desirable for the entropy S to be large so that the encoding will be sensitive to variations in ⃗ s and be robust to noises in representing and transmitting ⃗ x (the information maximization principle [28]).
It turns out that minimizing energy (cost) E and maximizing entropy (information) S are mutually conflicting goals.We introduce a temperature parameter T to quantify this energy-information trade-off.In the thermodynamic limit of infinite number of sensory inputs (M → ∞), the optimization objective is the total free energy MF with the coefficient F being which combines both energetic and entropic effects [28][29][30].The temperature embodies all the external and Phase diagram for the N = 2 system.The symmetry-breaking transition to w12 ̸ = w21 is continuous for c < 2/3 (blue dashed line) and discontinuous for c > 2/3 (red solid line).The green dot marks the tricritical point at c = 2/3, T = 9/8.Inset exhibits the continuous transition for c = 0.6 and the discontinuous one for c = 0.8.internal fitness stresses.At high T values information sensitivity and robustness is the main fitness drive and the adaptation of synaptic weights will favor entropy S; at the other limit of low T , reducing energy E becomes the dominant fitness concern and the weight matrix will evolve towards energy minimization.The thermodynamic equation accompanying free-energy minimization is simply T = dE/dS.
We search for the global minimum of F at each fixed value of T by a slow simulated annealing process and determine the corresponding optimal weight matrix [31].We find that E(T ) and S(T ) are singular at several critical points of T , and the optimal weight matrix W changes its qualitative properties at these points.Discontinuities of the energy susceptibility (dE/dT ) imply continuous phase transitions, and discontinuities in the energy itself manifest discontinuous phase transitions.To demonstrate most explicitly the endogenous nature of such phase transitions, we will focus on symmetric and homogeneous sensory inputs: the self-correlation c ii = 1 for all the inputs s i and pair correlation c ij = c being the same (c < 1) for the inputs of any two different units.The default internal model W is then reciprocal (w ij = w ji ) and permutation-symmetric (identical non-diagonal elements) [31].
The phase diagram for the simplest two-unit system (Fig. 1) is a good starting point to understand optimal LPC.There is a continuous reciprocity-breaking phase transition for c < 2/3, with w 12 and w 21 gradually deviating from each other at the critical temperature [31].When input correlation c exceeds 2/3, a local minimum of the free energy with w 12 ̸ = w 21 first emerges in the reciprocal (w 12 = w 21 ) phase and it then becomes the global minimum as T decreases to certain critical value strictly higher than T rb 2 .A discontinuous transition then occurs, from the reciprocal phase to a reciprocity-broken phase in which unit 2 inhibits unit 1 and unit 1 excites unit 2.

C. Reciprocity-breaking and ideal-gas law
For multi-unit systems containing N ≥ 3 units, our numerical optimization results and local stability analysis reveal that the optimal internal model W is reciprocal (w ij = w ji ) and permutation-symmetric (all w ij being equal) at high temperatures, but these properties break down spontaneously as T drops below the critical value [31] However, Eq. ( 35) does not apply for N = 2, indicating the emergence of new collective properties for N ≥ 3 at this continuous reciprocity-breaking phase transition.Indeed we find that the energy of the reciprocity-broken (w ij ̸ = w ji ) optimal system exactly obeys an ideal-gas law [31] in an extended temperature interval (T iglb N , T rb N ], with T iglb N being the lower-bound temperature at which this ideal-gas law is deviated [Fig.2(a)].
One type of interaction patterns realizing the idealgas law is the rotation-symmetric solution with weight parameters w i (we require w 1 ≤ . . .≤ w N −1 to remove trivial degeneracies): There is cyclic dominance (CD) among the N units such that unit i + 1 is most strongly inhibited by unit i and it most strongly inhibits unit i + 2 and so on.Rotationsymmetric solutions are impossible for N = 2, and there is only one such solution for N = 3, 4 but infinitely many degenerate solutions for N ≥ 5 [31].
We can define a quantity O CD , which compares the most dominant global cycle of directed interactions and the reverse directed cycle of interactions, to measure the degree of cyclic dominance: where (p 1 , p 2 , . . ., p N ) denotes a permutation of the N units.For example, the weight matrix shown in the right panel of Fig.  Examples of cyclic-dominant and EI-balanced optimal matrices at T ≈ 0.1383, and F (T ) and E(T ) for these two branches of solutions.Solid (dashed) links in the interaction graphs indicate positive (negative) weights.
We find through numerical computations that the idealgas law can be achieved by cyclic-dominant weight matrices with or without rotational symmetry and also by two-component matrices composed of excitatory and inhibitory units [31].
What is the significance of the ideal-gas law (6)?In terms of the eigenvalues ϵ i of the symmetric matrix (I + W ) −1 C(I + W ⊤ ) −1 we have E = i ϵ i and 2S = i ln ϵ i − ln Det(C).The entropy is then upperbounded by S ≤ (N/2) ln(E/N )−(1/2) ln Det(C) at each energy E, and equality holds only if all the eigenvalues ϵ i are identical [31].If this entropy upper-bound can be saturated within a continuous interval of E, then Eq. ( 6) will be achieved as a consequence of energy-information competition, and it means the most efficient utilization of energy to reach the information robustness upper-bound.An appealing statistical property is that the internal states are composed of independent components of equal magnitudes [19,32], with ⟨x 2 i ⟩ = E/N to achieve energy equipartition and ⟨x i x j ⟩ = 0 for j ̸ = i to eliminate internal pair correlations (here ⟨•⟩ denotes averaging over all the samples).We notice a probably closely related biological fact is that the neuron pair correlations of the visual cortex are very weak even for highly correlated sensory inputs [33].
As the temperature decreases to the lower critical value T iglb N , it will no longer be possible for an optimal weight matrix to exactly obey Eq. ( 6) and at the same time keep all its complex eigenvalues above the bottomline.Then the ideal-gas law will break down and a kink of the energy function E(T ) will be observed.For N = 5 and c = 0.8, T iglb 5 = 0.3360 at which the energy slope changes from 2.5 to 2.0 [Fig.2(a)].This continuous transition is not associated with any symmetry change.

D. Discontinuous phase transitions
The energy discontinuity at T = 0.1383 in the example curve of Fig. 2(a) signifies a discontinuous phase transition.Indeed the free energy at the vicinity of this temperature has two minima and they organize into two branches with distinct energies [Fig.2(b)].One branch corresponds to the one-component CD network which is almost fully inhibitory.The other branch corresponds to a two-component EI-balanced network: there is permutation symmetry within the two excitatory units 1 and 2 of group g E , while the three units 3, 4, 5 of group g I have cyclically dominant interactions within themselves and they strongly inhibit group g E .Such a pattern is in agreement with the vital biological fact of excitationinhibition competition and balance [34][35][36].
One simple measure of EI-balance is to compute the net input weights of all the individual units, A value of O EI significantly above zero indicates a highly EI-balanced network in whcih the excitatory inputs are largely cancelled by the inhibitory inputs for most of the units.For example, O EI = 0.626 for the weight matrix shown in the middle panel of Fig. 2(b).
After the transition to the optimal EI-balanced network, the ideal-gas law is then approximately (though not exactly) recovered at lower temperatures [Fig.2(a)], indicating that EI-balance is beneficial for the efficiency of energy usage.This can not be achieved if all the synaptic weights are restricted to be positive (inhibitory) [31].
The phase diagram for the system containing N = 5 units is shown in Fig. 3.The β 1 and β 2 regions in this diagram correspond to one-component cyclic-dominant optimal networks (a subregion of β 1 also contains EIbalanced ones).The γ 1 -γ 4 regions all correspond to different types of EI-balanced optimal networks.The phase boundaries are precisely determined by the singularities of E(T ), and we can also define some quantitative measures such as O CD and O EI to describe these different phases [31].The ideal-gas law is exact in regions β 1 and γ 1 but it only approximately holds in γ 2 -γ 4 .The discontinuous transition from γ 2 to γ 3 is associated with breaking the internal rotational symmetry of group g I and it leads to functional differentiation of the three units 3, 4, 5 (see Ref. [31] for more details).The phase diagram for the smallest multi-unit system (N = 3) is qualitatively similar to Fig. 3; and we also report some numerical results on the larger system of N = 10 units to demonstrate the generality of our conclusions [31].

E. Discussion
Our theoretical work attempted to consider energy and function simultaneously in a single model, and revealed that the trade-off between energy reduction and information robustness can induce spontaneous breaking of permutation symmetry in the lateral predictive coding system and drive the formation of cyclic dominance and the sudden emergence of excitation-inhibition balance among the network units.We discovered an ideal-gas law (6) of internally representing the sensory inputs as equal-magnitude independent components.Our theoretical results are in qualitative agreement with some important biological facts such as non-reciprocal interactions, excitation-inhibition balance and weak internal pair correlations.They may also have algorithmic implications.
For simplicity we have assumed the input correlation matrix C is uniform with a single parameter c.A natural extension of the present work is to study the ideal-gas law (6) for heterogeneous sensory inputs.Our ongoing analytical and numerical computations confirm that Eq. ( 6) and continuous and discontinuous phase transition will also be observed for non-uniform matrices C [31,39].
The existence of discontinuous phase transitions and the associated multiple free energy minima may render the optimal weight matrices hard to acquire through lo-cal and gradual Hebbian learning processes [12,37,38].This issue waits to be further explored.It is also very interesting to study phase transitions in nonlinear LPC models with energy cost being a sum i |x i | of absolute values and internal prediction being rectified linear max 0, i̸ =j w ij x j or another more complicated nonlinear function [40].We expect to observe rich continuous and discontinuous phase transitions in these more realistic models too.These nonlinear LPC systems may lead to enhanced sensitivity to fine details in typical input vectors.
Suppose there is noise in the mapping from sensory input s = (s 1 , . . ., s N ) ⊤ to the internal representation x = (x 1 , . . ., x N ) ⊤ , where ξ i for i = 1, . . ., N are independent Gaussian random variables with mean zero and variance σ 2 .For the linear lateral predictive coding (LPC) system, the deterministic function g i (s) is where I is the identity matrix with I ii = 1 and I ij = 0 for any i ̸ = j.The mutual information between x and s is where H[x] and H[s] are the entropies of internal states x and sensory inputs s, respectively, and H[ x|s ] is the conditional entropy of x given s.Notice that the entropy H[s] only depends on the probability distribution P input (s) of the sensory inputs s, so it could be considered as a constant.The conditional entropy H[ x|s ] only depends on the property of the noise vector (ξ 1 , . . ., ξ N ) ⊤ and so it could also be regarded as a constant.Then the mutual information I[ x; s ] = H[x] + I 0 with I 0 being a constant.To increase this mutual information we need to increase the entropy H[x] of internal states x.
The marginal distribution P output (x) of the internal state x is Equation ( 14) is valid for the situation of small noises (σ 2 being sufficiently small).The entropy of x is then Again because H[s] does not depend on the weight matrix W , we see that H[x] = − ln Det(I + W ) + H 0 with H 0 being a constant.For our LPC problem, the real parts of all the eigenvalues of I + W are required to be positive.The determinant of I + W then must be positive.This explains Eq. (3) of the main text.
G. The two-unit system (N = 2) When there are only two units, the steady-state relationship between the sensory input s and the internal state x is The mean energy of the system is Here ⟨•⟩ denotes averaging over all the input samples, and c ij is the element of the input correlation matrix C: The entropy of the system is which only depends on the product of the two synaptic weights.We can determine the minimum value of energy E at each fixed value of w 12 w 21 .Some example curves of E versus w 12 w 21 are shown in Fig. 4  The minimum value of the free energy F is determined at each value of the temperature T by choosing the optimal value of w 12 w 21 .Then the energy E as a function of T can be obtained.Some example results are shown in Fig. 5 for symmetric inputs (c 11 = c 22 = 1).
If the synaptic weights are restricted to be inhibitory (w ij ≥ 0), then depending on the input pair correlation c 12 , there are three different outcomes [Fig.5(a)].If c 12 is close to unity (say 0.95), synaptic reciprocity w 12 = w 21 is always kept and both E and S are smooth functions of T .At lower values of c 12 (e.g., 0.80), interaction reciprocity breaks down discontinuously at certain critical value of T , with an associated sudden drop in the energy E and in the entropy S. At still lower values of c (e.g., 0.60), the breaking of reciprocity becomes a continuous transition, and both E and S are continuous at the transition point but their first-order derivatives are discontinuous.The minimum-energy network will be reached when one of the synaptic weights vanishes.The minimum energy can not be arbitrarily small, because the input to one unit (say s 1 ) is not cancelled by the output x 2 of the other unit.
If the synaptic weights are allowed to take negative (excitatory) values, the mean energy of the network at low temperature values can be remarkably reduced [Fig.5(b)].The optimal strategy to achieve sufficiently low energy E appears to be that one unit inhibits and is excited by the other unit.There is either a continuous or a discontinuous phase transition for c ≤ 2 3 and c > 2 3 , respectively.The phase diagram for this case of arbitrary weights is reported in Figure 1 of the main text.
When c 11 ̸ = c 22 there is no permutation symmetry to break and the continuous phase transition is absent.We find that the discontinuous phase transition is still possible for this non-symmetric case.Setting c 22 = 1, Fig. 6 shows the region of the c 11 -c 12 plane in which the discontinuous phase transition can be observed at certain critical temperature.

H. Local stability of the reciprocal and permutation-symmetric solution
For symmetric input vectors s with c ii ≡ ⟨s 2 i ⟩ = 1 and c ij ≡ ⟨s i s j ⟩ = c for any i ̸ = j, the corresponding symmetric internal model is the permutation-symmetric matrix W with all its non-diagonal elements being equal to w. Reciprocity obviously holds in this internal model (w ij = w ji for all pairs).We now determine the value of this uniform weight w and study the local stability of this reciprocal and permutation-symmetric solution.
First, notice that the homogeneous correlation matrix C could be written as from which it is easy to see that C has one eigenvalue 1 + (N − 1)c with the corresponding eigenvector and the corresponding N − 1 eigenvectors which are orthogonal to u 0 and to each other.The symmetric matrix I + W can be diagonalized in the same way.Since all the eigenvalues of the correlation matrix must be non-negative, so c ∈ To ensure that the short-time predictive coding dynamics be convergent, the eigenvalues of I + W are required to be positive, which means w ∈ (− 1 N −1 , 1).When the matrix W is permutation-symmetric, the entropy S is expressed as Let us define an auxiliary symmetric matrix M for a general weight matrix W as In the special case of W being permutation-symmetric, it is easy to check that and then we obtain the following expression for the energy: By minimizing the free energy F = E−T S with respect to the weight parameter w, we obtain the following relationship between w and the temperature T : The local stability of this symmetric solution is studied by assuming w ij = w +ε ij with ε ij being a tiny perturbation of weight w ij .Denote by ε the perturbation matrix whose diagonal elements are all being exactly zero, we have Then the energy deviation ∆E up to the second order of ε is The first and second derivatives of the entropy are, respectively, and therefore the deviation of entropy up to second order is The first-order deviation (∆F (1) ) of the free energy is then If this first-order deviation is identical to zero for all perturbation matrices ε which do not violate the eigenvalue constraints of W , then the matrix W must be an extremal point of the free energy F .The permutation-symmetric matrix with w satisfying Eq. ( 25) is such an extremal solution.
The second-order derivation (∆F (2) ) of the free energy is expressed as where i ̸ = j and k ̸ = l and the element L ij,kl of the symmetric Hessian matrix L is computed according to When the minimum eigenvalue of this N (N − 1) × N (N − 1) symmetric matrix L changes from being positive to being negative, the synaptic matrix W will no longer be a locally stable solution of the free energy minimization problem.For the permutation-symmetric matrix W the Hessian matrix is easy to compute through Eq. (33).Our numerical results for the cases of N = 2, 3, 4, 5, 10 are shown in Fig. 7.This reciprocity-breaking (and also permutationsymmetry-breaking) temperature is denoted as T rb N in the main text.When N = 2 the Hessian matrix L is a 2 × 2 matrix, and it is straightforward to obtain the analytical expression of T rb 2 as which is in full agreement with the numerical results obtained at N = 2. Figure 7 demonstrates that the curve of T rb 2 for N = 2 is different from the corresponding curves for N ≥ 3.After a careful examination of the numerically obtained eigenvectors of the Hessian matrices L for N = 2 and N ≥ 3, we realize that the systems containing N ≥ 3 units have more ways to break the permutation symmetry and reciprocity than the two-unit system.For example, one additional symmetry for N ≥ 3 is the rotation symmetry among the N units.Our numerical results suggest that the permutation symmetry of the system starts to be unstable at T rb N and it may evolve to an optimal solution with rotation symmetry among the N units.This is easy to understand: a rotation-symmetric weight matrix W could be regarded as a long wavelength perturbation of a permutation-symmetric one.
Based on this insight, we compute the minimum eigenvalue of the Hessian matrix L corresponding to a rotationsymmetric perturbation vector {ε ij }.We then obtain an explicit expression of the critical temperature T rb N for N ≥ 3 as by locating the point at which this eigenvalue changes sign.Another way of deriving this expression will be described in the next section.This analytical formula is in full agreement with our numerical results in Fig. 7.The N → ∞ limit of this expression, T rb N →∞ = 2(1 − c), is also shown in this figure.I. On the ideal-gas law, cyclic-dominance, and excitation-inhibition balance The ideal-gas law E = N 2 T , possible only for N ≥ 3, was first observed through numerical computations.It was found to hold true in an extended temperature range immediately after interaction reciprocity (and permutation symmetry) is broken.We now confirm these numerical results through rigorous analytical derivations.
First, as noticed in the main text, the energy and entropy can be computed as with ϵ 1 , . . ., ϵ N being the N non-negative eigenvalues of the symmetric matrix M defined in Eq. (22).By applying the following sum-product inequality we obtain the following upper bound for the entropy, This upper bound of S will be attained only if all the eigenvalues ϵ i take the same value E/N .If this is the case for a continuous range of E, then by applying T = dE dS as required by the minimization of the free energy F , we get the ideal-gas law E = N 2 T .From the ideal-gas law we can easily derive that which means that the general form of the matrix I + W is where U is certain real orthogonal matrix satisfying U U ⊤ = U ⊤ U = I, and the square-root of C is expressed as Because the diagonal elements of W is identical to zero, we see from Eq. ( 40) that each column of the orthogonal matrix U needs to satisfy the additonal condition of If the temperature T is too large, it is impossible to satisfy these N conditions (42) by any orthogonal matrix U .When T is lowered to the critical value T rb N as given by Eq. ( 35), we find that these conditions are uniquely satisfied by the orthogonal matrix U = I.At this critical value, the matrix W is still permutation-symmetric and all its non-diagonal elements are For the ideal-gas law to hold at T < T rb N , the constraints (42) require that the i-th column vector of U should be staying on an (N − 1)-dimensional ellipsoid on a N -dimensional hypersphere of unit radius.These N constraints are compatible with the constraint of U being an orthogonal matrix, as long as N ≥ 3. To demonstrate this most clearly, we consider the type of weight matrices with rotational symmetry.
The general form of such a matrix is where w 0 ≡ 1 and w 1 , . . ., w N −1 are the other weight parameters.Let us define N discrete frequencies as ωj ≡ 2πj/N for any j = 0, . . ., N − 1 (so ω0 = 0, ω1 = 2π/N, . ..).Then it is easy to check that the j-th eigenvalue w k e ik ωj for j = 0, . . ., N − 1, and the corresponding complex eigenvector is r j = (1, e iωj , e 2iωj , . . ., e (N −1)iωj ) ⊤ / √ N .It is easy to check that r ⊤ j r * j = 1 and r ⊤ i r * j = 0 for j ̸ = i.The relationship between the eigenvalues {λ j } and the weight parameters {w j } can also be written as If N is odd so that N = 1 + 2N h , then one of the eigenvalues (λ 0 ) is real, and the remaining (N − 1) eigenvalues form N h complementary pairs, namely λ j = λ * N −j for j = 1, . . ., N h .Because of the fact e −iωj = e iω N −j , we can check from equation (45) that all the weight parameters w j must be real-valued.If N is even with N = 2N h , then two of the eigenvalues (λ 0 and λ N h ) are real-valued, while the remaining N − 2 eigenvalues form N h − 1 complementary pairs (λ j = λ * N −j for j = 1, . . ., N h − 1).Notice that e −iω N h = −1.We can also check that all the weight parameters as determined by Eq. ( 45) are again real-valued for N being even.
The mean energy E and the entropy S for this type of matrices are, respectively, Using the sum-product inequality for the eigenvalues |λ i | (see Eq. ( 37)), we obtain that The equality holds only if |λ j | 2 for all j = 1, . . ., N − 1 are equal to each other and that for which the temperature is and therefore E = N 2 T .Let us continue to investigate the existence and the degeneracy of the rotation-symmetric matrices satisfying the ideal-gas law.There are multiple constraints on the N eigenvalues λ i , including Eq. ( 49), the conjugate property λ j = λ * N −j , the condition of as inferred from w 0 ≡ 1, and furthermore, that the real part of every eigenvalue λ j must be positive to ensure the convergence of the predictive coding dynamics.First consider the limiting case of w j = w for all j ≥ 1, which corresponds to λ 0 = 1 + (N − 1)w and λ j = 1 − w for j ≥ 1.We get from Eqs. ( 49) and (50) that the only solution is This point (w, T ) is nothing but the critical point at which the permutation-symmetric solution becomes unstable [see Eqs. ( 35) and ( 43)].
Notice that for N = 2 it is impossible to have rotational symmetry when reciprocity is broken (w 12 ̸ = w 21 ), so the analysis of this section does not hold.The instability of the permutation symmetry in N = 2 is not towards rotational symmetry, but towards one unit dominating the other one.

9
, T rb c ).There is no non-trivial degeneracy within the rotation-symmetric solutions.
For N = 4, we have two real eigenvalues λ 0 and λ 2 and two complementary complex eigenvalues λ 1,3 = a 1 ± ib 1 .The solution is determined by At each valid temperature T , there is only one solution of λ 0 , and the values of λ 2 , a 1 and b 1 are then all uniquely determined.So there is also no non-trivial degeneracy within the rotation-symmetric solutions.
For N = 5, we have one real-valued eigenvalue λ 0 , and two pairs of complex eigenvalues: λ 1,4 = a 1 ± ib 1 and λ 2,3 = a 2 ± ib 2 .To eliminate the trivial symmetry we may require that a 1 ≥ a 2 .Then we have Given the temperature T , the value of λ 0 is uniquely determined.The sum a 1 + a 2 is then constrained to a fixed value but the individual values of a 1 and a 2 are not strictly constrained.Therefore there is an infinite degrees of degeneracy within the rotation-symmetric solutions.This same conclusion holds for all N ≥ 5. Rotation-symmetric weight matrices are special examples of more general cyclic-dominant (CD) matrices.The concept of cyclic-dominance is very easy to understand intuitively.Roughly speaking, it means one unit (say p 1 ) of the system is most strongly inhibited by another unit (say p 2 ) while p 1 only weakly affects p 2 , and unit p 2 is most strongly inhibited by yet another unit p 3 while p 2 only weakly affects p 3 , ..., and finally unit p N is most strongly inhibited by unit p 1 while p N only weakly affects p 1 .The directed cycle p 1 ← p 2 ← p 3 ← . . .← p N ← p 1 is a global closed loop of dominance.It is relatively easy to quantitatively measure the degree of cyclic-dominance in the system.For example, we may design an order parameter O CD as where (p 1 , p 2 , . . ., p N ) denotes a permutation of the indices of the N units.We see that O CD compares the most dominant cycle of directed interactions among N units and the reverse directed cycle of interactions.If there is no or only very weak cyclic dominance then O CD ≈ 0; if there is strong cyclic dominance (the limiting case being the N units form a single cycle of one-way interactions but no reverse interactions), then O CD will take a value close to unity.The order parameter O CD for the five units shown in the right panel of Fig. 2(b) of the main text is 0.996.Besides rotation-symmetric solutions there are other types of matrices, such as cyclic-dominant matrices without the property of rotational symmetry and different forms of EI-balanced networks (some concrete examples are given in the next sections).This will bring in additional degrees of degeneracy within the optimal solution space, further increasing the flexibility of achieving the ideal-gas law.
The units in a cyclic-dominant optimal weight matrices W may be regarded as forming only one component in the global level.We also encounter EI-balanced optimal weight matrices in our numerical computations.Roughly speaking, an EI-balanced weight matrix W are formed by two components of units.Group g I contains inhibitory units i whose output synaptic weights w ki to other units k are mostly inhibitory (i.e., w ki > 0); the other group g E contains excitatory units j whose output synaptic weights w kj to other units k are mostly excitatory (i.e., w kj < 0).An example of EI-balanced optimal matrix is shown in Fig. 2(b) of the main text.
One quantitative measure of EI-balance is to compute the net synaptic weights on all the N units, Notice that if the sum of input weights to unit i is j̸ =i w ij ≈ 0, then this unit is EI-balanced and it contributes a value ≈ 1 to the order parameter O EI ; while if all the input weights to unit i are of the same sign, then it contributes a term 0 to the order parameter.The EI-balanced optimal matrix of Fig. 2(b) has a value of O EI = 0.626.Another more coarse-grained measure of EI-balance is simply to compare the number M + of postive synaptic weights and the number M − of negative synaptic weights: The EI-balanced matrix of Fig. 2(b) attains the maximum value 1.0 in terms of this order parameter.

J. Numerical optimization methods
Analytical solutions of the free energy minimization problem are hard to achieve for systems containing N ≥ 3 units, especially at low temperatures.We resort to numerical optimization methods to solve this problem.Two simulated annealing search processes are implemented to construct optimal weight matrices.One process runs at a fixed temperature value T and aims at reaching the global minimum of the free energy F .The other works at a fixed entropy value S and attempts to reach the minimum energy E that is compatible with this entropy.
We introduce an annealing parameter κ and assign it a small initial value (say κ = 10 2 ).Then we run n 0 (e.g., n 0 = 10 5 ) stochastic trials to update W at each epoch of the optimization process.After one such epoch of trials, the next epoch then follows with a slightly increased value of κ (such as κ ← 1.01 × κ), initializing W by the so-far reached best weight matrix in this process.This annealing process is continued for hundreds of epochs until κ reaches a high value (say 10 8 ).The tentative optimal weight matrix is then further refined (if possible) by running greedy descent κ = ∞ for 10 × n 0 trials, and the final matrix W is returned as one candidate optimal solution.To make sure that the global minimum of free energy (at fixed T ) or energy (at fixed S) has been achieved, we repeat this whole κ-annealing process a number of times starting from different initial conditions and random number seeds and check whether the outcomes have converged to the same minimum value.

Temperature T being fixed
If the temperature T is fixed during the whole simulation process, in each elementary trial we propose a new weight matrix W ′ based on the current weight matrix W according to the following updating rules: (a): Single-element perturbation.The current weight matrix W is copied to the new matrix W ′ but with a single randomly chosen element w ij being changed to (b): All-elements perturbation.The current weight matrix W is copied to the new matrix W ′ but with every element The deviation δ ij is an independently and uniformly distributed random number in the range (−∆ 2 , ∆ 2 ).
The eigenvalues of the matrix I + W ′ are first computed.If the real part of an eigenvalue is below a bottomline value (fixed to be 10 −5 in this work), then W ′ is considered as invalid and the proposed update is rejected.If the eigenvalue bottomline constraint is satisfied by all the eigenvalues, then the free energy difference δF induced by the proposed weight matrix W ′ is computed.If δF ≤ 0 then W ′ is copied back to W (the proposed update being accepted).If δF > 0, then the proposed W ′ is accepted only with probability e −κδF (W is replaced by W ′ ) and it is rejected with the remaining probability 1 − e −κδF (the old W is kept).
We choose single-element and all-elements perturbations with equal probability 1/2 at each elementary trial.To improve the search efficiency, both ∆ 1 and ∆ 2 are dynamically adjusted so that the success rate of weight matrix change will be approximately 0.5.We also set the lower bound 10 −6 to the values of ∆ 1 and ∆ 2 to avoid the simulation process being trapped in a local minimum.If this lower bound is reach by (say) ∆ 1 in one epoch, we reset its value to ∆ 1 = 10 −4 in the next epoch to further increase the chance of escaping a local minimum of F .

Entropy S being fixed
If the whole simulation process is run under the constraint of fixed entropy, then in each elementary trial a new weight matrix W ′ is generated by modifying a single row or a single column of the old matrix W . Consider a row modification which generates the i-th row of W ′ as The constraint of fixed entropy S means the following linear constraint on the N − 1 perturbations δ ik : We can first generate N − 1 independent deviations δ ik by sampling in the range (−∆ 1 , ∆ 1 ), and then add a correction term δ 0 to all these deviations to satisfy the constraint (61).
Similarly, if a column j of the new matrix W ′ differs from the old matrix W by w ′ kj = w kj +δ kj , then the deviations δ kj shall satisfy the following constraint: After a new matrix W ′ has been proposed by row modification (with probability one-half) or by column modification (again with probability one-half), the eigenvalues of the matrix I + W ′ are first computed.If the real part of an eigenvalue is below the bottomline value (10 −5 ), then the proposed update is rejected.If the eigenvalue bottomline constraint is satisfied by all the eigenvalues, then the energy difference δE induced by the proposed weight matrix W ′ is computed.If δE ≤ 0 then W is replaced by W ′ .If δE > 0, then the proposed W ′ is accepted with probability e −κδE and it is rejected with the remaining probability 1 − e −κδE .

K. Some example matrices for N = 5
The phase diagram for N = 5 is shown in Fig. 3 of the main text.Here we offer some additional concrete samples of optimal weight matrices to help appreciate this phase diagram.The different phase regions are discussed by decreasing the temperature T along a vertical line of fixed pair correlation c.Here we mainly consider c = 0.6, for which the relationship between E and S is shown in Fig. 8.

Phase region β1
We show two degenerate optimal matrices here for the point c = 0.6, T = 0.45152 in phase region β 1 : (64) These two solutions have the same entropy S = −2.50 and the same energy E = 1.1288.The left one is a onecomponent system with cyclic dominant (CD) interactions but there is no rotation symmetry or other obvious symmetry, and the eigenvalues of I + W are {3.8092,0.5785 ± 1.2062i, 0.0169 ± 1.3367i}.The right network is an EI-balanced two-component system with group g E containing units 1 and 2 and group g I containing the remaining three units, and the eigenvalues are {1.3427,1.8103 ± 1.3213i, 0.0184 ± 1.3439i}.There is no permutation symmetry within g E , and there is cyclic dominance but no rotation symmetry in g I .

Phase range β2
For c = 0.6, the EI-balanced solution mentioned above reaches the eigenvalue bottomline at a critical temperature value 0.3872 and its free energy starts to be higher than that of the CD solution.The CD solution will reach the eigenvalue bottomline at a much lower critical temperature T = 0.2720, at which its energy susceptibility drops from 5/2 to 2.0.This transition is completely due to eigenvalue constraint; there is no any qualitative change in the symmetry of the solution.Here is an example matrix in this phase, obtained at c = 0.6 and T = 0.2303: whose entropy is S = −4.1 and energy is E = 0.56958 (the eigenvalues of I + W are {4.99996, 10 −5 ± 1.8639i, 10 −5 ± 1.8637i}.This is a CD matrix with all the complex eigenvalues touching the bottomline.

Phase region γ2
At c = 0.6, although the EI solutions are suppressed by the CD solutions in the phase space region β 2 , this suppression will be reversed at a critical temperature T = 0.2195, and a discontinuous transition between the CD and EI solutions occurs here.The EI solutions at the lower-temperature phase region γ 2 appears to be permutationsymmetric within group g E and cyclic-dominant with rotation symmetry within group g I .For example, the EI solution at T = 0.2138 (immediately after the discontinuous transition) is with entropy S = −4.35 and E = 0.5414.The eigenvalues of I + W are {2.0086,1.4957 ± 2.8409i, 10 −5 ± 1.9345i}.

Phase region γ3
As the temperature further decreases to another critical value T = 0.09505 (with c = 0.6), another discontinuous transition occurs to the optimal EI-balanced solution, caused by the breaking of internal rotation symmetry among the three units of group g I .This transition is associated with a small drop of the entropy from −6.410 to −6.417 and a small drop of the mean energy from 0.2397 to 0.2390.To demonstrate the different symmetries of these two types of EI-balanced solutions, we show the two corresponding solutions at the same entropy value S = −6.414and the same energy value E = 0.2394: Notice that the rotation-symmetry breaking of the group g I does not involve all the five units of the system; it is an abrupt change only in the subnetwork g I of the system.Notice also that the permutation symmetry between the two units 1 and 2 of group g E is intact in the EI-balanced phase space region γ 3 .

Phase region γ4
As the temperature T further decreases, the EI-balanced optimal solutions of the phase region γ 3 will reach the eigenvalue bottomline, leading to a kink in the mean energy.This continuous transition occurs at T = 0.0383 for c = 0.6 (with entropy S ≈ −8.73).Notice that there is no any qualitative change in the symmetry of the EI-balanced matrix, only that all the four complex eigenvalues touch the bottomline.
As the temperature decreases even further, the permutation symmetry between the two units of group g E will also break.This breaking of internal symmetry will be associated with a very weak continuous transition and a tiny kink in the mean energy.At very low temperatures, then there are no internal symmetries within both group g E and g I .To give one such example, we show the matrix at a very low temperature T = 0.02652 with entropy S = −9.47 and mean energy E = 0.071037 here: The eigenvalues of I + W are {4.99996, 10 −5 ± 7.1819i, 10 −5 ± 7.0902i}.Notice that cyclic dominance still exist in group g I and the three units of this group are all different.

Phase region γ1
Here we show an optimal solution obtained at c = 0.4 and T = 0.0978 in this second ideal-gas region γ 1 : The entropy is S = −7 and energy is E = 0.2446.The eigenvalues of I + W are {3.5236,0.6740 ± 4.9114i, 0.0642 ± 3.5581i}.This is an EI-balanced network with two groups of units.This is no permutation symmetry between the two units 1 and 2 of group g E , but there is cyclic dominance within the three different units (3,4,5) of group g I .

Phase region δ
This phase region is located at c close to unity and T relatively small.There is a discontinuous phase transition between the β 2 phase and the δ phase.For example at c = 0.99 this occurs at T = 0.0612.The optimal weight matrix at temperature T = 0.0613 (slightly above the critical temperature) is which has entropy S = 0.631 and energy E = 0.3210 (the eigenvalues of the matrix I + W are {4.99996, 10 −5 ± 0.5712i, 10 −5 ± 0.5711i}.It is a one-component cyclic-dominant network.Immediately after the discontinuous transition, the optimal matrix at T = 0.0608 is δ : whose entropy is S = −1.871and energy E = 0.1679 (the eigenvalues of I + W are {4.99996, 10 −5 ± 1.0680i, 10 −5 ± 1.0672i}).Notice that unit 1 excites units 2, 3, 5 but inhibits unit 4, while the remaining four units are all inhibitory units.At a lower critical temperature value T = 0.03445 there is again a discontinuous phase transition, from an optimal network of the δ phase to an optimal network of the γ 3 phase.At the phase transition point, the optimal network of the δ phase is δ : and its entropy is S = −3.50 and energy is E = 0.0914 (the eigenvalues of I +W are {4.99996, 10 −5 ±1.6044i, 10 −5 ± 1.6041i}.The optimal network after this transition is whose entropy is S = −3.6 and energy is E = 0.08796 (the eigenvalues of I + W are {0.8492,2.0754 ± 6.5191i, 10 −5 ± 0.9596i}. L. Ideal-gas law and phase diagram for the three-unit system (N = 3) The ideal-gas law for general input correlation matrix First, it is instructive to study the N = 3 system with symmetric input with a single parameter c for the pair correlations by exhaustive search.To achieve the ideal-gas law, the following condition must be satisfied by the synaptic weight matrix We consider the following general form of I + W as We may require c 1 ≤ c 2 to eliminate the trivial degeneracy of the synaptic weight matrix.Then we obtain that To determine the weight parameters, we express ], and x = 2 T − 1 ≥ 0 (so T ≤ 2).Then we find that where y = 2c T .Similarly, the expressions for a 2 and b 2 are Notice that the following condition must be satisfied by a 1 , b 1 , a 2 , b 2 : which means that The other constraints (to ensure the weight values being real) are All the eigenvalues of I + W should have positive real part.At each value of θ there are up to four distinct solutions of (a 1 , a 2 , b 1 , b 2 ).They can be determined accurately by numerical computations.The numerical results are the following.
For the EI-balanced network, numerical results indicate that the upper-bound temperature can be obtained by setting We can then obtain the upper-bound temperature.For example, the upper-bound EI-balanced temperature at c = 0.35 is T = 0.264684, and the corresponding network is This EI-balanced system breaks the internal permutation symmetry within group g E (units 2 and 3) to achieve the ideal-gas law.On the other hand, the lower-bound temperature for the ideal-gas law for the EI-balanced network can be computed numerically by computing the eigenvalues of W .The temperature upper-bound and lower-bound are shown in Fig. 9(a).
The other types of network realizing the ideal-gas law can all be obtained.For N = 3 these are cyclic-dominant networks.The temperature upper-bound and lower-bound for these ideal-gas CD networks are shown in Fig. 9(b).By comparing Figs.9(a) and 9(b) we see that there is a region in the phase space in which EI-balanced and CD optimal solutions coexist.The above analysis also works for the case of C having different values: with parameters c 22 , c 33 , c 12 , c 13 , and c 23 (the value of c 11 is fixed to be c 11 = 1 and we require that 1 ≤ c 22 ≤ c 33 ).Then we still have Eq.( 77) for the weight parameters c 1 and c 2 , and the equations for a 1 and b 1 are where x 2 = 2c22 T − 1 and y 1 = 2c12 T .Similarly, the expressions for a 2 and b 2 are where x 3 = 2c33 T − 1 and y 2 = 2c13 T .The other constraint is where y 3 = 2c23 T .We now examine a particular example, with correlation matrix Examining all the possible solutions of the ideal-gas-law condition (84), we find that the ideal-gas law is achievable for temperatures T ∈ [0.
In temperature range 1.70899 < T < 1.98262 there are two degenerate ideal-gas matrices.For example, at T = 1.
Notice that W (1) is cyclic-dominant, and W (2) contains a fully inhibitory unit, a fully excitatory unit, and a mixed unit.
In the temperature range 1.42383 < T < 1.70899 there are four degenerate ideal-gas matrices.For example at T = 1.6 they are: Matrices W (1) and W (3) are cyclic-dominant, and W (2) and W (4) contain a fully inhibitory unit, a fully excitatory unit, and a mixed unit.
In the temperature range 1.1927 < T < 1.42383 there are six degenerate ideal-gas matrices.For example at T = 1.3 they are: Matrices W (1) and W (3) are cyclic-dominant, and W (2) and W (4) contain a fully inhibitory unit, a fully excitatory unit, and a mixed unit, and matrices W (5) and W (6) are two excitation-inhibition balanced network with one inhibitory unit and two excitatory units.
In the temperature range 0.31636 < T < 1.1927 there are eight degenerate ideal-gas matrices.For example at T = 0.7 they are: Matrices W (1) and W (3) are cyclic-dominant, and the other six matrices are all formed by three functionally distinct units.
The degeneracy then decreases at lower temperatures.First the matrix W (1) attaches the eigenvalue bottomline at T = 0.31636, followed by matrix W (3) at T = 0.31346, followed by W (6) at T = 0.23364, followed by W (8) at T = 0.22835, followed by W (5) at T = 0.21387, followed by W (4) at T = 0.21290, followed by W (2) at T = 0.19830, and finally followed by W (7) at T = 0.19402.At the lowest temperature T = 0.19402 the ideal-gas law matrix is The above detailed results on the problem instance (88) demonstrate that the abundance of ideal-gas optimal matrices.These results indicate that, when the temperature T is in an appropriate range, the ideal-gas law optimal solutions might be relatively easy to obtain.We expect these aspects also extend to higher-dimensional systems with N > 3 units.

Phase diagram for uniform input correlation matrix
For the N = 3 system with symmetric sensory inputs the phase diagram in the plane of c and T is shown in Fig. 10.This phase diagram is qualitatively similar to the one for N = 5 (Fig. 3 of the main text) and it is much more richer than the phase diagram for N = 2 (Fig. 1 of the main text), confirming the intuitive expectation that three-body and multi-body (effective) interactions are capable of producing high degrees of complexity.We briefly describe the properties of these different phase space regions here.α: one-component networks with permutation symmetry.There is no permutation symmetry within the group of two units.
whose entropy S = −4.9 and E = 0.1198 and the eigenvalues of I + W are {2.99998, 10 −5 ± 6.6906i}.The permutation within the two excitatory units 2 and 3 is preserved in phase region γ 6 .
δ: An intermediate phase region between β 2 and γ 3 at high values of c.
Our numerical computations suggest that the intermediate phase region δ only exists for c > 0.9678.To demonstrate the discontinuous transitions between β 2 and δ and between δ and γ 3 , we plot the free energy curves of these three phases for c = 0.99 at low temperatures T close to 0.09 in Fig. 11.There is a discontinuous transition between the β 2 phase and the δ phase at T = 0.09905, at which the weight matrix changes from a purely inhibitory one with S = 0.5 and E = 0.4300 (the eigenvalues of I + W are {2.99998, 10 −5 ± 0.4496i}) to one with a single strong excitatory weight and five inhibitory weights whose S = −2.1 and E = 0.1725 (the eigenvalues of I + W are {2.99998, 10 −5 ± 1.6499i}: This second discontinuous transition only causes a relatively small drop in the energy and in the entropy, but the change in the optimal matrix W is quite comprehensive.
M. Some numerical results for N = 10 and c = 0.8 Besides N = 3 and N = 5, we also study systems with other number of units up to N = 10.The phase diagram for the system with N = 4 units has also been obtained by numerical computations.It is qualitatively similar to the phase diagrams of N = 3 and N = 5.Here we show some numerical results obtained on the large system of N = 10, which reveal the competition among multiple local minima of the free energy.The input pair correlation is fixed to be c = 0.8 for simplicity.
We confirm that there is a continuous permutation-symmetry-breaking transition at a high critical temperature T (1) c = 0.9489 for N = 10 and c = 0.8, at which cyclic dominance starts to emerge, and the energy versus temperature relationship starts to follow the ideal-gas law E = 5T .The optimal solutions are not unique but are degenerate.For example, we observe two distinct solutions W with the same entropy S = −4.6 and the same mean energy E = 1.1555 at temperature T = 0.2311.One of them is one-component with mostly positive synaptic weights, while the other is a two-component EI-balanced solution containing a single purely excitatory unit and nine inhibitory units.
As temperature decreases to another critical value T (2) c = 0.1763, the ideal-gas law E = 5T finally breaks as all the complex eigenvalues of the optimal solutions reach the bottomline.The energy susceptibility then drops from 5 to 4.5.At temperatures slightly below this critical value the minimum free energy solution is the two-component EI-balanced one with group g E containing a single excitatory unit and group g I containing the remaining nine inhibitory units (we call this the 1e-9i type structure).
As temperature decreases to much below T (2) c , we find that two other types of EI-balanced networks, the 3e-7i type with g E containing three excitatory units and g I containing seven inhibitory units and the 5e-5i type with g E containing five excitatory units and g I containing five inhibitory units, start to compete with this 1e-9i type networks.To demonstrate this competition, we draw in Fig. 12(a) the free energy differences of these three types of EI-balanced solutions at the vicinity of T = 0.107.
The free energy f 5 of the 5e-5i solution starts to be lower than the value f 3 of the 3e-7i solution as temperature is decreased below the threshold value 0.1084, so the 5e-5i structure becomes more favorable than the 3e-7i structure.At the lower critical value 0.1077 the 5e-5i structure also starts to be dominating over the 1e-9i structure.The 3e-7i structure also has lower free energy than the 1e-9i structure at T < 0.1070, but it can not compete with the 5e-5i structure.We therefore conclude that there is a discontinuous phase transition at T (3) c = 0.1077 between the 1e-9i and 5e-5i EI-balanced network structures.This transition is associated with a discontinuity in the entropy and in the mean energy [Fig.12(b)].
N. The effect of restricting all the synaptic weights to be non-negative Some neurons in the biological brain are purely inhibitory, and the networks formed by them will have no internal excitations.If the predictive coding system is composed purely of inhibitory units, then all the synaptic weights will be non-negative.Figure 5(a) has clearly demonstrated the significant effect at low temperatures of restricting the synaptic weights to be non-negative.Here we present some additional results obtained on multi-unit systems (N = 5).To compare with Fig. 2 of the main text, we fix the input correlation to be c = 0.8, see Fig. 13.
If the temperature is high enough the restriction of all weights being non-negative has no effect to the system's property.The weight matrix is completely symmetric at high temperatures (w ij = w ji = w > 0).A continuous transition occurs at T = 1.179 with reciprocity breaking (w ij ̸ = w ji ) and with the formation of cyclic dominance among any set of three units and the emergence of one global strongest inhibition cycle involving all the units.The energy susceptibility, χ ≡ dE/dT , has a discontinuity at this transition point, and the ideal-gas law follows after this permutation-symmetry breaking transition.
As T decreases to 0.336 the susceptibility drops from 2.5 to 2.0 but the energy E itself is continuous, indicating another continuous transition.Another signature of this transition is that the real parts of all the complex eigenvalues of I + W touch the bottomline value 10 −5 .At this transition point all the synaptic weights are still positive, so the non-negative constraint does not affect this transition.
As T further decreases to 0.15 some of the weights reach zero and the mean energy E starts to level off.There is then a considerable drop in E at T = 0.128 which is caused by a sudden relatively huge change in the weight matrix (see Fig. 13, T = 0.127 and 0.130).
As T keeps decreasing, the weight matrix experiences several additional abrupt changes before reaching the completely one-directional form at very low T values.The discontinuous nature of weights change is confirmed by the strong hysteresis behavior of energy at low temperatures [Fig.13, inset].The nearly minimum-energy weight matrix at T = 0.013 has a clear hierarchical structure: neuron 1 is inhibited by neuron 5 and inhibits neuron group {2, 3, 4} which in turn inhibits neuron 5, and there are internal cyclic inhibitions within group {2, 3, 4} and within group {1, 2, 5}.Because excitation-inhibition balance can not be realized in the system, at low temperatures the ideal-gas law has no chance of being approximately restored.The mean energy saturates to a value slightly below 0.35 as T decreases to zero.The energy E has two kinks at T = 1.179 and T = 0.336 (the slopes of fitting lines are respectively 2.5 and 2.0), and it experiences a series of small drops at low temperatures T < 0.15.The inset demonstrates the energy hysteresis during a slow cycle of T increasing (solid thin line) and decreasing (dotted thin line) in comparison with equilibrium (thick solid line).Also shown are the optimal weight matrices at several T values and the corresponding interaction graphs.The units are clustered into groups and are distinguished using different colors in each graph.The width of link from unit j to unit i is proportional to the weight wij (if wij is less than 10% of the maximum value, the edge is not shown).
FIG. 1.Phase diagram for the N = 2 system.The symmetry-breaking transition to w12 ̸ = w21 is continuous for c < 2/3 (blue dashed line) and discontinuous for c > 2/3 (red solid line).The green dot marks the tricritical point at c = 2/3, T = 9/8.Inset exhibits the continuous transition for c = 0.6 and the discontinuous one for c = 0.8.

FIG. 2 .
FIG. 2. Results for N = 5 and c = 0.8.(a) E versus T .The thin fitting lines have slopes 2.5 and 2.0, respectively (the critical temperatures T rb 5 = 1.1783 and T iglb 5 = 0.3360).(b)Examples of cyclic-dominant and EI-balanced optimal matrices at T ≈ 0.1383, and F (T ) and E(T ) for these two branches of solutions.Solid (dashed) links in the interaction graphs indicate positive (negative) weights.
(a) for statistically symmetric inputs with c 11 = c 12 = 1, and the corresponding synaptic weights w 12 and w 21 are shown in Fig. 4(b).We see that w 12 ̸ = w 21 when the value w 12 w 21 is below certain threshold value, at each fixed value c 12 of the input correlation.

FIG. 5 .
FIG. 5. Optimal predictive coding for the two-unit system with symmetric inputs (c11 = c22 = 1).Synaptic weights are restricted to be non-negative in (a) but are unrestricted in (b).Mean energy E and weights w12 and w21 are shown for three representative values of input pair correlation c12 = 0.60 (blue long dashed line), 0.80 (black solid line), and 0.95 (red dashed line).

)FIG. 6 .
FIG.6.Discontinuous phase transition is possible for the N = 2 system with c22 = 1 and c11 < 1 in the region between the two lines of this c11-c12 plane.The upper-boundary line corresponds to the limiting value c12 = √ c11.The lower-boundary line is determined numerically.

FIG. 7 .
FIG.7.The critical temperature T at which the permutation-symmetric network becomes locally unstable to perturbations, for symmetric input correlation matrix C with cii = 1 and non-diagonal elements all being equal to c.The results for N = 2, 3, 4, 5, 10 are obtained by numerical eigenvalue computation on the Hessian matrix(33) and they are in full agreement with analytical predictions [Eq.(35)]; the N = ∞ is obtained by analytical asymptotic analysis.

FIG. 8 .
FIG.8.The numerically obtained relationship between entropy S and energy E for N = 5 and c = 0.6.

FIG. 9 .
FIG. 9. (a)The upper-bound and lower-bound temperatures for the ideal-gas law of EI-balanced optimal solutions (a) and for cyclic-dominant optimal solutions (b).Number of units is N = 3, and c is the input correlation parameter.
19402, 1.98262].At the maximum temperature T = 1.98262 the ideal-gas matrix is W =

5 FIG. 10 .
FIG.10.Phase diagram for the N = 3 system with symmetric inputs (cii = 1 and pair correlation cij = c for any i ̸ = j).Region α is the permutation-symmetric phase; region β1: one-component and two-component optimal networks obeying the ideal-gas law; region β2: one-component networks with cyclic dominance; region γ1: two-component networks obeying the ideal-gas law; region γ2-γ6: two-component networks with excitation-inhibition balance; region δ is an intermediate region between β2 and γ3.Dashed lines indicate continuous transitions associated with symmetry change; dotted lines indicate continuous transitions caused by the eigenvalue bottomline constraint; solid lines mark discontinuous transition between different types of networks.

β 1 :
containing both one-component networks with cyclic dominance and two-component networks with excitationinhibition balance.The ideal-gas law is obeyed.

γ 1 :
two-component EI-balanced networks without internal permutation symmetry within the group of two units.The ideal-gas law is obeyed.One example network obtained at c = 0.35 and T = 0.2643 (slightly below the γ 2 → γ 1 critical temperature value 0.2644) is γ 1 : whose entropy is S = −2.87 and energy E = 0.3965, and the eigenvalues of I +W are: {2.2177, 0.3912±2.7928i}.

γ 2 :
two-component EI-balanced networks with internal permutation symmetry within the group of two units.One example for c = 0.35 and T = 0.2661 (slightly above the γ 2 → γ 1 critical temperature 0.2644) is entropy S = −2.86 and energy E = 0.3992) and the eigenvalues of I + W are {2.2106,0.3947 ± 2.7827i}.γ 3 : two-component EI-balanced networks at high values of c without permutation symmetry within the group of two units.

γ 4 :
two-component EI-balanced networks at low values of c constrained by the eigenvalue bottomline.One example obtained at c = 0.35 and T = 0.1404 is entropy S = −3.8 and energy E = 0.21335, and the eigenvalues of I + W are {2.99998, 10 −5 ± 3.8601i}.

γ 5 :
two-component EI-balanced networks at high values of c constrained by the eigenvalue bottomline.There is no permutation symmetry within the group of two units.

γ 6 :
two-component EI-balanced networks constrained by the eigenvalue bottomline, with permutation symmetry within the group of two units.The transition between γ 4 and γ 6 and that between γ 5 and γ 6 are both continuous.An example obtained at c = 0.35 and T = 0.0464 is
brings a large drop in the energy and in the entropy.At a lower temperature T = 0.08369, there is another discontinuous transition, from a matrix of the δ phase with S = −2.45 and E = 0.1407 (the eigenvalues of I + W are {2.99998, 10 −5 ± 1.9645i}) to an EI-balanced matrix of the γ 3 phase with S = −2.6 and E = 0.1281 (the eigen values of I + W are {0.6148,1.1926 ± 4.5251i}):

FIG. 12 .
FIG. 12. Results obtained for N = 10, c = 0.8, indicating a discontinuous phase transition at T = 0.1077.(a) Free energy difference.The 1e-9i, 3e-7i and 5e-5i type solutions are indicated by the labels 1, 3 and 5, respectively, and their free energies are f1, f3 and f5.The inset shows the entropy-energy curves for the three types of solutions.(b) Energy versus temperature, and entropy versus temperature (inset).

5 FIG. 13 .
FIG.13.Optimal predictive coding with non-negative weights for N = 5 and input pair correlation c = 0.8.The energy E has two kinks at T = 1.179 and T = 0.336 (the slopes of fitting lines are respectively 2.5 and 2.0), and it experiences a series of small drops at low temperatures T < 0.15.The inset demonstrates the energy hysteresis during a slow cycle of T increasing (solid thin line) and decreasing (dotted thin line) in comparison with equilibrium (thick solid line).Also shown are the optimal weight matrices at several T values and the corresponding interaction graphs.The units are clustered into groups and are distinguished using different colors in each graph.The width of link from unit j to unit i is proportional to the weight wij (if wij is less than 10% of the maximum value, the edge is not shown).