The critical temperature of the 2D-Ising model through deep learning autoencoders

. We investigate deep learning autoencoders for the unsupervised recognition of phase transitions in physical systems formulated on a lattice. We focus our investigation on the 2-dimensional ferromagnetic Ising model and then test the application of the autoencoder on the anti-ferromagnetic Ising model. We use spin conﬁgurations produced for the 2-dimensional ferromagnetic and anti-ferromagnetic Ising model in zero external magnetic ﬁeld. For the ferromagnetic Ising model, we study numerically the relation between one latent variable extracted from the autoencoder to the critical temperature T c . The proposed autoencoder reveals the two phases, one for which the spins are ordered and the other for which spins are disordered, reﬂecting the restoration of the Z 2 symmetry as the temperature increases. We provide a ﬁnite volume analysis for a sequence of increasing lattice sizes. For the largest volume studied, the transition between the two phases occurs very close to the theoretically extracted critical temperature. We deﬁne as a quasi-order parameter the absolute average latent variable ˜ z , which enables us to predict the critical temperature. One can deﬁne a latent susceptibility and use it to quantify the value of the critical temperature T c ( L ) at diﬀerent lattice sizes and that these values suﬀer from only small ﬁnite scaling eﬀects. We demonstrate that T c ( L ) extrapolates to the known theoretical value as L → ∞ suggesting that the autoencoder can also be used to extract the critical temperature of the phase transition to an adequate precision. Subsequently, we test the application of the autoencoder on the anti-ferromagnetic Ising model, demonstrating that the proposed network can detect the phase transition successfully in a similar way.

Trained neural networks can thus help distinguish phases in simple statistical systems -the structure of which is known -but, more importantly in more complex systems where the underlying phase structure is unknown. In this work, we would like to examine whether the proposed, fully-connected (Dense), deep learning autoencoder, which does not require supervision during training, can shed light on the phase structure of the 2D-Ising model and enable the identification of the critical temperature in the thermodynamic limit. Furthermore, we would like to investigate if deep learning autoencoders can be used as tools in order to extract physical observables with better statistical accuracy. This means to define new observables which demonstrate different features than the well-known quantities such as the order parameter of the theory.
As mentioned above, this is not the only work where autoencoders are used. Autoencoders have been previously used for the identification of phase transitions in references [6,13]. In reference [6] the authors studied the phase transition of the 2D-Ising model using in addition to PCA an autoencoder based on Convolutional Neural Networks (CNNs) and observed that they could actually "see" a transition and differentiate the two distinct states of the 2D-Ising model through the latent variable. Nevertheless, this result has been obtained for one lattice volume and a finite volume analysis 1 has not been carried out. Thus, it is not clear whether the observed critical point converges to the known critical temperature in the limit of infinite volume.
In reference [13] the author used a variational autoencoder as well as an autoencoder to demonstrate that the latent parameters of the autoencoder, resulting by feeding it with configurations of the 2D-Ising model, corresponds to the known order parameter, i.e. the magnetization. Although this was carried out at one lattice volume, it is clear that since the latent variable is identified as the magnetization, the extracted critical temperature for finite volume will converge to the known, theoretically extracted, critical temperature at the infinite volume limit. At this point, it should be made clear that goal of this work is not to use the autoencoder to reproduce the order parameter, but as a technical tool enabling to study particular features of the model and identify new quantities which might be proven useful for analyzing the phase behaviour of statistical and possibly gauge theoretical models. It actually turns out that using the proposed network with the chosen activation functions, the latent variable we observe is not identified as the magnetization but as a different quantity which is affected less by finite volume effects, leading to faster convergence in the thermodynamic limit.
Moreover, in the field of Computational Physics, autoencoders are also used by exploiting their generative context. In other words, there have been investigations on how to use variational autoencoders towards the reconstruction of physically meaningful configurations of statistical systems such as the 2D-Ising model [24] and the 2D XY model [25]. Although this is a hot and promising topic since it can potentially reduce the cost of the production of such configurations, the successful reconstruction of 2D-Ising configurations is beyond the scope of this work.
Deep learning autoencoders are frequently used in cases where data hides interesting structure by processing the raw datasets. They can, therefore, be used to discover interesting structure in ensembles produced for a range of a parameter that characterizes the phase space of the model (with different sectors having different physical properties). One such example is the ferromagnetic Ising model for which at the critical temperature T c , the system undergoes a transition from the ordered phase to the disordered. A minor variation of this model is the anti-ferromagnetic Ising model, where the system also undergoes a similar phase transition.
In this work, we investigate the action of unsupervised machine learning, namely the deep learning autoencoder (not variational), towards the identification of the phase transition of the 2D-Ising (anti)ferromagnetic model. More specifically, we produce decorrelated configurations for the 2D-Ising model for a given range of temperatures, and then we apply the autoencoder trying to understand what characteristics of the phase structure we can capture. Hence, technically, this work combines the production of configurations using Monte Carlo methods as well as the deep learning autoencoder algorithm. We observe that the autoencoder can capture the underlying Z 2 symmetry and can indeed find out where the transition occurs by identifying a relevant, quasi-order parameter: the mean value of the absolute latent variable. Although this quantity is not suitable for predicting the order of the transition, it can determine the critical temperature with small finite scaling effects.
This article is organized as follows: In Section 2 we present a brief description of the 2D-Ising ferromagnetic and anti-ferromagnetic model, explaining the production of the configurations as well as its phase structure. In Section 3, we discuss the deep learning autoencoder, explain how it works and provide the structure of the network. Subsequently, in Section 4 we provide our results for the ferromagnetic 2D-Ising model. Additionally, as a test, we apply our autoencoder on the anti-ferromagnetic 2D-Ising model and demonstrate our results in Section 5. Finally, in Section 6, we present our conclusions.

The 2-Dimensional Ising model
One of the most interesting physical phenomena in nature is magnetism. It is known that the ferromagnetic materials exhibit a spontaneous magnetization in the absence of an external magnetic field. Such magnetization occurs only if the temperature of the system is lower than a known critical temperature T c , the so called Curie temperature. If the temperature of the system is raised so that T > T c , then the magnetization vanishes. In principle, the critical temperature T c separates the microstates of the system from being ordered or magnetized for T < T c to being randomly oriented resulting in zero magnetization; these two phases correspond to the ferromagnetic and the disordered phases, respectively.
(Anti)ferro-magnetism has a quantum mechanical nature and, thus, much effort is invested towards its understanding. Albeit quantum mechanical, simple classical models can help to gain insight into this effect. The 2D-Ising model is a classical model that is commonly used to study magnetism. The 2D-Ising model can be considered as a lattice with N = N x × N y sites, on each of which a double valued spin s i is located, either being in an "up" orientation denoted by ↑ or s i = + or "down" denoted by ↓ or s i = −.
The macroscopic properties of the 2D-Ising system are determined by the nature of the accessible micro-states. Thus, it is useful to know the dependence of the Hamiltonian on the spin configurations. The total energy is given by where J is the self-interaction between neighbouring spins, h the external magnetic field and µ is the atomic magnetic moment. Note that in the first sum, the notation nn(i) represents nearest-neighbour pairs; the sum is taken over all nearest-neighbouring pairs. The sign of J determines whether we have a ferromagnetic(J = 1) or an anti-ferromagnetic(J = −1) system. In the case of the canonical ensemble, in other words, when the system is attached to a thermal reservoir and kept at a constant temperature T , as the time passes the spins are left to fluctuate with rates depending on the reservoir's temperature. This behaviour can be captured in a Monte Carlo (MC) simulation in the canonical ensemble. In the ferromagnetic case at T = 0, the system is frozen with all spins being at one direction either down or up. On the other hand in the anti-ferromagnetic case at T = 0, the system gets split into two sub-systems in a checkerboard pattern, and the difference between the spins of these two sub-systems points at one direction either up or down. The orientation of the spins is arbitrary, however, the dynamics enforce the system to choose one of the two directions. This corresponds to the spontaneous symmetry breaking of the Z 2 global symmetry group in the ferromagnetic case. In the anti-ferromagnetic case, the existence of a checkerboard pattern corresponds to the spontaneous breaking of the translation symmetry. Although the Hamiltonian of the system is invariant under Z 2 and translation transformations, the degenerate ground states are not invariant but get interchanged under such transformations.
For small, nonzero values of the temperature, spins of the whole system (in the ferromagnetic case), or the subsystems (in the anti-ferromagnetic case), still form large sectors where all spins are correlated and point to one direction. Above the critical temperature of T c , the spins are disordered and Z 2 symmetry is restored.
The question that we address in this work is whether the behaviour described above can be captured by a deep learning autoencoder when we pass it ensembles for a sequence of temperatures separated by some δT . More precisely, we seek to understand if a qualitative description of the phase structure of the Ising model can be extracted and whether one can determine the critical temperature T c .

Swendsen-Wang algorithm
The MC simulation for the 2D-Ising model is conventionally performed using the Metropolis algorithm. Since this algorithm is based on local updates, it faces the problem of critical slowing down near the critical temperature, where the correlation length diverges. In order to tackle this problem, we have implemented the Swendsen-Wang cluster algorithm [28,29], which is based on global updates of the spin configurations. This algorithm relies on the formation of bonds between every pair of nearest neighbours(ij) that are aligned at a given temperature T , with a probability p ij = 1 − exp (−2βJ), where β = 1 k B T (k B ≡ Boltzmann constant). A single cluster is defined as all the spins, which are connected via bonds. The global update is defined as the collective flipping with a probability of 1/2, on all the spins in each cluster [30,31]. This step works because of the so-called Fortuin-Kasteleyn mapping of the Ising model on the random-cluster model. Thus, global updates enable us to produce equilibrium configurations close to the T c with a few thermalization steps.

Monte-Carlo simulation setup
In this work we chose to investigate the case of zero external magnetic field (h = 0) and for simplicity we have set J = ±1 and k B = 1. In this case, the theoretically calculated value of the critical temperature is To extract experimentally this quantity one has to investigate the order parameter of theory. The first question that we address is whether we can get an approximate estimate of this temperature by using unsupervised learning. For this purpose, we define a sequence of different values of temperature. Then, for each one, we start from a "hot" configuration of spins (where the spins are oriented randomly), perform a large enough number of thermalization sweeps and then save the configuration. For every single temperature, we repeat the procedure 200 times. The same results could be obtained by starting from a "cold" configuration, letting the Markov chain evolve, and then sampling configurations along the single chain, but, the former procedure guarantees a higher degree of de-correlation within the data.

Phase structure, observables and order parameters
The phase structure of the 2D-Ising model can be reduced to the study of the magnetic order of the system [32,33]. If we suppose that there are N ↑ spins pointing upwards and N ↓ spins pointing downwards, then the total magnetic moment would be N ↑ − N ↓ (µ = 1). The largest possible magnetic moment would, therefore, be N . Thus, for the ferromagnetic case, we can define the magnetic order parameter or magnetization per spin configuration naturally as: while the average magnetization M = m . M can get values between −1 and 1, and the average of the absolute magnetizationm = |m| is just the magnetic order. Hence, ifm is close to 0, then the system is highly disordered and, thus, not magnetized, with approximately half of the spins pointing up and the other half pointing down. On the other hand, ifm is approximately 1, the system is ordered and, thus, magnetized with nearly all the spins pointing in the same direction.
In the anti-ferromagnetic case, the relevant magnetic order parameter is the staggered magnetization per spin configuration. In the checkerboard lattice, if we label black sites as (+) and white sites as (−), and we define m + and m − using (3), then the staggered magnetization per spin configuration (m s ) can be defined as: while the average staggered magnetization as M s = m s . Similar to the ferromagnetic case, the magnetic order is the average of the absolute staggered magnetizationm s = |m s | . Therefore, ifm s is close to 0, then the system is highly disordered and approximately not magnetized. On the contrary, the system has exactly 0 magnetization ifm s is close to 1 because in the system every spin is surrounded by the opposite spin among its neighbours, which makes it exactly ordered. The point T = T c is called the critical point and separates the ordered T < T c phase and disordered T > T c phase. At T = T c the system is described by a second order phase transition, i.e.à la Ehrenfest [34] the first derivative of the free energy with respect to the external field which is the order parameter is continuous while the second derivative of the free energy is discontinuous. Since there exists a bijective map between the spin fields of the ferromagnetic and anti-ferromagnetic cases of the 2D-Ising model, the phase transition in both the cases is identical.

Deep learning autoencoders
The concept of autoencoders exists for decades [35,36], where conventional autoencoders were used for feature learning and dimensionality reduction. In recent years, work has been conducted to join autoencoders and probabilistic latent variable models: Alternate forms of autoencoders have become popular for so-called generative modelling [37,38]. Autoencoders are a variant of artificial neural networks utilized for learning data codings in an unsupervised manner, efficiently [39,40]. An autoencoder aims to define a representation (encoding) for an assemblage of data, usually performing dimensionality reduction. An autoencoder encodes the input data ({X}) from the input layer into a latent variable ({z}), and then uncompresses that latent variable into an approximation of the original data ({ X}). The autoencoder engages in dimensionality reduction by learning how to ignore the noise and recognize significant characteristics of the input data. As Figure 1 shows, an autoencoder consists of two components, the encoder function g φ and a decoder function f θ and the reconstructed input is X = f θ (g φ (x)). The first layer of an autoencoder might learn to encode simple, identifiable and local features. The second layer by using the output of the first layer learns to encode more complex and less local features. This continues for higher-order layers until the final layer of the encoder learns to identify and encode the most complex and global characteristics of the input data. The same process in reverse is true for the decoder where the goal is to go from the compressed latent variable to the original input.
The activation functions have been chosen in such a way so that the latent variable provides as sharp as possible transition at the assumed critical point. In other words, we investigated which combination of activation functions leads to a steeper change on the scattered latent variable (like for instance Fig. 3), in order to identify with as much accuracy as possible the critical temperature for a given lattice volume. Other combinations of activation functions have also been investigated and will be presented in a forthcoming work [52].
In the training phase, the autoencoder learns the parameters φ and θ together, where f θ (g φ (x)) can approximate an identity function. Various metrics can be used to measure the error between the original input X and the reconstructionX, but the most simple and most commonly used is the Mean Square Error (MSE) as this is provided in equation (5), where n data is the number of data points:

Proposed autoencoder model
For the analysis of the proposed method, an eight-layer, fully connected (Dense), autoencoder is proposed, as Figure 1 shows, where the encoder compresses the configurations into a single latent variable. Through experimentation, we determine that the best model to detect the transition consists of the encoder with the input layer, first, second and third hidden layers having 625, 256, 64 and 1 neurons, respectively. The input layer has size equal to L × L and, thus, it changes as we alter the lattice size. The activation function used is ReLu (rectified linear unit), as shown in equation (6), for all layers except the third hidden layer, where tanh was used, as shown in equation (7). For the decoder, the first, second and third hidden layers use 64, 256, and 625 neurons, respectively. For the output layers, the number of neurons is set to be equal to the number of lattice points in the configuration under investigation. The activation function used is ReLu, as given in equation (6), for all hidden layers, and for the output layer, tanh is used, as per equation (7).
For the proposed autoencoder model, we use the socalled dropout realization technique [41], on 30% of the neurons at each layer. The dropout regularization technique refers to temporarily deactivating neurons from each layer, randomly, when training. It was successful at reducing over-fitting in our case. For the training of the proposed autoencoder model, the data are split into training (66.66. . . %) and testing (33.33. . . %) sets and the training is performed for 2000 iterations. For training, the starting learning rate was set to 0.001 and was reduced by 20%, when learning stagnates for 30 epochs, with minimum learning rate 0.000001. The implementation was performed using Keras [42] and Tensorflow [43]. In order to identify signals of the phase structure of the 2D-Ising ferromagnetic model, as a first step, we investigate how the latent variable z i conf behaves as a function of the temperature T for each configuration. We produce 40 000 configurations, namely 200 configurations for every single temperature. The produced configurations are for 200 different values of temperatures within the range T = 1 − 4.5 and separated by δT = 0.0175. We choose this range to make sure that we cover the two extreme cases, the nearly "frozen" at T 1 and completely disordered T 4.5. Furthermore, we assume that we have no prior knowledge of what is happening in between these two extremes.
After training the proposed autoencoder on configurations of the 2D-Ising model, the reconstruction error was found to be relatively high (0.6 − 0.7) with both training and testing sets. We clarify that the autoencoder is trained for all temperatures together in one dataset for each lattice size L. Table 1 shows the results. As the configurations consist of spins with values of 1 and -1, the maximum possible error is 4 based on MSE.
It appears that both training and testing errors increase with the lattice size. This was expected as the dimensionality (the number of configuration components) increases. Table 1, also provides the average reconstructed accuracy, which appears to be low and to decrease with the size of the lattice. This indicates the complexity of the problem in terms of reconstruction from the encoded variables as the error increases with the size of the configurations. A crucial point is the very low number of latent variables, in this case, only one. This is also demonstrated Fig. 3. The latent variable for each configuration as a function of the temperature for four different lattice volumes. These are scatter plots where no averaging was done for every single input data. The dashed line represents the analytically extracted value of the critical temperature (Eq. (2)). The red shaded area in the plot for L = 150 is the region where (by fitting to a constant) we expect to find the Tc(L = 150). The color on the gradient illustrator on the right denotes the temperature T .
in Figure 2, where we present the result of reconstruction for two configurations at two different temperatures close to the critical point and for lattice size L = 50. This result resembles the poor reconstruction accuracy observed in reference [25] when using variational autoencoder to reconstruct configurations for the 2D XY model. The reconstruction error appears to be small for low temperatures with an average MSE error of ∼ 0.015 at T = 1. As the temperature increases, the error rises until it reaches T c for which it becomes ∼ 1 and stops changing further with the temperature. In Figure 3 we show the latent variable for each different configuration, as a function of the temperature T , for four different lattice sizes, L = 25, 35, 50, 150. Figure 3 has the following features: -For low temperatures, we obtain two plateaus, one located at z = 1 and one at z = −1. A first simplistic explanation for this pattern would be that it corresponds to two distinct states that are not connected through any kind of transformation. This reflects the spontaneously broken Z 2 ≡ {−1, 1} global symmetry group. One can interpret these two plateaus as the two cases where all spins are up or down. This interpretation is confirmed by the results presented in Figure 4 where we show the absolute correlation coefficient C z,m between the latent variable z and the magnetization m defined as The fact that at low temperatures the absolute correlation coefficient is 1 demonstrates that the two different values of the latent variable −1 and 1 correspond to the two orientations of the spins. Finally, the two plateaus become more distinct as the lattice size increases. -At some temperature range ∆T trans the aforementioned behaviour collapses to one state, which is located around z = 0. This reflects the restoration of Z 2 symmetry. In other words, it corresponds to the case where all the spins are disoriented. -There is a critical point where there is a change in the pattern. As the lattice size increases the width of this transition decreases with T trans → 0 and this step becomes steeper and steeper. At L → ∞ the transition is localized right on the critical temperature T c extracted analytically.
Evidently, plotting the latent variable as a function of the temperature demonstrates that the autoencoder "notices" the two phases. Also, it provides a good approximation of the critical temperature. In fact from Figure 3 the transition appears to occur right at the critical temperature for L = 150. We fit the points of the latent variable which, to a good approximation, behave linearly to a constant as a function of T . This enables us to restrict that the collapse of the two states located at 1 and −1 occurs at T 2.28 (4). This temperature region is denoted in Figure 3 as the shaded area in red. In Figure 4 we observe that within this temperature region the value of the absolute correlation coefficient C z,m starts to decrease from 1. This demonstrates that, although highly correlated, the latent variable and magnetization are two different quantities. This result shares similarities with the findings of reference [19] where the authors showed that the latent variable resulting from an autoencoder and a variational autoencoder has some level of correlation with the order parameter. The authors used an autoencoder with one fully connected hidden layer with 256 neurons and ReLu activation function and a final layer with a sigmoid activation function. A simple explanation of the ability of the autoencoder to detect the phase transition is due to the fact that the variances of data which is fed as input are becoming maximal at the point of the phase transition. A possible explanation of the behaviour of the latent variable of being steeper than that of the magnetization close to the critical point is that the choice of non-linear activation function in the network "distorts" and possibly enhances certain contributions to the variances. Namely, we have observed that different choices of activation functions would affect the steepness of the average latent variable. As a matter of fact, this combination of activation functions maximizes the steepness of the average latent variable close to the critical point. A comprehensive study of how different choices of layers, number of neurons and activation functions are affecting the behaviour of the latent variable is under way [52].
Finally, we observed that for low temperatures the latent variable is, in a good approximation, equally distributed between the values of z = 1 and z = −1. This can be seen in Figure 5 where we present the average latent variable z for each temperature and L = 150 as a function of the temperature.
One could also investigate what happens within different "temperature windows". For instance, we can use a temperature window within the range T = 1 − 2 and apply the autoencoder. The outcome would be the behaviour presented in the left panel of Figure 6, where only the two ordered states are visible without the presence of a critical point. Since there is no visible signal for a phase transition behaviour within this range of T , it is reasonable to use another temperature window. If we choose T = 3−4.5, for instance, the corresponding latent variable would be the one given on the right panel of Figure 6 where no particular pattern is observed. A sensible next step would be to investigate what happens within a range of temperatures located between the two previous temperature windows, for instance, T = 1 − 4.

The absolute average latent variable
Since the latent variable per configuration is symmetric with respect to the T axis, it would be reasonable to define the average absolute latent variable as a parameter indicating the phase as Figure 3 shows that the latent variable resembles the behaviour of the magnetization per spin configuration as a function of the temperature. The absolute average magnetization defines the order parameter of the system distinguishing the two different phases. For the case of the autoencoder we can define an additional quasi-order parameter as the absolute average latent variable.
In the left-hand side of Figure 7 we provide the magnetization as a function of the temperature while on the right-hand side we provide the absolute latent variable. Indeed the absolute latent variable looks similar to the magnetization, albeit becoming steeper as the lattice size increases. Clearly, the magnetization behaves as an order parameter with the characteristics of a second order phase transition while the absolute latent variable resembles a behaviour consistent with a first-order phase transition. Of course, a more careful study needs to be carried out in order to understand better whether the latent variable can actually capture a first-order phase transition. We can, therefore, conclude that the absolute average latent variable can be used as an order parameter to identify the critical temperature, but cannot capture the right order of the phase transition. The fact thatz as a function of the temperature becomes steeper as the lattice size increases suggest that the critical temperature T c (L) as a function of the lattice size L extracted from the autoencoder data will suffer less from finite-size scaling effects as discussed in detail in Section 4.3.
Traditionally, T c (L) can be extracted by probing the peak of the magnetic susceptibility χ at zero magnetic field h, where According to finite size scaling theory, close enough to T c , magnetic susceptibility χ scales as where t = (T − T c )/T c is the reduced temperature and γ = 7/4 a critical exponent [33]. The magnetic susceptibility measures the ability of a spin to respond due to a change in the external magnetic field. In the same manner we define the latent susceptibility as Another conventional route to obtain the T c is by computing the fourth-order cumulant of the order parameter, also known as the Binder cumulant [47,48], defined as (13) Fig. 7. The average absolute magnetization (left) compared to the average latent variable (right) as a function of the temperature for five different lattice volumes. Magnetization has a behaviour in accordance with a second order phase transition while the average latent variable appears to have behaviour resembling a first order phase transition. The data presented on the right plot stem from testing data. Each data point has been extracted as an ensemble average of the observable for fixed volume and temperature. This quantity aims at capturing the non-trivial fluctuations of higher order in the spin, thus, excluding the trivial Gaussian fluctuations. In the thermodynamic limit (L → ∞), the cumulant becomes 2 3 for T < T c and 0 for T > T c .
We define a similar quantity with respect to the latent variablez, asŨ Binder cumulants U 4 andŨ 4 have been plotted for the largest volume in Figure 8. The U 4 obtained from Monte Carlo simulations is consistent with the thermodynamic limit. On the other hand, theŨ 4 obtained using the latent variable exhibits a plateau below 0 which makes it ambiguous to comment on the order of the phase transition using autoencoders.
Binder cumulants can be further utilised in the cumulant ratio intersection method to extract the T c [49], independently of the critical exponents. In Figure 9, we have demonstrated the application of this method to pinpoint the T c using U 4 andŨ 4 . The weak dependence of the Binder cumulant U 4 on L, keeps it close to the (universal but nontrivial) fixed-point U  noisy behaviour while crossing y = 1, which is the prime reason we resorted to the latent susceptibility method discussed in the next section to extract the T c . For the extraction of T c one realizes, by looking at Figure 7, that more data points close to the critical behaviour are needed to extract the critical temperature from the latent susceptibility. Hence, we produce configurations for a grid of temperatures near the critical regime. More specifically, we produce 200 configurations per temperature, for 200 different values of T in the range of T = 2−2.8 and δT = 0.004 for all the volumes considered. In addition, for L = 100 and L = 150 we produce 200 configuration for each value of T , in the range of T = 2.22−2.34 with δT = 0.0006. These new configurations, however, are not used to train the autoencoder. Instead, we use the synaptic weights extracted and predict the latent variable for the new configurations. Hence, this serves as a confirmation that our data do not suffer from over-fitting.
In Figure 10, we present the results of applying the autoencoder weights on the new configurations produced in the region close to the critical point. We compare with results extracted using configurations produced in the range T = 1 − 4.5. Both datasets agree, and there is a nice continuation of the behaviour of the absolute average latent variable within the critical regime. This serves as a confirmation that the execution of the encoder does not suffer from any over-fitting occurrence and at the same time, more data points can be used for the extraction of χz. To avoid any confusion, we state that all generated data which are presented in figures have been extracted exclusively from the test datasets. Furthermore, the plot for L = 150 behaves nearly as a step function with the step being right on the theoretically extracted T c . By fitting the second moment of the latent variable, as this is described in Section 4.3, one sees that the transition occurs at T c (L = 150) = 2.2779(3); this value is very close to the theoretically extracted value T c = 2.26918.
In the following section, we present the analysis of our data in order to investigate the latent susceptibility χz and, to subsequently, extract the critical temperature T c (L) from the corresponding peak.

The latent susceptibility and the critical temperature
In the previous sections, we provided strong evidence that the latent variable, resulting from the proposed autoencoder, demonstrates the underlying phase transition and that it can also be used as a rough estimate for the critical temperature T c . Nevertheless, as the finite lattice size L increases we need to make sure that T c (L) tends to the right limit, i.e. it convergences to the theoretically extracted value given in equation (2) as L → ∞.
To investigate the convergence of T c (L), we first extract T c (L) for each different lattice size and then extrapolate to infinite L. T c (L) can be extracted by probing the peak of the latent susceptibility for each L. The latent susceptibility as a function of the temperature for the five different lattice sizes, is presented on the left-hand side of Figure 11. Unlike the magnetic susceptibility, presented on the righthand side of Figure 11, the latent susceptibility is much sharper with peaks being closer to the known critical temperature T c . This means that the critical temperature for each L is influenced by less finite-size scaling effects.
Our temperature grid is fine enough and enables an adequate extraction of the T c (L) from the coordinates of Figure 11. Hence, there is no need to use multi-histogram reweighting [5] techniques. The latent variable behaves to a large extent as a step function, and thus, tends to ∝ δ(T − T c ) as L → ∞. In addition, the derivative  2)). Each data point has been extracted as an ensemble average of the observable for fixed volume and temperature. Fig. 12. The critical temperature Tc(L) extracted from fitting the magnetic (red) and the latent (blue) susceptibilities as a function of 1/L to equation (15). The error bands are estimated using the jackknife fit errors on the fit parameters.
of the latent susceptibility appears to be continuous. So we can also use a Gaussian fit to estimate the critical temperature.
In Figure 12 we present T c (L) extracted from fitting the latent susceptibility and the magnetic susceptibility as a function of 1/L. Results obtained using the latent susceptibility suffer less from finite-size scaling effects as compared to those when using the magnetic susceptibility. Adopting, the usual finite-size scaling behaviour we fit both susceptibilities to the ansatz T c (L) = T c (L = ∞) + αL −1/ν . Our findings are listed in Table 2.
As expected, fitting the data for T c (L) resulting from the magnetic susceptibility yields values of T c (L = ∞) and  (15), yield a value for T c (L = ∞), which is in accordance with the theoretical expectation. One can observe that the T c curves in the infinite volume limit intersect with the theoretical value in the thermodynamic limit. This provides good evidence that the deep learning autoencoder does not only predict the phase regimes of the 2D-Ising model as well as give an estimate for the critical temperature but can also lead to a precise evaluation of the critical temperature.

Results for the anti-ferromagnetic Ising model
Having demonstrated the use of the autoencoder on the 2D-Ising ferromagnetic model, we turn now to the antiferromagnetic where we simply test the application of the network on the produced configurations. We investigate how the latent variable z i conf behaves as a function of the temperature T for each configuration. We generate 6000 configurations, namely 200 configurations for every single temperature. The configurations are for 30 different values of temperatures within the range T = 1 − 4.5.
Once more, we make sure that we cover the whole range of temperatures between the two extreme cases of the  2)). The color on the gradient illustrator on the right denotes the temperature T .
anti-ferromagnetic Ising behaviour, the nearly "frozen" at T 1 and completely disordered at T 4.5. We choose the temperature grid to be dense close to the theoretical, critical temperature. Our assumption is that since the ferromagnetic is connected to the anti-ferromagnetic via a bijective map between the spin fields, the autoencoder should be able to "notice" the phase transition. Hence, we tempted to check whether one can approximately detect the critical temperature using the latent variable on a small number of configurations.
For the case of the anti-ferromagnetic Ising model we restrict the analysis to lattice volumes of L = 50, 100 and 150. After training the autoencoder on configurations of the anti-ferromagnetic 2D-Ising model the reconstruction error was found to range from MSE of 0.68, MSE of 0.69 and MSE of 0.705 for L = 50, 100 and 150 respectively. In Figure 13 we demonstrate the latent variable for each different configuration, as a function of the temperature T , for three different lattice sizes, L = 50, 100 and 150. The behaviour of the latent variable z i conf in these plots resembles the features of the latent variable z i conf for the ferromagnetic 2D-Ising model presented in Figure 3.
It is markedly observed upon representing the latent variable as a function of temperature that the autoencoder "notices" the two phases also for the case of the anti-ferromagnetic 2D-Ising model and provides a good approximation of the critical temperature. This reflects the spontaneously broken Z 2 ≡ {−1, 1} symmetry group for the bijectively mapped sub-lattices as well as the spontaneously broken translation symmetry for the antiferromagnetic model. In Figure 13 the plot for L = 150 indicates that the transition occurs right at the critical temperature. One can fit the points of the latent variable which behave linearly to a constant as a function of the temperature and can restrict that the collapse of the two states located at 1 and −1 occurs at T = 2.288 (21). This is in good agreement with the theoretical prediction.
On the left-hand-side of Figure 14 we present the staggered magnetization as a function of the temperature while on the right-hand side we provide the absolute latent variable for the anti-ferromagnetic model. As for the case of the ferromagnetic model, the absolute latent variable looks similar to the staggered magnetization, albeit becoming steeper as the lattice size increases. This demonstrates that the absolute latent variable and as a consequence the latent susceptibility can enable an extraction of the critical temperature with smaller scaling effects compare to the staggered magnetization and susceptibility respectively. Further analysis of the anti-ferromagnetic Ising model will be presented in [52].

Conclusions and outlook
In this work, we apply a deep learning auto-encoder on configurations produced for the 2D (anti)ferromagnetic Ising model for performing classification in an unsupervised manner. Hence, with no prior knowledge on the system, we demonstrate that we can predict the phase structure of this system qualitatively as well as quantitatively by determining both phase regions and the critical temperature.
For the ferromagnetic model, at low temperatures, by making use of the latent variable per configuration, the autoencoder predicts two states reflecting to the broken Z 2 symmetry. As the temperature increases, these two states appear to collapse at one state, located around zero, and the underlying symmetry is restored. This behaviour becomes more distinct as the volume of the lattice increases, and the point where the two states collapse is getting more and more local; this corresponds to the critical point of the phase transition.
One can define the average absolute latent variablẽ z that displays partially the characteristics of an order parameter; namely, it can identify the phase but cannot capture the order of the phase. Although it resembles the behaviour of the magnetization, it becomes steeper as the size of the volume increases, tending to a step function. The second moment of the absolute latent variable defines a susceptibility, named latent susceptibility, the peak of which can determine the critical temperature T c (L). By extrapolating the values of T c (L) to L → ∞ for the sequence of lattice sizes L = 25, 35, 50, 100, 150, we obtain for T c (L = ∞) = 2.266(4) in agreement with the exact value of T c = 2.26918 calculated analytically. This suggests that the proposed deep learning (fully-connected) autoencoder can identify, in an unsupervised manner, the phase structure of the 2D ferromagnetic Ising model but can also lead to a precise extraction of the critical temperature at the limit of the infinite volume. As shown in Figure 12 the values of T c (L) suffer with less finite size effects compared to those usually extracted by using the peak of the magnetic susceptibility, and one would thus expect that the autoencoder could give a more precise prediction for T c . Of course to test this hypothesis we need to extract T c (L) for larger volumes, for instance up to L = 1024 similarly to reference [5], and obtain the extrapolated value of T c (L = ∞). This requires the usage of a different autoencoder with more layers since memory limitations make the current autoencoder insufficient to work. This is a future extension of this work.
Applying the deep learning auto-encoder on configurations produced for the 2D anti-ferromagnetic Ising model, we observe that the results resemble to an adequate extent those for the ferromagnetic model. Namely, by using the latent variable per configuration, the autoencoder predicts two states reflecting to the broken Z 2 symmetry of the two bijectively connected sub-lattices as well as the broken translation symmetry. Once more, this behaviour becomes more distinct as we increase the lattice size with the collapse of the two states becoming steeper. This special point corresponds to the critical point of the phase transition. In the same manner as for the ferromagnetic case, we can make use of the average absolute latent variable which behaves similarly to the average stout magnetization, albeit becoming steeper with the lattice size.
This work provides a good indication that, with the right choice of parameters, deep learning autoencoders can be used as tools to define new quantities which are affected less by finite scaling effects and lead to more precise evaluation of observables related to the phase structure of statistical models. This could be proven beneficial for theories in which the production of thermalized uncorrelated configurations requires a large number of computational resources.
There are other several related directions in which this work can be extended. Since our proposed autoencoder has been tested only on the 2D-Ising model, it would be important to investigate its generalization to other physical systems with non-trivial phase structure. An important question, which could be answered is whether this neural network is capable of identifying the phases for cases in which an order parameter is either not known or not existing; such an example is the Hubbard model [44] describing the transition between conducting and insulating systems. Another relevant question is how this particular autoencoder behaves in cases where the phase transition is of a different order, or an infinite order such as in the 2D XY 3 spin model where the relevant phase transition is the Kosterlitz-Thouless which is of infinite order [45]. Finally, our future plans involve the testing of the autoencoder as a tool for the unsupervised extraction of the phase structure of physical systems with continuous symmetries. These involve quantum field theories formulated on the lattice such as the 3D φ 4 with O(2) symmetry [46] where the phase transition is of second order and belongs to the same universality class as the 2D-Ising model, the 3D U (1) gauge theory [50] for which the phase transition is of infinite order and belongs to the same universality class as the 2D XY model, as well as the 3D SU (N ) gauge theory [51] which has a second-order phase transition for N ≤ 3, a weakly first order for N = 4 and the first order for N ≥ 5.
Open Access funding provided by Università di Pisa within the CRUICARE Agreement. We would like to thank Giannis Koutsou, Nikos Savva, Spyros Sotiriadis, Mike Teper and Savvas Zafeiropoulos for fruitful discussions. We would also like to express our gratitude to Barak Bringoltz, Biagio Lucini and Davide Vadacchino for performing a critical reading of the manuscript. Author contribution statement C. Alexandrou has contributed to the critical analysis of the results and provided the computational resources on which the results have been produced. A. Athenodorou has contributed in the analysis of the results as well as in writing the manuscript. C. Chrysostomou designed and trained the deep learning autoencoder. S. Paul has produced the 2D-Ising model configurations, contributed in the analysis and written parts of the manuscript.
Publisher's Note The EPJ Publishers remain neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.