1 Introduction

Since the seminal work of Landauer [1], the question regarding the energetic cost of irreversible logical operations, e.g., the erasure of a bit of memory, has become a classic in the thermodynamics of computation [2].

According to Landauer, in order to erase a bit of information stored in a register being immersed in an environment at temperature T, an amount of energy of at least the order of kT must be spent [1]. More precisely, thermodynamic arguments suggest that at least \(kT \ln 2\) of work must be invested:

$$\begin{aligned} W \ge k T\ln 2 \, . \end{aligned}$$
(1)

Here we address the issue of the energetic cost of information erasure by focussing on the implementation of the RESET TO ONE process in an Ising ferromagnet. Using the Ising model, featuring first and second order phase transition, to investigate the link between thermodynamics and information theory, was first proposed in the work of Ref. [3], which clarified the thermodynamics of the Szilard engine, and illuminated the role that spontaneous symmetry breaking (SSB) plays in the process of information erasure. See also Ref. [4] for recent theoretical and experimental developments of that idea.

At variance with the work of Ref. [3] that was carried at a time when work fluctuations were not yet in the limelight of non-equilibrium thermodynamic research, here we will look at Landauer’s erasure principle Eq. (1) through the lens of the Jarzynski equality [5], which, for a cyclical process, such as the RESET TO ONE, reads:

$$\begin{aligned} \langle {e^{-\beta W}}\rangle =1\, . \end{aligned}$$
(2)

Notoriously, this combined with Jensen’s inequality, implies the bound

$$\begin{aligned} \langle W \rangle \ge 0\, , \end{aligned}$$
(3)

which is universally recognised to express the second law of thermodynamics in the formulation of Kelvin [5,6,7]. That is, one cannot extract energy with any generic cyclic transformation from a system in contact with a single thermal bath. Note that, within this picture the lower bound, 0, is achieved in the quasi-static limit. The fact that in the RESET TO ONE process the lower bound is not simply that dictated by the second law, Eq. (3), but is lifted by an amount \(kT \ln 2\), suggests that there is an extra source of dissipation, which is associated with information erasure, that cannot be eliminated by decreasing the speed of the process. This observation adheres perfectly to Landauer’s view that “Computing, like all processes proceeding at a finite rate, must involve dissipation. [...] however [...] there is a minimum heat generation, independent on the rate of the process” [1]. One might wonder, then, what is the mechanism that lifts the bound from zero to \(kT \ln 2\) when information is erased. Arguably, that is related to the absence of a quasi-static limit, a situation that typically occurs when some time-scale diverges and ergodicity breaks, e.g., as a consequence of spontaneous symmetry breaking [8], as Ref. [3] and the recent experiment of Ref. [9] suggest.

With this work we investigate this insight further and put forward the idea, corroborated by extensive numerical simulations, that the Landauer bound (i.e. \(kT \ln 2\), Eq. (1)) literally emerges from the underlying second law bound (i.e., 0, Eq. (3)), as a consequence of a mechanism of spontaneous breaking of the symmetry that is at the basis of Jarzynski equality, namely the fluctuation symmetry, a.k.a., the work fluctuation relation [7, 10, 11].

2 The Conjecture

The issue of the emergence of Landauer’s bound, Eq. (1), from the underlying second law bound, Eq. (3), is similar to a classic issue in statistical physics, namely how a net magnetisation \(\langle M_z \rangle \) is possible in an Ising ferromagnet at thermal equilibrium when its free energy is invariant under the reversal of the magnetic field [12]. It is now clear that while for any finite sample size the net magnetisation is null at zero applied external field, it presents a discontinuity at zero field in the thermodynamic limit. Namely the limits of vanishing field and infinite size of the sample do not commute:

$$\begin{aligned} 0 = \lim _{N\rightarrow \infty }\lim _{h\rightarrow 0^\pm } \langle M_z \rangle \ne \lim _{h\rightarrow 0^\pm }\lim _{N\rightarrow \infty } \langle M_z \rangle = \pm M_0. \end{aligned}$$
(4)

This is the mechanism at the basis of spontaneous symmetry breaking.

A similar phenomenon is at the basis of the process of erasure. Let us take a close look at how erasure is realised in practice in a uni-axial Ising ferromagnet. Let’s say positive macroscopic magnetisation encodes the ONE state and negative macroscopic magnetisation encodes the ZERO state. Below the Curie temperature, the RESET TO ONE operation is implemented in practice by simply switching on an external magnetic field along the positive magnetisation axis, and then switching it off, thus implementing a cyclical and time-symmetric protocol. According to this procedure, regardless of the initial magnetisation of our sample, it will in the overwhelming majority of repetitions of the protocol end up in the ONE state. The crucial point to notice is that, in the thermodynamic limit, it will always end up there, giving origin to a perfect erasure of information [12].

When that happens, the Jarzynski equality however would break. To understand that, let us recall a well known fact that often makes the applicability of Jarzynski equality a difficult task: The statistical average of the exponential \(e^{-\beta W}\) is dominated by rare events for which \(W<0\). In fact, as established by Jarzynski [13], the number of realisations \({\mathcal {N}}\) necessary to efficiently sample such dominant rare events goes, for a cyclic and time-symmetric process, like \(e^{\beta \langle {W}\rangle }\). So in case of \(\langle {W}\rangle \) scaling, e.g., linearly with some parameter N, the number of realisations that need to be sampled, scales exponentially in N, \({\mathcal {N}} \propto e^{\beta w N}\), where w is a constant prefactor. In the \(N\rightarrow \infty \) limit, and at finite \(\beta \), no matter how large is your statistical sample, you are going to miss the dominant rare events. As a result, the Jarzynski equality undergoes a spontaneous symmetry breaking. In formulae:

$$\begin{aligned} 1= \lim _{N\rightarrow \infty } \lim _{{\mathcal {N}}\rightarrow \infty } \langle e^{-\beta W} \rangle _{N, {\mathcal {N}}} \ne \lim _{{\mathcal {N}} \rightarrow \infty } \lim _{N\rightarrow \infty } \langle e^{-\beta W} \rangle _{N, {\mathcal {N}}} = \gamma \end{aligned}$$
(5)

with some \(\gamma \le 1\), where \(\left\langle \cdot \right\rangle _{N,{\mathcal {N}}}\) denotes the empirical average obtained from a finite sample of size \({\mathcal {N}}\) for a system of size N. That is: \(\langle e^{-\beta W} \rangle _{N, {\mathcal {N}}}=(1/{\mathcal {N}})\sum _{i=1}^{{\mathcal {N}}} e^{-\beta W_i}\) with \(W_i\) denoting the value of work of item i in the sampleFootnote 1. Equation (5) has to be understood in the following way. If you have an ideally infinitely large statistical sample, \({\mathcal {N}} = \infty \), no matter how large is N you will observe \(\langle e^{-\beta W} \rangle =1\); this is the first equality in Eq. (5). If you have a finite statistical sample (no matter how large), but \(N=\infty \), then you will observe \(\langle e^{-\beta W} \rangle =\gamma \); this is the second equality in Eq. (5). In this latter case, we say that the fluctuation symmetry (the Tasaki–Crooks relation [10, 11]) which is at the basis of the Jarzynski equality undergoes a spontaneous breaking. As will be detailed below, for an Ising ferromagnet below the critical temperature, the parameter N that drives the symmetry breaking is the size N of the system, but it can be some other quantity in different physical implementations of a memory.

According to the fluctuation symmetry, each dynamical trajectory y has a time-reversal conjugate \(\tilde{y}\), and their respective probabilities are linked (for cyclic and time-reversal symmetric forcing protocols) by the relation

$$\begin{aligned} \frac{p(y )}{p(\tilde{y})}= e^{\beta W(y)} \end{aligned}$$
(6)

with W(y) the work associated to the trajectory y. For an erasure process, the conjugates of trajectories \(\tilde{y}\) associated to large (order N or larger) work W, have some finite probability at finite size, but are de facto excluded from the statistics (have zero probability) in the \(N \rightarrow \infty \) limit. The phenomenon whereby some trajectories do not have their mirror image companion is referred to in the literature as “absolute irreversibility” [14, 15].

Using Jensen’s inequality with Eq. (5) one gets two distinct bounds to the work:

$$\begin{aligned} \lim _{N\rightarrow \infty } \lim _{{\mathcal {N}}\rightarrow \infty } \langle W \rangle _{N, {\mathcal {N}}}&\ge 0, \end{aligned}$$
(7)
$$\begin{aligned} \lim _{{\mathcal {N}} \rightarrow \infty } \lim _{N\rightarrow \infty } \langle W \rangle _{N, {\mathcal {N}}}&\ge - k T\ln \gamma \ge 0. \end{aligned}$$
(8)

The top line expresses the second law, the bottom line expresses Landauer’s principle. The former has a fundamental status and universal validity, in fact, since \(\gamma \le 1\) it is always true that \(\langle W \rangle \ge 0\). We remark that there is actually no issue in regard to commutation of limits for the quantity \( \langle W \rangle _{N, {\mathcal {N}}} \), and in fact the two equations above can be conveniently condensed into a single inequality

$$\begin{aligned} \langle W \rangle&\ge - kT \ln \gamma , \end{aligned}$$
(9)

where it is understood that \(\gamma =1\) in absence of information erasure and SSB of the fluctuation symmetry, and gets a lower value \(0<\gamma <1\) instead.

This situation is the non-equilibrium analogue of the observed spontaneous magnetisation in an Ising ferromagnet at equilibrium. For the latter, the time scale associated to a sign reversal goes like

$$\begin{aligned} \tau _0 \sim e^{N^{(d-1)/d}}, \end{aligned}$$
(10)

with d the spatial dimensions (see Ref. [12], Sect. 2.10). So, practically, for \(d>1\) and sufficiently large systems, one is never going to see a reversal, and conclude on empirical basis, that on average, the magnetisation is not null. Similarly, in the erasure process, one can never see the rare events and conclude on empirical ground that the second law bound should be lifted from zero to \(-kT \ln \gamma \).

For a memory featuring two symmetric states, \(\gamma \) would take on the value 1/2, and the second line in Eq. (8) would boil down to the Landauer principle in the form of Eq. (1). To see that, consider the case of a uni-axial Ising ferromagnet. The configuration space can be split in two sectors \({\mathcal {S}}_{\pm }\) pertaining to positive or negative magnetisation, respectively. Hence there are 4 main families of events that can take place during the cyclic process of switching the filed on and off: a) \({\mathcal {S}}_+ \rightarrow {\mathcal {S}}_+\), b) \({\mathcal {S}}_+ \rightarrow {\mathcal {S}}_-\), c) \({\mathcal {S}}_- \rightarrow {\mathcal {S}}_+\), d) \({\mathcal {S}}_- \rightarrow {\mathcal {S}}_-\). Let us consider the case in which we are below the critical temperature. Events of the type \({\mathcal {S}}_+ \rightarrow {\mathcal {S}}_+\), apart from fluctuations, are such that the system goes along a branch of the hysteresis loop and then traces it back. These are reversible processes, with ideally small dissipation. They are very frequent processes, occurring almost every time the system already starts in \({\mathcal {S}}_+\), hence with a probability \( \lesssim 1/2\). Events of the type \({\mathcal {S}}_+ \rightarrow {\mathcal {S}}_-\), roughly correspond to traversing a hysteresis branch in opposite direction, which are associated with an extensive gain in energy, i.e., a large negative dissipation. They are extremely rare, and their relative frequency is expected to vanish in the thermodynamic limit. Events of the type \({\mathcal {S}}_- \rightarrow {\mathcal {S}}_+\) correspond to traversing half of a hysteresis loop, they are very typical and frequent, and are associated to a positive dissipation. They occur almost every time the system starts in \({\mathcal {S}}_-\), hence with a probability \(\lesssim 1/2\). Events of the type \({\mathcal {S}}_- \rightarrow {\mathcal {S}}_-\) correspond to very small dissipation, and are very infrequent. Thus, roughly, one might expect tri-modal work PDF as the one sketched in Fig. 1, with a peak centered around a value \(W\gtrsim 0\), one peak around a positive extensive value \(W\simeq N\omega _0\), and a peak at \(W\simeq - N\omega _0\). The central and right peak contain basically all the probability, and approximately share it equally, that is they both enclose an area of about 1/2. Due to Crooks relation, the left peak is exponentially smaller than the right one. Now when calculating \(\langle e^{-\beta W} \rangle \), the left peak counts exponentially more that the right peak. The result is that, since the events in the left peak are so rare you are missing them all, and the events in the right peak do not count, only the events in the central peak count in practice. The integral \(\langle e^{-\beta W} \rangle \), in real experiments on a large but finite sample, amounts then to the probability of central peak only which is roughly 1/2, hence \(\gamma \simeq 1/2\). Above the critical temperature it is expected that only the central peak is there, or other peaks exist but are not too far separated by each other. Accordingly, there is no missing statistics and one should get \(\langle e^{-\beta W} \rangle =1\).

We stress that the above argument on the work PDF is not meant to be mathematically rigorous, but only to illustrate the main idea behind the process of symmetry breaking. In the following we report the results of numerical simulations on the 2D Ising model, display the actual form of the work PDF (see Fig. 2b) and give a precise estimate of how hard it is to sample the rare events from our statistics.

Fig. 1
figure 1

Mechanism of spontaneous breaking of the fluctuation symmetry. Top: Sketch of a work pdf p(W). Bottom: Sketch of the time-reversed companion of p(W), namely \(\tilde{p}(-W)=p(W)e^{-\beta W}\). The exponential average \(\langle e^{-\beta W} \rangle =\int dW e^{-\beta W}p(W)\) is contributed about half and half by the central and left peaks. As we argue, in a real experiment, the left peak is missing due to lack of statistics. Accordingly the empirical value is contributed by the central peak only, \(\langle e^{-\beta W} \rangle \simeq 1/2\), while the true ideal value is \(\langle e^{-\beta W} \rangle = 1\). The colours refer to \({\mathcal {S}}_+ \rightarrow {\mathcal {S}}_-\) transitions (orange), \({\mathcal {S}}_- \rightarrow {\mathcal {S}}_+\) transitions (green) and no-transitions (blue) (Color figure online)

It is important to remark that, albeit in the case of the Ising ferromagnet the size N of the system is the parameter that drives the fluctuation-symmetry breaking, the fluctuation symmetry breakdown may be driven by other parameters depending on the specific physical scenario considered for erasure. The thermodynamic limit is not essential for our analysis. For example, with reference to the erasure of information stored in a single particle in a double-well potential studied e.g., in Refs. [16,17,18,19,20], the symmetry-breaking parameter would be the height of the energy barrier E separating the two logical states.Footnote 2 The difficulties in sampling rare trajectories in an erasure process and so verify the Jarzynski equality in that case were in fact well illustrated by the experiments reported in Refs. [18, 21], we shall comment further on this in Sect. 4.1.

3 The Numerics

In the following we corroborate our argument with numerical experiments. To this end we simulated the dynamics of a 2D Ising ferromagnet on a square lattice of side length L, evolving according to Glauber dynamics [22]. The Hamiltonian reads:

$$\begin{aligned} H(t) = h(t) \sum _i \sigma _i^z - J \sum _{\langle i,j \rangle } \sigma _i^z \sigma _j^z. \end{aligned}$$
(11)

In our simulations the initial state of the system is randomly sampled from a Gibbs thermal equilibrium at a temperature T and the RESET TO ONE protocol is performed by linearly ramping up h(t) to some positive value \(h_0\) in a time \(\tau /2\) and linearly ramping down to zero in the same time \(\tau /2\), so that our protocol is cyclic and time-symmetric. Each randomly sampled initial state of the system is evolved according to Glauber dynamics with temperature T, so as to produce a trajectory \(y= \{S_i(t)\}_{i=1\dots N}^{t\in [0,\tau ]}\), with \(N=L^2\). The dynamics of the aforementioned model was generated using a Markov Chain Monte–Carlo approach which we parallelized using the python package numba [23] to increase our sampling capabilities (the code we used to run our numerical experiments is publicly available and can be found at https://github.com/Buffoni/landauer_parallel.)

For each generated trajectory y we record the net magnetisation \(M(t)=\sum _i S_i(t)\) at each time, during the evolution and use the formula

$$\begin{aligned} W(y) = \int dt \dot{{h}}(t) {M}(t) \end{aligned}$$
(12)

to calculate the according work. By repeating the erasure protocol \({\mathcal {N}}\) times we construct the statistical ensemble \(\mathcal P_{N,{\mathcal {N}}}(y)\), over which we evaluate average values of trajectory dependent quantities O(y):

$$\begin{aligned} \langle O(y)\rangle _{N,{\mathcal {N}}} = \sum _y O(y) \mathcal P_{N,{\mathcal {N}}}(y). \end{aligned}$$
(13)
Fig. 2
figure 2

Panel a: Average exponentiated work \(\langle e^{- \beta W(y)}\rangle _{N, {\mathcal {N}}}\) (solid lines) as a function of \({\mathcal {N}}\) for a reset protocol on a square lattice of different sizes N. The top dashed line represent the value 1 while the bottom dashed line the value 1/2. Panel b: work distributions \(\mathcal P_{N,{\mathcal {N}}}(W)\) at different sizes N for \({\mathcal {N}} =10^6\), \(h_0=1 ,\tau = 200\), \(T=1.5\)

In Fig. 2, we report the quantity \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\) as a function of \({\mathcal {N}}\) for a reset protocol on a square lattice of sizes \(N=[4,9,16,64]\), here \(h_0= 1\) and \(\tau = 200\). Note how for the \(N=4,9\) spin system, the asymptotic value 1 is quickly approached as the sample size is increased while picture changes drastically for \(N=16\), i.e., by increasing the side of the lattice of a single unit. Now the average of \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\) remains far below the value 1, and close to 1/2 for the same values of \({\mathcal {N}}\). Note how the running average slowly increases with \({\mathcal {N}}\). This reflects the fact that as the sample is increased, more rare events were sampled. These are well visible as jumps in the plotted curve. Their relative number remained however greatly insufficient to guarantee convergence to the ideal value 1. This could be achieved only with sample sizes of orders of magnitude larger than the maximal value \({\mathcal {N}}= 10^6\) employed here. In the case \(N=64\) the curve \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\) does not significantly depart from the value 1/2. In the thermodynamic limit, there will be no departure at all. This rich phenomenology is compactly summarised in Eq. (5).

Fig. 3
figure 3

Average exponentiated work \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\) (blue dots) and magnetisation density \(\langle M\rangle _{N,{\mathcal {N}}}/N\) (orange crosses), as a function of system size N for a sample of size \({\mathcal {N}}=10^6\) realisations of the RESET TO ONE protocol with \(h_0=1, \tau =200 \), \(T=1.5\) (Color figure online)

The right panels of Fig. 2 report the plots of the work statistics \(\mathcal P_{N,{\mathcal {N}}}(W)\) for \(N=[4,9,16,64]\) and \({\mathcal {N}}=10^6\). Note that besides a central and a right peak, a negative tail of the distribution is visible only in the \(N=4\). For \(N\ge 9\), the histograms still display a central peak and a right peak, which moves farther away from the origin of the W, axis, while no negative tail is visible: the sampling of rare events was greatly insufficient in these cases.

Fig. 4
figure 4

Panel a: average work \(\langle W \rangle _{N,{\mathcal {N}}}\) as a function of N for a fixed sample \({\mathcal {N}}\) and various temperatures, in log-log scale. Panel b: the probability \(e^{-\beta \langle W \rangle _{N,{\mathcal {N}}}}\) of rare events, as a function of temperature T, for various lattice sizes N at fixed \(\mathcal N =10^4\), \(h_0=1 , \tau =200\). The dashed line represents probability one for comparison

Figure 3 shows, for a sample of size \({\mathcal {N}} = 10^6\), both \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\) and the magnetisation density \(\langle M\rangle _{N,{\mathcal {N}}}/N\) as a function of system size N, with N up to 256, at \(T = 1.5<T_C \) (we recall that the critical temperature is \(T_C\sim 2.27\)). Note how, as the system size increases, the erasure protocol becomes more and more successful, i.e., the quantity \(\langle M\rangle _{N,{\mathcal {N}}}/N\) approaches the value 1, while, accordingly, the observed value of \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\) approaches 1/2.

Figure 4, left panel, shows plots of the average work \(\langle W \rangle _{N,{\mathcal {N}}}\), as a function of N, for fixed \({\mathcal {N}} = 10^4\) and various temperatures. All plots, in log-log scale evidence an approximately linear increase with N, \(\langle W \rangle _{N,{\mathcal {N}}} \simeq \omega _0 N\) with coefficients \(\omega _0\) that quickly vanish as the temperature is lowered below the Curie temperature \(T_C \simeq 2.27\). Figure 4, left panel, shows plots of the quantity \(\mathcal P \propto e^{-\beta \langle W \rangle _{N,{\mathcal {N}}}}\), which is a rough estimate of the probability of observing a rare fluctuation [13]. Note how far above the Curie temperature \(T_C\sim 2.27 \), the fluctuation probability is always close to one, while as the temperature is decreased well below the Curie point, the decrease with N becomes drastic. As discussed above the aftermath of this fact is that an exponentially large sample is needed in order to effectively sample the rare events, which, accordingly, get completely excluded from the statistics in the thermodynamic limit.

Figure 5, left panel, shows \(\langle W \rangle _{N,{\mathcal {N}}}/ k_B T \ln 2\) as a function of the duration \( \tau \) of the erasure protocol, for various system sizes \(N=[36,49,64]\) and fixed \({\mathcal {N}} = 10^4 \), \(h_0=1\) and \(T=1.5\). We note that, as expected, the average work decreases with increasing duration. Figure 5, right panel, shows the according average exponentiated work \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\). The plots evidence that in the limit of large \(\tau \) the quantity departs from the value 1/2 and approaches 1, regardless of N. However, the departure from 1/2, occurs at a take off time that grows with N. This can be translated by saying that in the large N limit, the quantity \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\) remains at 1/2, ragardless of the value of \(\tau \). This suggests that the quantity \(\langle W \rangle _{N,{\mathcal {N}}}\) would tend to \(kT\ln 2\) in the limit of large N, followed by the large \(\tau \) limit, and finally the large \({\mathcal {N}}\) limit.

Fig. 5
figure 5

Panel a: average work \(\langle W \rangle _{N,{\mathcal {N}}}/ k_B T \ln 2\) as a function of the total protocol time \(\tau \) (on a log scale) and for different sizes N. The dashed line indicates the Landauer bound \(\langle W \rangle _{N,{\mathcal {N}}}/ k_B T \ln 2 = 1\) . Panel b: average exponentiated work \(\langle e^{- \beta W(y)}\rangle _{N,{\mathcal {N}}}\) as a function of protocol duration \(\tau \) (on a log scale) for various system sizes N. The two dashed lines indicate the values 1 and 1/2. For both panels we have \({\mathcal {N}}=10^4\), \(h_0=1\), \(T=1.5\)

4 Discussion

4.1 Relation to Previous Works

The results presented above are closely related to those obtained in various previous works, reporting on experiments of erasure of information carried in double well systems [9, 18, 20, 21]. Specifically, in the work of Ref. [21], thanks to a coarse grained version of the fluctuation relation [20, 24], the averaged exponentiated work is expressed in terms of the probability, \(P_S\), of success of the reset to zero process as:

$$\begin{aligned} \langle e^{-\beta W}\rangle = P_S \langle e^{-\beta W}\rangle _{\rightarrow 0} + (1-P_S)\langle e^{-\beta W}\rangle _{\rightarrow 1} =1 \quad (0< P_S <1) \end{aligned}$$
(14)

where and the symbol \(\langle \cdot \rangle _{\rightarrow X}\) denotes the average conditioned on the outcome of the process being the logical state X, with \(X=0,1\). Note that, since Eq. (14) holds for any \(0<P_S<1\), it holds as well in the limit \(P_S\rightarrow 1\).

The authors of [21] also observe that, in the limit \(P_S\rightarrow 1\), it is \( \langle e^{-\beta W}\rangle _{\rightarrow 0} = 1/2. \) Noting that, by definition, at \(P_S=1 \), the conditional average coincides with the full average, namely \(\langle \cdot \rangle = \langle \cdot \rangle _{\rightarrow 0}\), one gets,

$$\begin{aligned} \langle e^{-\beta W}\rangle = 1/2 \quad (P_S=1). \end{aligned}$$
(15)

That is, the limit of perfect erasure is a singular limit, a fact that was not noted in Ref. [21]. Here we have spelled this fact out in a most explicit manner, and expressed it in terms of lack of commutation of limits. To make the connection more clear, perfect erasure occurs in the particle in the double well case when the height of the energy barrier that separates the two logical states diverges. The height of the barrier is accordingly the parameter that drives the fluctuation symmetry breaking in that case. If this limit is taken first, that is, if one is exactly at \(P_S=1\), Eq. (15) applies no matter how large is the statistical sample one is using. Notably the experimental data collected in Ref. [21] adhere to the prediction of Eq. (15), rather than Eq. (14) in agreement and analogy with our discussion above of the 2D Ising ferromagnet. We remark that, as the authors of Ref. [21] first noted, it is Eq. (15) that allows to obtain the Landauer bound, rather than the ideal Jarzynski equality (14), via application of Jensen’s inequality.Footnote 3 Based on this observation, one might argue that the Landauer bound is a consequence of the breakdown of the Jarzynski equality, rather than a consequence of its validity.

4.2 Landauer Principle is not the Second Law of Thermodynamics

It is important to remark that the Landauer principle may also be obtained from an inequality first derived in Ref. [25], (see also Refs. [24, 26,27,28]) reading, for a system weakly coupled to a thermal bath and initially at thermal equilibrium:

$$\begin{aligned} W - \Delta F \ge T D[\rho _t \vert \vert \rho _t^\text {eq}] \end{aligned}$$
(16)

where \(\Delta F= F(t)-F(0)\), with \(F(t) = -\beta ^{-1}\text{ Tr }e^{-\beta H(t)}\) denoting the free energy, \(\rho _t\) is the distribution describing the state of the system at time t, \(\rho _t^\text {eq} = e^{-\beta [H(t)-F(t)]}\) is the reference equilibrium distribution at time t, and the symbol \(D[\rho \vert \vert \sigma ]=\text{ Tr }\rho (\ln \rho -\ln \sigma )\) denotes the relative entropy (Kullback–Leibler divergence).Footnote 4 Note that, since both T and the relative entropy are non-negative quantities, Eq. (16) implies the second law of thermodynamics:

$$\begin{aligned} W \ge \Delta F\, . \end{aligned}$$
(17)

Accordingly, Eq. (16) contains more information than that contained in the second law, Eq. (17).

As noted in Ref. [28], introducing the non-equilibrium functional

$$\begin{aligned} f[\rho _t] = E[\rho _t] - T s[\rho _t] \end{aligned}$$
(18)

where \(s[\rho _t]= -\text{ Tr }\rho _t \ln \rho _t\) is the Von-Neumann information, \(E[\rho _t]=\text{ Tr }H(t) \rho _t\) is the energy expectation, and using the identity

$$\begin{aligned} f[\rho _t]-F(t) = T D[\rho _t \vert \vert \rho _t^\text {eq}]\, , \end{aligned}$$
(19)

Eq. (16) may equivalently be rewritten as \( W - F(t)+F(0) \ge f[\rho _t] - F(t)\), or, for an initial equilibrium state, such that \(f[\rho _0]=F(0)\), as:

$$\begin{aligned} W \ge \Delta f . \end{aligned}$$
(20)

Despite the appearance, the latter is not the second law of thermodynamics, because it presents the non-equilibrium quantity f instead of the equilibrium thermodynamic potential F. For this reason, in Ref. [28] Eq. (20) is referred to as the “non-equilibrium second law”, as opposed to the “second law”, Eq. (17).Footnote 5 Note that, since (for an initial state of thermal equilibrium) it is \(\Delta f \ge \Delta F\), then Eq. (20) presents a stricter bound than Eq. (17).

In the limit of perfect erasure where the initial equilibrium statistics \(\rho _0=\rho _0^\text {eq}\) gets finally squeezed into half of its initial support and achieves the according state of local thermal equilibrium, \(\rho ^{\text {eq},X}\), it is \(\Delta f = kT \ln 2\), and Eq. (20) reduces to Landauer’s principle, Eq. (1) [29]. Note that the equilibrium free energy change is null in that case, \(\Delta F =0\), while the second law, Eq. (17) would read as Eq. (3).

This last observation illustrates the crucial point that Eq. (20) is generally not equivalent to the second law of thermodynamics, Eq. (17). Specifically, the Landauer principle, in contrast to a widespread opinion, is not the second law of thermodynamics nor is it equivalent to it, in fact it is a stricter bound. Clearly distinguishing the thermodynamic potentials (free energy F and entropy S) from their non-equilibrium functional generalisations (f and s respectively), and the inequalities that characterise them,Footnote 6 is a good practice that in the present case helps making sense of Landauer intuition that there is an extra unavoidable dissipation cost associated with information erasure. A similar warning regarding equilibrium vs. nonequilibrium quantities was raised as well in Ref. [32].

4.3 SSB Adapted Version of the Jarzynski Equality

We have illustrated above that the Jarzynski equality breaks in presence of spontaneous breaking of the fluctuation symmetry. However there is a natural and suggestive way to extend it to include such cases. As discussed in the standard literature, e.g., [12], in presence of SSB, the Gibbs distribution becomes inappropriate to describe the observed physics. However, just like the local thermal equilibrium \(\rho ^{\text {eq},X}\) becomes the appropriate physical distribution, so the SSB adapted Jarzynski equality

$$\begin{aligned} \langle e^{-\beta W}\rangle = e^{-\beta [F_X(t)-F(0)]} \end{aligned}$$
(21)

becomes physically appropriate, as our analysis showed. Here, \(F=f[\rho ^\text {eq}]\) denotes the standard “global” free energy, and \(F_X=f[\rho ^{\text {eq},X}]\) the “local” free energy associated to subspace X. In absence of SSB, it is \(F_X=F\) and the standard formula is recovered. It is interesting to note how, in fact, Eq. (21) has been already employed to study Landauer’s erasure [33], while the fundamental reason behind its validity was not clarified.