1 Introduction

The last decades have seen an increased interest in establishing a thermodynamic understanding of nonequilibrium processes occurring in small systems. However, unlike macroscopic systems to which thermodynamics apply, small systems are characterised by relatively large fluctuations and time scales that are comparable to their relaxation times, rendering a thermodynamic description inappropriate.

A great advance in our understanding of the thermodynamics of small systems has been made in the last decades due mainly to two developments [13]: on the one hand, the discovery of the nonequilibrium fluctuation relations [11], relating the statistics of molecular fluctuations with the microscopic symmetries of the dynamics of small systems far from equilibrium, and ultimately giving theoretical support for the validity of the second law of thermodynamics [10, 12, 14, 18, 23], and on the other hand, the development of stochastic thermodynamics that derives a thermodynamic framework valid for the single fluctuating trajectories [21, 27,28,29]. These results have been successfully applied to a broad range of disciplines, including physics [9], chemistry [25], biological systems [27], active matter [30], and computer processing [33], among a host of many others.

One such nonequilibrium process is life itself, maintained at the molecular level by molecular motors determining the finely tuned kinetics of the cell. Molecular motors are responsible of essential life processes such as vesicle transport or cell division [15, 26], and the synthesis of ATP [31]. Notwithstanding the strong fluctuations these molecular complexes display, it is of central importance to understand the efficiency molecular motors operate.

Another instance in which efficiency plays a central role is that of processing in modern computers. On the one hand, every computation involves an energetic thermodynamic cost. On the other hand, due to the miniaturisation of computer processors, modern transistors operate at scales at which stochastic thermodynamics applies. However, scaling of computer transistors presents nowadays technological challenges, mostly related to the increase in dissipation, thus limiting the scaling up in the number of transistors and therefore, the possible optimisation of the speed of computation [22]. This calls for the development of highly efficient computing processes that operate at minimal dissipation.

A fundamental process for computers is that of memory erasure. The minimal amount of energy required to erase a bit of memory is given by the celebrated Landauer limit [17], stating an energetic lower bound of \(k_\mathrm {B}T \ln 2\), where T is the temperature and \(k_\mathrm {B}\) the Boltzmann constant. This energy is eventually dissipated into the environment as heat. The Landauer limit is achieved when the erasure of a bit occurs along a quasistatic process. In practice, memory swaps happen at finite short time scales in a process which is inherently noisy and out of equilibrium.

Fourteen years ago Krzysztof got interested on the mathematical formalisation of (at that time incipient area), nonequilibrium fluctuation relations [8], as well as on the Landauer limit [1].

The memory erasure process can be stated as a finite-time nonequilibrium process between an initial state at which a bit is observed to be in a specific state and a final state in which the bit is absent, and indeed the Landauer limit was experimentally verified like this [4]. In general, an external control on the nonequilibrium transition can be devised to obtain optimal processes that minimise, e.g. dissipation [2, 24]. In particular, in Ref. [2] such optimisation was shown to be solved for the Langevin dynamics in the overdamped limit by the Monge–Kantorovich optimal mass transport and the Burgers equation. This is particularly relevant in computer processing as there one searches for minimal dissipation but at the same time, fast processing.

In 2012, Krzysztof and his colleagues published a paper [1] in which they related the Landauer Principle [17] to the Monge–Kantorovich optimal mass transport. In a talk by Krzysztof in Geneva in April 2013, “2nd law of thermodynamics for fast random processes” he showed how the Landauer Principle is related to an overdamped Langevin evolution from an initial state to a different final state. In discussions with him, the question came up on how to efficiently and precisely compute the dissipated energy. There were many methods around at the time, and Krzysztof based his discussion on the papers by Benamou and Brenier [6, 7]. He also gave a sequence of 3 talks in Helsinki on this subject “Fluctuations relations in stochastic thermodynamics” where an extensive list of references can be found [13]. Since the authors of [1] restricted the discussion to 1 dimension, one can use methods from differential equations which are more precise than the methods used in that paper. This will be part of our contribution, which illustrates the breath of Krzysztof’s interests and competence. We miss him.

2 Stochastic Thermodynamics

We consider finite-time nonequilibrium transitions in d dimensions, with dynamics described by the Langevin equation in the overdamped limit

$$\begin{aligned} \mathrm {d}{\varvec{x}}_t= -\mu \ \partial _{\varvec{x}}U({\varvec{x}}_t,t) \mathrm {d}t + \mathrm {d}{\varvec{\xi }}_t \ , \end{aligned}$$
(1)

where \(U({\varvec{x}}_t,t)\) is a smooth control potential and \({\varvec{\xi }}_t\) white noise with zero mean \(\langle \mathrm {d}{\varvec{\xi }}^i_t \rangle = 0\), and covariance \(\langle \mathrm {d}{\varvec{\xi }}^i_t \ \mathrm {d}{\varvec{\xi }}^j_{t^\prime } \rangle = 2 D^{ij} \delta (t-t^\prime ) \mathrm {d}t\).

The diffusion matrix D and mobility matrix \(\mu \) appearing above, are assumed positive and satisfying the Einstein relation \( D = k_\mathrm {B}T \mu \) where \(k_\mathrm {B}\) is the Boltzmann constant, so that the noise models the fluctuations of a thermal bath at temperature T.

During the nonequilibrium transition, we assume that the control potential changes from \(U({\varvec{x}}_0 ,0) = V_i({\varvec{x}})\) at time \(t=0\) to \(U({\varvec{x}}_\tau ,\tau ) = V_f({\varvec{x}})\) at time \(t=\tau \). Given an initial probability density \(\varrho ({\varvec{x}}_0,0) = \varrho _i(x)\), \({\varvec{x}}_t\) defines a Markov diffusion process for times \(t>0\), with generator

$$\begin{aligned} \mathcal {L}_t = -\left( \partial _{\varvec{x}}U({\varvec{x}}_t,t) \right) \cdot \mu \partial _{\varvec{x}}+ k_\mathrm {B}T \partial _{\varvec{x}}\cdot \mu \partial _{\varvec{x}}\ . \end{aligned}$$

The probability density evolves according to the Fokker–Planck equation

$$\begin{aligned} \partial _t \varrho ({\varvec{x}}_t,t) = \mathcal {L}_t^\dagger \varrho ({\varvec{x}}_t,t) \ , \end{aligned}$$
(2)

where

$$\begin{aligned} \mathcal {L}_t^\dagger = \partial _{\varvec{x}}\mu \left( \partial _{\varvec{x}}U({\varvec{x}}_t,t) \right) \cdot \mu \partial _{\varvec{x}}+ k_\mathrm {B}T \partial _{\varvec{x}}\cdot \mu \partial _{\varvec{x}}\ , \end{aligned}$$

is the adjoint of \(\mathcal {L}_t\).

Following Ref. [28], the energy balance for the single fluctuating trajectories of the process Eq. (1) leads to the framework of stochastic thermodynamics, developed to give a thermodynamic description to small systems in contact with a heat bath and driven out of equilibrium (see, e.g. Refs. [21, 27] for a modern review).

Defining the work done on the system during the time interval \(\tau \) as

$$\begin{aligned} W = \int _0^\tau \partial _t U({\varvec{x}}_t,t) \mathrm {d}t \ , \end{aligned}$$

and the heat released by the system into the environment in the Stratonovich convention as

$$\begin{aligned} Q = - \int _0^\tau \partial _{\varvec{x}}U({\varvec{x}}_t,t) \circ \mathrm {d}{\varvec{x}}_t\ , \end{aligned}$$

the balance

$$\begin{aligned} W - Q = \Delta U \ , \end{aligned}$$

with \(\Delta U = U({\varvec{x}}_\tau ,\tau ) - U({\varvec{x}}_0 ,0)\), expresses the conservation of energy that holds for every fluctuating trajectory of the transition process, in analogy to the first law of thermodynamics.

To obtain a fluctuating version of the second law of thermodynamics, we first notice that the entropy change associated with the transition from time 0 to \(\tau \), can be split into two contributions

$$\begin{aligned} \Delta S_{\mathrm {tot}} = \Delta S_{\mathrm {sys}} + \Delta S_{\mathrm {env}} \ . \end{aligned}$$
(3)

The first term on the right-hand side of Eq. (3) corresponds to the entropy change of the system due to the evolution of the probability density

$$\begin{aligned} \Delta S_{\mathrm {sys}} = S_{\mathrm {sys}}(\tau ) - S_{\mathrm {sys}}(0) \ , \end{aligned}$$
(4)

where

$$\begin{aligned} S_{\mathrm {sys}}(t) = -k_\mathrm {B}\int \varrho ({\varvec{x}}_t,t) \ln \left( \varrho ({\varvec{x}}_t,t)\right) \mathrm {d}{\varvec{x}}_t\ , \end{aligned}$$

is simply the Gibbs–Shannon entropy with respect to the instantaneous probability density.

The second contribution on the right-hand side of Eq. (3) corresponds to the change of entropy of the environment due to the dissipated heat

$$\begin{aligned} \Delta S_{\mathrm {env}} = \frac{1}{T} \langle Q \rangle \ , \end{aligned}$$
(5)

where \(\langle Q \rangle \) is the mean heat released during the transition.

To obtain the second law, it is expedient to define the current velocity of the process \({\varvec{v}}({\varvec{x}}_t,t)\) [20]. We first note that the instantaneous probability density for a Markov diffusion process can be written as

$$\begin{aligned} \varrho ({\varvec{x}}_t,t) = \langle \delta \left( {\varvec{x}}- {\varvec{x}}_t\right) \rangle \equiv \exp \left( -\frac{R({\varvec{x}}_t,t)}{k_\mathrm {B}T} \right) \ . \end{aligned}$$

Moreover, the Fokker–Planck equation  (2), yielding the evolution of \(\varrho ({\varvec{x}}_t,t)\) can be rewritten as the advection equation

$$\begin{aligned} \partial _t \varrho + \partial _{\varvec{x}}(\varrho {\varvec{v}}) = 0 \ , \end{aligned}$$
(6)

in the current velocity defined as

$$\begin{aligned} {\varvec{v}}({\varvec{x}}_t,t)= & {} -\mu \left( \partial _{\varvec{x}}U ({\varvec{x}}_t,t) + \frac{k_\mathrm {B}T}{\varrho ({\varvec{x}}_t,t)} \partial _{\varvec{x}}\varrho ({\varvec{x}}_t,t) \right) \nonumber \\= & {} -\mu \ \partial _{\varvec{x}}\left( U ({\varvec{x}}_t,t)-R ({\varvec{x}}_t,t)\right) \ . \end{aligned}$$
(7)

The current velocity, defined through an appropriate limiting procedure [1, 3, 20], has the interpretation of the mean local velocity of the process \({\varvec{x}}_t\). Correspondingly, in terms of the current velocity, the Fokker–Planck equation is equivalent to deterministic mass transport.

Now, using Eq. (6), the entropy change of the system Eq. (4) can be written after integration by parts as

$$\begin{aligned} \Delta S_{\mathrm {sys}}= & {} \int _0^\tau \mathrm {d}t {\dot{S}}_{\mathrm {sys}} \nonumber \\= & {} \int _0^\tau \mathrm {d}t \left( -k_\mathrm {B}\int \left( 1 + \ln (\varrho ({\varvec{x}}_t,t))\right) \partial _t \varrho ({\varvec{x}}_t,t)) \mathrm {d}{\varvec{x}}\right) \nonumber \\= & {} \frac{1}{T} \int _0^\tau \mathrm {d}t \int \partial _{\varvec{x}}R({\varvec{x}}_t,t) \cdot {\varvec{v}}({\varvec{x}}_t,t) \ \ \varrho ({\varvec{x}}_t,t) \ \mathrm {d}{\varvec{x}}\ , \end{aligned}$$
(8)

where \({\dot{S}}_{\mathrm {sys}}\) is the time derivative of \(S_{\mathrm {sys}}\).

In the same way, the change of entropy of the environment becomes

$$\begin{aligned} \Delta S_{\mathrm {env}}= & {} -\frac{1}{T} \int _0^\tau \int U({\varvec{x}}_t,t) \partial _t \varrho ({\varvec{x}}_t,t) \mathrm {d}{\varvec{x}}\nonumber \\= & {} -\frac{1}{T} \int _0^\tau \mathrm {d}t \int \partial _{\varvec{x}}U({\varvec{x}}_t,t) \cdot {\varvec{v}}({\varvec{x}}_t,t) \ \ \varrho ({\varvec{x}}_t,t) \ \mathrm {d}{\varvec{x}}\ . \end{aligned}$$
(9)

Combining both contributions and using Eq. (7), the total entropy change Eq. (3) becomes

$$\begin{aligned} \Delta S_{\mathrm {tot}} = \frac{1}{T} \int _0^\tau \mathrm {d}t \int {\varvec{v}}({\varvec{x}}_t,t) \cdot \mu \ {\varvec{v}}({\varvec{x}}_t,t) \ \varrho ({\varvec{x}}_t,t) \ \mathrm {d}{\varvec{x}}\ . \end{aligned}$$
(10)

This expression implies immediately the fluctuating analogue of the second law of thermodynamics, namely

$$\begin{aligned} \Delta S_{\mathrm {tot}} > 0 \ . \end{aligned}$$
(11)

3 The Optimal Mass Transport

The mass transport problem was first considered by Gaspard Monge in 1781 [19]. Roughly speaking, the problem consists in calculating the most economic way of moving a volume of mass between two places. The modern approach of Monge’s optimal mass problem was formalised by Kantorovich in 1942 [16] (see also [6, 32]). Optimal mass transport is nowadays referred as the Monge–Kantorovich problem. Here we adopt Monge’s exposition.

Let \(\varrho _i(x)\) and \(\varrho _f(x)\) be two probability densities, bounded and with compact support in the reals, and satisfying

$$\begin{aligned} \int \varrho _i(x) \mathrm {d}x = \int \varrho _f(x) \mathrm {d}x = 1 \ . \end{aligned}$$
(12)

The optimisation problem is to find an invertible smooth map \(\varphi := {\varvec{x}}_f({\varvec{x}}_i)\), that is measure preserving, namely

$$\begin{aligned} \int _{{\varvec{x}}_i(A)} \varrho _i({\varvec{x}}) \ \mathrm {d}{\varvec{x}}= \int _A \varrho _f({\varvec{x}}) \ \mathrm {d}{\varvec{x}}\ , \end{aligned}$$

and minimises the objective function

$$\begin{aligned} \int \mathcal {C}\left( {\varvec{x}},{\varvec{x}}_f({\varvec{x}})\right) \varrho _i({\varvec{x}}) \ \mathrm {d}{\varvec{x}}\ , \end{aligned}$$
(13)

where \(\mathcal {C}\left( {\varvec{x}},{\varvec{x}}_f({\varvec{x}})\right) \) is the cost transporting the unit mass from its initial distribution \(\varrho _i\) into a final distribution \(\varrho _f\). In its original formulation, Monge considered the Euclidean distance as the cost function \(\mathcal {C}\left( {\varvec{x}},{\varvec{x}}_f({\varvec{x}})\right) =\left| {\varvec{x}}- {\varvec{x}}_f({\varvec{x}})\right| \). However, the cost function can be taken as \(\mathcal {C}\left( {\varvec{x}},{\varvec{x}}_f({\varvec{x}})\right) =\left| {\varvec{x}}- {\varvec{x}}_f({\varvec{x}})\right| ^r\). We will show that the case \(r=2\) is particularly relevant to formulate a refined Landauer limit for finite-time processes. This case was solved by Benamou and Brenier in 1999 [7], and was used by Krzysztof and colleagues in 2012 in relation to the Landauer limit [1].

4 Minimal Dissipation Memory Erasure

The second law states that for any thermodynamic transformation between two given initial and final states, the total entropy change must be positive, such as in Eq. (11). This is valid for any thermodynamic transformation, even for quasistatic processes occurring infinitely slowly.

Consider now a thermodynamic transformation constrained to be completed in a fixed finite time \(\tau \). Dissipation is naturally expected to be larger than in the quasistatic transformation, and the question that arises is: what is the minimal possible dissipation produced in a finite-time transformation? This question was answered in Ref. [1] for isothermal stochastic systems with Langevin dynamics in the overdamped limit of Sect.  2.

This question is particularly relevant to computer processing. Landauer cost of information processing, stating that the erase of a bit of information is performed at a cost of no less than \(k_\mathrm {B}T \ln 2\) dissipated heat [17]. The Landauer limit continues to be the main reference in information processing because the process of bit erasure is the elementary operation that produces maximal dissipation in universal computing with transistor logic gates [22].

In this section, we review briefly this optimal solution by following Ref. [1].

Consider the stochastic process \({\varvec{x}}_t\) of Eq. (1), driven out of equilibrium by the control \(U({\varvec{x}}_t,t)\) from a state \(\varrho _i({\varvec{x}})\) at time \(t=0\) to a state \(\varrho _f({\varvec{x}})\) at time \(t=\tau \). The goal is to obtain the optimal choice of the control \(U({\varvec{x}}_t,t)\) minimising the dissipation, as given by Eq. (10), required to drive the system along such transformation, over all the densities \(\varrho ({\varvec{x}}_t,t)\) and all velocity fields \({\varvec{v}}({\varvec{x}}_t,t)\) satisfying Eq. (6), under the constraint

$$\begin{aligned} \varrho ({\varvec{x}}_0,0) = \varrho _i({\varvec{x}}) \ , \qquad \varrho ({\varvec{x}}_\tau ,\tau ) = \varrho _f({\varvec{x}}) \ . \end{aligned}$$
(14)

In other words, we need to minimise the functional

$$\begin{aligned} \mathcal {A}\left[ \varrho _i,{\varvec{v}}\right] = \int _0^\tau \mathrm {d}t \int {\varvec{v}}({\varvec{x}}_t,t) \cdot \mu \ {\varvec{v}}({\varvec{x}}_t,t) \ \varrho ({\varvec{x}}_t,t) \ \mathrm {d}{\varvec{x}}\ . \end{aligned}$$
(15)

Apart from a factor \(\tau \), the minimisation of Eq. (15) over the fields \(\varrho ({\varvec{x}}_t,t)\) and \({\varvec{v}}({\varvec{x}}_t,t)\) subject to Eqs. (6) and  (14), was solved by Benamou and Brenier [7] (see also [1]). There it was shown that the optimal velocity current minimising Eq. (15) is gradient

$$\begin{aligned} {\varvec{v}}({\varvec{x}}_t,t) = \nabla \phi ({\varvec{x}}_t,t) \ , \end{aligned}$$
(16)

where \(\phi ({\varvec{x}}_t,t)\) is convex and a solution of the Hamilton–Jacobi equation. Equation (16) implies that the optimal solution corresponds, through Eq. (7), to the optimal control \(U({\varvec{x}}_t,t)\), and that \({\varvec{v}}\) is also the local velocity of the optimal control.

Restricting ourselves to smooth velocity fields \({\varvec{v}}\) such that the Lagrangian trajectories \({\varvec{x}}(t)\) satisfy \({\dot{{\varvec{x}}}}(t) = {\varvec{v}}({\varvec{x}}(t),t)\), the solution to the advection equation (6) is given by

$$\begin{aligned} \varrho ({\varvec{x}}_t,t) = \int \delta \left( {\varvec{x}}_t- {\varvec{x}}(t;{\varvec{x}}_i)\right) \varrho _i({\varvec{x}}_i) \ \mathrm {d}{\varvec{x}}_i \ , \end{aligned}$$
(17)

where \({\varvec{x}}(t;{\varvec{x}}_i)\) denotes the Lagrangian trajectory that at time \(t=0\) passes through \({\varvec{x}}_i\). Under the controlled transformation, the Lagrangian map \({\varvec{x}}_i \mapsto {\varvec{x}}_f({\varvec{x}}_i)\) should transport the initial density \(\varrho _i\) into the final density \(\varrho _f\).

Substituting Eq. (17) into Eq. (15), we can replace the minimisation of the functional \(\mathcal {A}\) over the velocity fields \({\varvec{v}}\) into the minimisation of

$$\begin{aligned} \mathcal {A}\left[ \varrho _i,{\varvec{v}}\right] = \int _0^\tau \mathrm {d}t \int {\dot{{\varvec{x}}}}({\varvec{x}}_i ,t) \cdot \mu \ {\dot{{\varvec{x}}}}({\varvec{x}}_i ,t) \ \varrho ({\varvec{x}}_i) \ \mathrm {d}{\varvec{x}}_i \ . \end{aligned}$$

over the Lagrangian flows satisfying \({\varvec{x}}_i \mapsto {\varvec{x}}({\varvec{x}}_i,\tau ) \equiv {\varvec{x}}_f({\varvec{x}}_i)\) such that

$$\begin{aligned} \varrho _f({\varvec{x}}) = \int \delta \left( {\varvec{x}}- {\varvec{x}}_f({\varvec{x}}_i)\right) \varrho _i({\varvec{x}}_i) \ \mathrm {d}{\varvec{x}}_i \ . \end{aligned}$$

This constraint is equivalent to

$$\begin{aligned} \varrho _f\left( {\varvec{x}}_f({\varvec{x}}_i)\right) \ \frac{\partial ({\varvec{x}}_f({\varvec{x}}_i))}{\partial ({\varvec{x}}_i)} = \varrho _i({\varvec{x}}_i) \ , \end{aligned}$$
(18)

where \(\frac{\partial ({\varvec{x}}_f({\varvec{x}}_i))}{\partial ({\varvec{x}}_i)}\) is the Jacobean of the Lagrangian map. We require the Lagrangian map to be smooth and invertible, with a smooth inverse \({\varvec{x}}_f \mapsto {\varvec{x}}_i({\varvec{x}}_f)\).

Minimising first over time under the above constraints we realise that for a positive definite matrix \(\mu \) the minimal Lagrangian trajectories correspond to straight lines

$$\begin{aligned} {\varvec{x}}(t;{\varvec{x}}_i) = \frac{\tau -t}{\tau } {\varvec{x}}_i + \frac{t}{\tau } {\varvec{x}}_f({\varvec{x}}_i) \ . \end{aligned}$$
(19)

Therefore, the optimal solution is completed once the functional

$$\begin{aligned} \mathcal {C}\left( {\varvec{x}}_f({\varvec{x}}_i)\right) = \int \left( {\varvec{x}}_f({\varvec{x}}_i) - {\varvec{x}}_i\right) \cdot \mu ^{-1} \left( {\varvec{x}}_f({\varvec{x}}_i) - {\varvec{x}}_i\right) \varrho _i({\varvec{x}}_i) \mathrm {d}{\varvec{x}}_i \ , \end{aligned}$$
(20)

is minimised over all Lagrangian maps \({\varvec{x}}_i \mapsto {\varvec{x}}_f({\varvec{x}}_i)\).

Equation (20) corresponds to the Monge–Kantorovich transportation problem of Eq. (13), with a quadratic cost, solved in Ref. [7], and in Refs. [1,2,3] in the context of stochastic thermodynamics. In Ref. [2] it was shown that minimisation of Eq. (20) is solved by the Burgers equation over the velocity potential \(\phi \), and mass transport by the Burgers velocity field Eq. (6).

Once the minimiser \({\varvec{x}}_f({\varvec{x}}_i) \) is obtained, the minimal value of the functional Eq. (15) is

$$\begin{aligned} \mathcal {A}_{\mathrm {min}} = \frac{1}{\tau } \mathcal {C}_{\mathrm {min}} \ , \end{aligned}$$

where \(\mathcal {C}_{\mathrm {min}}\) is the value of the quadratic cost Eq. (20) over the minimiser Lagrangian map Eq. (19). Then it follows that the minimum entropy production during a transition from \(\varrho _i({\varvec{x}})\) to \(\varrho _f({\varvec{x}})\) satisfying Eq. (12) in a fixed time \(\tau \) is

$$\begin{aligned} \langle \Delta S_{\mathrm {tot}} \rangle _{\mathrm {min}} = \frac{1}{\tau T} \mathcal {C}_{\mathrm {min}} > 0 \ . \end{aligned}$$
(21)

Finally, Eq. (5) and the value Eq. (21) yield a Landauer bound for the average dissipated heat during the erasure of one bit of information in overdamped Langevin dynamics

$$\begin{aligned} \langle Q \rangle \ge \frac{1}{\tau } \mathcal {C}_{\mathrm {min}} + k_\mathrm {B}T \ln 2 \ . \end{aligned}$$

5 Numerical Solution of the Assignation Problem

Given an initial \(\varrho _i({\varvec{x}})\) and final \(\varrho _f({\varvec{x}})\) densities, in this section we deal with the problem of solving numerically the assignation problem to find the minimiser of the Lagrangian map. This was done in Ref. [1] by means of several methods. Direct integration of the constraint Eq. (18) was found to become unstable at values of \({\varvec{x}}_i\) for which the derivative \(\mathrm {d}{\varvec{x}}_f({\varvec{x}}_i)/\mathrm {d}{\varvec{x}}_i\) diverges (and similarly for the inverse map). Similar problems were also found using a rearrangement “auction” algorithm [5]. After discussing this with Krzysztof in Geneva in 2012, we decided to use another, hopefully faster and more precise method. In the rest of this section, we show its implementation and discuss its performance and general limitations.

We assume that \(\varrho _i({\varvec{x}})\) and \(\varrho _f({\varvec{x}})\) are given and without loss of generality, satisfy Eq. (12). The minimiser of the Lagrangian map, namely the optimal map \({\varvec{x}}_i \mapsto {\varvec{x}}_f({\varvec{x}}_i)\) that transports \(\varrho _i\) into \(\varrho _f\) and minimises the cost Eq. (20), satisfies Eq. (18).

Fig. 1
figure 1

The initial distribution \(\varrho _i(x)\) (black) and the final distribution \(\varrho _f(x)\) (red). While the problem is of course quite easy, intuitively, the issue is how to find the “best” numerical solution. (color figure online)

We consider the optimal mass transport that was considered in Ref. [1]:

$$\begin{aligned} \varrho _i(x)= & {} \frac{1}{Z_i}\exp \left( -\frac{a}{k_\mathrm {B}T} \left( x^2-\alpha ^2\right) ^2\right) ~, \nonumber \\ \varrho _f(x)= & {} \frac{1}{Z_f}\exp \left( -\frac{a}{k_\mathrm {B}T} (x-\alpha )^2 \left( (x-\alpha )^2+3 \alpha (x-\alpha )+4 \alpha ^2)\right) \right) ~, \end{aligned}$$
(22)

where \(Z_i\) and \(Z_f\) are normalisation factors so that Eq. (12) is satisfied, and the constants \(a=112\,k_\mathrm {B}\,T\,\mu \mathrm {m}^{-4}\) and \(\alpha =0.5\,\mu \mathrm {m}\) were chosen to match the experimental realisation of Ref. [4]. As a matter of fact, Eq. (1) models the dynamics in the experiment with a mobility \(\mu = \frac{0.213877}{k_\mathrm {B}T} \frac{\mu \mathrm {m}^{2}}{s}\). Furthermore, the control potential \(U({\varvec{x}}_t,t)\) is effectively one-dimensionalFootnote 1. The initial and final densities of Eq. (22) are shown in Fig. 1.

The minimiser of the Lagrangian map, namely the optimal map \({\varvec{x}}_i \mapsto {\varvec{x}}_f({\varvec{x}}_i)\) that transports \(\varrho _i\) into \(\varrho _f\) and minimises the cost Eq. (20), satisfies

$$\begin{aligned} \int _{-\infty }^{{\varvec{x}}_i} \varrho _i({\varvec{x}}) \ \mathrm {d}{\varvec{x}}= \int _{-\infty }^{{\varvec{x}}_f({\varvec{x}}_i)} \varrho _f({\varvec{x}}) \ \mathrm {d}{\varvec{x}}\ , \end{aligned}$$
(23)

for all \({\varvec{x}}_i\), and uniqueness is guaranteed if for each \({\varvec{x}}_i\) one chooses the minimal \({\varvec{x}}_f({\varvec{x}}_i)\).

Fig. 2
figure 2

Vector field of the solution of the Lagrangian map \({\varvec{x}}_f({\varvec{x}}_i)\) for the example of Eq. (22)

However, a basic problem to take into account when solving Eq. (23) is that, if we are given \(\varrho _i\) and \(\varrho _f\) then the normalisation of the integrals is only numerically guaranteed to the available precision of the computer. This implies that the numerical solution ceases to exist at the ends of the supports of \(\varrho _i\) and \(\varrho _f\).

To have a better control over this difficulty one can solve the equivalent constraint Eq. (18), that in one dimension reads

$$\begin{aligned} \frac{\mathrm {d}x_f(x_i)}{\mathrm {d}x_i}=\frac{\rho _{\mathrm{i}}(x_i)}{\rho _{\mathrm{f}}(x_f(x_i))} ~. \end{aligned}$$
(24)

The problem of finding the minimiser map is simply transformed into solving an ODE. Eq. (24) defines a vector field in the \((x_i \, x_f)\) plane that can be easily obtained numericallyFootnote 2. For any point in this plane, the vector field determines the local evolution of Eq. (24). We show this in Fig. 2 for a number of points \((x_i \, x_f)\) chosen randomly.

Note that the vector field is not really defined, outside the central region (the central lobe in Fig. 2), as the problem has infinite derivatives in the vertical direction. The same happens at the left and right ends of the central area where the derivatives are zero and thus, the inverse of the Lagrangian map is not well defined.

In view of Fig. 2, solving Eq. (24) requires to choose appropriate initial conditions. It is clear that it is a good idea to start somewhere in the centre, at some height \(y\sim 0.5\) and to integrate both backwards and forwards. Up to numerical precision of about \(10^{-20}\) we found that integrating backwards from \(x_i=0\) fixed, the best numerical initial condition is \(x_f=\,\,\,0.493113178303063601340966142029819\) and when integrating forwards

\(x_f=0.493113178303063601752771313946797\).

These values represent the numerical possibilities for the example of Eq. (22). They were obtained by shooting as follows: we start with \(x_f=0.5\), which is about the centre of Fig. 2. We integrate backwards and check for which \(x_i\) the solution ceases to exist, either because the solution blows up (the graph is vertical), or the solution becomes constant (the graph is horizontal). We then try another value of \(x_f(0)\) (e.g. \(x_f=0.46\)), measure the divergence value for this new initial condition, and then solve by bisection for better and better values of \(x_f(0)\) until the value of \(x_i\) at which divergence is observed, cannot be improved any more.

To exemplify our shooting, the approximations through the bisection procedure are shown in Fig. 3 for the backward integration (left panel) and forward integration (right panel). We needed about 110 iterations to reach the maximal precision. The speed of convergence can certainly be improved by, e.g. quadratic interpolation.

The solution we obtain in this way is shown in Fig. 4 as the blue curve. It yields a discrete approximation of the minimiser of the continuous Lagrangian map, that can be improved by increasing the numerical precision.

Fig. 3
figure 3

The successive improvement of the shooting solution by changing the initial condition until the solution exists over a maximal extent

Fig. 4
figure 4

The vector field of the flow of the numerical convergence, with the minimiser of the Lagrangian map overlaid in blue (color figure online)

Finally, since the current velocity is gradient (see Eq. (16)), the current velocity is simply the time derivative of Eq. (19) and the optimal control potential is obtained from Eq. (7) (see Ref. [1] for further details).

6 Conclusions

In this paper, we have explored the numerical solution of the Monge problem of optimal transportation, applied to the bit erasure problem and the Landauer limit explored by Krzysztof in Ref. [1].

The densities of Eq. (22) are appropriate to discuss the problem of erasing a bit: At the initial time \(t=0\), the probability for a particle to be at the left (around \(x=-0.5\)), or at the right (around \(x=0.5\)), is the same. This corresponds to the bit to be in state 0 or 1 with equal probability, and is well described by a Gibbs state \(\varrho _i\propto \exp \left( -\frac{1}{k_\mathrm {B}T} V_i(x)\right) \) of Eq. (22), with a potential \(V_i(x)\) with two symmetric wells separated by a sufficiently high barrier. At final time \(t=\tau \), the final Gibbs states \(\varrho _f\) corresponds to a potential with only one of the two wells, in our case to a well centred at \(x=0.5\). This means the final state of the bit is 1, irrespective of its initial state.

The particular choice of Eq. (22) was obtained to reasonably match those used in the experiment reported in Ref. [4]. The entropy change between \(\varrho _i\) and \(\varrho _f\) is \(\Delta S_\mathrm {sys} \approx -0.74312 k_\mathrm {B}\), slightly smaller than \(-(\ln 2) k_\mathrm {B}\).

To obtain the minimal dissipated heat during this transition, we reduced the optimal mass transport in one dimensions to an ODE problem, and showed that shooting allows to find the optimal solution through bisection. The crucial information to obtain a convergent method was the knowledge of the vector field of the solution space (see Fig. 2).

Plugging the solution of the minimiser Lagrangian map into Eq. (20), we obtain the minimal cost \(\mathcal {C}_\mathrm {min} = 1.98897268 k_\mathrm {B}T\). In conclusion, erasing a bit through a transformation whose dynamics is described by Eq. (1) evolving in a finite time \(\tau \) between the states of Eq. (22) can be performed by an average dissipated heat which is

$$\begin{aligned} \langle Q \rangle \ge \left( \frac{1}{\tau } 1.98897268 + 0.74312 \right) k_\mathrm {B}T \ge k_\mathrm {B}T \ln 2 \ . \end{aligned}$$

We consider the value of the minimal cost that we obtained an improvement to the value \(1.996 k_\mathrm {B}T\) that Krzysztof obtained in [1]. We wish we could have discussed this with him.