1 Introduction

The Lattice Boltzmann Method (LBM) is a computationally efficient method for the simulation of fluid flows in a wide range of regimes. LBM allows solving a set of macroscopic equations via the time evolution of a (minimal) discrete version of the continuum Boltzmann equation, following the stream and collide paradigm.

While its original formulation targets mostly isothermal weakly compressible fluid flows, over the years several algorithmic developments have allowed extending the method to support the simulation of a wide range of complex flows, such as multi-phase [1, 2], turbulence [3], thermo-hydrodynamics [4, 5], non-Newtonian flows [6, 7], radiative transport [8], semi-classical fluids [9], relativistic flows [10], and many others [11], with an outlook toward exa-scale computing [12]. Most of these algorithmic enhancements have targeted the modeling of the collision process and, as a result, a large variety of collision models have been proposed to extend the applicability and overcome the shortcomings of the standard LBM. Notable examples extending the single relaxation time Bhatnagar–Gross–Krook (BGK) collision operator [13] are given by the two relaxation times (TRT) [14], multi-relaxation time (MRT) [15, 16], which can be combined with regularization procedures [17,18,19], and local viscous corrections, ensuring the validity of the H-theorem after the velocity discretization [20, 21]. More recent developments have taken into consideration the ellipsoidal statistical BGK [22] and the Shakov model [23], which allow to decouple the thermal relaxation from the viscous one. They also made possible to compute equilibrium distributions numerically, in principle, allowing to reproduce an arbitrary number of moments of the Maxwell-Boltzmann distribution [24]. For a comprehensive review comparing collision models for LBM the interested reader is referred to [25].

In recent years, there has been an increased interest in adoption of machine learning (ML) models, typically, of artificial neural networks (NN), to approximate various kernels/operators in the simulation of physical systems. Artificial neural networks form a class of nonlinear parametric models satisfying universal approximation property [26]. This property coupled with efficient computational tools for automatic differentiation and sensitivity analysis of forward and backward propagation, in the last decade, has led to outstanding results in such fields as computer vision [27] and natural language processing [28].

However, until recently, the biggest achievements of ML in scientific environment have been limited to approaches that are data-driven but agnostic to traditional scientific modeling of the underlying physics. Integrating the modern ML with physical modeling is the major challenge of what we call today Physics-Informed Machine Learning (PIML) [29, 30]. In particular, in fluid dynamics, there has been significant PIML activity in recent years. Examples include embedding physical constraints, such as Galilean invariance and rotational invariance, into the closure model [31, 32] and PIML models infusing physical constraints into the neural networks [33, 34]. Other efforts on turbulence modeling are summarized in [35, 36]. In addition to developing closure models, novel ML approaches have been used to learn turbulence dynamics [37], where a Convolutional Long Short Term Memory (ConvLSTM) Neural Network was developed to learn spatial-temporal turbulence dynamics; study super-resolution allowing to reconstruct turbulence fields using under-resolved data [38]; use Neural Ordinary Differential Equation (Neural ODE) for turbulence forecasting [39]; or measure [40], model and control flows [41].

Up to now, very few works have proposed applications of ML to LBM. Most of these have been focusing on accelerating the calculation of steady-state flows using convolutional neural networks [42,43,44], while Bedrunka et al. [45] employed a fully connected feed-forward neural network to tune the parameters of a MRT collision operator.

Since LBM entails a mesoscopic representation, it employs substantially more degrees of freedom (i.e., the number of discrete particle distribution functions) than the macroscopic observables of interest. These extra degrees of freedom suggest a possibility of using ML to encode more information in the model in order, for example, to extend its applicability, accuracy, and enhance the numerical stability. Indeed, deriving collision operators for LBM that can handle different type of fluid flows is an open problem with a lot of ongoing research, therefore there is a need for new and more general approaches, and data-driven techniques may offer an answer to this quest.

In this work, we take a first step in this direction and consider the problem of learning a custom collision operator from reference data. The collision operator will be represented by a NN that takes as inputs pre-collision and return post-collision populations. As a proof-of-concept we evaluate different neural network architectures to identify design choices that improve performance of the learned collision operator. To make performance evaluation more straightforward, we consider a large synthetic dataset containing pre- and post-collision populations pairs that itself was generated by a collision operator, specifically the BGK collision operator. In theory, in the limit of infinite data and infinite training resources it should be possible to recover the underlying operator. On the other hand, in practice, there will always be an error that (as we show later) significantly depends on the architecture of the NN. We show that constraining the NN to respect physics properties such as conservation laws and symmetries is key for accuracy. We evaluate the accuracy of the learned collision operator on both single-step (static) collision, as well as multi-step (dynamic) collisions, interleaved with advection steps, for the simulation of standard benchmarks. The focus of this work is on exposing the main ingredients needed to accurately learn a collision operator from data, while, for the moment, no attention is paid to computational efficiency.

This article is structured as follows: in Sect. 2, we provide a brief description of the Lattice Boltzmann Method. In Sect. 3, we define a PIML approach for learning a collision operator from data, focusing in particular on the embedding of relevant physical properties. In Sect. 4, we report simulations results for two numerical benchmarks where we have replaced the collision term in LBM simulations with a neural network. Here, we also compare the accuracy achieved by different neural network architectures. Concluding remarks and future directions are summarized in Sect. 5.

2 Lattice Boltzmann method

In this section, we give a short introduction to the Lattice Boltzmann Method; the interested reader is referred to, e.g., Ref. [11, 46] for a thorough introduction.

LBM simulates the evolution of macroscopic quantities (such as density and velocity) through a mesoscopic approach based on the synthetic dynamics of a set of discrete velocity distribution functions

$$\begin{aligned} f_i(\varvec{x},t), \ i = 0, \dots , q-1, \end{aligned}$$

to which we will refer as lattice populations.

At each grid node \(\varvec{x}\), the lattice populations are defined along the discrete components of the stencil \(\{ \xi _i \}, \ i = 1, \ldots , q-1\). It is customary to distinguish between different LBM schemes using the DdQq nomenclature, in which d refers to the number of spatial dimensions and q to the number of discrete components.

Fig. 1
figure 1

Example of a \(3\times 3\) LBM grid (with a single grid point shown on the right hand side) making use of the D2Q9 model where the lattice populations can move along 9 possible directions

In this work we adopt the D2Q9 model, based on the stencil in Fig. 1, where populations can move along 9 possible directions, defined by the following discrete velocity vectors:

$$\begin{aligned} \varvec{\xi }_i= {\left\{ \begin{array}{ll} (0,0) &{} i = 0,\\ (1,0),( 0,1),(-1, 0),(0,-1) &{} i = 1,2,3,4,\\ (1,1),(-1,1),(-1,-1),(1,-1) &{} i = 5,6,7,8. \end{array}\right. } \end{aligned}$$

In general, the velocity sets, \(\varvec{\xi }_{i}\), are chosen such that any spatial vector \(\varvec{\xi }_{i} \varDelta t\) points from one lattice site to a neighboring lattice site. This guarantees that the populations \(f_i\) always reach another lattice site during a time step \(\varDelta t\).

The time evolution of each lattice population is ruled by the lattice Boltzmann equation which, in the absence of external forces, reads as:

$$\begin{aligned} f_i(\varvec{x}+ \varvec{\xi }_{i} \varDelta t , t + \varDelta t) - f_i(\varvec{x},t) = \varOmega \left( f_i(\varvec{x},t) \right) , \end{aligned}$$
(1)

where \(\varOmega \) is the collision operator. Among various possible choices, in this work we consider the BGK [13] operator

$$\begin{aligned} \varOmega (f_i(\varvec{x},t)) = -\frac{\varDelta t}{\tau } \left( f_i(\varvec{x},t) - {f}_{i}^{\textrm{eq}} (\varvec{x},t)\right) , \end{aligned}$$
(2)

which models collisions as a linear relaxation process of the distribution function toward its equilibrium. Here, \(\tau \) is the relaxation time, \(\varDelta t\) is the time step, and \(f_i^{\textrm{eq}}(\varvec{x},t)\) is the discrete equilibrium distribution, for which we employ a second-order Hermite-expansion of the Maxwell-Boltzmann distribution:

$$\begin{aligned} f^{\textrm{eq}}_{i}(\rho , \varvec{u}) = \, w_i \rho \left( 1 + \frac{ \varvec{u} \cdot \varvec{\xi }_{i}}{c_s^2} + \frac{(\varvec{u} \cdot \varvec{\xi }_{i})^2 -(c_s |\varvec{u}|)^2 }{2 c_s^4}\right) , \end{aligned}$$
(3)

with \(w_i\) a lattice-dependent set of weighting factors. For the D2Q9

$$\begin{aligned}{} & {} w_0 = 4/9,\ \ w_1=w_2=w_3=w_4=1/9,\\{} & {} w_5=w_6=w_7=w_8=1/36. \end{aligned}$$

In lattice units, \(\varDelta t=1\), while the speed of sound in the lattice for the D2Q9 model is \(c_s = 1 / \sqrt{3}\). Finally, \(\rho \) and \(\varvec{u}\) indicate, respectively, the macroscopic density and the velocity fields. These macroscopic observable can be computed in terms of the moments of the velocity distribution functions as

$$\begin{aligned} \rho = \sum _{i=0}^{q-1} f_i \qquad \text {and} \qquad \rho \varvec{u} = \sum _{i=0}^{q-1} f_i \varvec{\xi }_{i}. \end{aligned}$$
(4)

Following an asymptotic analysis, like the Chapman-Enskog expansion [47], it can be shown that Eq. 1 delivers a second-order approximation of the Navier-Stokes equations. In particular, the following relation between the relaxation time parameter \(\tau \) and the kinematic viscosity \(\nu \) of the fluid holds:

$$\begin{aligned} \nu = \left( \tau - \frac{1}{2} \right) c_s^2 . \end{aligned}$$
(5)

We conclude this section by sketching the LBM algorithm. Provided a suitable initialization of the particle distribution functions, each time iteration of the algorithm entails the following steps:

  1. 1.

    Perform the streaming step:

    $$\begin{aligned} f_i^{\textrm{pre}}(\varvec{x}, t) = f_i(\varvec{x} - \varvec{\xi }_{i} \varDelta t , t) . \end{aligned}$$
    (6)
  2. 2.

    Compute the macroscopic fields using Eq. 4

  3. 3.

    Calculate the equilibrium distribution function using Eq. 3

  4. 4.

    Apply the collision operator

    $$\begin{aligned} f_i^{\textrm{post}}&= f_i(\varvec{x}, t + \varDelta t) = f_i^{\textrm{pre}}(\varvec{x},t) \nonumber \\&\quad - \frac{\varDelta t}{\tau } \left( f_i^{\textrm{pre}}(\varvec{x}, t) - {f}_{i}^{\textrm{eq}} ( \rho (\varvec{x},t), \varvec{u}(\varvec{x},t) ) \right) . \end{aligned}$$
    (7)

2.1 Collision invariants and equivariances

The operator \(\varOmega \) carries physical properties of the Boltzmann collision, which can be phrased in terms of invariances and equivariances. Respecting these physical aspects will turn central in the performance of the machine learning models discussed in the next sections. In particular, \(\varOmega \) satisfies the following:

P1:

Scale equivariance. Scale factors \(\lambda > 0\), remodulating all the pre-collision populations, are preserved, i.e.,

$$\begin{aligned} \varOmega (\lambda f_i^{pre}) = \lambda \varOmega ( f_i^{pre})\ . \end{aligned}$$
(8)

In other terms, the collision is degree-1 homogeneous.

P2:

Rotation and reflection equivariance. Generic two-dimensional collisions are equivariant with respect to the 2-dimensional orthogonal group O(2). This translates into the rotational and mirror independence on the spectator viewpoint. As we restrict to a D2Q9 lattice, this property reduces to preserving the 8th-order dihedral symmetry group of the lattice \(D_{2n}\subset O(2)\), \(n=4\). This group is generated by a 90 degree rotation and a mirroring with respect to symmetry axes of the cell (e.g., the x axis). Naming these two operations, respectively, r and s, and identifying with I the identity operation, the 8 elements of \(D_{8}\) are

$$\begin{aligned} D_8 = \{I,r,r^2,r^3,s,rs,r^2s,r^3s\}. \end{aligned}$$
(9)

Here, the n-th power indicates n subsequent applications of the same operator (i.e., \(r^2\) is a 180 degree rotation).

In 3-dimensions the extension of the dihedral symmetry group contains 48 elements.

When applied to the populations, these operators effectively yields permutations of the population indices (cf. Figure 2). Finally, in formulas, rotation and mirroring equivariance of collisions reads

$$\begin{aligned} \varOmega (\sigma f_i^{pre}) = \sigma \varOmega ( f_i^{pre}),\ \forall \sigma \in D_8. \end{aligned}$$
(10)
P3:

Mass and momentum invariance. In the D2Q9 LBM model, mass and momentum are preserved “exactly” by the collision. This holds thanks to the underlying Gaussian quadrature used in the discretization of the velocity space [48, 49]:

$$\begin{aligned} \sum _{i=0}^{8} \left( f_i^{\textrm{post}} - f_i^{\textrm{pre}} \right)&= 0, \nonumber \\ \sum _{i=0}^{8} \left( f_i^{\textrm{post}} - f_i^{\textrm{pre}} \right) \varvec{\xi }_i&= \varvec{0}. \end{aligned}$$
(11)

Finally, we shall require positivity (P4) for the post-collision lattice populations (\(f_i^{post} > 0\) for all i), since they represent discrete velocity distribution functions.

3 Machine learning approach

In this section we describe a machine learning approach, hinged on a neural network, to approximate the collision operator. Therefore, such a neural network will act as a replacement of the right hand side of Eq. 1. Our learning problem aims at finding a neural network \(\varOmega ^{\textrm{NN}}\) such that \(\varOmega ^{\textrm{NN}}\approx \varOmega \), i.e., formally,

$$\begin{aligned} {\left\{ \begin{array}{ll} \tilde{f}_i^{\textrm{post}} = \varOmega ^{\textrm{NN}} (f_i^{\textrm{pre}}), \ \ i = 0, \dots , 8,\\ \tilde{f}_i^{\textrm{post}}\approx f_i^{\textrm{post}} , \end{array}\right. } \end{aligned}$$
(12)

where the input of the network, \(f_i^{\textrm{pre}}\), is given by the pre-collision (post-streaming) lattice populations, and the network output, \(\tilde{f}_i^{\textrm{post}}\), targeting the post-collision populations \(f_i^{\textrm{post}}\).

In the reminder of the section we will define:

  • The loss function whose minimization drives the NN training process. This will also formalize our desired approximation \(\tilde{f}_i^{\textrm{post}}\approx f_i^{\textrm{post}}\).

  • The training and testing datasets.

  • The network architecture, addressing the strategies that we considered to embed symmetries and conservations.

Loss function and training procedure. We train the neural network to minimize the Mean Squared Relative Error (MSRE) between ground-truth post-collision populations, \( f_i^{\textrm{post}}\), and the neural network approximations, \(\tilde{f}_i^{\textrm{post}}\), accumulated across the populations:

$$\begin{aligned} \textrm{MSRE} = \sum _{i=0}^{8} \left( \frac{\tilde{f}_{i}^{\textrm{post}} - f_{i}^{\textrm{post}}}{f_{i}^{\textrm{post}}}\right) ^2. \end{aligned}$$
(13)

Here, the use of a relative error metric is crucial in order to achieve good accuracy, since in general the lattice populations take values proportional to the corresponding lattice weights \(w_i\), and, as a consequence, an absolute error metric would lead to the NN learning with higher accuracy the rest-population \(f_0\) (typically the one taking the largest value) at the expense of the others.

From an implementation perspective, we consider a mini-batch stochastic gradient descent approach driven by standard adaptive moment estimation (ADAM) optimizer [50].

Training and testing datasets In order to control the distribution of the macroscopic parameters appearing in the training set, we rely on synthetic data rather than actual simulation data. The training set consists of N pairs of 9-tuples

$$\begin{aligned} \{( f_{i,k}^{\textrm{pre}}, \varOmega (f_{i,k}^{\textrm{pre}}) ), k=1,2,\ldots ,N\}, \end{aligned}$$
(14)

where the pre-collision distributions are generated as

$$\begin{aligned} f_i^{\textrm{pre}} = f_i^{\textrm{eq}}(\rho , \varvec{u}) + f_i^{\textrm{neq}} . \end{aligned}$$
(15)

In the above, the equilibrium distribution \(f_i^{\textrm{eq}}\) is calculated using Eq. 3 from a set of randomly sampled macroscopic variables \(\rho , \varvec{u}\). The non equilibrium part \(f_i^{\textrm{neq}}\) is such that each population is randomly drawn from a Gaussian distribution, after which corrections are introduced to ensure no contributions to lower order moments, i.e.,

$$\begin{aligned} \sum _{i=0}^{8} f_i^{\textrm{neq}}&= 0, \nonumber \\ \sum _{i=0}^{8} f_i^{\textrm{neq}} \varvec{\xi }_i&= \varvec{0}. \end{aligned}$$
(16)

See Appendix A for further details.

Table 1 List of hyper-parameters used in the training of the NNs presented in this work

3.1 Neural network architectures

Fig. 2
figure 2

Sketch of a Neural Network architecture implementing the group averaging method. The core network (gray box on the left hand side) is evaluated 8 times on rotated/shifted versions of the input. The inverse transformation is applied to the 8 outputs which are then averaged in order to produce the final prediction

We consider variations of a fully connected feed-forward Neural Network, henceforth referred to as NN Naive, which is composed of two hidden layers of 50 neurons each. We use ReLU (rectified linear unit) as activation functions and no biases in the linear layers.

The Naive NN, as it concatenates bias-less linear layers and ReLU activations, all degree-1 homogenous functions, is itself degree-1 homogeneous. Therefore it is hardwired to respect the scale equivariance P1. Yet, no other properties such as conservation of mass, momentum and \(D_8\) equivariance are imposed, thus the denomination naive.

To amend this lack, in the reminder of this section we consider three further architectures:

  • NN Sym, satisfying properties P1, P2, P4;

  • NN Cons, satisfying properties P1, P3;

  • NN Sym+Cons, satisfying properties P1, P2, P3.

Before detailing the structure of these networks, we present a more general approach to satisfy P1, which we will use in all next three architectures. It hinges on considering pre- and post-collision populations normalized by the corresponding macroscopic density (invariant, P3). In formulas, we effectively consider and train a NN, \(\hat{\varOmega }^{NN}\), operating as

$$\begin{aligned} \tilde{\phi }_i^{\textrm{post}} =\hat{\varOmega }^{\textrm{NN}}(\phi _i^{\textrm{pre}}) , \end{aligned}$$
(17)

where the normalized pre-collision populations are defined as

$$\begin{aligned} \phi _i^{\textrm{pre}} = f_i^{\textrm{pre}} / \rho = f_i^{\textrm{pre}} / \sum _{i=0}^8 f_i^{\textrm{pre}}. \end{aligned}$$
(18)

The normalized post-collision populations are defined analogously.

Our final collision approximator, \(\varOmega ^{\textrm{NN}}\), prepends and appends rescaling operations as

$$\begin{aligned} \tilde{f}_i^{\textrm{post}} = \varOmega ^{NN}(f^{\textrm{pre}}_i) = \rho \hat{\varOmega }^{NN}(\phi _i^{\textrm{pre}}). \end{aligned}$$
(19)

On this basis, we can enforce positivity, P4, by considering a softmax activation function at the final layer of the network (i.e., in place of a ReLU activation). Let \(y_0,\ldots ,y_8\) be the 9 inputs of the final activation, then the softmax outputs read

$$\begin{aligned} \tilde{\phi }_i^{\textrm{post}} = \frac{e^{y_i}}{Z}=\frac{e^{y_i}}{\sum _{i=0}^8 e^{y_i}} \ . \end{aligned}$$
(20)

Note that this returns normalized populations by construction (cf. Equation 18).

3.2 \(D_8\) equivariance: NN Sym

We establish a collision NN, \(\bar{\varOmega }^{\textrm{NN}}\), in which we enforce the rotation and symmetry equivariance (cf. Equation 10). We achieve this by applying a \(D_8\) group averaging operation on a generic collision \(\varOmega ^{\textrm{NN}}\). In formulas, \(\bar{\varOmega }^{\textrm{NN}}\) operates as follows:

$$\begin{aligned} \tilde{f}_i^{\textrm{post}} = \bar{\varOmega }^{\textrm{NN}}(f_i^{\textrm{pre}}) = \frac{1}{|D_8|} \sum _{\sigma \in D_8}\sigma ^{-1}\varOmega ^{\textrm{NN}} (\sigma f_i^{\textrm{pre}}). \end{aligned}$$
(21)

A proof that Eq. 21 satisfies P2 (Eq. 10) is provided in Appendix D. Note that this approach is general: given any symmetry group the average in Eq. 21 generates an operator that is equivariant with respect to such a group action. Note that here we perform a convex combination of populations, hence ensuring positivity of populations, with combined weight of unity, which ensures preservation of density (assuming the original operator \(\varOmega ^{\textrm{NN}}\) had these properties).

In Fig. 2, we report our implementation of Eq. 21. Both at training time and for predictions the core network \(\varOmega ^{\textrm{NN}}\) is evaluated 8 times on rotated/shifted versions of the input (\(\sigma f_i^{\textrm{pre}}\)). The outputs are then averaged after an application of the inverse rotation/shift (\(\sigma ^{-1}\)).

3.3 Conservation of mass and momentum: NN Cons

A possible approach to ensure that Eq. 11 is satisfied, is algebraically correcting the lattice populations which the NN outputs (see also Ref [51] for an example where hard-constraints on conservation laws are imposed on the full Boltzmann equation). The method is based on the observation that all the conserved quantities are linear combinations of the lattice populations. Let

$$\begin{aligned} \varvec{f} = [f_0,\dots ,f_8]^T \end{aligned}$$
(22)

be the vector of the lattice populations, and \(\varvec{C}\) be an invertible matrix (representing change in bases):

$$\begin{aligned} \varvec{C} = [\varvec{c}_0,\dots ,\varvec{c}_8]^T \end{aligned}$$
(23)

with

$$\begin{aligned} \varvec{c}_0 \cdot \varvec{f}&= \rho \nonumber \\ \varvec{c}_1 \cdot \varvec{f}&= u_x \nonumber \\ \varvec{c}_2 \cdot \varvec{f}&= u_y . \end{aligned}$$
(24)

Consequently, the remaining column vectors \(\varvec{c}_3,\dots ,\varvec{c}_8\) are linearly independent and complementing \(\varvec{c}_0,\varvec{c}_1,\varvec{c}_2\) to a base of \({\mathbb {R}}^9\).

The matrix \(\varvec{C}\) represents an invertible map \({\mathbb {R}}^9\rightarrow {\mathbb {R}}^9\) which can be used to express a change in basis:

$$\begin{aligned} \varvec{b} = \varvec{C} \varvec{f} \Longleftrightarrow \varvec{f} = \varvec{C}^{-1} \varvec{b}. \end{aligned}$$
(25)

Thus, the first three entries of \(\varvec{b}\) are the density and the momentum components.

Let \(\varvec{I}_1\) and \(\varvec{I}_2\) be two diagonal matrices adding up to identity matrix (i.e., \(\varvec{I}_1 + \varvec{I}_2 = \varvec{I}\)), and satisfying

$$\begin{aligned} \varvec{I}_1&= \text {diag}(1, 1, 1, 0, \dots , 0) \nonumber \\ \varvec{I}_2&= \text {diag}(0, 0, 0, 1, \dots , 1) \ . \end{aligned}$$
(26)

We define the algebraic corrections as

$$\begin{aligned}&\tilde{\varvec{f}}^{\textrm{post}} =\varOmega _c(\varvec{f}^{\textrm{pre}}) =\varvec{A} \varvec{f}^{\textrm{pre}} + \varvec{B} \varOmega ^{\textrm{NN}} (\varvec{f}^{\textrm{pre}}), \nonumber \\&\quad \text {with} \quad \varvec{A}=\varvec{C}^{-1}\varvec{I}_1\varvec{C} \quad \text {and} \quad \varvec{B}=\varvec{C}^{-1}\varvec{I}_2\varvec{C}. \end{aligned}$$
(27)

The choice of \(\varvec{A}\) and \(\varvec{B}\) is not unique. In what follows we will report results where the algebraic reconstruction is applied to the populations of index 2, 5 and 8, using:

$$\begin{aligned} \varvec{A}&= \left[ \begin{array}{rrrrrrrrr} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 2 &{} 1 &{} 0 &{} 2 &{} 2 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ -\frac{1}{2} &{} \frac{1}{2} &{} 0 &{} -\frac{3}{2} &{} -1 &{} 1 &{} -1 &{} -2 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ \frac{1}{2} &{} \frac{1}{2} &{} 0 &{} \frac{1}{2} &{} 1 &{} 0 &{} 0 &{} 1 &{} 1 \\ \end{array}\right] ,\\ \varvec{B}&= \left[ \begin{array}{rrrrrrrrr} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ -1 &{} 0 &{} 0 &{}-2 &{}-1 &{} 0 &{}-2 &{}-2 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ \frac{1}{2} &{} -\frac{1}{2} &{} 0 &{} \frac{3}{2} &{} 1 &{} 0 &{} 1 &{} 2 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 \\ -\frac{1}{2} &{} -\frac{1}{2} &{} 0 &{} -\frac{1}{2} &{} - 1 &{} 0 &{} 0 &{} -1 &{} 0 \end{array}\right] . \end{aligned}$$

A second example is provided in Appendix C. Since the reconstruction occurs after the last hidden layer of the NN in general it does not ensure strictly positive lattice populations, even when used in combination with the softmax activation function (nevertheless, we never observed negative populations in the numerical results reported in the coming sections).

Note that this approach allows to enforce the conservation of mass and momentum at training time and yields no additional hyperparameters to be tuned.

An alternative approach, commonly adopted in the literature  [52,53,54], consists of introducing a soft constraint in the loss function in order to penalize mass and momentum mismatches. In formulas, this reads:

$$\begin{aligned} {\mathcal {L}} = \text {MSRE} + \alpha _1 | \tilde{\rho } - \rho | + \alpha _2 \Vert \tilde{\varvec{u}} - \varvec{u} \Vert \ , \end{aligned}$$
(28)

where \(\tilde{\rho }\) and \(\tilde{\varvec{u}}\) are the macroscopic quantities calculated over the lattice populations output of the network \(\tilde{f_i}^{\textrm{post}}\), while \(\alpha _1\) and \(\alpha _2\) weights the relative importance of each single constraint.

Since we have observed that the imposition of hard constraints via algebraic reconstruction systematically outperforms the soft-constraint based approach, the latter will not be covered in our analysis in the coming sections. Nevertheless, a few numerical results are reported in Appendix B where we highlight the shortcomings of this approach.

4 Numerical results

In this section, we present the results of LBM simulations where the collision term is replaced by either of the four neural networks introduced in the previous section: NN Naive, NN Sym, NN Cons, NN Sym+Cons. For each NN architecture we trained 50 instances, adopting random weights initialization. We stop the training process at 200 epochs. See Table. 1 for the full list of training hyper-parameters.

Fig. 3
figure 3

Static evaluation of the accuracy achieved by the four different NN architecture considered in this work. a Comparison of the absolute relative error on the post-collision populations of index i (cf. Figure 1). b Error committed in the conservation of momentum, with the uniformly filled boxplots representing the error associated with \(u_x\), and the boxplots with patterned filling the error associated with \(u_y\) (see Eq. 29 for the definition of the error metric). Note that the errors for NN Cons and NN Sym+Cons are zero to machine precision. c Error committed in violating rotation and mirroring equivariance (see Eq. 30 for the definition of the error metric). Note that for NN Sym and NN Sym+Cons the error is zero down to machine precision

We will first provide a static evaluation of the NN prediction error on the post-collision lattice populations. We also report on the physical properties of the learned collision operator. We will then turn our analysis to the comparison of time dependent flows considering two standard benchmarks: a Taylor–Green vortex decay, and a lid-driven cavity flow.

4.1 Static accuracy evaluation

We start by comparing the accuracy of the various NN architectures described in the previous section taking into consideration the training error. In Fig. 3a we show the distribution of the absolute relative error on the post-collision populations committed by the NN on the test dataset (generated following the procedure described in Appendix A). The boxplots compare the accuracy of 50 different instances of each NN architecture in the prediction of populations of index i. By comparing the median values we observe that NN implementing symmetries slightly, although systematically, outperform the Naive NN. On the other hand, hardwiring conservation laws do not lead to an improvement in the prediction of the lattice populations. This is due to the specific choice of algebraically reconstructing populations of index 2, 5 and 8 to restore the conservation of mass and momentum, and it can indeed be seen from the plot that the largest errors area associated with these three elements. A major improvement is achieved when combining conservation with rotation and symmetry equivariance (NN Sym+Cons). This case allows to improve accuracy in the prediction of the single lattice populations between 1 and 2 order of magnitudes with respect to all the previous cases.

Fig. 4
figure 4

Time evolution of the average absolute value of the velocity field in a Taylor–Green vortex, comparing the analytic solution (gray dotted line) against simulations making use of NNs with different architectures. The boxplots show variability among 50 different instances for each different NN architecture. The NN with built-in symmetries and conservation properties (red) is the most accurate, followed by NN with only conservation properties (blue), followed by NN with only symmetries (orange). The naive NN (green) is the least accurate

Fig. 5
figure 5

Velocity profile from simulations of a Taylor–Green vortex decay, after 1000 time steps. Color map indicates the absolute value of the velocity vector, whereas white lines provide the velocity streamlines. We compare the ground truth from a LBM simulation against the results provided by different NN implementations

We now evaluate how well the different architecture comply to the physical properties of the collision operator. In Fig. 3b we evaluate the distribution of the error committed in the momentum conservation by the various NN. We define

$$\begin{aligned} \varepsilon _1 = (u_j^{\textrm{pre}} - u_j^{\textrm{post}}) / c_s \ , \end{aligned}$$
(29)

with \(u_j^{\textrm{pre}}\) the momentum calculated on the pre-collision distribution functions, and \(u_j^{\textrm{post}}\) the momentum calculated from the distribution functions predicted by the NN; in the plot the case \(j = x\) is represented by the boxplots with uniform filling, and \(j = y\) by the boxplots with patterned filling.

The \(\varepsilon _1\) error distribution for the Naive NN is different when comparing the two spatial components, and also asymmetric with respect to zero. We observe that the NN implementing the symmetries of the lattice (NN Sym) outperforms the Naive NN, in turn restoring the symmetry in the error distribution. By construction, the error for the NN implementing conservation laws is systematically zero to machine precision.

Finally, in Fig. 3c we evaluate the distribution of the following error metric

$$\begin{aligned} \varepsilon _2 = \frac{1}{|D_8|}\sum _{i=0}^8 \sum _{\sigma \in D_8} \Big |\sigma \varOmega (f_i^{\textrm{pre}}) -\varOmega ( \sigma f_i^{\textrm{pre}} )\Big | \ , \end{aligned}$$
(30)

which quantifies the violation of the \(D_8\) equivariance. For \(D_8\)-equivariant collisions, i.e., satisfying P2 (Eq. 10), the term within the absolute value is zero to machine precision. We evaluate \(\varepsilon _2\) over the entire test dataset. We observe that the network implementing conservation laws (NN Cons) commits a larger error even when comparing with the Naive NN. This is due to the fact that the algebraic reconstruction procedure used to implement the conservation laws leads to the error accumulating along some lattice directions. The error metric is systematically zero for all the NN implementing the group-averaging technique.

In the coming sections we compare the performance of the different NN in the simulation of time-dependent fluid flows.

4.2 Benchmark I: Taylor–Green vortex

We consider the time dynamics of a Taylor–Green vortex, a standard benchmark for the validation of fluid flow solvers since it provides an exact solution to the Navier–Stokes equations.

Starting from the following initial conditions in a 2D periodic domain:

$$\begin{aligned} u_x(x,y)&= u_0 \cos {\left( x\right) } \sin {\left( y\right) },\nonumber \\ u_y(x,y)&=-u_0 \cos {\left( y\right) } \sin {\left( x\right) }, \quad x,y \in [0, 2 \pi ] \end{aligned}$$
(31)

with \(u_0\) the initial value for \(|\varvec{u}|\), it is simple to show that the flow decays exponentially and proportionally to

$$\begin{aligned} F(t) = \exp {\left( -2 \nu t \right) } , \end{aligned}$$
(32)

where \(\nu \) is the kinematic viscosity of the fluid (Eq. 5). This benchmark allows us to evaluate the time dynamic of a flow, covering different orders of magnitude in the values of the macroscopic parameters, and also to evaluate the preservation of symmetries by observing the structure of the vortexes.

We consider a \(32 \times 32\) grid, with \(u_0 = 10^{-2}\), \(\tau = 1\). In Fig. 4, we compare the time decay of the average absolute value of the velocity field from simulations making use of different NNs, comparing against the analytic solution. Once again, for each type of NN we have evaluated the results from 50 different networks trained starting from a random choice of the initial weights. The plot highlights the variability in the results from the different NNs by means of boxplots. From the plot we can see that the Naive NN is able of correctly follow the flow decay for only 20-40 iterations, after which not only the flow stops decaying but we also observe an increase in the kinetic energy. By employing a NN satisfying the symmetries of the lattice it is possible to restore the decaying trend of the flow, although we observe a deviation from the correct decaying rate. This can be attributed to the network not being able of preserving momentum. On the other hand, NNs enforcing the conservation laws are able to provide a more accurate dynamic, with only small variability around the analytic solution, which can be further reduced by combining conservation and preservation of symmetries.

Fig. 6
figure 6

Steady state profiles for a \(u_x\) along the vertical centerline, and b \(u_y\) along the horizontal centerline of the domain of a lid-driven cavity flow at \(\textrm{Re} = 10\). Simulations are performed on a square grid of side \(L = 64\). We compare the results of a LBM simulation (black line), against results obtained employing the four NN architectures considered in this work. The boxplots report the variability in the results from 50 instances of each NN architecture

Fig. 7
figure 7

Steady state profiles for a \(u_x\) along the vertical centerline, and b \(u_y\) along the horizontal centerline of the domain of a lid-driven cavity flow at \(\textrm{Re} = 100\). Dotted lines represent results obtained using a NN Sym+Cons architecture for increasing number of nodes in the grid side L. We compare the results against a LBM simulation (black line, \(L = 256\)), and reference data from Ghia et al. [55] (orange dots, \(L = 129\))

The importance of embedding conservation laws and symmetries together in the NN is even more evident in Fig. 4, where evolution statistics is shown for four types of NN designs. Embedding symmetries or conservation properties shows an immediate and dramatic improvement over the naive NN in the ability of the NN to capture the decay rate of the average velocity field. Enforcing conservation properties is appreciably more important (for the purpose of this test) than enforcing symmetries. Yet, enforcing both symmetries and conservation properties produces the most accurate results capturing the decay of average velocity with minimal variability all the way to machine precision, which is a remarkable result, especially compared to the performance of a naive NN. Moreover, we should stress that a NN with a lower training error will not necessarily guarantee for better results when employed in simulations; for example, NN Cons, which in Fig. 3a presents the larger training error, is among the best performing one when looking at Fig. 4.

On a more qualitative basis, in Fig. 5 we provide snapshots of the velocity field at a later stage of the dynamics (after \(t = 1000\) iterations), comparing the ground truth given by a plain LBM simulation against an example of the profile provided by each of the different NN implementations. The figure shows that, besides failing to reproduce the decay of the flow, the Naive NN is also not able to preserve the structure of the vortexes. The NN with symmetries, on the other hand, nicely preserves the geometric structure, although the amplitude of the velocity is slightly off with respect to the reference LBM profile. The NN enforcing conservation laws correctly capture on average the decaying rate (c.f. Figure 4), however, Fig. 5 clearly shows that the structure of the vortexes is not symmetric anymore. This can be attributed to the fact that the algebraic reconstruction is performed on 3 lattice populations, leading to a less balanced distribution of the error (cf. Figure 3c). Only by combining conservation and symmetries in the NN it is possible to reproduce correctly the velocity profile.

4.3 Benchmark II: Lid driven cavity flow

As a second example, we consider the lid-driven cavity flow, a wall-bounded benchmark in a very simple geometry, still leading to a non-trivial dynamic. Indeed there is no analytic solution for this flow, and for this reason we will compare this time only against reference LBM simulations.

The setup consists of a top-lid moving at a constant velocity (\(u_0\)), with no-slip boundary conditions at bottom and side walls. We consider a \(L \times L\) grid, the relaxation time set to \(\tau = 1\), and report the results for simulations at two different Reynolds numbers, respectively, \(\textrm{Re} = 10\) and \(\textrm{Re} = 100\), with

$$\begin{aligned} \textrm{Re} = \frac{u_0 L}{\nu } \ . \end{aligned}$$
(33)
Fig. 8
figure 8

Steady state profile of the velocity field for a lid-driven cavity flow at \(\textrm{Re} = 100\), comparing the results of a LBM simulation against the results provided by different NN implementations. Colors map the absolute value of the fluid velocity normalized over the lid velocity, whereas white lines provide the velocity streamlines. Simulations are performed on a square grid of side \(L = 128\)

In simulations the NN does not handle the evolution of the boundary nodes. Instead, we employ standard LBM approaches for implementing the boundary conditions. In particular, the bounce back rule is used to implement the no-slip condition. Here the lattice populations that during the streaming step interact with a solid wall get reflected to their original location with their velocity reversed:

$$\begin{aligned} f_{\bar{i}} (\varvec{x}, t + 1) = f_i (\varvec{x}, t) \ , \end{aligned}$$
(34)

where \(f_{\bar{i}}\) is the population of index \(\bar{i}\) such that \(\varvec{\xi }_{\bar{i}} = - \varvec{\xi }\). For the top wall we use a Dirichlet boundary condition

$$\begin{aligned} f_{\bar{i}} (\varvec{x}, t + 1) =f_i (\varvec{x}, t) + 2 w_i \rho _w \frac{\varvec{\xi }_i \cdot \varvec{u}_w}{c_s^2} \ , \end{aligned}$$
(35)

where \(\rho _w\) and \(\varvec{u}_{w} = (u_0, 0)\) are respectively the density and the velocity at the top wall.

In Fig. 6, we show the steady state velocity profiles along the vertical (a) and horizontal (b) centerlines of the lid-driven cavity for \(\textrm{Re} = 10\), comparing the results from a plain LBM simulation against results obtained employing NNs with different architectures. All simulations are performed on a square grid of side \(L = 64\). Once again we show data collected simulating 50 different instances of each NN architecture, with the boxplots reporting the variability in the obtained results. We observe that in this case the results of the Naive NN are much closer to the reference data with respect to the previous benchmark. This can be attributed to the boundary conditions constraining the flow. Both NN Sym and NN Cons provide an improvement over the Naive NN, however it is interesting to point out that the results provided by the latter present a much higher variability than the one observed in the simulation of the Taylor–Green vortex. Indeed, the plot clearly shows that only the case NN Sym+Cons is able to correctly reproduce the results of the LBM simulation. We select this NN architecture to perform simulation at a higher Reynolds number. In Fig. 7 we show the results obtained at \(\textrm{Re} = 100\), varying the grid size, and comparing with both a LBM simulation as well as with reference data from Ghia et al. [55]. The results from the simulation using the finer grid resolution (\(L=256\)) are found to be in excellent agreement with the reference data. On the other hand, we see that for coarser grid sizes the NN struggles to correctly reproduce the velocity in the proximity of the moving plate (see Fig. 7b). We shall discuss the origin of this mismatch in the coming subsection.

In Fig. 8, we show a more qualitative comparison for the case \(\textrm{Re} = 100\), presenting snapshots of the velocity field at the steady state, and comparing the results from a LBM simulation with results produced by the different NN architectures. It is interesting to observe that each different NN make a different prediction for the location of the main vortex, and only few reproduce the secondary vortex located at the bottom right corner. As expected from the analysis above, NN Sym+Cons provides results in excellent agreement with the LBM simulation.

4.4 Extrapolation

In Fig. 7b, we have observed significant deviations in the numerical results produced by the NN Sym+Cons architecture in the proximity of the moving plate, in particular for coarse grids. Since in simulations we are keeping fixed the kinematic viscosity and the Reynolds number, it follows from Eq. 33 that by increasing the grid resolution we also decrease the numerical value of the lid velocity \(u_0\). For \(L=64\) the numerical value used at the top lid \(u_0 \approx 0.26\) falls well outside the range of values shown to the network at training time. It is, therefore, interesting to investigate the extrapolation capabilities of the different NNs. In Fig. 9, we show the average MSRE on 50 instances of each NN architecture, calculated in the prediction of the equilibrium distribution \(f_i^{\textrm{eq}}(\rho = 1, u_x, u_y=0)\) at varying values of \(u_x\). The continuous lines show the performance of the NNs trained on a dataset where the macroscopic velocity takes values in the interval \((-0.03, 0.03)\); likewise, the dotted lines show the results for NNs trained on values of the macroscopic velocity in the interval \((-1/3, 1/3)\). Corresponding gray continuous (dotted) vertical lines are reported to identify the boundary of the two training datasets. Here we can see that when working in the range of values shown to the NN during the training, the NN Sym+Cons outperforms all the other network architectures. On the other hand, this NN commits the largest extrapolation error, i.e., it commits a larger error in predicting the equilibrium distribution outside of the values of the training set. While the reason for this behavior is currently unclear to us and will be object of further analysis in future work, these results explain the discrepancies observed in Fig. 7, where simulations with numerical values for the top-lid, which were outside of the training dataset, led to larger discrepancies with respect to the reference solution. This, in turn, points to the need of extra care in the preparation of the training dataset.

Fig. 9
figure 9

Comparison for the accuracy of the different NNs architecture within and outside the training dataset. The plot shows the average MSRE, computed on 50 instances of each NN architecture, in the prediction of the equilibrium distribution \(f_i^{\textrm{eq}}(\rho = 1, u_x, u_y=0)\) at varying values of \(u_x\). The continuous lines refer to NNs trained on a dataset where the macroscopic velocity takes values in the interval \((-0.03, 0.03)\), while the interval \((-1/3, 1/3)\) has been used to train the NNs corresponding to the dotted lines. The gray continuous (dotted) vertical lines identify the boundary of the two training datasets

5 Conclusion

In this work, we have presented a machine learning approach for learning a collision operator for the Lattice Boltzmann Method from data. As a proof of concept, we have developed a neural network capable of approximating to good accuracy the BGK collision operator. We have discussed in details a few methods which allow enriching the structure of the neural network to enforce relevant physical properties of the collision operator. We have shown that only by embedding conservation laws and lattice symmetries in the neural network it is possible to correctly reproduce the time dynamics of a fluid flow.

This work can be regarded as a first step toward the application of neural networks for extending the applicability of LBM in kinematic regimes not supported by the basic method. To give an example, in future extensions of the present work, we plan to evaluate the possibility of using our approach for learning collision operators from molecular dynamics and Monte Carlo simulations in regimes beyond hydrodynamic limit. While moving in this direction we expect that dealing with boundary conditions will become increasingly important, and one can think of training multiple NN for implementing diverse type of boundary conditions.

Moreover, we will take into consideration other approaches for embedding symmetries in the network to allow for a scalable extension to the 3-dimensional case, and for employing higher order stencils.