1 Introduction

Machine learning (ML) and artificial intelligence (AI) methods are increasingly being applied to scientific research, with the field of computational fluid dynamics (CFD) being no exception. This has led to the emergence of at least two hybrid AI/numerical simulation paradigms: AI-in-the-loop and AI-outside-the-loop. Both of these are distinguished by the coupling of an AI method with the simulation to form a new hybrdi, CFD+ML algorithm. In the case of AI-in-the-loop, the AI method is embedded within the simulation as a part of the numerical solver. For example, this could be an artificial neural network (ANN) surrogate model for sub-grid-scale physics or an ML model trained in-situ as the simulation progresses. AI-around-the-loop refers to the application of AI methods which interacts with the system via its inputs and outputs. Common examples include automated parameter tuning using black-box optimization techniques which where the simulation output forms a part of the objective function. In both cases, the most widely used programming language of choice is Python, though common ML frameworks provide C++ Application Programming Interfaces (API) as well.

Implementing hybrid CFD+ML algorithms in simulation codes like OpenFOAM presents challenges when operating at high-performance computing scales. This paper specifically focuses on three questions

  • How should ML be embedded into OpenFOAM as part of a simulation?

  • For CPU-based codes like OpenFOAM, what computing architectures allow for efficient use of CPU and GPU resources?

  • What are the basic workflow design patterns that can be composed to create complex CFD+ML applications?

These questions have thus far inhibited the integration of AI/ML, simulation methods, and High-Performance Computing (HPC). While the literature (particularly in the CFD realm), has a number of examples of using ML in situ, these are largely custom integrations that are difficult for others to expand upon or re-use.

In response to these challenges, the OpenFOAM Special Interest Group for Data-driven Modelling [1] has been considering the following design hypothesis: scientific simulations should be loosely coupled to ML both from a software sense (e.g., the ML backends should not be linked directly into the application) and also in a compute sense (e.g., ML training and inference should be offloaded to a separate process). This loose coupling allows for a clean separation of concerns and delivers modularity when examining different ML frameworks. Additionally, we consider a data-sharing paradigm which uses a centralized, computation-enabled database to stage data in-memory.

In this paper, we implement this loosely coupled paradigm within the OpenFOAM context using the open-source libraries SmartSim and SmartRedis, developed by Hewlett Packard Enterprise. We utilize the latest releases of OpenFOAM (version 2312 [2]), SmartSim (version 0.6.2. [3]), and SmartRedis (version 0.5.2 [4]), and provide three examples of hybrid CFD+ML workflows, available in a publicly available GitHub repository [5] and as a source code archive on the Zenodo data repository [6]:

  • Bayesian optimization for tuning the parameters of a turbulence model with the goal of matching the results of a low-resolution model with a higher fidelity reference case.

  • Streaming CFD data from the simulation to calculate its reduced basis via a partitioned singular value decomposition (SVD).

  • Using online-training and online-inference with CFD data to approximate mesh-point displacements in a mesh-motion problem.

Underlying these use cases is the expression of a new philosophy for scientific computation which considers workflow components like the simulation and ML applications as data producers and consumers. We focus on educational examples that can be easily understood and reproduced by the computational science community, and used as implementation starting points by the OpenFOAM community for developing more complex CFD+ML applications. The examples in this paper are therefore minimal working examples of complex CFD+ML workflows, representative of future large-scale applications that are currently in development.

This paper is organized into the following sections: Sect. 2 presents the overall architecture of both the software and computation framework and its implementation using SmartSim, Sect. 3.1 describes the integration of the SmartSim communication clients into OpenFOAM, Sect. 4 describes the implementation and results of the three use cases, and finally Sect. 5 provides a summary of this work.

2 Architecture

Traditionally, CFD engineers consider their simulations as single entities, i.e., in traditional CFD, the sole unit of work is the CFD case being integrated in time and space using numerical solvers. In the context of the hybrid CFD+ML workflows that we describe in this paper, there are now two separate entities to consider: the CFD algorithm and the ML algorithm. This necessitates a discussion of the overall architecture from both a software engineering and computational view.

2.1 Computational and communication architecture

On modern supercomputing platforms, two distinct types of computational units exist: general-purpose CPUs and vector-optimized GPUs. OpenFOAM is parallelized using domain-decomposition and message-passing via MPI, so the mainstream versions of OpenFOAM are designed to run on CPUs. In contrast, due to their arithmetic complexity, AI/ML methods, especially those that depend on matrix-multiply operations, can be run more efficiently on GPUs. While some HPC platforms are comprised of nodes with identical hardware, it is also common to have heterogeneous nodes. For example, some nodes (attached to the same network fabric) may have GPUs, whereas others might only have CPUs.

In a loosely coupled CFD+ML framework, both the CFD and ML algorithm can be considered separate entities and, thus, also deployed separately. In this paper, we consider loosely coupled CFD+ML algorithms to also mean that ML and CFD algorithms do not share memory. This creates a need to communicate data between the entities. Anticipating more complex workflows in the future, we reject the concept of a peer-to-peer communication model as the number of connections between entities scales by a power law which becomes untenable at even modest scales. Consider an OpenFOAM case deployed on 1,024 ranks, a second computational component deployed on 32 ranks, and a third application on 16 ranks. This configuration leads to a total of 524,288 connections. The problem is further amplified if applications might join or leave the ecosystem at various points of the workflow.

Instead, we consider a modified spoke-and-hub communication paradigm to enable entities to exchange data in a central location, i.e., a database. This type of communication topology concentrates the communication at the hub, which traditionally leads to performance degradation. In the solution proposed here, the hub has distributed components which helps balance the load on the hub and allows performance to be maintained at larger scales. Using the same example used in the fully-connected topology discussed previously, if we assume that this third application is the hub, the total number of connections is only 1024*16 + 32*16 = 16,896 connections,  30x fewer connections than the fully connected network. Additionally, adding or removing an entity from the ecosystem does not require a full resynchronization of every application, but only between that entity and the database.

We next demonstrate the additional functionality gained if the distributed database is capable of computation. With this simple addition, data can be sent, transformed, and made available to any entity in the ecosystem. In the context of a hybrid CFD+ML workflow, this database can be used an inference server, and thus, the other entities do not require their own specialized hardware. SmartSim implements this distributed database using a Redis cluster that can be sharded over a number of nodes, as explained in next section. Communication clients, provided by the related SmartRedis library and embedded in the application, connect to this database on application launch. The embedding of these clients into OpenFOAM is discussed in more detail in Sect. 2.3.

2.2 SmartSim and SmartRedis

To enable the CFD+ML use cases described in this work, two libraries are used together with OpenFOAM: SmartSim [3, 7], which sets up and runs the workflow’s computational infrastructure, and SmartRedis, a client to Redis databases used to exchange data across workflow applications and to run Machine Learning (ML) models and other processing functions on such data.

SmartSim is an open-source Python library developed with the main goal of simplifying the orchestration and deployment of modern High Performance Computing (HPC) workflows mixing ML and numerical simulations. A standard SmartSim-orchestrated experiment is defined and executed through a Python driver script. Users define representatives of their applications (usually a simulation) and the AI-enabled database through the use of Model and Orchestrator objects respectively. A SmartSim Model represents the execution of an application, and therefore mainly consists of the name of the executable and a list of arguments to pass to it. A SmartSim Orchestrator object represents an instance of a Redis database, possibly spanning multiple system nodes. The orchestrator is used by models launched through SmartSim to upload, retrieve, and exchange data and to execute ML models on stored data. The ML management and execution features are achieved through the RedisAI module, which is automatically loaded by SmartSim when the Orchestrator is deployed.

This same driver script is used to define the order of execution of the different components, which – if enough resources are available to the user – can also be started in groups and run concurrently. To launch entities, SmartSim interacts with the system, usually by means of the available workload manager. Details about the entity execution, such as computational resources, environment variables, and constraints can be specified by the user and are converted by SmartSim into the corresponding system-dependent options. During the execution of the workflow, SmartSim interfaces with the workload manager to monitor the launched entities, allowing users to get real-time status updates.

In a SmartSim experiment, applications communicate with orchestrators through SmartRedis, an open-source Redis client library offering multi-language support developed together with SmartSim. As an example, two applications can exchange data using an orchestrator as broker. Each application can upload its own data to the orchestrator and retrieve data uploaded by the other application. SmartRedis is also used to request stored data to be processed in place through ML models or other post-processing functions. All data obtained as the result of a call to SmartRedis immediately become available to all applications launched in the same experiment.

From a software development standpoint, SmartRedis is lightweight and minimally invasive: only a few calls to SmartRedis functions need to be added to existing application code to allow the integration in the orchestrated workflow. SmartRedis can be used to instrument applications written in C, C++, Fortran, and Python. SmartSim and SmartRedis have been tested on CFD-like parallel applications running at scale on state-of-the-art supercomputers. Results from these applications and discussions about the performance of these CFD+ML applications can be found in references [7,8,9]. These integrations are more easily done when the source code is user-modifiable (as in the case of OpenFOAM) or if the application has defined ways of injecting custom code (e.g. via user defined functions or plugins).

The client/orchestrator architecture provided by SmartSim is uniquely suited for high performance computing systems which have heterogeneous nodes, i.e. some nodes may have accelerators whereas other nodes are traditional CPU-only nodes. If the ML backends were linked directly into OpenFOAM, every node that the AI component of OpenFOAM was run on would expect to have a GPU. By decoupling the ML component and using the Orchestrator as an inference engine and/or an in-memory cache for training applications, GPU resources can be limited to only the portion of the workflow that absolutely require them. Additionally, for CPU-based simulation codes like OpenFOAM, the entire AI workload can be handled by a relatively small amount of GPUs because the AI model sizes tend to be small [7, 9]. By scaling the database to the workload, users on these environments can ensure high utilization of the GPU resources.

2.3 Communication and signaling

The loose coupling of CFD+ML algorithms using OpenFOAM and SmartSim, enables fast design and deployment to problem-specific data-driven asynchronous and synchronous workflows. Consider, for example, Fig. 1, that contains a CFD+ML workflow developed with OpenFOAM and SmartSim for live post-processing of CFD results using a generic ML model. A concrete example of this workflow is described in Sect. 4.3 using a distributed version of the SVD applied to CFD results from OpenFOAM. As shown in Sect. 4.3, an OpenFOAM solver or a function-object communicates OpenFOAM fields to the SmartRedis database. A SmartSim implementation of the ML workflow fetches OpenFOAM fields and trains the ML model to approximate the fields. Both the simulation loop and approximation loop iterate at user-defined frequencies. The simulation loop may iterate over time steps or pseudo time steps, while the approximation loop may iterate over one or more snapshots of flow fields.

The nature of the modelled problem defines the signaling frequency between the simulation loop and the approximation loop. The signaling is achieved by storing, deleting and polling flags in the SmartSim database. In this context, polling information from the database is defined as performing repeated queries in user-defined time intervals and maximal number of attemps, until the polled information becomes available. Polling is necessary because SmartRedis internally manages and balances its input/output operations, that are generally asynchronous. Asynchronicity in this sense means that when a client writes to SmartRedis, it will request a write operation - there is no way for us to know when exactly the data will be actually written. Since each data object in SmartRedis has a name (i.e. a key), other clients can very efficiently check if a key is available, and only fetch data once a key becomes available, i.e. poll the database for the key. This data-centric approach is at the core of our workflow, replacing numerous, complicated and inneficient client-to-client communications on heterogeneous hardware with a database-centric poll/get/put operations. In Fig. 1, the CFD algorithm communicates at the end of the simulation an appropriate flag to the ML algorithm through SmartRedis. In a batch-mode ML approximation loop, the signal would notify the ML algorithm that a sufficient number of simulation loop iterations have passed and the time has come for the ML algorithm to approximate batched OpenFOAM fields stored in SmartRedis.

Fig. 1
figure 1

Online ML approximation of CFD results from OpenFOAM in SmartSim. Full lines represent transitions between algorithm steps, dashed lines are database operations

A bidirectional, loosely-coupled CFD+ML workflow requires more complex communication and signaling between the CFD and ML algorithms, as shown in Fig. 2. The approximation of displacements of the unstructured finite volume mesh using an ANN is one such example of this workflow (discussed in Sect. 4.4). In this case, the ANN must be updated on with new data every time step in the CFD simulation, called for inference with fields from the current state of the model, and then retrieved to realize the actual movement of hte mesh.

Figure 2 contains a schematic representation of a generic bidirectional CFD+ML workflow, using objects and concepts from OpenFOAM, SmartSim, and SmartRedis. The CFD algorithm communicates CFD data to the SmartRedis database in the form of SmartRedis tensors. A SmartSim entity implements the ML training algorithm (which also contains a SmartRedis client), obtaining the tensors needed to train the ML model. Once the ML model has been trained on CFD data, it is stored in the SmartRedis database. Concurrently with these operations, each MPI rank of the CFD algorithm (when run in parallel) sends the data for forward inference to SmartRedis and queries the SmartRedis database for an ML model availability flag. Once the ML model availability flag has been set by the ML client of the SmartRedis database, the CFD algorithm requests the forward inference within SmartRedis. Tensors resulting from the forward inference in SmartRedis are obtained by the OpenFOAM CFD client and used in a simulation. As in Fig. 1, the bidirectional CFD+ML algorithm from Fig. 2 synchronizes the approximation loop and the CFD loop using a flag for the end of the simulation, stored in SmartRedis by the CFD algorithm, and polled for in the SmartRedis database by the SmartSim client.

Fig. 2
figure 2

A bidirectional CFD+ML algorithm with OpenFOAM and SmartSim. PDE solution block is ommitted in the CFD algorithm for brevity, full lines represent transitions between algorithm steps, dashed lines are database operations

3 OpenFOAM and SmartSim integration

The loosely coupled, CFD+ML workflows described in Sect. 2 significantly simplifies implementing the integration of SmartSim and OpenFOAM. The SmartSim and SmartRedis application programming interfaces (API) are very concise, because loose coupling requires an orchestration of OpenFOAM simulations with blocking or non-blocking behavior (cf. Figs. 1 and 2), sending and receiving tensors to and from SmartRedis, forward inference of ML models in SmartRedis, and writing and polling signaling flags in the SmartRedis database (more information can be found in the SmartSim documentation [10]). These tasks can be achieved by using the C++ client to communicate with the SmartRedis database. An element of the OpenFOAM CFD algorithm that takes part in communication and signaling described in Sect. 2.3 links to the smartredis library, opens a communication channel to SmartRedis, and uses the SmartRedis C++ API [11] to perform above the mentioned tasks.

Contrary to the classical CFD workflow, where OpenFOAM solvers are independently started in simulations, the loosely-coupled CFD+ML workflow introduces dependencies to OpenFOAM in the form of the database and the ML algorithm. The interaction between OpenFOAM, SmartSim, SmartRedis, and the ML algorithm is governed by the SmartSim driver script implemented in Python, which implements the overall workflow, driving OpenFOAM simulations and ML model training as SmartSim Models.

The following section describes a generalization of the communication between OpenFOAM and SmartRedis, by implementing the integration of SmartRedis in an OpenFOAM function object. OpenFOAM function objects are elements of OpenFOAM that enable user-defined calculations within a simulation, which can be attached to any OpenFOAM solver at runtime via configuration files, without modifying the solver [12].

3.1 Integrating SmartRedis in an OpenFOAM function object

3.1.1 Standardizing OpenFOAM-SmartRedis interactions

This section describes an API for interfacing OpenFOAM with the SmartRedis in-memory database. The API is organized into three layers, which support varying levels of control over the interaction with the SmartRedis database.

Most of the core API is implemented through the smartRedisClient class which handles establishing connections, executing queries, and reading/writing data between OpenFOAM and SmartRedis. The class aims to enable communication for online ML workflows and simplify OpenFOAM data aggregation on the SmartRedis side by setting conventions for naming database tensors and datasets. These naming conventions include fields for the timestep (needed for transient simulations) and for a domain identifier (needed for distributed cases). Additionally, the methods provided here handle both internal and boundary portions of OpenFOAM’s geometric fields [12].

The first API layer provides high-level services for common workflows on OpenFOAM’s geometric fields. This includes methods like the sendGeometricFields, illustrated in listing 1, which, by default, packs the field’s internal data and sends it to the SmartRedis database. Notably, the service API aims to simplify integrating SmartRedis into OpenFOAM workflows by avoiding both OpenFOAM-specific and SmartRedis-specific types from method interfaces. Only primitive types like strings and booleans appear in the method’s interface.

Listing 1
figure a

Example method from the service API

The second layer is useful when special development is needed and considers SmartRedis Dataset objects as its building block. All methods from the developer layer do not interact directly with the SmartRedis database; instead, they handle a Dataset reference passed in as a first argument. All database interactions, including querying, sending, and receiving operations to get the Dataset object, need to happen prior to calling methods from this layer. To illustrate, listing 2 showcases the declaration of the packFields method template, which takes a Dataset object to pack fields into, as well as the names of target fields and boundary patches. Note that methods from this layer are specific to field types, hence the extensive use of templates.

Listing 2
figure b

Example method from the developer API

The third API layer facilitates generic interactions with the Database, which are intended as fallback methods, by operating on OpenFOAM List objects as SmartRedis tensors. Methods from this layer offer a consistent approach to serialize and deserialize OpenFOAM List objects to and from SmartRedis tensors, maintaining relevant tensor dimensions, and storing them under a contiguous memory layout. The generic templates ensure conformity to the naming conventions set by the smartRedisClient class, even for interactions not fully managed by the class.

Listing 3
figure c

Example method from the generic API

Key benefits of implementing such API standards include:

  • Enabling online ML workflows where OpenFOAM and SmartRedis interact at runtime as the effort required to get field data from OpenFOAM is minimized.

  • Hiding the implementation details of the SmartRedis database from the OpenFOAM user while supporting lower-level methods for non-standard interactions.

  • Putting the API user in control of the data aggregation process on the SmartRedis side, which is a key component of online ML workflows.

3.1.2 A function object for interacting with SmartRedis

As a direct application for the service API described in Sect. 3.1.1, an OpenFOAM functionObject, named fieldsToSmartRedis, is provided to handle the task of sending portions of OpenFOAM fields to a SmartRedis database. The development of such function objects is greatly simplified by inheriting from the smartRedisClient class. The developer only needs to call sendGeometricFields(fieldNames, patchNames); in the execute method which executes at the end of each time step. The fieldNames and patchNames are lists of strings defined by the user in their OpenFOAM case to denote the target OpenFOAM fields and the desired boundary patches, respectively. For all purposes, "internal" is considered a special boundary patch that refers to the internal field.

The proper usage of such function objects to send pressure, velocity, and face flux data for both the internal field and the inlet patch, is shown in listing 4, where both field names and target patches can be selected dynamically by the user. A tutorial case showcasing how this function object can be used is provided in the associated repository.

Listing 4
figure d

Usage example for fieldsToSmartRedis functionObject in the case’s controlDict dictionary

Fetching the data in the ML code requires knowing the naming convention, echoed back by the function object itself, as shown in listing 5. The Jinja2 templating engine [13] can be used to hot-replace the placeholders with their desired values in the ML code if needed. To illustrate, the SmartRedis database needs to be queried for the pUPhi_time_index_0_mpi_rank_0.field_name_patch_inlet tensor if the user is looking for the inlet data of the pressure field from MPI rank 0 at the first time index.

Listing 5
figure e

A sample naming convention for the pUPhi function object as configured in listing 4

Another important component of the function object is the metadata dataset that is automatically submitted to the SmartRedis database. This special dataset will be named pUPhi_metadata for the example in listing 4, and is intended to handle metadata information that the master process wants to communicate to the ML code. These metadata are case-specific, but they will always include the naming conventions for datasets and fields. To illustrate, listing 6 shows how this metadata dataset can be exploited to derive corresponding tensor names for fields in a generic and automated way through Python scripting. Note that in case SmartSim is running an ensemble-based model for parameter variation, data can be fetched from the database for a specific ensemble member programatically using the set_data_source method on the SmartRedis client for the name returned by get_field_name from listing 6 to be a valid database key.

Listing 6
figure f

Process the naming conventions posted by the function object to derive the actual tensor names on the database

4 Example use cases

This section presents three separate use cases that emphasize the novel workflows that can be achieved using the SmartSim/OpenFOAM integration described above. The first case is an example of the AI-around-the-loop paradigm (where the AI algorithm interacts with the simulation on startup and finalization). Bayesian optimization is applied to ensembles of OpenFOAM simulations to identify specific values for tuneable parameters that maximize a desired metric. The second case implements a distributed, singular value decomposition algorithm as an example of online-analysis (analysis is performed on data streamed from the simulation in contrast to post-hoc analysis). The third case demonstrates an example of AI-in-the-loop (the AI technique is embedded as part of the solver). A neural network is first trained on boundary displacements and then immediately used to query displacements on the interior portion of the mesh

4.1 Data and software

Example use cases have been developed with OpenFOAM (version 2312 [2]), SmartRedis (version 0.5.2 [4]), and SmartSim (version 0.6.2 [3]). Both the software implementation and the starting files for the use case examples are publicly available on GitHub [5], as well as an archive of the project used to generate data in this manuscript [6].

4.2 Parameter tuning using ensembles of simulations

Many simulations have tunable coefficients that affect their subgrid-scale parameterizations and numerics. Finding optimal sets of parameters that maximize a certain goodness of fit is directly analogous to hyperparameter optimization when training ML models. Like the ML problem, this problem is characterized by a large search space of N dimensions (one for each tunable parameter) and having an objective function that is computationally costly (in ML, this is the cost of training the model; for simulation, this is the cost of running a model to completion).

Bayesian optimization (BO) attempts to mitigate both these issues by building a statistical model of the underlying objective function [14]. The typical BO loop involves creating a prior distribution for the objective function, evaluating the function at select points, then using the results of the evaluation to construct a posterior distribution which can be used to generate the next set of query points. This has two primary advantages: 1) it does not need the derivatives of the actual function being evaluated, and 2) information from nearly any function evaluation improves the overall model. The former is particularly important for some numerical models for which an adjoint is not available. The latter is pertinent as well because shortening the time-to-solution is often difficult, but scaling the number of simulations performed in parallel is a simpler task.

The above features make BO a powerful technique for optimization of CFD problems because the solver can be treated as a blackbox. In this example, which maps onto an AI-around-the-loop paradigm), we apply BO to optimize the turbulence parameters of a Reynolds-averaged Navier–Stokes simulation (RANS) of a combustion chamber [15]. The problem geometry is 2D with a backward facing step near the inlet and a tapered nozzle near the outlet. Boundary conditions are fixed such that at the inlet: \(u_{inlet}\) = 10 m s\(^{-1}\) with zero pressure gradient whereas at the outlet both the velocity gradient and the pressure at the outlet are zero. The implementation of this case using OpenFOAM’s steady-state, incompressible solver adds on the \(k-\epsilon\) turbulence closure. The default setup converges within 6 s on a single core of an AMD EPYC 7763 processor.

The \(k-\epsilon\) turbulence closure traditionally has five free parameters. Based on physical arguments, this set can be reduced to three parameters [16], \(C_\mu\), \(C_1\), and \(C_2\). The last free parameter is the dissipation rate \(\epsilon\) at the inlet.

Given these four free parameters, the last component needed to define the optimization problem is the objective function itself. For simplicity, we make the same choice as in [16] to evaluate the pressure difference between the outlet and the inlet. Note that because the pressure at the outlet is fixed to be zero the pressure difference solely depends on \(P_{inlet}\). The reference value for the large-scale, density-normalized pressure was computed from a large-eddy simulation (LES) to be about 1.9 m\(^2\)s\(^{-2}\) (averaged over the last 0.1s of a 0.2s simulation. This simulation took about 41 min on 8 cores of an AMD EPYC 7763 processor.

The minimization problem can thus be fully defined as

$$\begin{aligned} \begin{aligned} \min _{\epsilon , C_\mu , C_1, C_2} \quad&\left[ P_{inlet}^{RANS}(\epsilon , C_\mu , C_1, C_2) - 1.9\right] ^2 \\ \text {s.t.} \quad&2.97< \epsilon< 74.28 \\&0.05< C_\mu< 0.15 \\&1.1< C_1< 1.5 \\&2.3< C_2 < 3.0. \end{aligned} \end{aligned}$$
(1)

To demonstrate how to solve this optimization problem, we employed the Bayesian Optimization implementation as provided by Scikit-Optimize (version 0.8.1) [17]. This implementation provides an ask-tell interface: during the ask phase, the optimizer is queried for the points within the parameter space that should be explored; during the tell phase, the results of these evaluations are sent to the optimizer to update the Bayesian prior. The problem is initialized with the default values of the coefficients (Table 1) which yields an initial error (as compared to the LES simulation) of 3.49 m\(^2\)s\(^{-2}\). We then begin the optimization loop where each iteration retrieves and evaluates five sets of the four parameters. This step takes advantage of SmartSim’s ability to configure, launch, manage, and collect output from ensembles. The results are then given to the ‘tell’ portion’s optimizer to update the Bayesion prior in preparation for the next optimization loop. After 10 iterations (50 total simulations) of the Bayesian Optimization loop, the optimal set of coefficients 1 yields an error of only 0.01 m\(^2\)s\(^{-2}\).

The most dramatic change between the initial and final set of parameters (Table 1) is the boundary condition for the \(\epsilon\) which nearly doubles. Generally, this value is determined via a length scale argument so a factor of two is not a priori unreasonable. The coefficients of the \(k-\epsilon\) model themselves fall within the range of probable values that reasonably fit previous experimental data [18].

Table 1 Initial and optimal values for the free parameters related to the \(k-\epsilon\) parameterization within the Pitz and Daily example case
Fig. 3
figure 3

The magnitude of the velocity from the large-eddy simulation (LES) of the Pitz and Daily case (a), the Reynolds-Averaged Navier Stokes using the default \(k-\epsilon\) coefficients (b), and the optimal coefficients (c). Note that for the LES simulation, the plotted field is the large-scale U that has had the eddy components filtered out

Comparing the output of the simulations shows improvements in some of the characteristics when using the ‘optimal’ parameters from the Bayesian optimization routine 3. When using the \(k-\epsilon\) parameters from above, the recirculation zone in the bottom left portion of the domain does not stretch as far to the right as compared to the RANS case using the default \(k-\epsilon\) coefficients. Similarly, in this simulation, the inlet jet tapers off earlier in the domain using the optimal coefficients. Combined with the higher boundary condition for \(\epsilon\) prescribed by the Bayesian optimizer, this suggests that the \(k-\epsilon\) model coefficients do not dissipate enough of the mean flow for this problem.

While tuning unknown parameters is a useful demonstration of Bayesian optimization, many other use cases can be cast within this framework. For example, an emerging use case in aerodynamics is using an ML algorithm to optimize the shape of the hull to optimize some desired quantity, e.g. drag over a hull or lift from an airfoil. Additionally, data assimilation and/or the development of control schemes also can take advantage of this type of ensemble-based exploration and optimization.

4.3 Partitioned singular value decomposition

SVD is a fundamental algorithm for analyzing and modeling fluid flows [19]. The flow’s state at time \(t_n = n\Delta t\) may be expressed as vector \({\textbf{x}}_n \in {\mathbb {R}}^M\). Based on the cell-centered finite volume method implemented in OpenFOAM, \({\textbf{x}}_n\) may be composed of the cell-centered values of one or multiple fields, i.e., velocity, pressure, temperature, etc. A suitable normalization should be applied before building the state vector when mixing fields with different units [20]. The components of vector or tensor fields are then concatenated. For example, the velocity field on mesh with \(N_p\) cells yields a state vector of length \(M=3N_p\). Given a time series of N states, the state vectors are then arranged into a data matrix:

$$\begin{aligned} {\textbf{X}} = \left[ {\textbf{x}}_1, {\textbf{x}}_2,\ldots , {\textbf{x}}_N\right] ^T, \end{aligned}$$
(2)

such that \({\textbf{X}}\in {\mathbb {R}}^{M\times N}\). For typical CFD simulations, the data matrix is tall and skinny, meaning that \(M\gg N\). The economy SVD of the data matrix is a factorization of the form [19]:

$$\begin{aligned} {\textbf{X}} = \mathbf {U\Sigma V}^T, \end{aligned}$$
(3)

with \({\textbf{U}} \in {\mathbb {R}}^{M\times N}\), \(\mathbf {\Sigma } \in {\mathbb {R}}^{N\times N}\), and \({\textbf{V}} \in {\mathbb {R}}^{N\times N}\). The column vectors of \({\textbf{U}}\) and \({\textbf{V}}\) form optimal orthogonal bases spanning the column and row space of \({\textbf{X}}\), respectively. In this context, optimality is defined as the rank-r approximation to \({\textbf{X}}\) in the least-squares sense [19]:

$$\begin{aligned} \underset{{\textbf{X}}_r \text { s.t. } \textrm{rank}({\textbf{X}}_r)=r}{\textrm{argmin}} || {\textbf{X}} - {\textbf{X}}_r ||_2^2 = {\textbf{U}}_r\mathbf {\Sigma }_r{\textbf{V}}^T_r, \end{aligned}$$
(4)

with \({\textbf{U}}_r \in {\mathbb {R}}^{M\times r}\), \(\mathbf {\Sigma }_r \in {\mathbb {R}}^{r\times r}\), and \({\textbf{V}}_r \in {\mathbb {R}}^{N\times r}\). The SVD’s properties have several important applications in fluid mechanics:

  1. 1.

    Data reduction: flows often exhibit coherent structures; in that case, an accurate reconstruction \({\textbf{X}}_r\) is possible with \(r\ll N\); storing the truncated SVD, i.e., \({\textbf{U}}_r\), \(\mathbf {\Sigma }_r\), and \({\textbf{V}}_r\), requires significantly less space than storing \({\textbf{X}}\).

  2. 2.

    Flow analysis: due to the arrangement of states in \({\textbf{X}}\), the SVD separates variation in space, i.e., the column vectors in \({\textbf{U}}_r\), from variation in time, i.e., the column vectors of \({\textbf{V}}_r\); this modal decomposition aids the understanding of complex flows.

  3. 3.

    Reduced-order modeling: the column vectors of \({\textbf{V}}_r\) form a low-dimensional basis that is ideally suited to create time series models; multiplying the predictions made by such models with \({\textbf{U}}_r\mathbf {\Sigma }_r\) yield a prediction of the full flow state.

While tremendously useful, memory consumption is a major challenge in the application of SVD to fluid-mechanical problems. Consider a case with moderately sized mesh consisting of \(N_p=10^6\) control volumes and \(N=1000\) snapshots of velocity and pressure have been saved with double precision. Constructing the corresponding data matrix requires approximately 30GB of memory. Computing the economy SVD requires at least an additional 30GB. Even at relatively modest scales, his computation could quickly outrun the memory of commodity workstations or compute nodes. A distributed memory parallel SVD implementation, e.g., the MPI-based variant available in ScaLAPACK [21], provides one workaround to handle the memory limitation. However, such implementations are not very well accessible in OpenFOAM or the typical ML ecosystem. Moreover, applying the SVD to state-of-the-art simulations in the automotive or aerospace sector can easily exceed the memory of an entire cluster.

The present work demonstrates a generic split-and-merge approach to compute the SVD. Specifically, we adopt the partitioned SVD by Liang et al. [22]. First, the data matrix is split into S partitions of approximately equal size:

$$\begin{aligned} {\textbf{X}} = \left[ {\textbf{X}}_1, {\textbf{X}}_2, \ldots , {\textbf{X}}_S \right] ^T. \end{aligned}$$
(5)

Each partition has exactly N columns, but the number of rows may vary. The domain decomposition in OpenFOAM provides a natural partitioning. However, we note that, at least in principle, the number of partitions could be larger or smaller than the number of sub-domains. Next, one SVD is computed for each partition:

$$\begin{aligned} {\textbf{X}}_i = {\textbf{U}}_i\mathbf {\Sigma }_i{\textbf{V}}^T_i,\quad i \in \left\{ j\in {\mathbb {Z}} | 1 \le j \le S \right\} . \end{aligned}$$
(6)

Each SVD computation may be performed in parallel or sequentially, depending on the available resources. Two additional steps are necessary to merge the domain-specific SVDs. First, all products \(\mathbf {\Sigma }_i{\textbf{V}}^T_i\) are concatenated into a new matrix:

$$\begin{aligned} {\textbf{Y}} = \left[ \mathbf {\Sigma }_1{\textbf{V}}^T_1, \mathbf {\Sigma }_2{\textbf{V}}^T_2, \ldots , \mathbf {\Sigma }_S{\textbf{V}}^T_S \right] ^T, \end{aligned}$$
(7)

such that \({\textbf{Y}}\in {\mathbb {R}}^{SN\times N}\). Note that \({\textbf{Y}}\) is relatively easy to handle, even if N and S are both \(O(10^3)\). By computing the SVD of \({\textbf{Y}}\), i.e., \({\textbf{Y}}={\textbf{U}}_y\mathbf {\Sigma }_y{\textbf{V}}^T_y\), the SVD of \({\textbf{X}}\) can be recovered. Let \({\textbf{U}}_{yi}\) denote the sub-matrix of \({\textbf{U}}_y\) formed by the rows \(j \in \left\{ k \in {\mathbb {Z}} | (i-1)S \le k \le iS \right\}\). Then the SVD of \({\textbf{X}}\) is given by:

$$\begin{aligned} {\textbf{U}}&= \left[ {\textbf{U}}_1 {\textbf{U}}_{y1}, {\textbf{U}}_2 {\textbf{U}}_{y2}, \ldots , {\textbf{U}}_N {\textbf{U}}_{yN}\right] ^T,\end{aligned}$$
(8)
$$\begin{aligned} \mathbf {\Sigma }&= \mathbf {\Sigma }_y,\end{aligned}$$
(9)
$$\begin{aligned} {\textbf{V}}&= {\textbf{V}}_y. \end{aligned}$$
(10)

As before, the matrix multiplications in Eq. (8) may be computed in parallel or sequentially. Moreover, it is not necessary to store the full SVD. Instead, we can analyze the singular values on the diagonal of \(\mathbf {\Sigma }\), determine a suitable value for r, and keep the first r columns of each matrix. Finally, we note that the rank-r reconstruction to \({\textbf{X}}\) can also be performed independently for each partition (sub-domain), i.e., \({\textbf{X}}_{ri} = {\textbf{U}}_i{\textbf{U}}_{yi}\mathbf {\Sigma }_y{\textbf{V}}^T_y\).

In the complementary code repository, we apply the partitioned SVD to the laminar flow past a cylinder at Reynolds number \(Re=d U_\textrm{in}/\nu = 100\) (d - cylinder diameter, \(U_\textrm{in}\) mean inlet velocity, \(\nu\) - kinematic viscosity). Details about the setup of this classic problem are given by Schäfer et al. [23]. Reference results for the modal decomposition of such a flow can be found in [19]. The setup is comparatively simple and serves as a playground for development and testing. On an abstract level, our implementation is built as follows:

  1. 1.

    Driver program: a Jupyter notebook with a Python kernel serves as the driver to start the SmartRedis database and to execute the steps listed below; to run OpenFOAM applications, we use SmartSim’s Experiment feature.

  2. 2.

    Data transmission: during the simulation, snapshots are directly written to the database by employing the fieldsToSmartRedis function object described in Sect. 3.1.2; only the fields of the last available write time are kept on the disk in case a restart is required.

  3. 3.

    Domain-specific SVDs: for each sub-domain i of the mesh, a parametrized Python script assembles the i-th data matrix from the database and computes the SVD using standard NumPy operations; the resulting matrix factorization is written back to the database; we employ the SmartSim Ensemble feature to run the script for all S sub-domains.

  4. 4.

    SVD of \({\textbf{Y}}\): assembling the \({\textbf{Y}}\) matrix (7) and computing its SVD is little effort, which is why we perform this computation in the driver notebook and write the result to the database.

  5. 5.

    Global SVD: in a dedicated Python script, the global \({\textbf{U}}\) matrix (8) is evaluated individually for each sub-domain; once the domain-specific computation is complete, the corresponding \({\textbf{U}}_i\) matrix is deleted from the database; for visualization, we also save the rank-r reconstruction of the snapshots to the database; as before, the script is executed employing the Ensemble feature.

  6. 6.

    Transfer of results: in the final step, the driver program executes the svdToFoam utility, which converts the reconstructed snapshots and the first r column vectors of \({\textbf{U}}\) to OpenFOAM fields; after the conversion, common OpenFOAM post-processing workflows can be applied, e.g., visualization in ParaView, conversion to VTK, execution of function objects, etc.

In the example, we build the state vector based on the velocity field. Hence, each state vector has a length of \(M=3N_p\). When converting the left singular vectors \({\textbf{u}}_j\) (the column vectors of \({\textbf{U}}\)) back to OpenFOAM fields, each vector is reshaped such that \({\textbf{u}}_j \in {\mathbb {R}}^{N_p \times 3}\) for visualization. Figure 4 shows the first 4 left singular vectors computed based on 400 snapshots. In the complimentary repository, we also provide reconstruction errors obtained with varying values of r.

As a final remark for this example, we note that Liang et al. [22] also provide a streaming version of the partitioned SVD employed here. Our implementation can be adjusted with relatively little effort to process the snapshots online as they become available in the database. We refer to algorithm 3 in [22] for more details about the streaming variant.

Fig. 4
figure 4

Comparative ParaView visualization of the first 4 left singular vectors (columns of \({\textbf{U}}\)); each subplot shows the magnitude of the respective vector field

4.4 Mesh motion using artificial neural networks

4.4.1 Problem definition

In this example use-case, we demonstrate how to leverage SmartSim to train an ML model and perform inference at every time step in an OpenFOAM simulation on CFD data, effectively implementing the bidirectional, AI-in-the-loop, CFD+ML workflow from Fig. 2, for mesh motion, shown for a simplified example in Fig. 5.

The mesh starts with an initial state Fig. 5a, deforms into an extreme state from Fig. 5b, then returns back to the initial state. This simplified use-case mirrors mesh motion, widely used in CFD applications involving fluid-solid interaction or shape optimization.

OpenFOAM implements mesh motion following its modular philosophy [12], making it possible to extend mesh motion with ML by developing a new mesh motion solver that uses an Artificial Neural Network (ANN) to approximate non-uniform mesh displacements, e.g., from Fig. 5.

Fig. 5
figure 5

Mesh deformation using Artificial Neural Networks

The mesh motion problem is defined by the motion of the solution-domain boundary \(\partial \Omega\), given either by boundary displacements \({\textbf{d}}|_{\partial \Omega }\) or velocity \({\textbf{v}}|_{\partial \Omega }\). In our example, shown in Fig. 6, we use boundary displacements \({\textbf{d}}|_{\partial \Omega }\) with no loss of generality. The internal cylinder boundary rotates with the amplitude of \(30^\circ\) over 2 seconds of simulation time.

Fig. 6
figure 6

Mesh motion boundary conditions on domain-boundary \(\partial \Omega\)

To move the mesh points, mesh motion requires displacements of internal mesh points, i.e. \({\textbf{d}}({\textbf{x}}), {\textbf{x}} \in \Omega {\setminus } \partial \Omega\), given \({\textbf{d}}|_{\partial \Omega }\), such that a minimal increase in discretization errors is introduced by distorting the mesh. We can define mesh motion M generally as

$$\begin{aligned} M: {\mathbb {R}}^3 \rightarrow {\mathbb {R}}^3, \end{aligned}$$
(11)

generating bulk displacements

$$\begin{aligned} {\textbf{d}}({\textbf{x}}) = M({\textbf{x}}, {\textbf{d}}|_{\partial \Omega }). \end{aligned}$$
(12)

For example, in OpenFOAM, there are mesh motion ‘engines’ (displacement maps), which define M as an interpolation or a solution of a Laplace equation for \({\textbf{d}}\) with variable diffusivity, i.e. the Laplacian mesh motion, given boundary displacements \({\textbf{d}}|_{\partial \Omega }\).

4.4.2 Mesh motion in OpenFOAM

In this example, we leverage SmartSim to implement M as an ANN approximator of \({\textbf{d}}({\textbf{x}}), {\textbf{x}} \in \Omega {\setminus } \partial \Omega\), given boundary displacements \({\textbf{d}}|_{\partial \Omega }\).

Since OpenFOAM’s parallel implementation relies on domain decomposition and MPI, boundary displacements are decomposed onto both boundary patches and over MPI ranks r. The decomposition of the domain boundary \(\partial \Omega\) into boundary patches is necessary for applying different boundary conditions for systems of partial differential equations (PDEs) solved in OpenFOAM. The decomposition of the boundary \(\partial \Omega\) into patches (i.e. the cylinder and outer wall in Fig. 6), the domain decomposition over \(N_r\) MPI ranks, and the time integration of the simulation over discrete time steps \(\Delta t = t^{n+1} - t^n\), results in boundary displacements being split into sequences

$$\begin{aligned} {\textbf{d}}|_{\partial \Omega }(t^n) = \bigcup _{p \in 1, \dots , N_{\partial \Omega }}\bigcup _{r \in 1, \dots , N_r} {\textbf{d}}^n_{p,r}, \end{aligned}$$
(13)

with \({\textbf{d}}^n_{p,r}:={\textbf{d}}_{p,r}(t^n)\) denoting displacements of the \(p-th\) boundary patch (out of \(N_{\partial \Omega }\) patches), of the MPI rank r at time \(t^n\). Boundary points are equivalently decomposed into

$$\begin{aligned} {\textbf{p}}|_{\partial \Omega }(t^n) = \bigcup _{p \in 1, \dots , N_{\partial \Omega }}\bigcup _{r \in 1, \dots , N_r} {\textbf{p}}^n_{p,r}. \end{aligned}$$
(14)

Note that it is quite possible that \({\textbf{d}}^n_{p,r}, {\textbf{p}}^n_{p,r} = \emptyset\) (i.e. empty patches) if the domain decomposition results in the MPI rank r not containing any part of the p-th boundary patch (e.g., the rank does not contain a part of the cylinder from Fig. 6). This seemingly obvious case must be handled explicitly when exchanging boundary patch data with SmartRedis in the workflow (cf. Fig. 2), as storing and reading zero-sized OpenFOAM fields in the database does not make sense, and creates unnecessary overhead. It is also important to note that mesh motion generally does not apply Neumann-type boundary conditions for the displacements at \(\partial \Omega\); it uses Dirichlet-type (i.e., fixed-value) boundary conditions. The displacement map M from eqs. (11) and (12) implemented in this CFD+ML algorithm as an ANN, views the Dirichlet (fixed value) boundary conditions simply as unstructured training data with boundary mesh points \({\textbf{p}}|_{\partial \Omega }(t^n)\) as features and the boundary mesh displacements \({\textbf{d}}|_{\partial \Omega }(t^n)\) as the labels of the ANN model of M. The decomposition given by eqs. (13) and (14) is not necessary, as long as discrete boundary displacements \({\textbf{d}}^n|_{\partial \Omega }\) match the discrete boundary points \({\textbf{p}}^n|_{\partial \Omega }\). Although the boundary decomposition in patches is not relevant for our mesh-motion example, we keep it to demonstrate how it impacts the communication and signaling workflow from Fig. 2, in a general case where Neumann-type conditions might be necessary, e.g. for training a Physics-Informed Neural Network on OpenFOAM data with Neumann-type boundary conditions.

Algorithm 1
figure g

Machine-Learning Mesh Motion Algorithm in OpenFOAM

The boundary decomposition from eqs. (13) and (14) increases the complexity of the CFD+ML workflow discussed in Sect. 2.3 and shown in Fig. 2. The CFD part of the OpenFOAM+SmartSim CFD+ML algorithm for mesh motion is summarized in pseudocode by algorithm 1. Boundary points and displacements that are used to train the M as an ANN are aggregated over boundary patches for each MPI rank into SmartRedis datasets (\({\textbf{p}}^{n+1}_r\) and \({\textbf{d}}^{n+1}_r\)), and the datasets are appended to respective points and displacements dataset lists, \(L_{{\textbf{p}}}\) and \(L_{{\textbf{d}}}\). This is done to simplify queries for available data in SmartRedis. Each individual MPI rank writes OpenFOAM data as SmartRedis tensors into SmartRedis using non-blocking communication for efficiency reasons. Another client of the SmartRedis database, in this example case, the python-based, ML training algorithm programmed can easily query the size of the aggregation lists \(L_{\textbf{p,d}}\) until the target size \(N_r\) - number of MPI ranks - is reached and all data needed for the ANN training is available in SmartRedis. The OpenFOAM mesh motion solver, as a SmartRedis client, queries the SmartRedis database for the availability of M as an ANN, trained on rank-aggregated boundary points and displacements, \({\textbf{p}}^{n+1}_r\) and \({\textbf{p}}^{n+1}_r\), respectively. When M becomes available, mesh displacements are evaluated at mesh points. There is a peculiarity in OpenFOAM at this point: mesh points in OpenFOAM are not stored in a geometric field [12] that would make it possible to differentiate between internal and boundary points easily. After forward inference is performed on all mesh points in SmartRedis, algorithm 1 must ensure that only internal displacements are overwritten. If we overwrote \({\textbf{d}}^{n+1}|_{\partial \Omega }\), with approximated displacements, this would incur a loss of accuracy for exact displacements given by boundary conditions in OpenFOAM or more accurate displacements available as CFD boundary conditions in OpenFOAM. The overwriting of boundary conditions can be easily avoided in any other bidirectional CFD+ML algorithm, as is done here, by evaluating boundary conditions from OpenFOAM on fields inferred from an ML model.

One key aspect of this setup as well is that the only components of the workflow that benefit from having access to a GPU are the Redis database (for running inference) and the training script. The nodes which are running OpenFOAM can still be kept on CPU-only nodes, alleviating problems with scaling on heterogeneous clusters. This is crucial for ensuring that OpenFOAM simulations are not unnecessarily reserving high-value GPU hardware.

4.4.3 Approximating mesh-motion displacements in SmartSim

Algorithm 2
figure h

Mesh-Motion Displacement Approximation in SmartSim

Fig. 7
figure 7

Approximating mesh-motion displacements with an ML model whose architecture ensures a level of quality comparable to the Laplacian mesh-motion

The ML part of the bidirectional CFD+ML algorithm (cf. Fig. 2) for mesh-motion is outlined in algorithm 2. Algorithm 2 demonstrates the advantage of using SmartSim for developing CFD+ML algorithms in the Python programming language. Besides a few additional calls to the SmartRedis database using its Python API for obtaining training data and storing the trained model in SmartRedis, the rest of the algorithm contains a standard implementation of a training loop of an Artificial Neural Network.

We validate the example by comparing the non-orthogonality error [12] resulting from the mesh motion map M approximated by the ANN in our CFD+ML algorithm, with the existing Laplacian mesh motion solver in OpenFOAM, which solves a Laplace equation for the displacements, in Fig. 7. Non-orthogonality is a measure of mesh quality associated with face-centers; however, we visualize cell-averages of non-orthogonality angles. The hyperparameter tuning of the ANN used to model M determines its accuracy. In this example, we have manually found a set of hyperparameters, which deliver similar distributions of the cell-average non-orthogonality angle to the ones generated by the Laplacian mesh-motion solver. In a real-world scenario, hyperparameter tuning would be integrated into the workflow, e.g., using the Bayesian Optimization from Sect. 4.2. The described approach to mesh-motion using the loosely-coupled bidirectional CFD+ML algorithms can easily be extended to improve the quality of the deformed mesh w.r.t. mesh motion solvers in OpenFOAM, e.g., by tuning the ANN hyperparameters or implementing derivative constraints to mesh motion, both of which we will address in our future work.

Algorithms 1 and 2 show the advantage of the loosely-coupled CFD+ML algorithms in SmartSim - besides the communication and signaling, each entity in a loosely coupled CFD+ML algorithm retains much of its original complexity.

5 Summary and discussion

We present a straightforward computational framework for implementing CFD+ML algorithms, leveraging SmartSim’s Orchestrator for simplifying CFD+ML programming and the SmartRedis database for scalable data exchange. We simplify the already straightforward to use SmartSim and SmartRedis API for OpenFOAM by providing an implementation of an OpenFOAM function object for serializing the data exchange between OpenFOAM and SmartRedis. We provide highly heterogeneous example use cases, including Bayesian Optimization, Distributed Singular Value Decomposition, and mesh motion using artificial neural networks, highlighting the efficiency and potential of combining ML techniques with CFD simulations using SmartSim and OpenFOAM.

The publicly-available open-source implementation of the examples provides a starting point for both beginner and experienced OpenFOAM users to conceive and build complex concurrent CFD+ML algorithms using OpenFOAM and SmartSim. Work is currently underway to further expand these examples into new avenues of research applications. Additional work is planned to make these examples even more accessible. Lastly, this work will be submitted to become an OpenFOAM module so that continued development can be used by the community as a whole.