Machine learning (i.e., modern data-driven optimization and applied regression) is a rapidly growing field of research that is having a profound impact across many fields of science and engineering. In the past decade, machine learning has become a critical complement to existing experimental, computational, and theoretical aspects of fluid dynamics. In this short article, we are excited to introduce this special issue highlighting a number of promising avenues of ongoing research to integrate machine learning and data-driven techniques in the field of fluid dynamics. We will also attempt to provide a broader perspective, outlining recent successes, opportunities, and open challenges, while balancing optimism and skepticism.

In the field of fluid dynamics, there is an interesting parallel between the rise of machine learning in recent years and the rise of computational science decades earlier. Neither approach fundamentally changes the scientific questions being asked nor the higher-level objectives. Rather, both approaches provide sophisticated tools for analysis based on emerging technologies, enabling the community to address scientific questions at a greater scale and a broader scope than was previously possible. In the early years of computational fluid dynamics, there were voices of both extreme skepticism and open-ended optimism that these new approaches would supplant existing techniques. In reality, computational techniques have provided another valuable perspective for scientific inquiry, complementing more traditional approaches. It is therefore reasonable to believe that machine learning and data-intensive analysis will have a similar impact, complementing other well-established techniques to expand our collective capabilities.

Machine learning offers a wealth of techniques to discover patterns in high-dimensional data [5, 6], extending traditional modal expansions that have been a cornerstone of fluid dynamics for decades [26, 27]. Despite this great potential, it is important to recognize that these algorithms must be used properly, and that a single tool alone will not be equipped to address every task. The same factors that make data-driven and machine learning methods so appealing—namely, that they are relatively easy to use and do not require expert knowledge—can also serve as a potential downfall. Much as users of CFD software are cautioned against blind application of these numerical tools without proper knowledge, training, and verification/validation, we must adopt a similar philosophy for the use of data-driven and machine learning tools.

Many in the fluid mechanics community are asking why machine learning and data science techniques have received a surge in attention over the past few years. Indeed, many machine learning and data-driven methods were introduced and studied within fluid dynamics decades ago [6, 21]. Yet, these past studies did not benefit from more recent advances in machine learning, which have been driven by the present confluence of big data, high-performance computing, advanced algorithms, and considerable investment by industry, leading to open-source tools and benchmark datasets. One of the most significant milestones in the modern machine learning era is the rise of deep learning [13] and the achievement of human-level performance in several highly challenging tasks, such as image recognition [15] and in control [18, 22]. The success of these methods has been based primarily on large labeled data sets [9], open software for reproducible research, and powerful parallel computations in consumer GPU hardware. There have also been significant advancements from the data science community with eyes towards handling, analyzing, compressing, and transferring massively large data sets. Novel methods leveraging randomized numerical linear algebra, sparsity promoting norms, and efficient optimization techniques have become powerful enablers for the analysis of big data.

It is noteworthy that many of the enabling technologies in machine learning have deep roots in the computational fluid mechanics community [21]. Fluid dynamics researchers have pushed the frontiers of data storage and transfer capabilities, computational hardware, and scalable algorithms. As an example, consider that in the same year that snapshot proper orthogonal decomposition (POD) was introduced to the fluids community [23], the same algorithm was also introduced to the image sciences community to efficiently represent human faces [24], creating a major branch of modern data-intensive image analysis. In the past decade, machine learning has made great advances, driven by commercial successes in the fields of image sciences, natural language processing, and advertising. These approaches are now coming full-circle, and fluids researchers are increasingly leveraging these powerful techniques for the modeling and control of flow physics. What is exciting for the field of fluid mechanics is that these emerging approaches are sufficiently powerful to handle large data sets describing complex nonlinear dynamics that are commonly encountered in fluid flow analysis.

The use of machine-learning and data-science inspired approaches should be encouraged to solve problems in fluid dynamics, especially those that are difficult to solve with traditional methods. Many goals in fluid dynamics, such as analysis, modeling, sensing, estimation, design optimization, and control, may be posed as optimization problems. These problems are challenging because fluids are nonlinear and multiscale in space and time, resulting in high-dimensional and non-convex optimization landscapes. Fortunately, machine learning is improving our ability to tackle these traditionally intractable optimization problems. In addition to the critical tasks of benchmarking and validating new approaches with canonical problems, we must strive to use these emerging techniques to go beyond what is possible with existing techniques to reveal physical insights or improve analytical capabilities. Yet, it is important to understand that machine learning is not a panacea that will make domain researchers obsolete. Quite to the contrary, applying these techniques effectively requires considerable expertise in how to formulate the relevant optimization problem, in generating and curating the training data, in selecting or designing the relevant machine learning architecture, and in evaluating the results to extract correct physical insights.

Fluid dynamics, and physics in general, presents a unique set of challenges and opportunities for machine learning. Although fluids data are vast in some dimensions, it may be quite sparse in others. Let us also emphasize that data-driven techniques rely on the quality and the coverage of the training data, following the garbage-in garbage-out principle. Moreover, the coverage of the training data is critical to ensure robustness, as extrapolation is typically unreliable and may lead to catastrophic results. For these reasons, fluid flow data must be offered with reliability and accuracy to data-driven approaches. If fluid dynamics is to capitalize on these advances, we must embrace open-source software and open data, such as the Johns Hopkins Turbulence Database [20], the Stanford Center for Turbulence Research database [28], and the NOAA database [1]. The use of data-driven techniques for fluid dynamics should be solidly founded on the ability to conduct high-quality fluid mechanics research. The natural conclusion is that in the age of data-driven fluid dynamics [10, 21], the performance of high-quality numerical simulations and experiments is ever more important.

1 Summary of articles in this special issue

In light of the emerging developments and applications of machine learning and data-driven techniques in fluid mechanics, we have put together a collection of papers written by some of the active research groups in these areas. These invited papers are published in a double issue entitled Machine Learning and Data-Driven Methods in Fluid Dynamics in Theoretical and Computational Fluid Dynamics. Covered in these issues are a wide range of topics, including analysis, modeling, numerical algorithms, sensing and estimation, and control for fluid flows with data-driven techniques. We summarize these papers below.

Several papers in this issue use machine learning for the reduced-order modeling of fluid dynamic phenomena. Loiseau [16] has used modal decomposition and sparse regression to obtain a nonlinear reduced-order model of a thermosyphon flow. The thermosyphon is an annular flow driven by a thermal instability, and it exhibits chaotic oscillations. Loiseau showed that these oscillations may be related to the simple Lorenz system, providing an interpretable nonlinear model for this complex flow. Hasegawa et al. [14] trained a machine-learned reduced order model (ML-ROM) based on a convolutional neural network autoencoder (CNN-AE) and a long short-term memory (LSTM) to capture the evolution of the laminar bluff body wakes. As their training data, flows over bodies of arbitrary body shapes are considered and are tested for unseen wakes. Their approach takes advantage of the low-dimensional nature of the latent space for laminar wake dynamics. Mendible et al. [17] developed a data-driven optimization procedure to segment and extract multiple traveling waves in fluid simulations. Typical dimensionality reduction techniques, such as POD and DMD, often fail to capture symmetries and invariants, such as translation invariance. This work uses sparse optimization to extend dimensionality reduction to include known symmetries.

Closure modeling is one of the most promising avenues of machine learning research in fluid dynamics. Balachandar et al. [2] use machine learning to model the hydrodynamic forces within particle-laden flows, providing closures that make a large step toward fully particle-resolved DNS. They show that by accounting for interactions with neighboring particles, their closure models are able to better predict the hydrodynamic forces within Euler–Lagrange simulations. These models would not have been possible without the synergistic combination of both physics and machine learning. In the work of Pawar et al. [19], an artificial neural network (multilayer perceptron) and a convolutional neural network are used to construct data-driven subgrid-scale closure models for two-dimensional turbulence. For the development of the models, they consider a number of spatial points and variables in learning the subgrid-scale stresses at a spatial point. They find that in comparison with the dynamic Smagorinsky model, the machine-learned model can accelerate computations with comparable accuracy. They also provide some cautionary notes on the use of data-driven models with importance of having checking mechanisms, especially for safety-critical applications.

The use of data science and machine learning has also stimulated the development of novel algorithms for numerical simulations of fluid flows. Foti et al. [11] leverage recent techniques in low rank decompositions and sparse sampling for efficient adaptive mesh refinement (AMR). Stevens and Colonius [25] have developed an enhanced shock capturing scheme that trains a neural network to improve the performance of a WENO method. The resulting WENO-NN combines favorable aspects of existing numerical methods with the expressive power of neural networks, resulting in performance gains in regions of high numerical viscosity.

Flowfield and parameter estimation are naturally data-centric problems and have also benefited from recent advances in machine learning and data-driven methods, as is highlighted in this special issue. Fukami et al. [12] surveyed the use of supervised machine learning techniques for flow estimation problems. Their work covers the use of multilayer perceptron, random forest, support vector regression, extreme learning machine, and convolutional neural networks on a number of canonical laminar and turbulent flows. Recommendations on how to perform the learning process and construct the networks for achieving accurate and robust estimations of flow fields from limited data are offered. In the work of Canuto et al. [7], data assimilation is used to develop cardiovascular models from clinical measurements. They use an ensemble Kalman filter (EnKF) to estimate model parameters for patient-specific models, in an effort to support clinicians to perform complex cardiovascular analysis on a reduced time frame for planning and diagnosis.

Recent trends in machine learning have provided powerful techniques for data-driven optimization and flow control. In this issue, Colvert and Kanso [8] have developed a sensing and control strategy that learns to orient the sensors in directions of maximal information. Although this flowtaxis is biologically inspired, there are many applications for such an adaptive sensing and control strategy. This work fits into a larger effort by the authors to efficiently and accurately classify wakes using techniques from machine learning. Bhattacharjee et al. [3] present a data-driven method for determining the optimal actuator location for controlling separated flow over an airfoil with minimal input energy. They also introduce complementary methods to extract controllable flow structures from flowfield data, helping identify physical mechanisms and instabilities that are exploited by the optimal actuation. Bieker et al. [4] have combined deep learning with model predictive control to manipulate the forces on bluff bodies. Instead of using a computationally expensive flow simulation or developing a reduced-order model with intrusive methods, they use a neural network to develop a surrogate model for bluff body flows with actuation, based on limited measurements of the forces on the bodies.

These articles collectively mark an exciting and growing avenue of research to bring machine learning and data science to bear on pressing problems in fluid dynamics. We believe that the applications of data-driven techniques are only in the early stage of their development and will continue to thrive over this decade. During that process, we expect that the fluid mechanics community will capitalize on the strengths of machine learning and data science, developing novel algorithms and gaining insights into the complex dynamics of fluid flows. The intersection of fluid mechanics and data science will continue to grow in importance, and we hope that this special issue provides inspiration for new research.