Visualizing Fluid Flows via Regularized Optimal Mass Transport with Applications to Neuroscience

The regularized optimal mass transport (rOMT) problem adds a diffusion term to the continuity equation in the original dynamic formulation of the optimal mass transport (OMT) problem proposed by Benamou and Brenier. We show that the rOMT model serves as a powerful tool in computational fluid dynamics for visualizing fluid flows in the glymphatic system. In the present work, we describe how to modify the previous numerical method for efficient implementation, resulting in a significant reduction in computational runtime. Numerical results applied to synthetic and real-data are provided.


Introduction
Optimal mass transport (OMT) treats the problem of optimally transporting a mass distribution from one configuration to another via the minimization of a given cost function.The OMT problem was first posed by Monge in 1781 in the context of the transportation of debris [17].This formulation was later given a modern relaxed formulation by Kantorovich in 1942 [12].In 2000, Benamou and Brenier reformulated OMT into a computational fluid dynamics (CFD) framework [1].In recent times, OMT theory has received extensive research attention with rich applications in machine learning [21], image processing/registration [7,8,9], network theory [3], and biomedical science [25].
The model employed in this work is based on the CFD approach proposed by Benamou and Brenier [1].Here OMT is formulated as an energy minimization problem with a partial differential equation (continuity) constraint.The continuity equation in the original version only involves advection.Our implementation includes an additional diffusion term that is of importance in our studies of glymphatic flows.This leads to the present modified formulation, which is referred as the regularized optimal mass transport (rOMT) problem.In addition to visualizing glymphatic flow, this type of model appears in many contexts including the Schrödinger bridge and entropic regularization [4,5].
Here we give the formal description of the rOMT model.Given two nonnegative density/mass functions ρ 0 (x) and ρ 1 (x) defined on spatial domain Ω ⊆ R 3 with equal total mass Ω ρ 0 (x)dx = Ω ρ 1 (x)dx, we consider the following optimization problem: subject to where ρ(t, x) : [0, 1] × Ω → R and v(t, x) : [0, 1] × Ω → R 3 are the timedependent density/mass function and velocity field, respectively, and σ > 0 is the constant diffusion coefficient.Eq. 2a is the advection-diffusion equation or the continuity equation in fluid dynamics.If we set σ = 0, one can recover the regular OMT problem proposed by Benamou and Brenier [1].By adding the non-negative diffusion term σ∆ρ into the continuity equation, we include both motions, advection and diffusion, into the dynamics of the system.Problem Eq. 1-Eq.2b solves for the optimal interpolation ρ(t, x) between the initial and final density/mass distributions, ρ 0 (x) and ρ 1 (x), and for the optimal velocity field v(t, x) which transports ρ 0 (x) into ρ 1 (x), during which the total kinetic energy is minimized and the dynamics follow the advection-diffusion equation.Continuing the work of the numerical method by Koundal et al. [6,13], we report a significant reduction in runtime by about 91% resulting from improvements of previous code.Some of the primary applications of the present work are concerned with fluid and solute flows in the brain, and in particular, the glymphatic system.The latter is a waste clearance network in the central nervous system that is mainly active during sleep and with certain anesthetics.Many neurodegenerative diseases, such as Alzheimer's and Parkinson's, are believed to be related to the impairment of the function of the glymphatic system.The glymphatic transport network has received enormous attention and efforts of a number of researchers to understand the fluid behaviors in the waste disposal process in the brain [2,11,18,19,24].The rOMT formulation described in the present work is highly relevant to analyzing glymphatic data due to the inclusion of both advection and diffusion terms in the continuity equation.In addition to solving the rOMT problem, we use Lagrangian coordinates for the rOMT model, which is especially useful for visualization of the time trajectories of the transport.

Material and Methods
This section outlines the numerical method of solving the rOMT problem Eq. 1-Eq.2b, and a post-processing Lagrangian method for practical purposes of tracing particles and visualizing fluid flows.

Numerical Solution of rOMT
The developed method is based on the assumption that the intensity of observed dynamic contrast enhanced MRI (DCE-MRI) data is proportional to the density/mass function in the rOMT model, and thus we treat the image intensity equivalently as the concentration of the tracer molecules in vivo.Suppose we are given the observed initial and final images, ρ obs 0 (x) and ρ obs 1 (x).In consideration of the image noise, instead of implementing a fixed end-point condition, we use a free end-point of the advection-diffusion process, which is realized by adding a fitting term ρ(1, x) − ρ obs 1 (x) 2 into the cost function and removing ρ from the optimized variables.This free end-point version of the rOMT problem for applications in noisy (e.g., DCE-MRI) data may be expressed as subject to Fig. 1 Numerical Pipeline of rOMT: From t i to t i+1 , the interpolated image ρ i ρ i ρ i is firstly advected via the velocity field v i v i v i by applying averaging matrix S(v i v i v i ) and is next diffused by applying matrix where β is the weighing parameter balancing between minimizing the kinetic energy and matching the final image.Given successive images ρ obs 0 , ρ obs where p > 2 and p ∈ N + , this method can be recursively run between adjacent images to guide the prolonged dynamic solution.
Next, a 3D version of the algorithm is detailed.Note that the proposed workflow also works for 2D problems with simple modifications.The spacial domain Ω is discretized into a cell-centered uniform grid of size n x × n y × n z and the time space is divided into m equal intervals.Let k s and k t be the volume of each spatial voxel and the length of each time interval, respectively.With t i = i • k t for i = 0, • • • , m denoting the m + 1 discrete time steps, we have discrete interpolations and velocity fields, . Note that a bold font is used to denote discretized flattened vectors to differentiate from continuous functions.For example, ρ ρ ρ is a vector of size mn × 1 and v v v is of size 3mn × 1 where n = n x n y n z is the total number of voxels.
The cost function Eq. 3 can be approximated with where Here ⊗ is the Kronecker product; is the Hadamard product; I i is the i × i identity matrix; [•|•] means forming block matrices.
An operator splitting technique is employed to separate the transport process into an advective and a diffusive step.From t i to t i+1 , in the firstly advective step, a particle-in-cell method is used to re-allocate transported mass to its nearest cell centers: ) is the interpolation matrix after the movement by velocity field v i v i v i .In the secondly diffusive step, we use Euler backwards scheme to ensure stability: where Q is the discretization matrix of the diffusive operator σ∆ on a cell-centered grid.Combining the two steps together, we get the discretized advection-diffusion Eq. 4a: 1).Consequently, the discrete model of the rOMT problem is given as follows: subject to One can prove that Hence, following Steklova and Haber [20], the Gauss-Newton method is used to optimize for the numerical solution where the gradient g ∂v v v 2 are computed to solve the linear system Hx = −g for x in each iteration.
Next, we elaborate on the analytical derivation of g(v v v) and H(v v v).Noticing that in F (v v v), ρ ρ ρ and ρ m ρ m ρ m are dependent on v v v following the advection-diffusion constraint, we have and where diag(•) is the function creating a diagonal matrix from the components of the given vector, and ∇ v v v is the operator of taking gradient with respect to v v v.
Considering the expressions of g and H, the difficulty lies in the computation of From the constraint Eq. 8a we have , so that for ∀j k, J k vj vj vj = 0 holds.Therefore, J is an upper-triangular block matrix of the form where where B(ρ j ρ j ρ j ) = ∂ ∂vj vj vj (S(v j v j v j )ρ j ρ j ρ j ), which by the particle-in-cell method is linear to v j v j v j and dependent on density ρ j ρ j ρ j .Notice that the second term of the Hessian matrix H given above, involves computing the multiplication of two matrices of sizes 3mn × n and n × 3mn, which is usually avoided in numerical implementation.Instead, we use a function handle that computes Hx in place of the coefficient matrix H so that the second term in Hx can be derived from twice the multiplication of a matrix and a vector.To sum up, we are going to compute We can create two functions getJmx and getJmT y to compute J m x and J T m y for any vector x and y, respectively.Within these two functions, J m x and J T m y can be computed iteratively in observation of the recursive format of Eq. 13.Given that J is upper-triangular, J T y can be computed by recursively calling the function referred to as getJmT y.We can use nested function getJmT y(getJmx(•)) to get the second term of Hx.However, this algorithm spends the vast majority of time (more than 90%) solving the linear system Hx = −g with the MATLAB built-in function pcg.We therefore modify the previous algorithm to reduce the time spent on this computational bottleneck.
The major contributions of the present work are as follows: 1. We pre-compute all S(v j v j v j ) and Consequently, we found a significant improvement in the efficiency of solving the linear system.See Algorithm 1 for the detailed process.

Lagrangian coordinates
Instead of observing the system under the usual Eulerian coordinates, one can get a Lagrangian representation of the above framework in the standard way.This is of course very useful for tracking the trajectories of particles and for investigating the characteristic patterns of fluid dynamics.This Lagrangian method has been used as a visualization method in [6,13].
Briefly, the method begins with defining the augmented velocity field ṽ = v − σ∇ log ρ and putting it into the advection-diffusion equation to get a zero on the right-hand side which gives a conservation form of the continuity equation Eq. 2a.We apply Lagrangian coordinates L(t, x) such that to track the pathlines (i.e.trajectories) of particles with the starting coordinates at t = 0 (Eq.16b) and the time-varying augmented velocity field ṽ.
Along each binary pathline, the speed s = ||v|| 2 may be calculated at each discrete time step, forming a pathline endowed with speed information which we call a speed-line, where || • || 2 represents the Euclidean norm.The speed-lines indicate the relative speed of the flow over time.
This representation provides a neat way of computing certain dimensionless constants that are very popular in CFD, in particular, the Péclet (P e) number.It has been used by several groups [10,16] in neuroscience to study the motion of cerebrospinal fluid (CSF) within the brain.Of special importance is the determination of regions where advection dominates or where diffusion dominates.In our model, we define P e number as follows: This measures the ratio of advection and diffusion.Similar to speed-lines, we can compute and endow P e along the binary pathlines to form the Péclet-lines.
In order to visualize in 3D rendering, we interpolate the speed-lines and Péclet-lines into the original grid size by taking the averages of the endowed speed and P e values within the same nearest voxel to derive the smoothed speed map and P e map, respectively.Additionally, the directional information of the fluid flow is thereby captured by connecting the start and end points of pathlines, obtaining vectors which we refer as velocity flux vectors.Even though they may lose intermediate path information compared to pathlines, these vectors can provide a clearer visualization of the dmovement.
The code for the Lagrangian method is available in https://github.com/xinannancy-chen/rOMTspdup.

Results
This section comes in three parts.In the first two parts, we test our methodology on self-created geometric dataset and DCE-MRI rat brain dataset, respectively.Last but not least, we compare the upgraded rOMT algorithm with the previous one on the two forementioned datasets and report a significant saving in runtime.

Gaussian Spheres
Five 3D Gaussian spheres of image size 50 × 50 × 50 are created, ρ 0 , • • • , ρ 4 , as successive input images fed into the rOMT algorithm and its Lagrangian post-processing (Fig. 2, first row).The initial mass distribution is a 3D dense Gaussian sphere, and it moves forward (advection) with mass gradually diffusing into the surrounding region over time.The unparalleled runtime for this dataset was about 26 minutes on a 2.6 GHz Intel Core i7-9750H, 16 GB RAM, running macOS Mojave (version 10.14.6) with MATLAB 2019b.Please refer to Table 1 for parameters used in the experiment.The resulting interpolated images from the rOMT algorithm are made into a video on Github.Other returned outputs are illustrated in Fig. 2. The binary pathlines, which are color-coded with the numerical start and end time, show the trajectories of particles.The resulting velocity flux vectors point in the direction of the movement and the color code shows relatively how far a particle is transported during the whole process.The speed-lines and the interpolated speed map indicate that the core of the Gaussian spheres are of higher speed compared to the outer regions.The Péclet-lines and P e map show that in the early stage of the transport, the motion is mainly advective in nature.However, in the later time, diffusion takes over.This change of dominated motion is within expectation in that dissolvable substances always have the tendency to eventually be equally mixed with the solution as a result of diffusion, regardless of the existence of an imposed velocity field.

DCE-MRI Rat Brain
To further test our method for practical uses, we ran our algorithm on a DCE-MRI dataset consisting of 55 rat brains.During the MRI acquisition, all the rats were anesthetized and an amount of tracer, gadoteric acid, was injected into the CSF from the neck, moving towards the brain.The DCE-MRI data were collected every 5 minutes and were further processed to derive the % signal change from the baseline.Data for each rat contained a 110-minute time period from 23 images of size 100×106×100, and we put every other image (in total 12 images) within a masked region into our Lagrangian rOMT method to reduce the computational burden.To avoid constantly introducing new data noise into the model, we utilize the final interpolated image from the previous loop as the initial image of the next loop.The computation of the rOMT model was performed consecutively using MATLAB 2018a on the Seawulf CPU cluster using 12 threads of a Xeon Gold 6148 CPU, which took about 4 hours for each rat.The Lagrangian post-processing method took between 2 to 3 minutes for each case.Please refer to Table 1 for parameters used in this experiment.
In Fig. 3, we display the data and results of an example 3-month-old rat.As the pathlines and velocity flux vectors illustrate, the tracer partly entered the brain parenchyma via the CSF sink and partly was drained out towards the nose.From the speed-lines and speed map, the higher speed occurred mainly along the large vessels, which is also recognized as advection-dominated transport according to the Péclet-lines and P e map.When the tracer entered the brain parenchyma, the movement motion was mainly dominated by diffusion due to the relative low values there.

Saving of Runtime
As detailed in Section 2.1, we improved the current rOMT code by eliminating repeated computation in nested functions and by optimization of the algorithm.An important part of this reduction in computational time was done by pre-computing intermediate results of the advective steps that were used either throughout the calculations or for a specific step.We compared the upgraded algorithm with the previous algorithm [13] by recording the runtime of rOMT code on the Gaussian sphere dataset at scaled image size (N=6, each of 4 loops) in Section 3.1 and the DCE-MRI dataset in Section 3.2 (N=55, each of 11 loops).
To analyze the time complexity of the algorithm, we considered the original Gaussian sphere images of size N 3 = 50 3 and scaled N by a factor of 0.5, 0.75, 1, 1.25, 1.5 and 1.75 to obtain spheres of sizes 25 3 , 38 3 , 50 3 , 63 3 , 75 3 and 88 3 , respectively.We put the six groups of data into the rOMT code using MATLAB 2019a on the Seawulf CPU cluster with 12 threads of a E5-2683v3 CPU.As illustrated in Fig. 4, the runtime increases drastically as N linearly scales up, especially for the previous code.For example, the previous code took 0.70 hours to run on the 25 3 size input, 13.61 hours on the 63 3 size According to the speed-lines and speed map, higher speed occurred mainly along the large vessels and quickly slowed down after entering the brain.The Péclet-lines and P e map identified the transport along vessels and in CSF mainly as advection-dominated by the relative higher P e numbers therein.In contrast, the movement motion was dominated by diffusion after the entry of tracer into deeper brain.
input, and 1 day and 10.91 hours on the 88 3 size input.The upgraded code without parallelization greatly reduced the runtime to 3 minutes, 1.14 hours, and 3.36 hours, respectively.By analyzing the six groups of data statistically, we found that in general a 91.25% ± 0.51% reduction and a 97.79% ± 0.36% reduction in runtime were realized by the upgraded code and the parallelized code, respectively, compared with the previous version.
For the rat brain dataset, the previous algorithm took 45.18 (± 7.64) hours to run a case.However, it took only 3.89 (± 0.35) hours for the upgraded algorithm to run and 0.41 (± 0.03) hours if run in parallel, resulting in a significant reduction in runtime by 91.21% (± 1.21%) and 99.08% (± 0.12%), respectively (Fig. 5).The runtime depends on various factors, such as the size of input image, the number of input images p, the diffusive coefficient σ, the number of time intervals m, etc.However, this significant improvement of efficiency is believed to be comprehensive as all parameters were kept fixed for the comparison.

Discussion
Our rOMT methodology (both Eulerian and Lagrangian) models the dynamic fluid flows based on advection-diffusion equation and the theory of OMT.This method is largely data-driven, meaning that there is no ground truth at hand to compare with especially when it comes to real-life image data.Taking σ = 0 in our model, the square root of the obtained infimum in the cost function Eq. 1 gives L 2 Wasserstein metric, which has vast applications to many fields [22,23].
It is interesting to note that rOMT is mathematically equivalent to the Schrödinger bridge [14,15], and thus the proposed algorithm may prove useful for a number of problems in which this mathematical model is relevant.We should note that while the Schrödinger bridge is formally similar to OMT (σ = 0), it has a substantially different motivation and interpretation.OMT was originally formulated in an engineering framework as the problem to optimally transport resources between sources and destinations.Erwin Schrödinger's motivation for the Schrödinger bridge was based on physics and the so-called "hot gas experiment" that led to a certain maximal likelihood problem.The aim was to link quantum theory to classical diffusion processes.In both cases (OMT and the Schrödinger bridge), one starts with two probability measures.In OMT, the measures are regarded as the initial and final configurations of resources whose transportation cost has to be minimized among all possible couplings: this is the Kantorovich formulation [22,23].In comparison, the measures employed in the Schrödinger bridge represent initial and final probability distributions of diffusive particles, and one searches for the most likely evolution from one to the other.It may be regarded as an entropy minimization problem in path space, and gives a natural data-driven model for a number of dynamical processes arising in both physics and biology, as described above in our models of fluid flow in the brain.
Numerical algorithms based on finite difference scheme and optimization can be very time-consuming on large 3D images.We realized a remarkable reduction of runtime of the rOMT algorithm, cutting a two-day running down to 4 hours and even less if run in parallel.One may notice that there can be some defects in the connection of the velocity fields when applying the rOMT algorithm independently to consecutive images in a given series run in parallel.For example, in Fig. 2 the speed map shows discontinued boundaries due to four independent loops.This can be ameliorated by feeding the final interpolated image from the previous loop into the next one as the initial image to give smoother velocity fields, as we did for the DCE-MRI rat brain dataset.However, by doing so parallelization is out of question because the loops are connected in time.The longer running time comes with the advantage of much smoother pathlines for meaningful visualization (Fig. 6).One must be wise weighing between a quicker algorithm and smoother velocity fields.If the continuity and smoothness of velocity fields is highly emphasized, a multimarginal model should be considered and longer time to run is also expected.

Conclusions
We introduced the rOMT methodology, both in the Eulerian and Lagrangian frameworks, as an efficient algorithm to track and visualize fluid flows, and in particular to track the trajectories of substances in glymphatic system using DCE-MRIs from a computational fluid dynamic perspective.Quantitative measurements, speed and the Péclet number are also provided along with the pathways to help uncover features of the fluid flows.We improved the previous code by removing redundant computation and significantly saved the runtime by 91%, and we offer the option to further cut the runtime down by putting the algorithm in parallel.

3 .
− 1 and make them inputs when calling functions getJmx and getJmT y to eliminate unnecessary redundant computations of advection-related matrices.2. We combine nested function getJmT y(getJmx(•)) into one function in light of the poor performance of transferring function handles in nested functions.We add the option of running the rOMT model with multiple input images ρ obs 0 , ρ obs 1 , • • • , ρ obs p−1 where p > 2 in parallel to further reduce runtime.
Compute gradient g and Hessian matrix function handle Hx Solve linear system Hx = −g for x Do line search to find length l if line search fails then return

Fig. 2 3D
Fig.23D Geometric Gaussian Spheres Dataset: First row: A series of synthetic Gaussian spheres as inputs of the rOMT model.The sphere is advectively moving forward while simultaneously diffusing locally.Second and third rows: Illustrative outputs from the Lagrangian rOMT methodology to visualize the fluid dynamics.Pathlines, color-coded with start and end time, show the trajectories of particle movement.The speed-lines show the relative speed at the corresonding location along pathlines.The Péclet-lines indicate the local transport motion of advection-dominated (higher value) or diffusion-dominated (lower value) along pathlines.The speed map and the P e map shown in 3D rendering are smoothed interpolations of speed-lines and Péclet-lines on the numerical grid, respectively.Velocity flux vectors are vectors obtained by connecting the initial and terminal points of the pathlines, which are color-coded with the length of vectors, illustrating the overall direction of the movement.These outputs show that the higher speed distributes mainly in the core of the Gaussian sphere and that the transport is first dominated by advection but quickly diffusion prevails.

Fig. 3
Fig. 3 3D DCE-MRI Rat Brain Dataset: (a) The input data is 12 successive images, ρ 0 , • • • , ρ 11 , selectively shown in 3D rendering.(b-g) The outputs from the Lagrangian rOMT methodology.The pathlines give the trajectories of tracer over the 110-minute period.The velocity flux vectors, color-coded with the length of vectors and shown scaled by 0.25, indicate that in addition to penetrating into the brain parenchyma, there are strong flows moving within CSF towards the nose.According to the speed-lines and speed map, higher speed occurred mainly along the large vessels and quickly slowed down after entering the brain.The Péclet-lines and P e map identified the transport along vessels and in CSF mainly as advection-dominated by the relative higher P e numbers therein.In contrast, the movement motion was dominated by diffusion after the entry of tracer into deeper brain.

Fig. 4
Fig. 4 Comparison of Runtime on the Gaussian Sphere Dataset: Left: The comparison of runtime of previous, upgraded and upgraded+parallel code at scaled image size.Right: The percent of saving in runtime compared with the previous code at scaled image size.The upgraded code realized 91.25% ± 0.51% reduction and can be further improved to 97.79% ± 0.36% if run in parallel.

Fig. 5
Fig. 5 Comparison of Runtime on the Rat Brain Dataset: Left: The comparison of runtime in logarithmic scale of previous (45.18± 7.64 hours), upgraded (3.89 ± 0.35 hours) and upgraded+parallel (0.41 ± 0.03 hours) code.There are in total 55 data points and each case consists of 12 input images resulting in 11 loops.Right: The percent of saving in runtime compared with the previous code.The upgraded code realized 91.21% ± 1.21% reduction and can be further improved to 99.08% ± 0.12% if run in parallel.

Fig. 6
Fig.6Comparison of unparallelized and parallelized algorithms on pathlines: The pathlines of an example rat brain viewed from the top.Left: pathlines from continuously feeding final interpolated image into next loop.Right: pathlines from parallelized computation.The left ones are smoother and more in shape of clusters.The right ones show some fluctuations in the obtained pathlines due to constantly introducing new data noise into the system, but the overall direction of movement is aligned with the left.The left requires about 10-fold that of on the right in runtime.

Table 1
Parameters used in rOMT algorithm