Reconstruction of low-rank aggregation kernels in univariate population balance equations

The dynamics of particle processes can be described by population balance equations which are governed by phenomena including growth, nucleation, breakage and aggregation. Estimating the kinetics of the aggregation phenomena from measured density data constitutes an ill-conditioned inverse problem. In this work, we focus on the aggregation problem and present an approach to estimate the aggregation kernel in discrete, low rank form from given (measured or simulated) data. The low-rank assumption for the kernel allows the application of fast techniques for the evaluation of the aggregation integral (O(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {O}(n\log n)$\end{document} instead of O(n2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {O}(n^{2})$\end{document} where n denotes the number of unknowns in the discretization) and reduces the dimension of the optimization problem, allowing for efficient and accurate kernel reconstructions. We provide and compare two approaches which we will illustrate in numerical tests.

and v which takes into consideration the conditions of the underlying process (like temperature and saturation).
The number density distribution f (v, t) throughout this pure aggregation process is governed by the integro-differential equation with a source term and a sink term Starting from an initial distribution f (v, 0) = f Init (v), the density distribution f will vary dynamically over time as smaller particles combine and form larger particles.
The total mass of all particles (also called the first moment), Here, we are concerned with the reconstruction of the kernel κ(u, v) from given data F (v, t i ) at m + 1 different time instances t i , i ∈ {0, . . . , m}. We denote the reference distributions by a capital F while distributions computed with a reconstructed kernel are denoted by a lowercase f . Previous work in this field is often concerned with fitting a single ( [1], [15]) or a small number of parameters of a kernel that is assumed to be of a certain form, e.g. fitting exponents in rational functions [12]. Laurent polynomials were used in [7] to approximate the aggregation kernel. All these works determine coefficients for an approximation of a kernel on (0, ∞) × (0, ∞). A reconstruction on a bounded computational domain is presented in [5]. Here, the property space is divided into cells and a (piecewise) bilinear basis is used to approximate the kernel.
There exist some methods for parameter estimation in differential equations that are based on discontinuous data in the presence of measurement noise ( [2,13] or [14]) without a connection to population balances. However, the differential equations considered in these works have only a few degrees of freedom and more measurements available making them non-applicable in this setting.
When formulating the optimization problem, there exist (at least) the following two approaches: • Compute approximations to the derivatives on the left hand side of (1) using the measured data and compute the right hand side of (1) using a reconstructed kernel where the reconstruction should minimize the error between left and right hand side of (1) in some appropriate norm. • Simulate the aggregation process with a reconstructed kernel where the reconstruction should minimize the error between the measured and simulated data.
The first approach minimizes the error of the derivative and is typically computationally less challenging but requires a high temporal resolution of the density distribution in order to provide useful results. This is often infeasible for applications of experimentally measured data where the time between measurements might be large. The second approach often challenges the selected optimization algorithm and is computationally expensive but-if successful-yields better results than the first approach. In this paper, we will pursue the second approach which will be feasible in view of a low-rank assumption imposed on the kernel. This assumption allows us to use the fast evaluation techniques for the discrete aggregation process developed in [8,10]. In particular, this paper includes the following novel contributions: • The reconstruction of a discrete kernel in low-rank representation, both for a fixed and variable kernel basis. • Optimization with the objective to minimize the error between measured and simulated data with a new error-function based on the χ 2 -measure. • Numerical tests of this new framework.
The remainder of this paper is organized as follows: In Section 2, we introduce the discretization of the property space and the low-rank assumption for the kernel. We introduce the optimization problem and elaborate its structure in 3. Section 4 is devoted to numerical results of the proposed optimization problem, showing the applicability of the reconstructed kernel in different simulations.

Kernel estimation
The first prerequisite for the numerical solution of this inverse problem is a suitable discretization of the property space. For this, we define a maximum particle property v max ∈ R + and exclude any particle that is larger from further consideration, i. e. we assume that with equal spacing h := v max /n, resulting in a grid G v max p := {v 1 , . . . v n }. We will assume particles are concentrated at these equidistant pivots and discretize f (v, t) by macroscopic variables at any given time t which allows us to represent f (v, t) as a non-negative, timedependent vector-valued function We also represent the measured data F (v, t i ) with respect to the grid G v max p as a timedependent vector F (t i ) = (F 1 (t i ), . . . , F n (t i )) T ∈ R n ≥0 . In view of this discretization, the goal is to reconstruct a symmetric kernel matrix for an (unknown) kernel function κ(·, ·).
Since several of the physically motivated kernel functions have separabel representations or approximations of low rank, see for example [10] for an analysis of the Brownian, shear and kinetic kernels, and since a separable kernel function of rank k implies a kernel matrix of rank (at most) k, we make the following assumption.

Assumption 1
To reduce the number of unknown coefficients in the kernel matrix K ∈ R n×n , we assume it is of rank k ( n), i. e. it can be represented (or approximated) in the form with matrices U ∈ R n×k and S ∈ R k×k . To enforce the symmetry of K, we require that S = S T .
This reduces the degrees of freedom in the kernel matrix K from n(n+1) It also allows us to use the algorithms introduced in [8, 10] to accelerate the calculations: They provide the efficient evaluation of the source and sink terms (2) and (3) in the case of a separable kernel. We will refer to the (thin/rectangular) matrix U as the kernel basis.
The above discretization leads to a discrete source term of convolution type which can be evaluated for all j = 1, . . . , n in complexity O(kn log n) for given kernel factors U, S. The discrete sink term is given by and evaluated in complexity O(kn). Our framework for the kernel estimation is related to [5] and [7] where coefficients for kernel functions within a linear space spanned by a number of given basis functions, e.g. Laurent polynomials, are to be found. The corresponding continuous kernel function in [7] can be expressed in the form T where some of the coefficients of the symmetrix matrix S ∈ R k×k are subject to the optimization (while the others are fixed to zero). In our framework, this corresponds to a given (fixed) matrix U with its entries given by the basis functions evaluated at the pivots, i. e. U ij = b j (v i ). Here, we further generalize this framework by including U in the optimization process.

Optimization problem
With these prerequisites, we define the following minimization problem Here, F j (t i ) is the given (measured) data and f j (t i ) are computed by numerical simulation of on the grid G v max p . A possible disadvantage of the above minimization problem results from the fact that absolute errors are considered. Cells with a small number of particles may have only a small influence on the kernel estimation since the error E 2 is dominated by index pairs (i, j ) where F j (t i ) is large. To increase the sensitivity with respect to those cells with a small amount of particles, we define the error based on the χ 2measure leading to Here, we weigh the difference of simulated and observed particles higher when the simulation indicates a small number of particles in a class. Similar measures to curvefitting are also used in machine learning [11]. We add = 10 −10 to the denominator in (9) to ensure it is large enough to avoid numerical instabilities caused by the division. We set f j (t 0 ) = F j (t 0 ), j = 1, . . . , n, as the initial distribution which is always non-negative. The constraint f j (t i ) ≥ 0 will be satisfied throughout the simulation when step sizes in the time discretization of the differential equation are chosen sufficiently small.
Since the objective is to determine a kernel that minimizes the error between measured and simulated density distribution, every evaluation of E(U, S) requires the solution of (1), making it mandatory that efficient computational techniques are available. In fact, there are kn degrees of freedom in U and k(k+1) 2 degrees of freedom in S, leading to this number of evaluations of E(U, S). Most computational time during this optimization is spent in the evaluation of (6) which only becomes feasible with our Assumption 1 of a separable kernel and the fast algorithms designed in [8,10].
We solve the minimization problem (9) using MATLAB's optimization routine lsqnonlin and use the routine ode45 to solve the underlying differential equation. The routine lsqnonlin uses the Levenberg-Marquardt algorithm to search for local optima via finite differences.
If U has full rank k, its columns can be chosen orthonormal which turned out favorable in our numerical tests. We obtain orthonormal columns in U by replacing the current estimate U, S by Q, RSR T where Q and R are the QR-factors of U .
This implies that the matrix K is elementwise non-negative as well, which cannot be guaranteed without imposing complicated constraints on U and S. It is possible to restrict U and S to non-negative matrices as well (which guarantees K to be non-negative). This, however, significantly reduces the search space for a fixed rank k and does not allow for orthonormal columns of U . Details about non-negative matrix factorization (NNMF) are available in [4] and [9] but will not be used in this work.

Numerical results
This section is devoted to numerical results using the proposed method to reconstruct a kernel from given (measured or simulated) data. In this work, in order to be able to validate our results, we will reconstruct the following four different kernels from "measurements" F (v j , t i ) obtained through numerical simulation, Shear : κ S (u, v) = 2 · u 1/3 + v 1/3 7/3 (11) which are plotted in Fig. 1. We note that the kernels κ B and κ are separable, i. e. the kernel matrices can be represented in factored form (5) with k = 3 and k = 2, respectively. The discretized Brownian and sum kernels κ B , κ can be written in the form K : respectively, where v i , i = 1, . . . , n, are the pivots defined in (4), giving rise to initialize our optimizations with an antidiagonal matrix S. The other two kernels are not separable but can be approximated using a low-rank k so that the resulting discretization error is dominated by the number of pivots n.
We use a bimodal initial distribution where the scaling coefficient c normalizes the function to first moment μ 1 (f, t) = 10 −2 with v max = 1. We will use v max = 1 in all our numerical tests and hence leave out the superscript in G p := G 1 p . We obtain reference solutions by solving (1) with respect to a very fine grid G 17 and take "measurements" for m + 1 = 6 equidistant time instances t i = i for i ∈ {0, . . . , 5}. We obtain F (t i ) ∈ R 2 17 and consider it to be a distribution (perfectly) measured at time t i . We coarsen it to the grid G 10 and gather the particles at the pivots by summing over each set of 2 7 entries to obtain F (t i ) ∈ R 1024 . Throughout this section, we will use the following notation for discrete density distributions:

Grid
Reconstructed kernel Exact kernel The distributions F (v, t i ) for κ S and κ P are shown in Fig. 2. We see that the distributions have similar shapes and both have hardly any particles of mass greater than 0.5.
We present numerical tests for two variants of the optimization problem, one with a fixed matrix factor U , i. e. optimization only with respect to S, and one with both U and S (5) included in the optimization.

Optimization including the kernel basis U
For a variable U , we start the optimization process with rank k = 5 in (5) and We use this initial kernel because it is the pointwise evaluation of a smooth function and gives only moderate aggregation rates in the considered domain. High rates of aggregation will result in very small time steps in the solution of the differential equation to ensure the positivity of f . Our choice of S mimics the matrix present in the kernels κ B (14) and κ (15). The kernel matrix that solves the optimization problem (9) is denoted by K est = U est S est U T est for each of the four kernels. We also compute the pointwise relative errors The right column shows F · F T on a logarithmic scale and plot these in Fig. 3.
For all four kernels, we see a minimum of the relative error around (0.2, 0.2) which we attribute to our choice of initial condition with a peak at 0.2. We are not concerned about larger relative errors in the upper right triangle (v i + v j > 1) since the aggregation of two particles to one with mass greater than 1 and hence out of our computational domain should not occur, hence there is no (or hardly any) data to estimate the kernel in this region. The relative error of the shear kernel κ S is small over the entire domain, a similar result is observed for the Brownian kernel κ B . The sum and Peglow kernel approximation errors are smallest where most data is available-this follows from comparison with the plots of F · F T in Fig. 2 (right).
The sum kernel κ shows a large error for the aggregation rate involving very small particles even though there is enough data for an accurate estimation. We attribute this to the coarse discretization.
For additional validation of our kernel estimates, we use them for simulations with a different initial distribution and compare the obtained results with the simulations with the correct kernel. The approximation was obtained by fitting a given initial Fig. 3 (Logarithm of the) relative error (18) of the kernel with the kernel basis U included in the optimization distribution f but should also hold a certain accuracy for other distributions to be of general use. We use the initial distribution (the factor c is again used for normalization of g(v, 0) to the first moment μ 1 (g, t) = 10 −2 ) and calculate G(v, 10) with the reference kernel and g(v, 10) with the approximated kernel factors U est and S est with respect to the very fine grid G 17 . We chose g(v, 0) in view of its maximum at v = 0.1 which is near the local minimum of f (v, 0). In Fig 4, we show G(v, 10) and g(v, 10) on the interval [0, 0.4] (the simulation was computed on [0, 1]). We also calculate the relative L 2 error  Table 1 Relative L 2 error (20) betweenG andg with the kernel basis U included in the optimization for kernel approximations based on different grids G p r r r r p kernel κ B κ S κ κ P 5 4 . 5 3 · 10 −2 3.23 · 10 −2 3.89 · 10 −2 3.82 · 10 −2 6 1 . 3 5 · 10 −2 1.65 · 10 −2 3.01 · 10 −2 2.47 · 10 −2 7 1 . 2 2 · 10 −2 9.80 · 10 −3 2.79 · 10 −2 2.50 · 10 −2 8 1 . 1 3 · 10 −2 6.59 · 10 −3 2.64 · 10 −2 1.43 · 10 −2 9 1 . 4 0 · 10 −2 5.06 · 10 −3 2.60 · 10 −2 1.27 · 10 −2 10 1.40 · 10 −2 4.59 · 10 −3 2.59 · 10 −2 1.23 · 10 −2 between G(v, 10) and g(v, 10) for each kernel based on approximations for different grids G 5 to G 10 and present the results in Table 1. On the positive side, we have relative approximation accuracies of order O(10 −2 ) on a relatively coarse grid with 2 5 pivots, i. e. a gridwidth of h = 2 −5 = 0.03125. However, we observe hardly any improvement with respect to further refinement of the grid even though, as we will see in the following subsection, the framework would indeed allow for higher accuracies.

Optimization with a fixed kernel basis U
We now fix the kernel basis to U = (u jγ ) ∈ R n,k with k = 7 and entries u j,γ = v (γ −4)/3 j for γ = 1, . . . , 7. The pivots v j , j = 1, . . . , n, have been defined in (4). The optimization occurs now only with respect to the symmetric matrix S ∈ R k×k . We initialize S by S = 0 since we experienced better results compared to an initialization with an antitriangular matrix (which was not the case in the previous setting where U was included in the optimization). We are able to allow for a larger rank k in this setting since the number of degrees of freedom is no longer linear in the number of pivots n but only quadratic in the maximum rank k of the kernel matrix.
In Fig. 5, we show the relative approximation errors (18). We observe a clear improvement compared to the kernels estimated with variable U whose errors were shown in Fig. 3 and offer two interpretations of this result: (i) The fixed basis U has been chosen to span a space that allows for accurate approximations of the exact kernels; hence, the search space has been reduced significantly,  allowing for an easier optimization. (ii) Kernel estimation is an inverse problem which we propose to solve using a (non-convex) optimization. There could exist several (local) minima. In the end, we are estimating a kernel that simulates results close to the measured results, not a kernel that is close to another kernel (in our experiments given, but in practice unknown). Using these estimated kernels together with an initial distribution g(v, 0) = c · v · e −200(v−0.1) 2 , the resulting distributions g(v, 10) are shown in Fig. 6. They are indeed more accurate than those obtained for a variable U which were shown in Fig. 4. The relative L 2 errors for approximations based on grids G 5 to G 10 are shown in Table 2. Comparing to the respective results for an optimized U in Table 1, we see less accurate kernel approximations on the coarser grids (p = 5, 6) but then a decrease of order up to O(h 2 ) with respect to the grid width h = 2 −p (errors decrease by factors close to 4 when the grid is refined, i. e. when h is divided by 2).

Conclusions and future work
We have presented a novel framework for the approximation of aggregation kernels in population balance equations from measured or previously simulated data. We do not require a high temporal resolution of the measured data since we do not use it to approximate the (time) derivative on the left hand side on (1) as is done in several previous works. This allows estimation of an aggregation kernel from population data that is spaced in time without information at intermediate time instances. The main idea is the assumption of a discrete low-rank kernel of the form K = USU T which allows the fast evaluation of aggregation integrals introduced in [8,10] in nonlinear optimization procedures (here: MATLAB's lsqnonlin). We presented two approaches, one with a fixed kernel basis U and optimization with respect to 1 2 k(k+1) entries in the symmetric matrix S, and one with U included in the optimization. Our numerical tests indicate that the approach with a variable U is preferable only for a rather small number of pivots (here less than 100) whereas using a fixed basis U yields approximation results that depend (up to) quadratically on the grid with h.
In addition to producing results of higher accuracy, the approach with a fixed kernel basis U is also preferable with respect to computational complexity which is O(kn log n) compared to O(kn 2 log n) for a variable U . Presenting some actual simulation timings, using a variable U took 2 min for n = 2 7 = 128 pivots and 22 minutes for n = 2 10 = 1024. Using a fixed kernel, using n = 2 10 = 1024 pivots can still be done in under a minute.
Possible extensions in the future include the analysis for noisy data as it might occur in actual measurements for physical particles. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.