Data fitting with signomial programming compatible difference of convex functions

Signomial Programming (SP) has proven to be a powerful tool for engineering design optimization, striking a balance between the computational efficiency of Geometric Programming (GP) and the extensibility of more general methods for optimization. While techniques exist for fitting GP compatible models to data, no models have been proposed that take advantage of the increased modeling flexibility available in SP. Here, a new Difference of Softmax Affine function is constructed by utilizing existing methods of GP compatible fitting in Difference of Convex (DC) functions. This new function class is fit to data in log–log space and becomes either a signomial or a set of signomials upon inverse transformation. Examples presented here include simple test cases in 1D and 2D, and a fit to the performance data of the NACA 24xx family of airfoils. In each case, RMS error is driven to less than 1%.

. Despite these successes, the GP and SP formulations do not allow the use of black box analysis tools that are common in practical design problems [16].In aircraft design, these tools include high fidelity analyses such as Computational Fluid Dynamics (CFD) and Finite Element Analysis (FEA), and low fidelity analyses like XFOIL [5] and AVL [6].Previous work by Hoburg et.al. [11] addressed this issue for Geometric Programming by proposing three methods for fitting a GP compatible function to a pre-computed dataset, but the additional modeling flexibility available in Signomial Programming goes untreated.This work fills this gap by proposing a method for fitting a SP compatible function to data in cases where the Hoburg methods [11] lack sufficient accuracy.
2 Geometric Programming and Signomial Programming

Geometric Programming
Geometric Programs are built upon two fundamental building blocks: monomial and posynomial functions.A monomial function is defined as the product of a leading constant with each variable raised to a real power [2]: A posynomial is simply the sum of monomials [2], which can be defined in notation as: From these two building blocks, it is possible to construct the definition of a GP in standard form [2]: subject to m i (x) = 1, i = 1, . . ., N p j (x) ≤ 1, j = 1, . . ., M When constraints and objectives can be written in the form specified in Equation 3 it is said that the problem is GP compatible.

Signomial Programming
Signomal Programs (SPs) are a logical extension of Geometric Programs that allow the inclusion of negative leading constants and a broader set of equality constraints.The key building blocks of the Signomial Programing are signomials, which are the difference between two posynomials p(x) and n(x): The posynomial n(x) is often referred to as a 'neginomial' because it is made up of all of the terms with negative leading coefficients.From this definition, the standard form for a Signomial Program can be constructed [13]: however, another useful form is [13]: In this alternative form, the neginomial is added to both sides, and then used as a divisor to construct an expression either equal to or constrained by a value of one.SPs are not convex upon log-log transformation unlike their GP counterparts and therefore must be solved using general non-linear solution methods.However many signomial programs of interest still benefit from an underlying structure which is well approximated by a log-log convex formulation, and as a result can be efficiently solved by a series of GP approximations via the Difference of Convex Algorithm (DCA).In this process the various neginomials n(x) are replaced with local monomial approximations, yielding substantial benefits over other non-linear solution methods (see [14] and [21] for discussion).
3 Difference of Convex Functions for Data Fitting

Difference of Convex (DC) Functions
Signomials are extremely versatile functions, far more versatile than posynomials or monomials.But the construction of signomials is rather unique since they are the difference between two log-log convex functions.
Modern study of Difference of Convex (DC) functions is widely attributed to have started with a paper by Hartman et.al. [8], which states that a continuous function of bounded variation (f (x)) can be written as the difference of two convex functions (g(x) and h(x)) [8]: which clearly mirrors signomial construction.Further extensions have opened up a wide variety of well known properties that DC functions possess [1,17].
The implication here is that a significant subset of continuous functions can be decomposed into the difference of two convex functions, including nondifferentiable continuous functions.By extension, it should be possible to fit a DC function to nearly any dataset that can be well approximated with a continuous function.
Hoburg et.al. [11] provided three ways of fitting functions to datasets that were convex under the log-log transformation, but these methods entirely ignore the additional modeling flexibility available in Signomial Programming.So consider the use of a DC function under the log-log transformation, specifically the difference between two of the functions proposed by Hoburg et.al. [11].

Notation
Consider a data set sampled from a black box mapping from R N → R. Consistent with the notation in Hoburg [11] let the vector u j represent the independent variables in R N for data point j and the scalar w j represent the output in R for data point j.The logspace variables are then represented as x j = log u j and y j = log w j .

Difference of Max Affine (DMA) Functions
The first function proposed by Hoburg et.al. [11] is the Max Affine (MA) function: This function class is known to create a convex epigraph.In fact, any convex function can be reasonably approximated as a max affine function given a sufficient number of affine functions, K. Now consider the difference between two of these max affine functions (Equation 8), which henceforth will be called the Difference of Max Affine (DMA) function: The subtracted term is represented by an entirely separable Max Affine function which is defined by fitting parameters M , h, and g.
While the Max Affine function has a realizable transformation back from logspace, the Difference of Max Affine function has no such transformation due to the inability to readily construct a meaningful epigraph or subgraph.Despite this, the DMA function is quite rapid to fit, and could be of application in other areas where a cheap surrogate is desired for non-convex fitting.Here, the DMA function is a useful as an intellectual building block, and as a seed for the function class that follows.

Soft Difference of Max Affine (SDMA) Functions
The second function proposed by Hoburg et.al. [11] is the Soft-Max Affine (SMA) function: The SMA function uses a global softening parameter (α) to 'smooth' the sharp corners of the Max Affine function and has the benefit of requiring far fewer affine terms K to capture smooth convex functions with reasonable accuracy.However, the global softening parameter results in a poor representation in regions where the curvature deviates substantially from the global average.Consider the following function which is the difference between two Soft-Max Affine functions (Equation 10): which his henceforth referred to as the Soft Difference of Max Affine (SDMA) function.As with SMA functions when compared to MA functions, the SDMA function compared to the DMA function significantly decreases the number of terms K +M required to approximate smooth functions to a reasonable degree of accuracy by introducing scalar softening parameters α and β.
Transforming the SDMA function back to the optimization variables u j and w j proceeds as follows: (12) At this point it is obvious from the definition of a posynomial function that the form will reduce to: This is not compatible with the SP formulation, but consider the following substitutions: Which then reduces Equation 13 to: Thus, taking the three constraints (Equations 14, 15, 16) as a set does result in an SP compatible scheme.This method of substitution is consistent with other approaches to constructing GP and SP compatible constraints [2].The table below outlines the operators that should be used in the new constraint set given the original mapping: Since the ISMA function class provided a fit with lower RMS error than SMA, it is tempting to write an implicit soft difference of max affine function as: Fitting one of these functions to data would require that the data first be decomposed into convex (y convex ) and concave (y concave ), which has no obvious solution a priori.Fortunately, in practice the SDMA function is highly versatile and captures regions with varying curvature for even low fit orders due to the DC construction, somewhat negating the desire for ISDMA functions in the first place.However, non-differentiable corners and cusps would benefit from this type of function if the issues could be successfully resolved.

How Functions are Fit
The following process is used for fitting functions to the data points (u j ,w j ): 1. Apply the log transformation (x j , y j ) = (log u j , log w j ) 2. Fit a function f to the transformed data such that y j ≈ f (x j ) 3. Relax the equality y j = f (x j ) as appropriate to construct the desired constraint This process is nearly identical to that proposed by Hoburg et.al. [11], but two differences exist.First, the work here does not require convex functions in Step 2. Second, while Hoburg et.al. [11] could readily relax each equality constraint to construct a convex epigraph in Step 3, there is no similar approach when considering functions that are generally not convex and so selecting the appropriate constraint operator (=,≤, or ≥) will depend on the constraint being considered.
For a set of m data points, the fitting problem can be written as an unconstrained least squares optimization problem: where the fitting parameters are stacked in the vector γ.
Based on the work of Hoburg [11], all of the functions presented here were fit using a Levenberg-Marquardt (LM) algorithm.The LM algorithm computes a step size at each iteration, but requires a Jacobian to be computed at each step.Derivatives are therefore presented for each function class with respect to the fitting parameters in the vector γ in the two sections that follow.
For each SDMA function fit, a DMA function is first fit in order to provide an initial guess for the SDMA fitting algorithm.The ability to quickly achieve a DMA initial guess is critical to the success of SDMA functions, as starting from random positions rarely yields a good result.
In addition, the gradient based nature of the LM algorithm meant that the fitting process required a number of random restarts to be performed from varying initial guesses.In general 30-100 random restarts were performed for the work presented here, with the specific number varying with the individual fitting problem being considered.
Non-gradient based methods such as a Genetic Algorithm or Particle Swarm Optimization were considered for fitting, but there was no evidence that this approach would be superior to the LM algorithm established in the literature.A Genetic Algorithm was actually tested, but had worse performance than the LM method even in simple 2D cases.

Derivatives for the DMA Function
4.2 Derivatives for the SDMA Function 5 Demonstrations of the Fitting Models  The convex fitting methods all converge to nearly identical representations, none of which capture the significant non-convexity to any degree.In contrast, the SDMA function captures nearly all of the non-convexity in the function, to an RMS error of 1.5 × 10 −3 .The key takeaway here is that unlike SMA functions, which can only capture a single curvature due to the single parameter α, the SDMA function can capture complex, multi-radius curvature as a direct result of the DC construction.
Increasing fit order also improves the RMS error as expected:

A 3D Fitting Problem
After proving on a 2D case, the next logical step was to consider a 3D dataset.Similar to the 2D test case, it was desirable to have a non-differentiable region and complex 3D curvature.Consider this test function, the eigenfunction of the wave equation: An SDMA fit yields a reasonably accurate surrogate (consider here 9th order): As might be expected, the fitting scheme struggles along the non-differentiable L-shaped curve, and at the point of the L specifically.Error near the point (0,1) is due to an inflection in the data where curvature changes from concave to convex in a very small region, and this is difficult to capture in the fit.
And as before, plotting RMS error as a function of increasing fit order shows significant improvement for higher order fits: Given the success of the two test cases, the final step was to validate on a more representative engineering example.The Hoburg et.al. [11] work was able to fit performance data for NACA 24xx airfoils generated from XFOIL [5], but only by considering curves of lift coefficient vs. drag coefficient.In many cases, it is more useful to have two separate curves of lift coefficient vs. angle of attack and drag coefficient vs. angle of attack, but the CL vs. α curve is not compatible with log-log convex fitting techniques.
Consider the problem of fitting the following function: where C L is the airfoil lift coefficient, α is the airfoil angle of attack, Re is the Reynolds number, and τ is the airfoil thickness (ex, a τ = 0.12 would be a NACA 2412 airfoil).
XFOIL was used to generate a grid of data from α = [1, 23], Re = [10 6 , 10 7 ], and τ = [.09,.21].This data was then fit with an SDMA function of fit orders varying from 3 to 10. Plotting below are the results of a 5th order fit:  As before, plotting the RMS error vs. increasing fit order shows improvement with an increased number of fit terms: This work serves as a successful development and validation of a Signomial Programming compatible method for fitting black box data.The Soft Difference of Max Affine (SDMA) function is a crucial link in the chain from Geometric Programming, where low fidelity analysis methods must be used, to Sequential Quadratic Programming or the more recently proposed Logspace Sequential Quadratic Programming [12], which typically impose no bounds on the type of analysis that can be utilized.

5. 1 A
2D Fitting ProblemConsider the 2D function, which uses the log transformed variables x and y:y = max −6x − 6, x 4 − 3x 2 (25)This function was selected to test the SDMA function at a non-differentiable point and over variations in curvature.A set of 101 data points were sampled on even intervals over the function from x = [−2, 2], and 5th order MA, SMA, ISMA, and SDMA functions were fit to this data:

Fig. 2
Fig. 2 Variation of RMS error with increasing fit order

Fig. 3
Fig.3The eigenfunction of the wave equation

Fig. 4
Fig.4The fitted 9th order function and error.Note the increased value at the top of the color axis compared to 8th order, indicating a larger maximum error

Fig. 5
Fig. 5 Variation of RMS error with increasing fit order (a) Reynolds Number of 10 6 (b) Reynolds Number of 10 6.5 (c) Reynolds Number of 10 7

Fig. 6
Fig. 6 Curves of Lift Coefficient vs. Angle of Attack for the NACA 24xx family of airfoils

Fig. 7
Fig. 7 Variation of RMS error with increasing fit order

Table 1
Operators For a Given Initial Constraint