1 Introduction

Aircraft operations are mainly driven by the aircraft’s flight performance. Therefore, new developments always target better aerodynamic performance and propulsion system efficiency with less energy consumption to reduce the operational costs and environmental impact of each individual flight. Hence, the optimization of an aircraft’s flight performance is key to sustainable future aviation. Europe’s Flight Path 2050 [1] describes the vision of technologies and procedures available in the year 2050 that will allow a reduction of 75% of the CO\(_2\) emissions and 90% of the NO\(_\text {x}\) emissions per passenger kilometer compared to the year 2000. In addition, the perceived noise emission of an aircraft in flight must be reduced by 65%.

Within a short-term perspective, the reduction of emissions and pollution can only be achieved by new technologies enabling today’s modern aircraft to be even more environmentally friendly and allow a greener aviation. For the best possible reduction of emissions with current fleets an optimization of aircraft operations is required using smart flight control strategies. One of these strategies is DLR’s Low Noise Augmentation System (LNAS) [2, 3], which is able to optimize the aircraft’s descend and approach by advising the pilots with optimized autopilot commands and configuration points. It has proven during flight tests that its application can save up to 25% of fuel and emissions for certain approach flight phases. In addition, measurements on ground revealed that a significant noise reduction could be obtained when using LNAS: with an overall reduction on ground along the approach path certain areas show a reduction up to 5dB(A). But the system relies on a high-quality simulation model for aircraft flight performance evaluation which is used within the internal flight path analysis and optimization process.

There are several ways to obtain such simulation models which base on different sources of information about the distinct aircraft type. For example, the simulation model formulation could be based on published engine data, e.g., from handbooks or aircraft manuals [4]. Unfortunately, aircraft manuals do normally not contain the engine thrust values but only engine state information, which makes it impossible to obtain engine thrust models without additional information. Furthermore, one could use high-fidelity aircraft and engine design models to obtain the necessary information for generating aircraft flight performance models. Such an approach is mainly used by aircraft manufacturers because they have full insight in the aircraft design process. If available, ground test data and wind tunnel results can also be used to determine a propulsion system model, e.g., [5], but the corresponding information are also mostly limited to engine manufacturers. Another way to obtain the required information and determine flight performance models is the conduction of special flight test programs with an aircraft of the specific type and use flight data recordings to extract the information about the flight performance. But even with extensive and expensive flight test programs, additional a priori information on e.g., engine thrust might be necessary to develop reliable flight performance models.

A completely different fourth way is to only use flight data gathered during operational flights, which can be easily recorded daily. But without dedicated flight test procedures, this will pose a big data problem which has to be solved. Nevertheless, the desired information is inside this database, and if the database is large enough, the information can be extracted by application of proper methodologies and algorithms. Big data problems and artificial intelligence or machine learning methods to solve these are omnipresent today. But to solve a certain big data problem with minimal effort, the simple application of artificial intelligence algorithms is no smart solution. A smarter way to solve a big data problem in engineering is to apply as much fundamental knowledge about the underlying system as possible. This way big data is transferred to smart data and the initially posed problem can be solved with much simpler, faster and presumably more deterministic methods. For operational flight data this means that engineering knowledge about aircraft flight mechanics is applied and the data analysis process is designed accordingly. Doing so, well established model formulations can be used which further allow a direct interpretation of the resulting model, e.g., evaluation of lift-to-drag ratio. In this way, the big data analysis of operational data can directly help to reduce aircraft emissions and make aviation sustainable.

The challenge for such work is the correlation between aerodynamics and engine thrust which is present in the observed aircraft data during daily operations. The major task is to extract the corresponding information from flight data. Hence, the main assumption for this work is that big data allows decorrelating drag and thrust if the aerodynamics do vary due to flap/slat configuration changes and engine thrust is not mainly dependent on these changes. It means that similar engine thrust conditions do exist for different aircraft configurations and consequently drag which then allow determining each part individually. Moreover, using this assumption it is not only possible to estimate aircraft drag and thrust but also get unbiased values for both in terms of nearly correct zero-lift drag and engine idle thrust.

This paper presents a part of the work on solving the described big data problem in a smart and intelligent engineering way to reveal the flight performance, aerodynamics and engine thrust from operational flight data of three different Airbus A 320neo (New Engine Option) aircraft. It is based on previous DLR research on the determination of flight performance variations within operational flight data [6, 7] and has a different focus than already established methods for flight performance monitoring, e.g., [8,9,10]. A distinct process is developed to reduce the data size and necessary computational effort. First, a flight data preprocessing allows to select only the information required for the further flight performance analysis and already make all computations only required once in the process before starting any further processing (loop). Second, the data are processed and analyzed based on distinct engineering knowledge which does not require unstructured artificial intelligence algorithms or highly complex evaluation methods. The process does contain state-of-the-art data processing and system identification techniques and is mainly based on aerospace engineering knowledge. Again, this has the advantage that the resulting models can be directly evaluated by analyzing the corresponding parameters (e.g., aerodynamics) and model outputs (e.g., engine thrust).

Figure 1 gives an illustration of the data evaluation process used for this work. This process contains five individual blocks necessary to process and analyze the data and to determine the desired models:

  1. (a)

    Flight Data Processing: processing the flight data by e.g., performing unit conversions, transfer all relevant measurements to CG, determination of AC configuration, anti-ice system status and flight condition, calculation of the atmospheric parameters using the international standard atmosphere.

  2. (b)

    Aerodynamics Evaluation: calculation of aerodynamics with given model and from flight data depending on available thrust information with simplified or complete equations.

  3. (c)

    Engine Thrust Model Identification: extraction of relevant data for thrust model identification, calculation of required thrust for the thrust model parameter estimation, engine thrust prediction in relation to the given flight data measurements with the identified model.

  4. (d)

    Aerodynamics Calculation: calculation of aerodynamic coefficients based on flight data and predicted engine thrust as reference for the aerodynamic model update.

  5. (e)

    Aerodynamics Model Identification: new estimation of aerodynamic model parameters for all different aircraft configurations.

Fig. 1
figure 1

Basic scheme of the evaluation process used for the flight performance determination (aerodynamic model and thrust model identification)

The herein presented work is focused on the “Engine Thrust Model Identification” in block (c) using given information about the aircraft aerodynamics to obtain the required thrust information. The objective of this work is to obtain an engine thrust model will sufficient accuracy to reliably predict the aircraft flight performance (later-on) using the aerodynamic model predicted with the above given approach. The paper is structured as follows, giving information about:

  1. (1)

    the database of operational flight data (Sect. 2);

  2. (2)

    a general description of engine thrust modeling for dynamic aircraft simulation (Sect. 3);

  3. (3)

    definition and prediction of a global linear thrust model (Sect. 4);

  4. (4)

    formulation of local linear thrust models (Sect. 5);

  5. (5)

    estimation of a complex nonlinear multi-dimensional thrust model table (Sect. 6);

  6. (6)

    comparison of different engine thrust model approaches (Sect. 7);

  7. (7)

    estimation of fan speed-dependent temperature correction of the thrust prediction (Sect. 8);

  8. (8)

    a statistical analysis of the different thrust models (Sect. 9).

A summary and conclusions on the overall results are given in Sect. 10.

2 Flight database

This work is based on flight data recorded during operational flights on the pilots’ electronic flight bags (EFBs). The DLR-developed recorder was installed on the EFBs and used with different Airbus A 320neo aircraft of a major European airline in preparation of the SESAR Very Large Demonstration (VLD2) ALBATROSS project.Footnote 1 The recorder is able to record messages sent via ARINC 429, ARINC 834 or ARINC 717 using different interfaces depending on the EFBs it is installed on and was activated and deactivated manually by the pilots during climb and descent.

The flight database is limited to reasonable airspeeds and altitudes sufficiently high above ground to guarantee the aircraft flying outside of the ground effect. Therefore, the following initial limits are used for the evaluation presented herein: height above ground above 50 ft and barometric altitude above 500 ft, whichever comes first, as well as true airspeed above 130 kts. These limits do no restrict the evaluation results at all, but allow to circumvent any special conditions of flight near ground during landing and engine power reduction. A brief overview of the flight database as a result of the data pre-processing is given in Table 1. The number of data points does only include the data which was used for the evaluation and is therefore less than the total size of all data. Figure 2 visualizes the distribution of available flight data on an altitude-Mach plane split into the different aircraft flap/slat configurations.

Table 1 Brief overview of flight database used for evaluation
Fig. 2
figure 2

Flight data envelope: altitude-Mach diagram for all flap/slat configurations

The flight data sets used for the flight performance model determination are reduced to the relevant information. Initially, the data was recorded directly from aircraft buses/data streams and had to be processed to contain individual data channels with a sampling rate of 50 Hz, which is the in-house standard for an aircraft system identification task and normally used if available from flight data. For this task, the sample rate might appear very high because the aircraft flight performance and no dynamic behavior is targeted. But as the following calculations require several a priori transformations of, e.g., accelerations, it is justifiable to use a 50 Hz sample rate and not reduce the data size by downsampling. For each flight, the data sets finally contain:

  • time stamp;

  • body-fixed accelerations: longitudinal, lateral, vertical;

  • angular rates: pitch, yaw, roll;

  • inflow/atmosphere: corrected angle of attack, estimated angle of sideslip, true airspeed, Mach number, static temperature;

  • altitude: barometric, barometric corrected, GPS, radio height;

  • engine fan speed (left/right);

  • aircraft gross weight, longitudinal CG position;

  • configuration: landing gear extension (main left/right, nose), flap position, slat position;

  • control inputs: spoiler deflection (left/right, spoilers 2 to 5), horizontal stabilizer deflection;

  • anti-ice system status: wing, engine.

There is one major restriction with the available data sets: the records do not contain any information about the elevator deflection. It does only slightly affect the aircraft flight performance by a minor change of the overall aircraft lift, but is necessary to correctly predict the flight performance. But, with elevators only deflected during a short time during flight with the A 320, it can be considered as an additional uncertainty within the overall process and is mainly relevant for the aerodynamics prediction outside the work presented herein.

Not all data of the recorded flight data sets can be used for the evaluation and model determination. There are some requirements which are essential for the process shown in Fig. 1. First of all, the aircraft’s flap/slat configuration must be fixed to successfully determine the corresponding aerodynamics. There is no need to determine the aerodynamics during configuration changes. Moreover, this would introduce further uncertainties into the model determination and estimation process and would not be beneficial to achieve the goals of this work. Therefore, all data marked as containing moving slat, flap or gear is set to invalid for the further data evaluation. In addition, data with low airspeed and altitude is also excluded from the evaluation process as already described above. Also, the data is checked for segments containing unreliable measurements (e.g., not-a-number or completely wrong valuesFootnote 2), which lead to an exclusion of these segments or the whole flight depending on the severity of this unreliability. Finally, this leads to the selection of the about 55.5 million data points given in Table 1.

3 Engine thrust model for dynamic aircraft simulation

Engine thrust prediction is essential for any flight performance evaluation. Therefore, the engine thrust model development is one essential part of the process described herein (see block (c) in Fig. 1), because there was no a priori information about the Pratt & Whitney “PW1100G” engineFootnote 3 thrust available for this work. The given flight database is searched for specific symmetric engine conditions and the results are split into individual data sets for each anti-ice system configuration.Footnote 4 An overview of the data selection for the engine thrust model development is given in Table 2.

Table 2 Distribution of data in flight database: anti-ice system status

3.1 Required thrust using aircraft aerodynamics

In the first step, the required thrust is calculated using the initial guess for the aerodynamics. Later on in the process, the calculation is repeated with the updated aerodynamics. The sum of forces in longitudinal direction of the aerodynamic frame during flight is composed of the current engine required thrust, the drag force as well as the AC mass and the longitudinal acceleration. Solving this equation for the required total thrust yields

$$\begin{aligned} T_{\textrm{tot,x,a}} = m_{\textrm{AC}}\cdot g \cdot n_{\textrm{x,a}}+ C_{D{}} \cdot \overline{q}\cdot S , \end{aligned}$$
(1)

where the longitudinal load factor at CG in the aerodynamics frame is given by

$$\begin{aligned} n_{\textrm{x,a}}= n_\textrm{x}\cdot \cos (\alpha ) \cdot \cos (\beta ) + n_\textrm{y}\cdot \sin (\beta ) + n_\textrm{z}\sin (\alpha ) \cdot \cos (\beta ) , \end{aligned}$$
(2)

using the load factors at CG in the body axis \(n_\textrm{x},n_\textrm{y},n_\textrm{z}\) resulting from the transfer of acceleration measurements from sensor position to CG.

The required total thrust is further transferred to the body-fixed coordinate system (neglecting thrust force components in vertical and lateral direction):

$$\begin{aligned} T_{\textrm{tot,x}} = \frac{T_{\textrm{tot,x,a}}}{\cos (\alpha )\cdot \cos (\beta )} . \end{aligned}$$
(3)

For the thrust model definition symmetric engine conditions are considered to allow splitting total thrust on both engines equally. Using engine inclination and toe out angle, the approximated required thrust per engine resulting from acceleration measurements and drag force predicted by the assumed aerodynamic model finally results in

$$\begin{aligned} T_{\textrm{req}}= 0.5 \cdot \frac{T_{\textrm{tot,x}}}{\cos (\sigma )\cdot \cos (\varepsilon _{xy})}. \end{aligned}$$
(4)

Note that the data also includes different states of the anti-ice system. Hence, the data is selected according to the different anti-ice cases (off, engine anti-ice on, wing and engine anti-ice on) and individual thrust models are determined.

3.2 Thrust model regressors

The thrust of a jet engine depends on several engine states and external parameters. Hence, the model formulation must cover the main influences to reliably predict the engine thrust within the required flight envelope and for all relevant thrust settings. The general influence of different parameters on thrust is shown in Fig. 3. It is clearly visible that the jet engine thrust has a complex nonlinear behavior with the variation of individual parameters. In consequence, the main regressors for the engine thrust model used in this work are defined as follows:

  • engine fan speed \(N_1\),

  • Mach number \(Ma\),

  • barometric altitude H,

  • temperature offset \(\Delta \text {ISA}\).

The latter is used as model regressor but in a subsequent engine thrust correction, especially to reduce the necessary dimensions of the thrust table (see Sect. 6). A detailed description of the temperature correction used is given in Sect. 8. In general, the engine thrust model is defined by

$$\begin{aligned}{} {} T& = T_{\textrm{net}}\cdot f_\textrm{corr}(\Delta \text {ISA},\dots ) \nonumber \\{} & {} = f(N_1,Ma,H) \cdot f_\textrm{corr}(\Delta \text {ISA},\dots ) \approx T_{\textrm{req}} \end{aligned}$$
(5)

with the net thrust being a (arbitrary) function of the above named three main influence parameters. In a first step, the temperature correction function is neglected which leads to the following approximation:

$$\begin{aligned} T_{\textrm{req}}\approx T_{\textrm{net}}= f(N_1,Ma,H). \end{aligned}$$
(6)
Fig. 3
figure 3

General influence of different engine/aircraft states and atmospheric parameters on jet engine thrust, according to Refs. [11, 12]

4 Linear thrust model

The simplest model for net thrust is a linear model with the given three regressors and four parameters:

$$\begin{aligned} T_{\textrm{net}}= \theta _0 + \theta _1 \cdot N_1+ \theta _2 \cdot Ma+ \theta _3 \cdot H. \end{aligned}$$
(7)

Note that although it is well known (see Fig. 3) that engine thrust is strongly and nonlinearly dependent on the several chosen regressors, from an engineering point of view, a linear model is a good choice to start with. If necessary the model must be enhanced step by step to further include or approximate nonlinear thrust characteristics. Hence, the necessary effort to obtain the four parameters is also limited, because the mean square problem can be solved easily:

$$\begin{aligned} \widehat{\theta } = \underset{\theta }{\text {argmin}} \left( \theta \rightarrow \sum _{i=1}^{n} \left( z_i - y_i(\theta )\right) ^2 \right) , \end{aligned}$$
(8)

with \(z_i\) being the required thrust and \(y_i(\theta )\) the modeled net thrust \(T_{\textrm{net}}\). Using for example MATLAB®, the linear problem can be formulated and solved directly:

$$\begin{aligned}{} & {} A \theta = z \quad \Rightarrow \quad \theta = A~\backslash ~z, \quad \text {with} \quad A_i = [1,N_{1,i},Ma_i,H_i] \quad \nonumber \\{} & {} \quad \text {and} \quad z_i = T_{\textrm{req},i}. \end{aligned}$$
(9)

The standard error vector of the resulting parameters results from the unscaled covariance matrix C and the scaling factor \(\widehat{\sigma }^2\)

$$\begin{aligned} \sigma (\theta ){} & {} = \sqrt{\widehat{\sigma }^2 \cdot \text {diag}(C)}, \quad \text {with} \quad \widehat{\sigma }^2\nonumber = \frac{\Vert z - A \theta \Vert }{n-p}\\{} & {} \quad \text {and} \quad C = \left( A^T A\right) ^{-1}. \end{aligned}$$
(10)

The results for the model parameters and the limits of the linear thrust model are given in Table 3. These limits result from the available data within the thrust envelope. The very large standard error for the offset parameter \(\theta _0\) and the Mach derivative \(\theta _2\) directly indicate the low identifiability of these parameters using the available data set and the low ability of the linear model formulation to represent the given nonlinear engine thrust behavior. An example of the linear thrust model evaluation is given in Fig. 4. The left side plot (Fig. 4a) shows the predicted thrust variation with fan speed and Mach number at fixed altitude, whereas the right side plot (Fig. 4b) illustrates the thrust versus fan speed and altitude at fixed Mach number.

Table 3 Linear thrust model parameters and limits
Fig. 4
figure 4

Linear thrust model evaluations at example conditions; anti-ice off. a variation of engine fan speed and Mach number at H = 3000 m, b variation of engine fan speed and altitude at Ma = 0.4

With the linear model the whole nonlinear thrust envelope is covered. But the model delivers wrong predictions in certain areas of the envelope because the nonlinear thrust behavior cannot be predicted sufficiently by the linear model. Furthermore, the linear model will give completely wrong results at the borders of the envelope, where large negative thrust values are predicted (e.g., see Fig. 4 at low \(N_1\) and high Mach or altitude). Hence, the model quality and validity are strictly limited and extrapolation outside the limiting borders will off course give even worse predictions.

5 Local linear thrust models

A set of local linear models allows to better approximate nonlinear functions than one single linear model. Therefore, it is reasonable to use these in the given evaluations. Hence, the complete data is segmented and the regression problem formulated in Eq. (9) is solved for each data set.

The boundaries of the local linear models used for this evaluation —given in Table 4— result in 48 potential models within the envelope. Note that these boundaries could significantly exceed the given data —e.g., the maximum altitude represented in the data set— but this does not affect the model estimation results. Furthermore, the boundaries are extended for the regression by the values given in the last row of Table 4 to potentially reduce the gaps between the individual models: if the local model is forced to also cover the thrust slightly outside the boundaries, the discontinuous behavior at the boundaries is assumed to be reduced to a minimum, accepting that the models might lose some ability to predict the required thrust within the boundaries.

Table 4 Boundaries of local linear models for breakpoint definition

A measure for the quality of the local linear model is the coefficient of determination, which is defined by the ratio between the sum of squares of the residuals and the total sum of squares

$$\begin{aligned} R^2 = 1-\frac{\sum _i \left( z_i - y_i\right) ^2}{\sum _i \left( z_i - \overline{z}\right) ^2}. \end{aligned}$$
(11)

A value of 1 indicates a perfect match of model output and measurement whereas a negative value reveals a very poor measurement prediction. Using this, a set of valid models can be extracted from the local linear models within the thrust envelope. Table 5 gives an overview of the obtained models with a coefficient of determination above 0.6. The different numbers of data points used for regression indicate the unbalanced distribution of data within the whole thrust envelope, which further indicates the problem of identifying all potential 48 local models with the same quality. Although some models, e.g., number 13 or 33, show high values of \(R^2\), it is very questionable if these values are meaningful because less than 4000 data points were used for the regression and \(R^2\) itself does not contain any information about the specific data distribution within the local boundaries. Therefore, only results for models with high coefficients of determination and a large data set used for regression should be considered to be reliable.

Table 5 Valid local linear models after estimation with \(R^2>0.6\)

An example of the local linear thrust model evaluation is given in Fig. 5. Similar to Fig. 4, the model evaluations are made and visualized for an example altitude (Fig. 5a) and Mach number (Fig. 5b). The local linear models do better approximate the nonlinear engine thrust behavior with the given regressors than a global linear model. Furthermore, a change of the local model boundary definition could allow an even better approximation of the nonlinear engine thrust behavior. Nevertheless, the number of models is limited to the given data in the way that models can only be determined if enough data is available within a certain part of the envelope and therefore within the given boundaries. With focus on a high model quality, a distinct number of measurements with sufficient (local) variation of regressors must be available to estimate the individual model parameters. For this example, the minimum number of measurement was empirically set to 1000, with the assumption that data set containing more data points would also contain enough variation.

Fig. 5
figure 5

Local linear thrust model evaluations at example conditions; anti-ice off. a Variation of engine fan speed and Mach number at H = 3000 m, b variation of engine fan speed and altitude at Ma = 0.4

Furthermore, to allow a continuous evaluation of the engine thrust models beyond their local boundaries and to switch from one model to another, model stitching would be necessary. Therefore, the models must be, for example, steady across the boundaries on the one hand or a suitable interpolation between the models over the boundaries must be made on the other. But as shown in Fig. 5 the local linear models are not steady across the model boundaries, which means that a different model parameter estimation technique than a simple, unrestricted regression as given in Eq. (9) would be necessary to include a local steadiness requirement. But this will reduce the quality of the local linear model fit with the given data because the additional requirement limits its flexibility. Moreover, without a suitable local steadiness at the model boundaries a stitching is to achieve a smooth transition between the multi-dimensional models is highly complex, very difficult or gets even impossible. For example: Fig. 5a) shows for low Mach numbers and high fan speeds significantly different gradients of the resulting local linear models. Hence, no “simple” interpolation technique can be applied to stitch the models. Different approaches with e.g., additional linear intermediate models to fill the gaps between the local linear models would act similarly like a finer grid, but there is no guarantee of success. If the models provide local steadiness, e.g., model blending [5, 13] can be used as stitching technique.

Figure 6 shows two examples for local linear models: one with a coefficient of determination near 1 representing a very good model fit, and another with a negative value of \(R^2\) indicating a poor thrust prediction. The local linear model visualized in Fig. 6a) well predicts the required thrust which is represented by the model surface being very close to the data. Furthermore, the required thrust shows a nearly linear behavior with variation of the regressors which means that the local linear thrust model representation is a good choice in this case. In contrast, Fig 6b) shows a very poor fit of required thrust with the local linear model. Therefore, the model cannot be used to reliably predict thrust for further analysis, because e.g., the model underestimates the required thrust for the given example conditions. It is clearly visible that the (example) data for model estimation is locally concentrated and widely spread, which makes it difficult to well estimate the local linear model parameters in this case.

Fig. 6
figure 6

Examples of thrust predictions using local linear model compared to required thrust data: very good and bad model fit. a Local linear model thrust match: required thrust (data points from measurements) and model prediction (surface) with variation of different regressors; example conditions at H = 4000 m (left side) and Ma = 0.625 (right side); good model fit, R2 = 0.97192, Model 27, b local linear model thrust match: required thrust (data points from measurements) and model prediction (surface) with variation of different regressors; example conditions at H = 6000 m (left side) and Ma = 0.575 (right side); bad model fit, R2 = - 0.8662

All in all, for this work it was found that the estimation of a complete thrust table would be more beneficial than a complex local linear model determination for the whole thrust envelope and subsequent extensive model stitching.

6 Nonlinear engine thrust table

For this work, the representation of the engine thrust by a multidimensional table is a better choice than using (local) linear models. By this, the nonlinear behavior of engine thrust with variation of engine states and external influences can be modeled. The choice of breakpoints, the spacing of the grid and the interpolation method between the breakpoints defines the quality of the resulting thrust prediction. A coarse grid and the application of a linear interpolation method will give less accurate results as a very fine grid and/or a higher order interpolation method. Furthermore, a non-uniform grid will also be able to cover strong local nonlinearities, but will pose some additional challenges to the table definition and the entire estimation process. For this work, it was assumed that a moderately fine grid allows covering the nonlinear thrust behavior with the used regressors and a linear interpolation method is sufficient to predict the thrust between the breakpoints of this fine grid. Also, it was kept in mind that the table estimation should be performed on a state-of-the-art desktop computer with large but limited memory: because of the number of free parameter during the estimation directly increases with the number of breakpoints, the table size is simply limited by the available memory. Nevertheless, the grid definition in Table 6 was initially found sufficient and finally delivered together with (simple) linear multi-dimensional interpolation a good representation of the required thrust behavior. Note that the breakpoints are chosen to cover the operational limits of the aircraft but also consider the limitations of the given data sets, i.e. the altitude limit at 6500m.

Table 6 Nonlinear thrust model table breakpoint definition

To allow the estimation of the thrust table entries with e.g., the well-known Gauss–Newton method, the table is decomposed into the parameter vector \(\theta\). Hence, the number of parameters p respectively the length of the parameter vector is given by the product of lengths in each dimension

$$\begin{aligned} p = N_{\text {BP}_{N_1}} \cdot N_{\text {BP}_{Ma}} \cdot N_{\text {BP}_{H}} . \end{aligned}$$

The linear model (Sect. 4) is used to obtain an initial guess for the table entries respectively \(\theta\). Using the breakpoints defined in Table 6 the linear model is evaluated for the whole resulting grid and the table is initialized.

One major requirement for the engine thrust table is that it should have a smooth shape and no significant local changes of curvature. The basic influence in different regressors on engine thrust in Fig. 3 shows that there is no different, discontinuous behavior of thrust with fan speed, velocity or Mach number and altitude or pressure expectable. Furthermore, the engineering knowledge about jet engines and the way they produce thrust directly imposes the need to obtain “smooth” thrust curves: a jet engine within its normal operation limits will not produce an abrupt change of thrust due to a change of flight condition because the thermodynamic cycle within the engine is running continuously. As long as no major disturbance of this process occurs, like engine stall or any malfunction, there must be no significant change of thrust curvature with the used regressors. Note that this is the type of engineering knowledge which is implemented in the presented approach and differs from unstructured artificial intelligence techniques to solve the given big data problem. Without such assumptions, the resulting thrust model will not be able to reasonably and reliably predict the engine thrust. The optimization problem is now formulated as:

$$\begin{aligned} \widehat{\theta } = \underset{\theta }{\text {argmin}} \left( \theta \rightarrow \sum _{i=1}^{n} \left( z_i - y_i(\theta )\right) ^2 + \Vert \Gamma ^*_2 \theta \Vert ^2\right) . \end{aligned}$$
(12)

In this case, the Tikhonov regularization matrix \(\Gamma ^*_2\) [14, 15] is used to penalize the second-order derivatives of the multi-dimensional thrust table in each direction to smooth the table shape. It is defined to consider all break point combinations of the table except for the corners of the resulting break point cube. Its definition further considers the fact, that the faces of the cube are represented with the second-order derivative in only two dimensions and the edges in only one dimension. Figure 7 shows all break points and illustrates the named cube, faces and edges, which must be treated individually to compose the matrix. The matrix size is defined by the number of parameters p (columns) and the number of break-points to be considered for calculation of the curvature (rows) given by

$$\begin{aligned} \begin{aligned}{} & {} \text {{Num rows}}(\Gamma ^*_2) =~ 3 \cdot \Big (N_{\text {BP}_{N_1}}-2\Big ) \cdot \Big (N_{\text {BP}_{Ma}}-2\Big ) \\{} & {} \quad \cdot \Big (N_{\text {BP}_{H}}-2\Big ) \cdot \Big (N_{\text {BP}_{Ma}}-2\Big ) \\{} & {} \quad +~ 4\cdot \Big (N_{\text {BP}_{N_1}}-2\Big ) \cdot \Big (N_{\text {BP}_{Ma}}-2\Big )\\{} & {} \quad + 4 \cdot \Big (N_{\text {BP}_{Ma}}-2\Big ) \cdot \Big (N_{\text {BP}_{H}}-2\Big )\\{} & {} \quad +~ 4\cdot \Big (N_{\text {BP}_{N_1}}-2\Big ) \cdot \Big (N_{\text {BP}_{H}}-2\Big ) \\{} & {} \quad +~ 4\cdot \Big (N_{\text {BP}_{N_1}} + N_{\text {BP}_{Ma}} + N_{\text {BP}_{H}} - 6\Big ).\\ \end{aligned} \end{aligned}$$
Fig. 7
figure 7

Illustration of thrust table model breakpoints (gray dots) to be considered differently for Tikhonov matrix regularization during parameter estimation. Left side: inner cube to be considered with three-dimensional distances between the breakpoints for weighting. Right side: outside faces and edges which result in two- and one-dimensional distances for weighting

The matrix is based on the coefficients \([-1,+2,-1]\) of the well-known Mexican Hat wavelet filter or the Laplacian-of-Gaussian filter [16] of order two. The filter is scaled with the weighted Euclidean norm \(d_\gamma\) between two break-point combinations \(\text {BP}_{}\) (including the Tikhonov weights \(\gamma\) in each direction):

$$\begin{aligned}{} & {} d_{\gamma }(\text {BP}_{i},\text {BP}_{j}) =\nonumber \\{} & {} \quad \sqrt{\gamma _{N_1}\cdot \left( \text {BP}_{N_1,i}-\text {BP}_{N_1,j}\right) ^2 +\gamma _{Ma}\cdot \left( \text {BP}_{Ma,i}-\text {BP}_{Ma,j}\right) ^2 +\gamma _{H} \cdot \left( \text {BP}_{H,i} -\text {BP}_{H,j} \right) ^2}. \end{aligned}$$
(13)

Note that this is independent of the breakpoints’ location, i.e. within the inner cube, on the faces or the edges.

The Tikhonov weights \(\gamma\) are used to define the strength of the penalization against the least squares within the optimization. In the simplest case of a one-dimensional model, the Tikhonov matrix \(\Gamma ^*_2\) is given by the (scaled) filter coefficients on the matrix main and secondary diagonals, as given in [17] for a one-dimensional wind profile. But in case of this multi-dimensional thrust table the definition of the matrix entries is more complex. The scaled filter coefficients must be related to the distinct entry in the parameter vector representing the value for the \(\text {BP}_{j}\) as neighbor of \(\text {BP}_{i}\). Therefore, it is necessary to a priori connect all breakpoints and calculate their distances to each other as required to set-up \(\Gamma ^*_2\).

The optimization problem in Eq. (12) can be solved by application of the iterative Gauss–Newton method. The resulting least squares problem can be written as

$$\begin{aligned} \widehat{\theta } = \underset{\theta }{\text {argmin}} \left( \Vert r(\theta ) \Vert ^2+ \Vert \Gamma ^*_2 \theta \Vert ^2\right) , \end{aligned}$$
(14)

where the vector of residuals is composed of the n residuals resulting from the n measurements (in this case the required thrust values) and the model prediction using the given parameter vector of the current iteration k:

$$\begin{aligned} r_i \big (\theta ^{k}\big ) = z_i - y_i\big (\theta ^{k}\big ), \quad \forall i\in [1,n]. \end{aligned}$$
(15)

The Jacobian matrix entries are defined by the gradient of residuals with parameter variation

$$\begin{aligned} J_{i,j} \big (\theta ^{k}\big ) = \frac{\partial }{\partial \theta _j} r_i \big (\theta ^{k}\big ), \quad \forall i\in [1,n], \forall j\in [1,p]. \end{aligned}$$
(16)

The regularization of the estimation problem requires to extend the residuals vector by additional rows

$$\begin{aligned} r^*\big (\theta ^{k}\big ) = \left[ \begin{array}{c} r\big (\theta ^{k}\big ) \\ \Gamma ^*_2~\theta ^{k} \end{array}\right] , \end{aligned}$$
(17)

so that \(r^*\) does now contain the penalization of too large parameter variations within the vector respectively thrust table. Similarly, the Jacobian matrix is extended by the Tikhonov matrix

$$\begin{aligned} J^*\left( \theta ^{k}\right) = \left[ \begin{array}{c} J\big (\theta ^{k}\big ) \\ \Gamma ^*_2 \end{array}\right] , \end{aligned}$$
(18)

and the Gauss–Newton step to update the parameter vector (in each iteration) becomes

$$\begin{aligned} \theta ^{k+1} = \theta ^{k} + \left( \left[ J^*\big (\theta ^{k}\big )\right] ^T J^*\big (\theta ^{k}\big ) \right) ^{-1} \left[ J^*\big (\theta ^{k}\big )\right] ^T r^*\big (\theta ^{k}\big ). \end{aligned}$$
(19)

The iterative algorithm is terminated for this problem if the relative change of the cost function between two steps lies within a range of \(-10^{-4}\) and 0. The cost function is defined as the sum of residual squares

$$\begin{aligned} \mathbb {L}^k\big (\theta ^{k}\big ) = \sum _i r^2_i \big (\theta ^{k}\big ). \end{aligned}$$
(20)

The thrust table estimation described above requires large matrices which have to be inverted. To significantly reduce the size of these matrices, the necessary memory and the corresponding computational effort during the Gauss–Newton step, the data is clustered before the estimation algorithm is applied. This further allows to use a state-of-the-art desktop computer for the whole evaluation process in Fig. 1. During the clustering, similar data are combined to one individual point with larger weight during the table estimation which reduces the number of rows of \(J^*\) significantly. For comparable regressor points the required thrust values are averaged and the number of averaged points defines the weight of the resulting residual value in r. Using the clustering grid as defined in Table 7 the number of data points can be reduced by a factor of 235 or up to 50.7 million rows of \(J^*\) (in case of anti-ice off). If the clustering grid is fine enough, it can be further shown that the table estimation results are (nearly) identical to the case in which the data is not clustered.

Table 7 Definition of clustering grid to reduce the number of data used during the engine table estimation

As an example, Fig 8 shows some results of the engine thrust table evaluation similar to Fig. 8a and b. The results reflect the (steady) nonlinear engine thrust behavior which could not be predicted by the linear and local linear models.

Fig. 8
figure 8

Nonlinear thrust table evaluations at example conditions; anti-ice off. a Variation of engine fan speed and Mach number at H = 3000 m, b variation of engine fan speed and altitude at Ma = 0.4

7 Comparison of different thrust models

A selection of data around the example conditions (\(H = 3000\) m and \(Ma= 0.4\)) was done to provide a comparison of the different models with the required thrust data. For the fixed Mach number condition a selection criterion of \(Ma= 0.4 \pm 0.001\) was applied resulting in 17 424 data points. The fixed altitude data was selected by \(H = 3000 \pm 25\) m resulting in 7 146 points. Fig 9 shows a comparison of the different models with the required thrust data around the example conditions. In each plot, the linear model results are given together with the required thrust data and the local linear models (left plot) or the nonlinear table (right plot). It is clearly visible that the linear model does not allow to predict the required thrust data as the resulting surface lies far above the data because it fits the whole data across the envelope. Again, this points out that a global linear model for thrust gives a good initial guess for the engine thrust behavior but is not suitable for the given purpose. The other two modeling approaches do better represent the given required thrust, as expected. Figure 9a shows the results of thrust variation with fan speed and Mach number at fixed altitude and Fig. 9b the variation with fan speed and altitude at fixed Mach number. In both cases, the table model is the (clearly visible) better choice to cover the nonlinear thrust behavior as it follows the curvature of the required thrust. Note that a higher number of local linear models or a change to a higher-order interpolation polynomial of the local models could be used to better approximate the nonlinear thrust behavior, but with the remaining problem of model stitching. The nonlinear table model is still the better choice in this case.

For the three modeling approaches the calculations had been performed with MATLAB®2012b on a state-of-the-art desktop computer (3.4 GHz multi-core processor, 64 GB memory). Including the data handling, the computational effort for the local linear models and the nonlinear table were well comparable and could be done in less than an hour. The linear model can be obtained much faster with significant less accuracy.

Fig. 9
figure 9

Example of different thrust model evaluations for selected space in model envelope and comparison with given required thrust values; anti-ice off. a Variation of engine fan speed and Mach number at H = 3000 m; 7146 data points, b variation of engine fan speed and altitude at Ma = 0.4; 17424 data points

8 Temperature correction on engine thrust predictions

The temperature correction of the predicted net thrust given in Eq. (5) is formulated as product of correction factor and temperature offset

$$\begin{aligned} P_\textrm{corr}(\Delta \text {ISA},N_1) = P_{\Delta \text {ISA}}(N_1) \cdot \Delta \text {ISA}. \end{aligned}$$
(21)

Different models for the temperature correction were tested during the process development. Finally, a nonlinear table representation for \(P_{\Delta \text {ISA}}\) with dependency on engine fan speed was found the best way to reduce the remaining errors between required thrust and predicted net thrust in correlation with the temperature offset. For the resulting correction table, 41 breakpoints of fan speed between 20 % and 100 % with a step size of 2 % were chosen which is more detailed than for the thrust table (see Table 6) but necessary to cover the specific influences of temperature on engine thrust. The resulting least squares problem is rewritten as

$$\begin{aligned} \widehat{\theta } = \underset{\theta }{\text {argmin}} \left( \Vert r(\theta ) \Vert ^2 + \Vert \Gamma ^*_1 \theta \Vert ^2 + \Vert \Gamma ^*_2 \theta \Vert ^2\right) . \end{aligned}$$
(22)

Here, the Tikhonov matrices penalize the first- and second-order derivatives of the parameters with respect to engine fan speed in order to have a more or less steady behavior of the ISA correction factor with fan speed; this penalization is similar to the implementation for a free-form wind field estimation for lidar line-of-sight measurements in [17]. Nevertheless, there is no direct requirement for having a certain shape of correction factor with fan speed. But from an engineering point of view together with the basic knowledge about engine thrust behavior in Fig. 3, a completely unsteady behavior with alternating values for each fan speed breakpoint seems not to be reasonable. The matrix \(\Gamma ^*_1\) is of size \((p-1)\times p\) and \(\Gamma ^*_2\) of size \((p-2)\times p\) where \(p=41\) in the presented case. Note that \(\Gamma ^*_2\) is now different in size and contains a different weight than the one used for the engine table estimation in Eq. (14).

The choice of Tikhonov weights is no easy task. Several values for the parameters were tried and the resulting correction table respectively correction parameters analyzed. Starting with very small values, these weights were increased and the results showed that due to the low magnitude of the correction parameter values, larger values for the weights were necessary. The choice of Tikhonov weights has to result in an acceptable shape of the temperature correction table: the parameters must allow good results for calculating the total engine thrust as required, and also provide a somewhat smoothed change of the correction coefficient with the measured fan speed. In this case, the extended residual vector is defined by

$$\begin{aligned} r^*\big (\theta ^{k}\big ) = \left[ \begin{array}{c} r\big (\theta ^{k}\big ) \\ \Gamma ^*_1~\theta ^{k} \\ \Gamma ^*_2~\theta ^{k} \end{array}\right] , \end{aligned}$$
(23)

and the extended Jacobian is composed by J and the Tikhonov matrices \(\Gamma ^*_1\) and \(\Gamma ^*_2\):

$$\begin{aligned} J^*\left( \theta ^{k}\right) = \left[ \begin{array}{c} J\big (\theta ^{k}\big ) \\ \Gamma ^*_1 \\ \Gamma ^*_2 \end{array}\right] . \end{aligned}$$
(24)

The parameter update is calculated using Eqs. (19) and (20) similarly to the thrust model table estimation.

The optimization problem is in this case further formulated directly for the correction factor \(P_\textrm{corr}\) instead of using the required and model thrust values for the calculation of residuals. Hence, the residuals are defined using Eq. (6) as

$$\begin{aligned} r(\theta ){} & {} = z_i - y_i(\theta ) = \frac{T_{\textrm{req}}- T_{\textrm{net}}}{T_{\textrm{net}}} - y_i(\theta ), \\{} & {}\quad \text {with}\quad y_i(\theta ) = P_\textrm{corr}(\Delta \text {ISA}_i,N_{1,i}). \end{aligned}$$
(25)

This further leads to an optimized data clustering and a significantly smaller optimization problem, even smaller than for the thrust table estimation. Table 8 shows the grid definition for this case, which results for example in a reduction of 35.74 million data points or rows of \(J^*\) (anti-ice off), which is more than 554 times less than using the complete data. Note that the correction model estimation is done using the nonlinear engine thrust table predictions for \(T_{\textrm{net}}\) as this model showed the best characteristics for the given purpose, see Sect. 7.

Table 8 Definition of data clustering grid to reduce the number of data used during estimation of temperature offset correction on net thrust

The correction factor results are visualized in Fig 10. There, the final model evaluation (after parameter estimation) is given together with the data used for estimation derived from measurements and the previously predicted thrust table model (\(\Delta \text {ISA}= 0\) K). The model well predicts the necessary temperature correction and follows very well the majority of data points with engine fan speed. Note that there are more data points outside the displayed vertical limits, but these outliers are irrelevant for the model estimation and therefore the plot was zoomed to the majority of data distributed around the final model. Note that for low fan speeds (below 26%) the estimated correction factor shows a non-smooth distribution although the regularization of the optimization problem was used to smooth the correction model’s shape with fan speed. Within the data for low fan speeds, the distribution of the given measurements \(z_i\) varies significantly, which leads to the given free-form model result. The used smart data approach relies on the general assumption, that physical knowledge leads to a fast and accurate solution of the problem. But in addition, the usage of the free-form table models allows to compensate for effects which are e.g., difficult to model or are results of approximations used within the process. In case of the temperature correction, the overall model estimation results seem very reliable. For the foreseen application of the engine thrust model, the low fan speed part below operational engine idle is irrelevant. Furthermore, the given data results in the correction model’s shape and it is the engineer’s task to decide, if a manual adaptation or the usage of a different modeling approach is required to obtain a different behavior of the temperature correction for low fan speeds, or not.

Fig. 10
figure 10

Temperature correction factor on net thrust dependent on fan speed; final correction model and data used for model estimation derived from measurements and non-linear thrust table predictions; anti-ice off

9 Statistical analysis

For a further evaluation of the engine models’ thrust prediction abilities and qualities a statistical analysis is performed on the residuals composed from required and predicted thrust. This work is based on a unique big data set and a statistical analysis is the only way to reliably evaluate the resulting model quality. First, the probability distribution of the residuals is calculated for each model type (linear, local linear and final nonlinear table including the temperature correction) based on a histogram with 300 bins. The corresponding plots are given in Fig. 11. It is clearly visible that the (temperature corrected) nonlinear engine thrust table model provides the highest number of very small residuals with smallest overall deviation. The local linear models do also show small residuals but are not useful for the given purpose (as discussed in Sect. 7). The simple linear model has large deviations of the residuals which indicates a poor overall model fit.

Fig. 11
figure 11

Probability distribution of engine thrust residuals for the three different models: linear, local linear and temperature corrected table model; anti-ice off

The following four mathematical moments are used for this statistical analysis to further reveal the model’s ability to well predict the measurements:

  1. (1)

    mean value: expected value of residuals; for good model match it must be near zero;

  2. (2)

    standard deviation: variation of error between measurement and prediction; small values indicate very good predictions, large values point out that the model is not able to match the measurements well;

  3. (3)

    skewness: measure of asymmetry of the probability distribution in relation to its mean; negative values indicate longer left tail, positive values a longer right tail and zero values are obtained for balanced distributions;

  4. (4)

    kurtosis: “peakness” of the probability distribution, higher values correspond to a large peak around the mean; for interpretation of the kurtosis, it is often compared to the value 3 of the normal distribution leading to the “excess”.

The results of the mathematical moments evaluation are provided in Table 9. A very good model fit or model ability to predict the measurements results in very small residuals. In terms of the mathematical moments this means that a mean value near zero with small standard deviation, a skewness near zero and a high value of kurtosis are desired. The direct comparison of the results for the three different approaches shows also that the nonlinear thrust table including the temperature correction represents the required thrust values best.

Note, that this evaluation reflects the quality of the overall model fit. The mean value presented here should not mislead the reader as it is resulting from all engine conditions for all flight phases considered, which means that there are regions in the given thrust envelop with higher accuracy (lower residuals) and regions with less. For this work, the main focus is on the overall ability of the model to predict engine thrust with sufficient accuracy which is revealed by the additional statistical moments.

Table 9 Mathematical moments of probability distribution of engine thrust model residuals based on all data points (anti-ice off); comparison of results for different models: linear, local linear and nonlinear table (including temperature offset correction)

10 Conclusion

The paper presents the estimation of engine thrust models from operational flight data with limited a priori knowledge about the aircraft’s flight performance characteristics. It is part of a novel evaluation process for flight performance model determination which solves the posed big data problem with a smart data approach. The direct comparison of linear, local-linear and nonlinear modeling approaches revealed the necessity of the nonlinear table model to represent the required engine thrust. The specific data reduction and matrix regularization allow to reduce the overall computational effort and formulate a well-posed optimization problem. Therefore, the nonlinear thrust table estimation becomes reliable, efficient and relatively fast on a state-of-the-art desktop computer and does not require any specific hardware. Furthermore, the nonlinear thrust table model was extended by a temperature correction also estimated from the given flight data. Including this correction a final statistical analysis underpins the conclusion that the nonlinear table is the best choice in terms of model accuracy for the given purpose. Hence, this work is an important step for the overall flight performance model determination process and provides a significant contribution to solutions of big data problems in aviation.

As future work, the engine model prediction will be implemented in the outlined development process for aircraft performance model determination and the flight performance-related aircraft aerodynamics will be estimated.